File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-1020_intro.xml

Size: 1,576 bytes

Last Modified: 2025-10-06 14:03:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1020">
  <Title>Effective Self-Training for Parsing</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In parsing, we attempt to uncover the syntactic structure from a string of words. Much of the challenge of this lies in extracting the appropriate parsing decisions from textual examples. Given sufficient labelled data, there are several &amp;quot;supervised&amp;quot; techniques of training high-performance parsers (Charniak and Johnson, 2005; Collins, 2000; Henderson, 2004). Other methods are &amp;quot;semi-supervised&amp;quot; where they use some labelled data to annotate unlabeled data. Examples of this include self-training (Charniak, 1997) and co-training (Blum and Mitchell, 1998; Steedman et al., 2003). Finally, there are &amp;quot;unsupervised&amp;quot; strategies where no data is labeled and all annotations (including the grammar itself) must be discovered (Klein and Manning, 2002).</Paragraph>
    <Paragraph position="1"> Semi-supervised and unsupervised methods are important because good labeled data is expensive, whereas there is no shortage of unlabeled data.</Paragraph>
    <Paragraph position="2"> While some domain-language pairs have quite a bit of labelled data (e.g. news text in English), many other categories are not as fortunate. Less unsupervised methods are more likely to be portable to these new domains, since they do not rely as much on existing annotations.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML