File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1126_intro.xml

Size: 4,997 bytes

Last Modified: 2025-10-06 14:01:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1126">
  <Title>Recovering latent information in treebanks</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Head-lexicalization
</SectionTitle>
      <Paragraph position="0"> Many of the recent, successful statistical parsers have made use of lexical information or an implicit lexicalized grammar, both for English and, more recently, for other languages. All of these parsers recover the &amp;quot;hidden&amp;quot; lexicalizations in a treebank and find the most probable lexicalized tree when parsing, only to strip out this hidden information prior to evaluation. Also, in all these parsing e orts lexicalization has meant finding heads of constituents and then propagating those lexical heads to their respective parents. In fact, nearly identical head-lexicalizations were used in the dis- null criminative models described in (Magerman, 1995; Ratnaparkhi, 1997), the lexicalized PCFG models in (Collins, 1999), the generative model in (Charniak, 2000), the lexicalized TAG extractor in (Xia, 1999) and the stochastic lexicalized TAG models in (Chiang, 2000; Sarkar, 2001; Chen and Vijay-Shanker, 2000). Inducing a lexicalized structure based on heads has a two-pronged e ect: it not only allows statistical parsers to be sensitive to lexical information by including this information in the probability model's dependencies, but it also determines which of all possible dependencies-both syntactic and lexical--will be included in the model itself. For example, in Figure 2, the nonterminal NP(boy-NN) is dependent on VP(caught-VBD) and not the other way around.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Other tree transformations
</SectionTitle>
      <Paragraph position="0"> Lexicalization via head-finding is but one of many possible tree transformations that might be useful for parsing. As explored thoroughly by Johnson (1998), even simple, local syntactic transformations on training trees for an unlexicalized PCFG model can have a significant impact on parsing performance. Having picked up on this idea, Collins (1999) devises rules to identify arguments, i.e., constituents that are required to exist on a particular side of a head child constituent dominated by a particular parent. The parsing model can then probabilistically predict sets of requirements on either side of a head constituent, thereby incorporating a type of subcategorization information. While the model is augmented to include this subcatprediction feature, the actual identification of arguments is performed as one of many preprocessing steps on training trees, using a set of rules similar to those used for the identification of heads.</Paragraph>
      <Paragraph position="1"> Also, (Collins, 1999) makes use of several other transformations, such as the identification of subjectless sentences (augmenting S nodes to become SG) and the augmentation of nonterminals for gap threading. Xia (1999) combines head-finding with argument identification to extract elementary trees for use in the lexicalized TAG formalism. Other researchers investigated this type of extraction to construct stochastic TAG parsers (Chiang, 2000; Chen and Vijay-Shanker, 2000; Sarkar, 2001).</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Problems with heuristics
</SectionTitle>
      <Paragraph position="0"> While head-lexicalization and other tree transformations allow the construction of parsing models with more data-sensitivity and richer representations, crafting rules for these transformations has been largely an art, with heuristics handed down from researcher to researcher. What's more, on top of the large undertaking of designing and implementing a statistical parsing model, the use of heuristics has required a further e ort, forcing the researcher to bring both linguistic intuition and, more often, engineering savvy to bear whenever moving to a new treebank. For example, in the rule sets used by the parsers described in (Magerman, 1995; Ratnaparkhi, 1997; Collins, 1999), the sets of rules for finding the heads of ADJP, ADVP, NAC, PP and WHPP include rules for picking either the rightmost or leftmost FW (foreign word). The apparently haphazard placement of these rules that pick out FW and the rarity of FW nodes in the data strongly suggest these rules are the result of engineering e ort. Furthermore, it is not at all apparent that tree-transforming heuristics that are useful for one parsing model will be useful for another. Finally, as is often the case with heuristics, those used in statistical parsers tend not to be data-sensitive, and ironically do not rely on the words themselves.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML