File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/j03-4003_intro.xml

Size: 5,410 bytes

Last Modified: 2025-10-06 14:01:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="J03-4003">
  <Title>c(c) 2003 Association for Computational Linguistics Head-Driven Statistical Models for Natural Language Parsing</Title>
  <Section position="3" start_page="0" end_page="591" type="intro">
    <SectionTitle>
545 Technology Square, Cambridge, MA 02139. E-mail: mcollins@ai.mit.edu.
</SectionTitle>
    <Paragraph position="0"> Computational Linguistics Volume 29, Number 4 et al. (1992). In a history-based model, a parse tree is represented as a sequence of decisions, the decisions being made in some derivation of the tree. Each decision has an associated probability, and the product of these probabilities defines a probability distribution over possible derivations.</Paragraph>
    <Paragraph position="1"> We first describe three parsing models based on this approach. The models were originally introduced in Collins (1997); the current article  gives considerably more detail about the models and discusses them in greater depth. In Model 1 we show one approach that extends methods from probabilistic context-free grammars (PCFGs) to lexicalized grammars. Most importantly, the model has parameters corresponding to dependencies between pairs of headwords. We also show how to incorporate a &amp;quot;distance&amp;quot; measure into these models, by generalizing the model to a history-based approach. The distance measure allows the model to learn a preference for close attachment, or right-branching structures.</Paragraph>
    <Paragraph position="2"> In Model 2, we extend the parser to make the complement/adjunct distinction, which will be important for most applications using the output from the parser. Model 2 is also extended to have parameters corresponding directly to probability distributions over subcategorization frames for headwords. The new parameters lead to an improvement in accuracy.</Paragraph>
    <Paragraph position="3"> In Model 3 we give a probabilistic treatment of wh-movement that is loosely based on the analysis of wh-movement in generalized phrase structure grammar (GPSG) (Gazdar et al. 1985). The output of the parser is now enhanced to show trace coindexations in wh-movement cases. The parameters in this model are interesting in that they correspond directly to the probability of propagating GPSG-style slash features through parse trees, potentially allowing the model to learn island constraints. In the three models a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then follow naturally, leading to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. For this reason we refer to the models as head-driven statistical models.</Paragraph>
    <Paragraph position="4"> We describe evaluation of the three models on the Penn Wall Street Journal Tree-bank (Marcus, Santorini, and Marcinkiewicz 1993). Model 1 achieves 87.7% constituent precision and 87.5% consituent recall on sentences of up to 100 words in length in section 23 of the treebank, and Models 2 and 3 give further improvements to 88.3% constituent precision and 88.0% constituent recall. These results are competitive with those of other models that have been applied to parsing the Penn Treebank. Models 2 and 3 produce trees with information about wh-movement or subcategorization. Many NLP applications will need this information to extract predicate-argument structure from parse trees.</Paragraph>
    <Paragraph position="5"> The rest of the article is structured as follows. Section 2 gives background material on probabilistic context-free grammars and describes how rules can be &amp;quot;lexicalized&amp;quot; through the addition of headwords to parse trees. Section 3 introduces the three probabilistic models. Section 4 describes various refinments to these models. Section 5 discusses issues of parameter estimation, the treatment of unknown words, and also the parsing algorithm. Section 6 gives results evaluating the performance of the models on the Penn Wall Street Journal Treebank (Marcus, Santorini, and Marcinkiewicz  Collins Head-Driven Statistical Models for NL Parsing detailed analysis of the parser's performance on treebank data, including results on different constituent types. We also give a breakdown of precision and recall results in recovering various types of dependencies. The intention is to give a better idea of the strengths and weaknesses of the parsing models. Section 7 goes on to discuss the distance features in the models, the implicit assumptions that the models make about the treebank annotation style, and the way that context-free rules in the original treebank are broken down, allowing the models to generalize by producing new rules on test data examples. We analyze these phenomena through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, section 8 gives more discussion, by comparing the models to others that have been applied to parsing the treebank. We aim to give some explanation of the differences in performance among the various models.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML