XML Viewer - p04-1015

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1015_intro.xml
Size: 4,355 bytes
Last Modified: 2025-10-06 14:02:22
<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1015">
  <Title>Incremental Parsing with the Perceptron Algorithm</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In statistical approaches to NLP problems such as tagging or parsing, it seems clear that the representation used as input to a learning algorithm is central to the accuracy of an approach. In an ideal world, the designer of a parser or tagger would be free to choose any features which might be useful in discriminating good from bad structures, without concerns about how the features interact with the problems of training (parameter estimation) or decoding (search for the most plausible candidate under the model). To this end, a number of recently proposed methods allow a model to incorporate &amp;quot;arbitrary&amp;quot; global features of candidate analyses or parses. Examples of such techniques are Markov Random Fields (Ratnaparkhi et al., 1994; Abney, 1997; Della Pietra et al., 1997; Johnson et al., 1999), and boosting or perceptron approaches to reranking (Freund et al., 1998; Collins, 2000; Collins and Duffy, 2002).</Paragraph>
    <Paragraph position="1"> A drawback of these approaches is that in the general case, they can require exhaustive enumeration of the set of candidates for each input sentence in both the training and decoding phases1. For example, Johnson et al.</Paragraph>
    <Paragraph position="2"> (1999) and Riezler et al. (2002) use all parses generated by an LFG parser as input to an MRF approach - given the level of ambiguity in natural language, this set can presumably become extremely large. Collins (2000) and Collins and Duffy (2002) rerank the top N parses from an existing generative parser, but this kind of approach 1Dynamic programming methods (Geman and Johnson, 2002; Lafferty et al., 2001) can sometimes be used for both training and decoding, but this requires fairly strong restrictions on the features in the model.</Paragraph>
    <Paragraph position="3"> presupposes that there is an existing baseline model with reasonable performance. Many of these baseline models are themselves used with heuristic search techniques, so that the potential gain through the use of discriminative re-ranking techniques is further dependent on effective search.</Paragraph>
    <Paragraph position="4"> This paper explores an alternative approach to parsing, based on the perceptron training algorithm introduced in Collins (2002). In this approach the training and decoding problems are very closely related - the training method decodes training examples in sequence, and makes simple corrective updates to the parameters when errors are made. Thus the main complexity of the method is isolated to the decoding problem. We describe an approach that uses an incremental, left-to-right parser, with beam search, to find the highest scoring analysis under the model. The same search method is used in both training and decoding. We implemented the perceptron approach with the same feature set as that of an existing generative model (Roark, 2001a), and show that the perceptron model gives performance competitive to that of the generative model on parsing the Penn treebank, thus demonstrating that an unnormalized discriminative parsing model can be applied with heuristic search. We also describe several refinements to the training algorithm, and demonstrate their impact on convergence properties of the method.</Paragraph>
    <Paragraph position="5"> Finally, we describe training the perceptron model with the negative log probability given by the generative model as another feature. This provides the perceptron algorithm with a better starting point, leading to large improvements over using either the generative model or the perceptron algorithm in isolation (the hybrid model achieves 88.8% f-measure on the WSJ treebank, compared to figures of 86.7% and 86.6% for the separate generative and perceptron models). The approach is an extremely simple method for integrating new features into the generative model: essentially all that is needed is a definition of feature-vector representations of entire parse trees, and then the existing parsing algorithms can be used for both training and decoding with the models.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML