File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1013_intro.xml

Size: 3,019 bytes

Last Modified: 2025-10-06 14:02:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1013">
  <Title>Discriminative Training of a Neural Network Statistical Parser</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Much recent work has investigated the application of discriminative methods to NLP tasks, with mixed results. Klein and Manning (2002) argue that these results show a pattern where discriminative probability models are inferior to generative probability models, but that improvements can be achieved by keeping a generative probability model and training according to a discriminative optimization criteria. We show how this approach can be applied to broad coverage natural language parsing. Our estimation and training methods successfully balance the con icting requirements that the training method be both computationally tractable for large datasets and a good approximation to the theoretically optimal method. The parser which uses this approach outperforms both a generative model and a discriminative model, achieving state-of-the-art levels of performance (90.1% F-measure on constituents).</Paragraph>
    <Paragraph position="1"> To compare these di erent approaches, we use a neural network architecture called Simple Synchrony Networks (SSNs) (Lane and Henderson, 2001) to estimate the parameters of the probability models. SSNs have the advantage that they avoid the need to impose hand-crafted independence assumptions on the learning process. Training an SSN simultaneously trains a nite representations of the unbounded parse history and a mapping from this history representation to the parameter estimates. The history representations are automatically tuned to optimize the parameter estimates. This avoids the problem that any choice of hand-crafted independence assumptions may bias our results towards one approach or another. The independence assumptions would have to be di erent for the generative and discriminative probability models, and even for the parsers which use the generative probability model, the same set of independence assumptions may be more appropriate for maximizing one training criteria over another. By inducing the history representations speci cally to t the chosen model and training criteria, we avoid having to choose independence assumptions which might bias our results.</Paragraph>
    <Paragraph position="2"> Each complete parsing system we propose consists of three components, a probability model for sequences of parser decisions, a Simple Synchrony Network which estimates the parameters of the probability model, and a procedure which searches for the most probable parse given these parameter estimates. This paper outlines each of these components, but more details can be found in (Henderson, 2003b), and, for the discriminative model, in (Henderson, 2003a). We also present the training methods, and experiments on the proposed parsing models. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML