File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/p02-1035_intro.xml

Size: 5,064 bytes

Last Modified: 2025-10-06 14:01:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1035">
  <Title>Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Statistical parsing using combined systems of hand-coded linguistically fine-grained grammars and stochastic disambiguation components has seen considerable progress in recent years. However, such attempts have so far been confined to a relatively small scale for various reasons. Firstly, the rudimentary character of functional annotations in standard tree-banks has hindered the direct use of such data for statistical estimation of linguistically fine-grained statistical parsing systems. Rather, parameter estimation for such models had to resort to unsupervised techniques (Bouma et al., 2000; Riezler et al., 2000), or training corpora tailored to the specific grammars had to be created by parsing and manual disambiguation, resulting in relatively small training sets of around 1,000 sentences (Johnson et al., 1999).</Paragraph>
    <Paragraph position="1"> Furthermore, the effort involved in coding broad-coverage grammars by hand has often led to the specialization of grammars to relatively small domains, thus sacrificing grammar coverage (i.e. the percentage of sentences for which at least one analysis is found) on free text. The approach presented in this paper is a first attempt to scale up stochastic parsing systems based on linguistically fine-grained hand-coded grammars to the UPenn Wall Street Journal (henceforth WSJ) treebank (Marcus et al., 1994).</Paragraph>
    <Paragraph position="2"> The problem of grammar coverage, i.e. the fact that not all sentences receive an analysis, is tackled in our approach by an extension of a full-fledged Lexical-Functional Grammar (LFG) and a constraint-based parser with partial parsing techniques. In the absence of a complete parse, a so-called &amp;quot;FRAGMENT grammar&amp;quot; allows the input to be analyzed as a sequence of well-formed chunks. The set of fragment parses is then chosen on the basis of a fewest-chunk method. With this combination of full and partial parsing techniques we achieve 100% grammar coverage on unseen data.</Paragraph>
    <Paragraph position="3"> Another goal of this work is the best possible exploitation of the WSJ treebank for discriminative estimation of an exponential model on LFG parses. We define discriminative or conditional criteria with re-Computational Linguistics (ACL), Philadelphia, July 2002, pp. 271-278. Proceedings of the 40th Annual Meeting of the Association for  &amp;quot;The golden share was scheduled to expire at the beginning of&amp;quot;</Paragraph>
    <Paragraph position="5"> spect to the set of grammar parses consistent with the treebank annotations. Such data can be gathered by applying labels and brackets taken from the tree-bank annotation to the parser input. The rudimentary treebank annotations are thus used to provide partially labeled data for discriminative estimation of a probability model on linguistically fine-grained parses.</Paragraph>
    <Paragraph position="6"> Concerning empirical evaluation of disambiguation performance, we feel that an evaluation measuring matches of predicate-argument relations is more appropriate for assessing the quality of our LFG-based system than the standard measure of matching labeled bracketing on section 23 of the WSJ treebank. The first evaluation we present measures matches of predicate-argument relations in LFG f-structures (henceforth the LFG annotation scheme) to a gold standard of manually annotated f-structures for a representative subset of the WSJ treebank. The evaluation measure counts the number of predicate-argument relations in the f-structure of the parse selected by the stochastic model that match those in the gold standard annotation. Our parser plus stochastic disambiguator achieves 79% F-score under this evaluation regime.</Paragraph>
    <Paragraph position="7"> Furthermore, we employ another metric which maps predicate-argument relations in LFG f-structures to the dependency relations (henceforth the DR annotation scheme) proposed by Carroll et al. (1999). Evaluation with this metric measures the matches of dependency relations to Carroll et al.'s gold standard corpus. For a direct comparison of our results with Carroll et al.'s system, we computed an F-score that does not distinguish different types of dependency relations. Under this measure we obtain 76% F-score.</Paragraph>
    <Paragraph position="8"> This paper is organized as follows. Section 2 describes the Lexical-Functional Grammar, the constraint-based parser, and the robustness techniques employed in this work. In section 3 we present the details of the exponential model on LFG parses and the discriminative statistical estimation technique. Experimental results are reported in section 4. A discussion of results is in section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML