File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-2025_intro.xml

Size: 2,610 bytes

Last Modified: 2025-10-06 14:03:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-2025">
  <Title>Theoretical Evaluation of Estimation Methods for Data-Oriented Parsing</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Stochastic Tree Substitution Grammars (henceforth, STSGs) are a simple generalization of Probabilistic Context Free Grammars, where the productive elements are not rewrite rules but elementary trees of arbitrary size. The increased flexibility allows STSGs to model a variety of syntactic and statistical dependencies, using relatively complexprimitivesbut just asingle andextremelysimpleglobal rule: substitution. STSGscanbeseenas Stochastic Tree Adjoining Grammars without the adjunction operation.</Paragraph>
    <Paragraph position="1"> STSGsaretheunderlying formalism of most instantiations of an approach to statistical parsing known as &amp;quot;Data-Oriented Parsing&amp;quot; (Scha, 1990; Bod, 1998). In this approach the subtrees of the trees in a tree bank are used as elementary trees of the grammar. In most DOP models the grammar used is an STSGwith, in principle, all subtrees1 of the trees in the tree bank as elementary trees. For disambiguation, the best parse tree is taken to be the most probable parse according to the weights of the grammar.</Paragraph>
    <Paragraph position="2"> Several methods have been proposed to decide on the weights based on observed tree frequencies 1A subtree tprime of a parse tree t isa tree such that every node iprime in tprime equals a node i in t, and iprime either has no daughters or the same daughter nodes as i.</Paragraph>
    <Paragraph position="3"> inatreebank. Thefirst suchmethod isnowknown as &amp;quot;DOP1&amp;quot; (Bod, 1993). In combination with someheuristic constraints on the allowed subtrees, it has been remarkably successful on small tree banks. Despite this empirical success, (Johnson, 2002) argued that it is inadequate because it is biased and inconsistent. His criticism spearheaded a number of other methods, including (Bonnema et al., 1999; Bod, 2003; Sima'an and Buratto, 2003; Zollmann and Sima'an, 2005), and will be the starting point of our analysis. As it turns out, the DOP1 method really is biased and inconsistent, but not for the reasons Johnson gives, and it really is inadequate, but not because it is biased and inconsistent. In this note, we further show that alternative methods that have been proposed, only partly remedy the problems with DOP1, leaving weight estimation as an important open problem.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML