File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/p01-1010_concl.xml
Size: 6,756 bytes
Last Modified: 2025-10-06 13:53:00
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1010"> <Title>What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy?</Title> <Section position="7" start_page="2" end_page="2" type="concl"> <SectionTitle> 5 Discussion: Converging Approaches </SectionTitle> <Paragraph position="0"> The main goal of this paper was to find the minimal set of fragments which achieves maximal parse accuracy in Data Oriented Parsing. We have found that this minimal set of fragments is very large and extremely redundant: highest parse accuracy is obtained by employing only two constraints on the fragment set: a restriction of the number of words in the fragment frontiers to 12 and a restriction of the depth of unlexicalized fragments to 6. No other constraints were warranted.</Paragraph> <Paragraph position="1"> There is an important question why maximal parse accuracy occurs with exactly these constraints. Although we do not know the answer to this question, we surmise that these constraints differ from corpus to corpus and are related to general data sparseness effects. In previous experiments with DOP1 on smaller and more restricted domains we found that the parse accuracy decreases also after a certain maximum subtree depth (see Bod 1998; Sima'an 1999). We expect that also for the WSJ the parse accuracy will decrease after a certain depth, although we have not been able to find this depth so far.</Paragraph> <Paragraph position="2"> A major difference between our approach and most other models tested on the WSJ is that the DOP model uses frontier lexicalization while most other models use constituent lexicalization (in that they associate each constituent non terminal with its lexical head -- see Collins 1996, 1999; Charniak 1997; Eisner 1997). The results in this paper indicate that frontier lexicalization is a promising alternative to constituent lexicalization. Our results also show that the linguistically motivated constraint which limits the statistical dependencies to the locality of headwords of constituents is too narrow. Not only are counts of subtrees with nonheadwords important, also counts of unlexicalized subtrees up to depth 6 increase the parse accuracy.</Paragraph> <Paragraph position="3"> The only other model that uses frontier lexicalization and that was tested on the standard WSJ split is Chiang (2000) who extracts a stochastic tree-insertion grammar or STIG (Schabes & Waters 1996) from the WSJ, obtaining 86.6% LP and 86.9% LR for sentences [?] 40 words. However, Chiang's approach is limited in at least two respects. First, each elementary tree in his STIG is lexicalized with exactly one lexical item, while our results show that there is an increase in parse accuracy if more lexical items and also if unlexicalized trees are included (in his conclusion Chiang acknowledges that &quot;multiply anchored trees&quot; may be important). Second, Chiang computes the probability of a tree by taking into account only one derivation, while in STIG, like in DOP1, there can be several derivations that generate the same tree.</Paragraph> <Paragraph position="4"> Another difference between our approach and most other models is that the underlying grammar of DOP is based on a treebank grammar (cf. Charniak 1996, 1997), while most current stochastic parsing models use a &quot;markov grammar&quot; (e.g. Collins 1999; Charniak 2000).</Paragraph> <Paragraph position="5"> While a treebank grammar only assigns probabilities to rules or subtrees that are seen in a treebank, a markov grammar assigns probabilities to any possible rule, resulting in a more robust model. We expect that the application of the markov grammar approach to DOP will further improve our results. Research in this direction is already ongoing, though it has been tested for rather limited subtree depths only (see Sima'an 2000).</Paragraph> <Paragraph position="6"> Although we believe that our main result is to have shown that almost arbitrary fragments within parse trees are important, it is surprising that a relatively simple model like DOP1 outperforms most other stochastic parsers on the WSJ. Yet, to the best of our knowledge, DOP is the only model which does not a priori restrict the fragments that are used to compute the most probable parse. Instead, it starts out by taking into account all fragments seen in a treebank and then investigates fragment restrictions to discover the set of relevant fragments. From this perspective, the DOP approach can be seen as striving for the same goal as other approaches but from a different direction. While other approaches usually limit the statistical dependencies beforehand (for example to headword dependencies) and then try to improve parse accuracy by gradually letting in more dependencies, the DOP approach starts out by taking into account as many dependencies as possible and then tries to constrain them without losing parse accuracy. It is not unlikely that these two opposite directions will finally converge to the same, true set of statistical dependencies for natural language parsing.</Paragraph> <Paragraph position="7"> As it happens, quite some convergence has already taken place. The history of stochastic parsing models shows a consistent increase in the scope of statistical dependencies that are captured by these models. Figure 4 gives a (very) schematic overview of this increase (see Carroll & Weir 2000, for a more detailed account of a subsumption lattice where SCFG is at the bottom and DOP at the top).</Paragraph> <Paragraph position="8"> statistical dependencies by stochastic parsers Thus there seems to be a convergence towards a maximalist model which &quot;takes all fragments [...] and lets the statistics decide&quot; (Bod 1998: 5). While early head-lexicalized grammars restricted the fragments to the locality of headwords (e.g.</Paragraph> <Paragraph position="9"> Collins 1996; Eisner 1996), later models showed the importance of including context from higher nodes in the tree (Charniak 1997; Johnson 1998).</Paragraph> <Paragraph position="10"> This mirrors our result of the utility of (unlexicalized) fragments of depth 2 and larger.</Paragraph> <Paragraph position="11"> The importance of including single nonheadwords is now also uncontroversial (e.g. Collins 1997, 1999; Charniak 2000), and the current paper has shown the importance of including two and more nonheadwords. Recently, Collins (2000) observed that &quot;In an ideal situation we would be able to encode arbitrary features h s , thereby keeping track of counts of arbitrary fragments within parse trees&quot;. This is in perfect correspondence with the DOP philosophy.</Paragraph> </Section> class="xml-element"></Paper>