File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1109_intro.xml
Size: 3,312 bytes
Last Modified: 2025-10-06 14:03:37
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1109"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An All-Subtrees Approach to Unsupervised Parsing</Title> <Section position="4" start_page="865" end_page="866" type="intro"> <SectionTitle> 2 DOP </SectionTitle> <Paragraph position="0"> The key idea of DOP is this: given an annotated corpus, use all subtrees, regardless of size, to parse new sentences. The DOP1 model in Bod (1998) computes the probabilities of parse trees and sentences from the relative frequencies of the subtrees. Although it is now known that DOP1's relative frequency estimator is statistically inconsistent (Johnson 2002), the model yields excellent empirical results and has been used in state-of-the-art systems. Let's illustrate DOP1 with a simple example. Assume a corpus consisting of only two trees, as given in figure 1.</Paragraph> <Paragraph position="1"> New sentences may be derived by combining fragments, i.e. subtrees, from this corpus, by means of a node-substitution operation indicated as deg. Node-substitution identifies the leftmost nonterminal frontier node of one subtree with the root node of a second subtree (i.e., the second subtree is substituted on the leftmost nonterminal frontier node of the first subtree). Thus a new sentence such as Mary likes Susan can be derived by combining subtrees from this corpus, shown in DOP1 computes the probability of a subtree t as the probability of selecting t among all corpus subtrees that can be substituted on the same node as t. This probability is computed as the number of occurrences of t in the corpus, |t |, divided by the total number of occurrences of all subtrees t' with the same root label as t.1 Let r(t) return the root label of t. Then we may write:</Paragraph> <Paragraph position="3"> The probability of a derivation t1deg...degtn is computed by the product of the probabilities of its subtrees ti:</Paragraph> <Paragraph position="5"> As we have seen, there may be several distinct derivations that generate the same parse tree. The probability of a parse tree T is the sum of the 1 This subtree probability is redressed by a simple correction factor discussed in Goodman (2003: 136) and Bod (2003).</Paragraph> <Paragraph position="6"> probabilities of its distinct derivations. Let tid be the i-th subtree in the derivation d that produces tree T, then the probability of T is given by</Paragraph> <Paragraph position="8"> Thus DOP1 considers counts of subtrees of a wide range of sizes: everything from counts of single-level rules to entire trees is taken into account to compute the most probable parse tree of a sentence.</Paragraph> <Paragraph position="9"> A disadvantage of the approach may be that an extremely large number of subtrees (and derivations) must be considered. Fortunately there exists a compact isomorphic PCFG-reduction of DOP1 whose size is linear rather than exponential in the size of the training set (Goodman 2003).</Paragraph> <Paragraph position="10"> Moreover, Collins and Duffy (2002) show how a tree kernel can be applied to DOP1's all-subtrees representation. The currently most successful version of DOP1 uses a PCFG-reduction of the model with an n-best parsing algorithm (Bod 2003).</Paragraph> </Section> class="xml-element"></Paper>