File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/e93-1006_intro.xml
Size: 2,456 bytes
Last Modified: 2025-10-06 14:05:25
<?xml version="1.0" standalone="yes"?> <Paper uid="E93-1006"> <Title>Using an Annotated Corpus as a Stochastic Grammar</Title> <Section position="3" start_page="37" end_page="38" type="intro"> <SectionTitle> NP VP NP VP </SectionTitle> <Paragraph position="0"> Suppose that our combination operation (indicated with o) consists of substituting a subtree on the leftmost identically labeled leaf node of another subtree. Then the sentence Mary likes Susan can be parsed as an S by combining the following subtre~ from the corpus.</Paragraph> <Paragraph position="2"> But the same parse tree can also be derived by combining other subirees, for instance:</Paragraph> <Paragraph position="4"> Thus, a parse can have several derivations involving different subtrees. These derivations have different probabilities. Using the corpus as our stochastic grammar, we estimate the probabifity of substituting a certain subtree on a specific node as the probability of selecting this subtree among all subtrees in the corpus that could be substituted on that node. The probability of a derivation can be computed as the product of the probabilities of the subtre~ that are combined. For the example derivations above, this yields:</Paragraph> <Paragraph position="6"> This example illustrates that a stntigtical language model which defines probabilities over parses by taking into ac~unt only one ,derivation, does not accommodate all statistical properties of a language corpus. Instead, we will defme the probability of a parse as the sum of the probabilities of all its derivations. Finally, the probability of a string is equal to the sum of the probabilities of all its parses.</Paragraph> <Paragraph position="7"> We will show ,hat conventional parsing techniques can be applied to DOP, but that this becomes very inefficient, since the number of derivations of a parse grows exponentially with the length of the input suing. However, we will show that DOP can be parsed in polynomial time by using Monte Carlo techniques.</Paragraph> <Paragraph position="8"> An important advantage of using a corpus for probability calculation, is that no tr0jning of parameters is needed, as is the case for other stochastic grammars (Jelinek et al., 1990; Pereira and Schabes, 1992; Schabes, 1992). Secondly, since we take into account all derivations of a parse, no relationship that might possibly be of statistical interest is ignored.</Paragraph> </Section> class="xml-element"></Paper>