File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/e93-1006_intro.xml

Size: 2,456 bytes

Last Modified: 2025-10-06 14:05:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="E93-1006">
  <Title>Using an Annotated Corpus as a Stochastic Grammar</Title>
  <Section position="3" start_page="37" end_page="38" type="intro">
    <SectionTitle>
NP VP NP VP
</SectionTitle>
    <Paragraph position="0"> Suppose that our combination operation (indicated with o) consists of substituting a subtree on the leftmost identically labeled leaf node of another subtree. Then the sentence Mary likes Susan can be parsed as an S by combining the following subtre~ from the corpus.</Paragraph>
    <Paragraph position="2"> But the same parse tree can also be derived by combining other subirees, for instance:</Paragraph>
    <Paragraph position="4"> Thus, a parse can have several derivations involving different subtrees. These derivations have different probabilities. Using the corpus as our stochastic grammar, we estimate the probabifity of substituting a certain subtree on a specific node as the probability of selecting this subtree among all subtrees in the corpus that could be substituted on that node. The probability of a derivation can be computed as the product of the probabilities of the subtre~ that are combined. For the example derivations above, this yields:</Paragraph>
    <Paragraph position="6"> This example illustrates that a stntigtical language model which defines probabilities over parses by taking into ac~unt only one ,derivation, does not accommodate all statistical properties of a language corpus. Instead, we will defme the probability of a parse as the sum of the probabilities of all its derivations. Finally, the probability of a string is equal to the sum of the probabilities of all its parses.</Paragraph>
    <Paragraph position="7"> We will show ,hat conventional parsing techniques can be applied to DOP, but that this becomes very inefficient, since the number of derivations of a parse grows exponentially with the length of the input suing. However, we will show that DOP can be parsed in polynomial time by using Monte Carlo techniques.</Paragraph>
    <Paragraph position="8"> An important advantage of using a corpus for probability calculation, is that no tr0jning of parameters is needed, as is the case for other stochastic grammars (Jelinek et al., 1990; Pereira and Schabes, 1992; Schabes, 1992). Secondly, since we take into account all derivations of a parse, no relationship that might possibly be of statistical interest is ignored.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML