File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/93/e93-1006_abstr.xml

Size: 1,432 bytes

Last Modified: 2025-10-06 13:47:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="E93-1006">
  <Title>Using an Annotated Corpus as a Stochastic Grammar</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> In Data Oriented Parsing (DOP), an annotated corpus is used as a stochastic grammar. An input string is parsed by combining subtrees from the corpus. As a consequence, one parse tree can usually be generated by several derivations that involve different subtrces. This leads to a statistics where the probability of a parse is equal to the sum of the probabilities of all its derivations. In (Scha, 1990) an informal introduction to DOP is given, while (Bed, 1992a) provides a formalization of the theory.</Paragraph>
    <Paragraph position="1"> In this paper we compare DOP with other stochastic grammars in the context of Formal Language Theory. It it proved that it is not possible to create for every DOP-model a strongly equivalent stochastic CFG which also assigns the same probabilities to the parses.</Paragraph>
    <Paragraph position="2"> We show that the maximum probability parse can be estimated in polynomial time by applying Monte Carlo techniques. The model was tested on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments yield 96% test set parsing accuracy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML