File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1091_intro.xml

Size: 5,642 bytes

Last Modified: 2025-10-06 14:06:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1091">
  <Title>An Empirical Evaluation of Probabilistic Lexicalized Tree Insertion Grammars *</Title>
  <Section position="3" start_page="0" end_page="558" type="intro">
    <SectionTitle>
2 PLTIG and Related Work
</SectionTitle>
    <Paragraph position="0"> The inspiration for the PLTIG formalism stems from the desire to lexicalize a context-free gram- null mar. There are three ways in which one might do so. First, one can modify the tree structures so that all context-free productions contain lexical items. Greibach normal form provides a well-known example of such a lexicalized context-free formalism. This method is not practical because altering the structures of the grammar damages the linguistic information stored in the original grammar (Schabes and Waters, 1994). Second, one might propagate lexical information upward through the productions. Examples of formalisms using this approach include the work of Magerman (1995), Charniak (1997), Collins (1997), and Goodman (1997). A more linguistically motivated approach is to expand the domain of productions downward to incorporate more tree structures. The Lexicalized Tree-Adjoining Grammar (LTAG) formalism (Schabes et al., 1988), (Schabes, 1990) , although not context-free, is the most well-known instance in this category.</Paragraph>
    <Paragraph position="1"> PLTIGs belong to this third category and generate only context-free languages.</Paragraph>
    <Paragraph position="2"> LTAGs (and LTIGs) are tree-rewriting systems, consisting of a set of elementary trees combined by tree operations. We distinguish two types of trees in the set of elementary trees: the initial trees and the auxiliary trees. Unlike full parse trees but reminiscent of the productions of a context-free grammar, both types of trees may have nonterminal leaf nodes. Auxiliary trees have, in addition, a distinguished nonterminal leaf node, labeled with the same nonterminal as the root node of the tree, called the foot node. Two types of operations are used to construct derived trees, or parse trees: substitution and adjunction. An initial tree can be substituted into the nonterminal leaf node of another tree in a way similar to the substitution of nonterminals in the production rules of CFGs. An auxiliary tree is inserted into another tree through the adjunction operation, which splices the auxiliary tree into the target tree at a node labeled with the same nonterminal as the root and foot of the auxiliary tree. By using a tree representation, LTAGs extend the domain of locality of a grammatical primitive, so that they capture both lexical features and hierarchical structure. Moreover, the adjunction operation elegantly models intuitive linguistic concepts such as long distance dependencies between words. Unlike the N-gram model, which only offers dependencies between neighboring words, these trees can model the interaction of structurally related words that occur far apart.</Paragraph>
    <Paragraph position="3"> Like LTAGs, LTIGs are tree-rewriting systems, but they differ from LTAGs in their generative power. LTAGs can generate some strictly context-sensitive languages. They do so by using wrapping auxiliary trees, which allow non-empty frontier nodes (i.e., leaf nodes whose labels are not the empty terminal symbol) on both sides of the foot node. A wrapping auxiliary tree makes the formalism context-sensitive because it coordinates the string to the left of its foot with the string to the right of its foot while allowing a third string to be inserted into the foot. Just as the ability to recursively centerembed moves the required parsing time from O(n) for regular grammars to O(n 3) for context-free grammars, so the ability to wrap auxiliary trees moves the required parsing time further, to O(n 8) for tree-adjoining grammars 1. This level of complexity is far too computationally expensive for current technologies. The complexity of LTAGs can be moderated by eliminating just the wrapping auxiliary trees. LTIGs prevent wrapping by restricting auxiliary tree structures to be in one of two forms: the left auxiliary tree, whose non-empty frontier nodes are all to the left of the foot node; or the right auxiliary tree, whose non-empty frontier nodes are all to the right of the foot node. Auxiliary trees of different types cannot adjoin into each other if the adjunction would result in a wrapping auxiliary tree. The resulting system is strongly equivalent to CFGs, yet is fully lexicalized and still O(n 3) parsable, as shown by Schabes and Waters (1994).</Paragraph>
    <Paragraph position="4"> Furthermore, LTIGs can be parameterized to form probabilistic models (Schabes and Waters, 1993). Informally speaking, a parameter is associated with each possible adjunction or substitution operation between a tree and a node.</Paragraph>
    <Paragraph position="5"> For instance, suppose there are V left auxiliary trees that might adjoin into node r/. Then there are V q- 1 parameters associated with node r/  ity for the recognition of Tree Adjoining Languages is O(M(n2)), where M(k) is the time needed to multiply two k x k boolean matrices.(Rajasekaran and Yooseph, 1995)</Paragraph>
    <Paragraph position="7"> represent a bigram grammar. The arrows indicate adjunction sites.</Paragraph>
    <Paragraph position="8"> that describe the distribution of the likelihood of any left auxiliary tree adjoining into node ~/. (We need one extra parameter for the case of no left adjunction.) A similar set of parameters is constructed for the right adjunction and substitution distributions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML