File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2134_intro.xml

Size: 6,170 bytes

Last Modified: 2025-10-06 14:00:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2134">
  <Title>Lexicalized Tree Automata-based Grammars for Translating Conversational Texts</Title>
  <Section position="2" start_page="0" end_page="926" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Achieving both broad coverage for general texts and better quality for texts from a restricted domain has been an important issue in practical natural language processing. Conversational language is a typical domain this problem has been notable, since they often include idioms, colloquial expressions and/or extra-grammatical expressions while a majority of utterances still obey a standard grammar.</Paragraph>
    <Paragraph position="1"> Furusc and Iida (1994) proposed an approach to spoken-language translation based on pattern matching on the surface form, combined with an cxalnple-based disambiguation method. Since the grammar rules are simple patterns containing surface expressions or constituent boundaries, they are easy to write, and domain-specific knowledge can be easily accumulated in the grammar. On the other hand, relationships between two trees arc not easy to describe, especially when they are separated apart on a larger tree. This might become an obstacle in expanding a domain-specific grammar into a general gralnlnar with a wide coverage.</Paragraph>
    <Paragraph position="2"> Brown (1996) approached to this problem employing a nmlti-engine architecture, where outputs from Transfer Machine Translation (MT), Knowledge-based MT and Example-based MT are combined on the chart during parsing. Ruland et al. (1998) employs a multi-parser multi-strategy architecture for robust parsing of the spoken language, where the results fi'om different engines are combined on the chart using probability-based scores. A difficult part with these hybrid architectures is that it is not easy to properly compare and combine the results fi'om differcnt engines designed on different principles. In addition, these methods will require much computational power, since multiple parsers have to be run simultaneously.</Paragraph>
    <Paragraph position="3"> A third approach, such as Takeda (1996), is grammar-based. In this approach, a method is provided to associate a grammar rule to a word or a set of words in order to encode their idiosyncratic syntactic behaviour. An associated grammar rule can be sccn as a kind of example if it is described mostly by the surface level information. As is apparent fl'om this description, this approach is an application of strong lexicalization of a grammar (Schabes, Abeill6 and Joshi, 1988).</Paragraph>
    <Paragraph position="4"> This approach allows coexistence of general rules and surface-level patterns in a uniform framework. Combination of both types of rules is naturally defined. These advantages arc a good reason to employ strongly lexicalized grammars as the basic grammar formalism. However, wc feel there are some points to be improved in the current strongly lcxicalized grammar formalislns.</Paragraph>
    <Paragraph position="5"> The first point is the existence of globally defined special tree operation, which requires a special parsing algorithm. In a strongly lexicalized grammar formalism, each word is associated with a finite set of trees anchored by that word. The tree operations usually include substitution of a leaf node by another tree, corresponding to expansion of a nonterminal symbol by a rewriting rule in CFG.</Paragraph>
    <Paragraph position="6"> However, if the tree operation is limited to substitution, the resulting grammar, namely Lexicalized Tree Substitution Grammar (LTSG), cannot even reproduce the trees obtained fi'om non-lexicalized context free grammars. This will be obvious from the fact that for any LTSG, there is a  constant such that, in any trees built by the grammar, the distance of the root node and the nearest lexical item is less than that constant, while this property does not always hold for CFG. Tree Insertion Grammar (TIG), introduced by Schabes et al. (1995), had to be equipped with the insertion operation in addition to substitution, so that it can be strongly equivalent to an arbitrary CFG. The insertion operation is a restricted form of the adjoining operation in the Lexicalized Tree Adjoining Grammar (LTAG) (Joshi and Schabes, 1992).</Paragraph>
    <Paragraph position="7"> Thus, a special tree operation other than substitution is inevitable to strongly lexicatized grammars. It is needed to grow an infinite number of trees from a finitely ambiguous set of initial trees representing the extended domain of locality (EDOL) of the word.</Paragraph>
    <Paragraph position="8"> However, such special tree operation requires a specially devised parsing algorithm. In addition, the algorithm will be operation-specific and we have to devise a new algorithm if we want to add or modify the operation at all. Our first motivation was to eliminate the need for globally defined special tree operations other than substitution whenever possible, without losing the existence of EDOL.</Paragraph>
    <Paragraph position="9"> Another point is the fact that lexicalization is applied only to trees, not to the tree operations. For example, in LTAG, initial tree sets anchored to a word is not enough to describe the whole set of trees anchored by that word, since initial trees are grown by adjunction of auxiliary trees. Since an auxiliary tree is in the EDOL of another word, the former word has limited direct control over which auxiliary tree can be adjoined to certain node. For detailed control, the grammar writer has to give additional adjoining restrictions to the node, and/or detailed attribute-values to the nodes that can control adjunction through node operations such as unification.</Paragraph>
    <Paragraph position="10"> In short, we would like to define a lexicalized grammar such that 1) tree operation is substitution only, 2) it has extended domain of locality, and 3) tree operations as well as trees are lexicalized whenever possible. In the next section, we propose a grammar formalism that has these properties.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML