File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/j97-3002_evalu.xml

Size: 22,390 bytes

Last Modified: 2025-10-06 14:00:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="J97-3002">
  <Title>Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora</Title>
  <Section position="10" start_page="389" end_page="398" type="evalu">
    <SectionTitle>
1. Initialization
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> In our experience, this method has proven extremely effective for avoiding missegmentation pitfalls, essentially erring only in pathological cases involving coordination constructions or lexicon coverage inadequacies. The method is also straightforward to employ in tandem with other applications, such as those below.</Paragraph>
    <Paragraph position="3"> 7. Bracketing Bracketing is another intermediate corpus annotation, useful especially when a full-coverage grammar with which to parse a corpus is unavailable (for Chinese, an even more common situation than with English). Aside from purely linguistic interest, bracket structure has been empirically shown to be highly effective at constraining subsequent training of, for example, stochastic context-free grammars (Pereira and Schabes 1992; Black, Garside, and Leech 1993). Previous algorithms for automatic bracketing operate on monolingual texts and hence require more grammatical constraints; for example, tactics employing mutual information have been applied to tagged text (Magerman and Marcus 1990).</Paragraph>
    <Paragraph position="4"> Our method based on SITGs operates on the novel principle that lexical correspondences between parallel sentences yields information from which partial bracketings for both sentences can be extracted. The assumption that no grammar is available means that constituent categories are not differentiated. Instead, a generic bracketing transduction grammar is employed, containing only one nonterminal symbol, A, which rewrites either recursively as a pair of A's or as a single terminal-pair:  for all i,j English-Chinese lexical translations for all i English vocabulary for all j Chinese vocabulary Longer productions with rank &gt; 2 are not needed; we show in the subsections below that this minimal transduction grammar in normal form is generatively equivalent to any reasonable bracketing transduction grammar. Moreover, we also show how postprocessing using rotation and flattening operations restores the rank flexibility so that an output bracketing can hold more than two immediate constituents, as shown in Figure 11.</Paragraph>
    <Paragraph position="5"> The bq distribution actually encodes the English-Chinese translation lexicon with degrees of probability on each potential word translation. We have been using a lexicon that was automatically learned from the HKUST English-Chinese Parallel Bilingual Corpus via statistical sentence alignment (Wu 1994) and statistical Chinese word and collocation extraction (Fung and Wu 1994; Wu and Fung 1994), followed by an EM word-translation-learning procedure (Wu and Xia 1994). The latter stage gives us the bij probabilities directly. For the two singleton productions, which permit any word in either sentence to be unmatched, a small c-constant can be chosen for the probabilities bit and bq, so that the optimal bracketing resorts to these productions only when it is  Computational Linguistics Volume 23, Number 3 otherwise impossible to match the singletons. The parameter a here is of no practical effect, and is chosen to be very small relative to the bq probabilities of lexical translation pairs. The result is that the maximum-likelihood parser selects the parse tree that best meets the combined lexical translation preferences, as expressed by the bij probabilities. Pre-/postpositional biases. Many bracketing errors are caused by singletons. With singletons, there is no cross-lingual discrimination to increase the certainty between alternative bracketings. A heuristic to deal with this is to specify for each of the two languages whether prepositions or postpositions are more common, where &amp;quot;preposition&amp;quot; here is meant not in the usual part-of-speech sense, but rather in a broad sense of the tendency of function words to attach left or right. This simple strategem is effective because the majority of unmatched singletons are function words that lack counterparts in the other language. This observation holds assuming that the translation lexicon's coverage is reasonably good. For both English and Chinese, we specify a prepositional bias, which means that singletons are attached to the right whenever possible.</Paragraph>
    <Paragraph position="6"> A Singleton-Rebalancing Algorithm. We give here an algorithm for further improving the bracketing accuracy in cases of singletons. Consider the following bracketing produced by the algorithm of the previous section:</Paragraph>
    <Paragraph position="8"> The prepositional bias has already correctly restricted the singleton The/C/ to attach to the right, but of course The does not belong outside the rest of the sentence, but rather with Authority. The problem is that singletons have no discriminative power between alternative bracket matchings--they only contribute to the ambiguity. We can minimize the impact by moving singletons as deep as possible, closer to the individual word they precede or succeed; or in other words, we can widen the scope of the brackets immediately following the singleton. In general this improves precision since wide-scope brackets are less constraining.</Paragraph>
    <Paragraph position="9"> The algorithm employs a rebalancing strategy reminiscent of balanced tree structures using left and right rotations. A left rotation changes a (A(BC)) structure to a ((AB)C) structure, and vice versa for a right rotation. The task is complicated by the presence of both \[\] and 0 brackets with both L1- and L2-singletons, since each combination presents different interactions. To be legal, a rotation must preserve symbol order on both output streams. However, the following lemma shows that any subtree can always be rebalanced at its root if either of its children is a singleton of either language.</Paragraph>
    <Paragraph position="10"> Lemma 4 Let x be an Ll-singleton, y be an L2-singleton, and A, B, C be arbitrary terminal or nonterminal symbols. Then the following properties hold for the \[\] and () operators, where the ~ relation means that the same two output strings are generated, and the matching of the symbols is preserved:</Paragraph>
    <Paragraph position="12"> The method of Figure 8 modifies the input tree to attach singletons as closely as possible to couples, but remaining consistent with the input tree in the following sense: singletons cannot &amp;quot;escape&amp;quot; their immediately surrounding brackets. The key is that for any given subtree, if the outermost bracket involves a singleton that should be rotated into a subtree, then exactly one of the singleton rotation properties will apply. The method proceeds depth-first, sinking each singleton as deeply as possible.</Paragraph>
    <Paragraph position="13">  Flattening the Bracketing. In the worst case, both sentences might have perfectly aligned words, lending no discriminative leverage whatsoever to the bracketer. This leaves a very large number of choices: if both sentences are of length l, then there (2t~ i are ~ l \] ~ possible bracketings with rank 2, none of which is better justified than any other. Thus to improve accuracy, we should reduce the specificity of the bracketing's commitment in such cases.</Paragraph>
    <Paragraph position="14"> An inconvenient problem with ambiguity arises in the simple bracketing grammar above, illustrated by Figure 9; there is no justification for preferring either (a) or (b) over the other. In general the problem is that both the straight and inverted concatenation operations are associative. That is, \[A\[AA\]\] and \[\[AA\]A\] generate the same two output strings, which are also generated by \[AAA\]; and similarly with (A(AA)) and ((AA)A), which can also be generated by (AAA). Thus the parse shown in (c) is preferable to either (a) or (b) since it does not make an unjustifiable commitment either way.</Paragraph>
    <Paragraph position="15"> Productions in the form of (c), however, are not permitted by the normal form we use, in which each bracket can only hold two constituents. Parsing must overcommit, since the algorithm is always forced to choose between (A(BC)) and ((AB)C) structures even when no choice is clearly better. We could relax the normal form constraint, but longer productions clutter the grammar unnecessarily and, in the case of generic bracketing grammars, reduce parsing efficiency considerably.</Paragraph>
    <Paragraph position="16"> Instead, we employ a more complicated but better-constrained grammar as shown in Figure 10, designed to produce only canonical tail-recursive parses. We differentiate type A and B constituents, representing subtrees whose roots have straight and inverted orientation, respectively. Under this grammar, a series of nested constituents with the same orientation will always have a left-heavy derivation. The guarantee that parsing will produce a tail-recursive tree facilitates easily identification of those nesting levels that are associative (and therefore arbitrary), so that those levels can be &amp;quot;flattened&amp;quot; by a postprocessing stage after parsing into non-normal form trees like the one in Figure 9(c). The algorithm proceeds bottom-up, eliminating as many brackets as possible, by making use of the associativity equivalences \[lAB\]C\] = \[ABC\] and ((ABIC) ~ (ABC). The singleton bidirectionality and flipping commutativity equivalences (see Lemma 4) can also be applied whenever they render the associativity equivalences applicable.</Paragraph>
    <Paragraph position="17">  for all i,j English-Chinese lexical translations for all i English vocabulary for all j Chinese vocabulary Figure 10 A stochastic constituent-matching ITG.</Paragraph>
    <Paragraph position="18"> The final result after flattening sentence (8) is as follows:</Paragraph>
    <Paragraph position="20"> Experiment. Approximately 2,000 sentence-pairs with both English and Chinese lengths of 30 words or less were extracted from our corpus and bracketed using the algorithm described. Several additional criteria were used to filter out unsuitable sentence-pairs. If the lengths of the pair of sentences differed by more than a 2:1 ratio, the pair was rejected; such a difference usually arises as the result of an earlier error in automatic sentence alignment. Sentences containing more than one word absent from the translation lexicon were also rejected; the bracketing method is not intended to be robust against lexicon inadequacies. We also rejected sentence-pairs with fewer than two matching words, since this gives the bracketing algorithm no discriminative leverage; such pairs accounted for less than 2% of the input data. A random sample of the bracketed sentence-pairs was then drawn, and the bracket precision was computed under each criterion for correctness. Examples are shown in Figure 11.</Paragraph>
    <Paragraph position="21"> The bracket precision was 80% for the English sentences, and 78% for the Chinese sentences, as judged against manual bracketings. Inspection showed the errors to be due largely to imperfections of our translation lexicon, which contains approximately 6,500 English words and 5,500 Chinese words with about 86% translation accuracy (Wu and Xia 1994), so a better lexicon should yield substantial performance improvement.</Paragraph>
    <Paragraph position="22"> Moreover, if the resources for a good monolingual part-of-speech or grammar-based bracketer such as that of Magerman and Marcus (1990) are available, its output can readily be incorporated in complementary fashion as discussed in Section 9.</Paragraph>
    <Section position="1" start_page="395" end_page="396" type="sub_section">
      <SectionTitle>
8.1 Phrasal Alignment
</SectionTitle>
      <Paragraph position="0"> Phrasal translation examples at the subsentential level are an essential resource for many MT and MAT architectures. This requirement is becoming increasingly direct for the example-based machine translation paradigm (Nagao 1984), whose translation flexibility is strongly restricted if the examples are only at the sentential level. It can now be assumed that a parallel bilingual corpus may be aligned to the sentence level with reasonable accuracy (Kay and Ri3cheisen 1988; Catizone, Russel, and Warwick 1989; Gale and Church 1991; Brown, Lai, and Mercer 1991; Chen 1993), even for languages as disparate as Chinese and English (Wu 1994). Algorithms for subsentential alignment have been developed as well as granularities of the character (Church 1993), word (Dagan, Church, and Gale 1993; Fung and Church 1994; Fung and McKeown 1994), collocation (Smadja 1992), and specially segmented (Kupiec 1993) levels. However, the identification of subsentential, nested, phrasal translations within the parallel texts remains a nontrivial problem, due to the added complexity of dealing with constituent structure. Manual phrasal matching is feasible only for small corpora, either for toy-prototype testing or for narrowly restricted applications.</Paragraph>
      <Paragraph position="1"> Automatic approaches to identification of subsentential translation units have largely followed what we might call a &amp;quot;parse-parse-match&amp;quot; procedure. Each half of the parallel corpus is first parsed individually using a monolingual grammar. Subsequently, the constituents of each sentence-pair are matched according to some heuristic procedure. A number of recent proposals can be cast in this framework (Sadler and Vendelmans 1990; Kaji, Kida, and Morimoto 1992; Matsumoto, Ishimoto, and Utsuro 1993; Cranias, Papageorgiou, and Peperidis 1994; Grishman 1994).</Paragraph>
      <Paragraph position="2"> The parse-parse-match procedure is susceptible to three weaknesses: Appropriate, robust, monolingual grammars may not be available. This condition is particularly relevant for many non-Western European languages such as Chinese. A grammar for this purpose must be robust since it must still identify constituents for the subsequent matching process even for unanticipated or ill-formed input sentences.</Paragraph>
    </Section>
    <Section position="2" start_page="396" end_page="397" type="sub_section">
      <SectionTitle>
Wu Bilingual Parsing
</SectionTitle>
      <Paragraph position="0"> The grammars may be incompatible across languages. The best-matching constituent types between the two languages may not include the same core arguments. While grammatical differences can make this problem unavoidable, there is often a degree of arbitrariness in a grammar's chosen set of syntactic categories, particularly if the grammar is designed to be robust. The mismatch can be exacerbated when the monolingual grammars are designed independently, or under different theoretical considerations.</Paragraph>
      <Paragraph position="1"> Selection between multiple possible arrangements may be arbitrary. By an &amp;quot;arrangement&amp;quot; between any given pair of sentences from the parallel corpus, we mean a set of matchings between the constituents of the sentences. The problem is that in some cases, a constituent in one sentence may have several potential matches in the other, and the matching heuristic may be unable to discriminate between the options.</Paragraph>
      <Paragraph position="2"> In the sentence pair of Figure 4, for example, both Security Bureau and police station are potential lexical matches to ~j. To choose the best set of matchings, an optimization over some measure of overlap between the structural analysis of the two sentences is needed. Previous approaches to phrasal matching employ arbitrary heuristic functions on, say, the number of matched subconstituents.</Paragraph>
      <Paragraph position="3"> Our method attacks the weaknesses of the parse-parse-match procedure by using (1) only a translation lexicon with no language-specific grammar, (2) a bilingual rather than monolingual formalism, and (3) a probabilistic formulation for resolving the choice between candidate arrangements. The approach differs in its single-stage operation that simultaneously chooses the constituents of each sentence and the matchings between them.</Paragraph>
      <Paragraph position="4"> The raw phrasal translations suggested by the parse output were then filtered to remove those pairs containing more than 50% singletons, since such pairs are likely to be poor translation examples. Examples that occurred more than once in the corpus were also filtered out, since repetitive sequences in our corpus tend to be nongrammatical markup. This yielded approximately 2,800 filtered phrasal translations, some examples of which are shown in Figure 12. A random sample of the phrasal translation pairs was then drawn, giving a precision estimate of 81.5%.</Paragraph>
      <Paragraph position="5"> Although this already represents a useful level of accuracy, it does not in our opinion reflect the full potential of the formalism. Inspection revealed that performance was greatly hampered by our noisy translation lexicon, which was automatically learned; it could be manually post-edited to reduce errors. Commercial on-line translation lexicons could also be employed if available. Higher precision could be also achieved without great effort by engineering a small number of broad nonterminal categories. This would reduce errors for known idiosyncratic patterns, at the cost of manual rule building.</Paragraph>
      <Paragraph position="6"> The automatically extracted phrasal translation examples are especially useful where the phrases in the two languages are not compositionally derivable solely from obvious word translations. An example is \[have acquired/C/ C//-~\] new/~J~ skills/~ ~j~\] in Figure 11. The same principle applies to nested structures also, such as (\[ ~/~ I who/,~ \] \[ have acquired/C/ C//~\] new/~J~ skills/~ \]), on up to the sentence level.</Paragraph>
      <Paragraph position="7">  an acceptable starting point for this new policy ~~IJ~~ are about 3.5 million pk~-350~ born in Hong ~ ~ ~ for Hong ~ have the right to decide our ~J~m~J~ in what way the Government would increase ~(J~{~t~}Jll~~@;~ their job opportunities ; and last month _L~ J~ never to say &amp;quot; never &amp;quot; ~-~&amp;quot;~&amp;quot; reserves and surpluses ~\]~\[I~, starting point for this new policy ~_~~ there will be many practical difficulties in terms \]~@~-~I~,~t of implementation year ended 3 1 March 1 9 9 1 ~_~Ph~J~ --n u-Figure 12 Examples of extracted phrasal translations.</Paragraph>
    </Section>
    <Section position="3" start_page="397" end_page="398" type="sub_section">
      <SectionTitle>
8.2 Word Alignment
</SectionTitle>
      <Paragraph position="0"> Under the ITG model, word alignment becomes simply the special case of phrasal alignment at the parse tree leaves. This gives us an interesting alternative perspective, from the standpoint of algorithms that match the words between parallel sentences. By themselves, word alignments are of little use, but they provide potential anchor points for other applications, or for subsequent learning stages to acquire more interesting structures.</Paragraph>
      <Paragraph position="1"> Word alignment is difficult because correct matchings are not usually linearly ordered, i.e., there are crossings. Without some additional constraints, any word position in the source sentence can be matched to any position in the target sentence, an assumption that leads to high error rates. More sophisticated word alignment algorithms therefore attempt to model the intuition that proximate constituents in close relationships in one language remain proximate in the other. The later IBM models are formulated to prefer collocations (Brown et al. 1993). In the case of word_align (Dagan, Church, and Gale 1993; Dagan and Church 1994), a penalty is imposed according to the deviation from an ideal matching, as constructed by linear interpolation? From this point of view, the proposed technique is a word alignment method that imposes a more realistic distortion penalty. The tree structure reflects the assumption that crossings should not be penalized as long as they are consistent with constituent structure. Figure 7 gives theoretical upper bounds on the matching flexibility as the lengths of the sequences increase, where the constituent structure constraints are reflected by high flexibility up to length-4 sequences and a rapid drop-off thereafter. In other words, ITGs appeal to a language universals hypothesis, that the core arguments of frames, which exhibit great ordering variation between languages, are relatively few and surface in syntactic proximity. Of course, this assumption over-simplistically 4 Direct comparison with word_align should be avoided, however, since it is intended to work on corpora whose sentences are not aligned.</Paragraph>
    </Section>
    <Section position="4" start_page="398" end_page="398" type="sub_section">
      <SectionTitle>
Wu Bilingual Parsing
</SectionTitle>
      <Paragraph position="0"> blends syntactic and semantic notions. That semantic frames for different languages share common core arguments is more plausible than that syntactic frames do. In effect we are relying on the tendency of syntactic arguments to correlate closely with semantics. If in particular cases this assumption does not hold, however, the damage is not too great--the model will simply drop the offending word matchings (dropping as few as possible).</Paragraph>
      <Paragraph position="1"> In experiments with the minimal bracketing transduction grammar, the large majority of errors in word alignment were caused by two outside factors. First, word matchings can be overlooked simply due to deficiencies in our translation lexicon. This accounted for approximately 42% of the errors. Second, sentences containing nonliteral translations obviously cannot be aligned down to the word level. This accounted for another approximate 50% of the errors. Excluding these two types of errors, accuracy on word alignment was 96.3%. In other words, the tree structure constraint is strong enough to prevent most false matches, but almost never inhibits correct word matches when they exist.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML