File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-1067_relat.xml

Size: 8,267 bytes

Last Modified: 2025-10-06 14:15:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1067">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Distortion Models For Statistical Machine Translation</Title>
  <Section position="4" start_page="0" end_page="530" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> Different languages have different word order requirements. SMT decoders attempt to generate translations in the proper word order by attempting many possible  word reorderings during the translation process. Trying all possible word reordering is an NP-Complete problem as shown in (Knight, 1999), which makes searching for the optimal solution among all possible permutations computationally intractable. Therefore, SMT decoders typically limit the number of permutations considered for efficiency reasons by placing reordering restrictions. Reordering restrictions for word-based SMT decoders were introduced by (Berger et al., 1996) and (Wu, 1996). (Berger et al., 1996) allow only re-ordering of at most n words at any given time. (Wu, 1996) propose using contiguity restrictions on the reordering. For a comparison and a more detailed discussion of the two approaches see (Zens and Ney, 2003). A different approach to allow for a limited reordering is to reorder the input sentence such that the source and the target sentences have similar word order and then proceed to monotonically decode the reordered source sentence.</Paragraph>
    <Paragraph position="1"> Monotone decoding translates words in the same order they appear in the source language. Hence, the input and output sentences have the same word order.</Paragraph>
    <Paragraph position="2"> Monotone decoding is very efficient since the optimal decoding can be found in polynomial time. (Tillmann et al., 1997) proposed a DP-based monotone search algorithm for SMT. Their proposed solution to address the necessary word reordering is to rewrite the input sentence such that it has a similar word order to the desired target sentence. The paper suggests that reordering the input reduces the translation error rate. However, it does not provide a methodology on how to perform this reordering.</Paragraph>
    <Paragraph position="3"> (Xia and McCord, 2004) propose a method to automatically acquire rewrite patterns that can be applied to any given input sentence so that the rewritten source and target sentences have similar word order. These rewrite patterns are automatically extracted by parsing the source and target sides of the training parallel corpus. Their approach show a statistically-significant improvement over a phrase-based monotone decoder.</Paragraph>
    <Paragraph position="4"> Their experiments also suggest that allowing the decoder to consider some word order permutations in addition to the rewrite patterns already applied to the source sentence actually decreases the BLEU score.</Paragraph>
    <Paragraph position="5"> Rewriting the input sentence whether using syntactic rules or heuristics makes hard decisions that can not be undone by the decoder. Hence, reordering is better handled during the search algorithm and as part of the optimization function.</Paragraph>
    <Paragraph position="6"> Phrase-based monotone decoding does not directly address word order issues. Indirectly, however, the phrase dictionary1 in phrase-based decoders typically captures local reorderings that were seen in the training data. However, it fails to generalize to word reorderings that were never seen in the training data. For example, a phrase-based decoder might translate the Ara1Also referred to in the literature as the set of blocks or clumps.</Paragraph>
    <Paragraph position="7"> bic phrase AlwlAyAt AlmtHdp2 correctly into English as the United States if it was seen in its training data, was aligned correctly, and was added to the phrase dictionary. However, if the phrase Almmlkp AlmtHdp is not in the phrase dictionary, it will not be translated correctly by a monotone phrase decoder even if the individual units of the phrase Almmlkp and AlmtHdp, and their translations (Kingdom and United, respectively) are in the phrase dictionary since that would require swapping the order of the two words.</Paragraph>
    <Paragraph position="8"> (Och et al., 1999; Tillmann and Ney, 2003) relax the monotonicity restriction in their phrase-based decoder by allowing a restricted set of word reorderings. For their translation task, word reordering is done only for words belonging to the verb group. The context in which they report their results is a Speech-to-Speech translation from German to English.</Paragraph>
    <Paragraph position="9"> (Yamada and Knight, 2002) propose a syntax-based decoder that restrict word reordering based on reordering operations on syntactic parse-trees of the input sentence. They reported results that are better than word-based IBM4-like decoder. However, their decoder is outperformed by phrase-based decoders such as (Koehn, 2004), (Och et al., 1999), and (Tillmann and Ney, 2003) . Phrase-based SMT decoders mostly rely on the language model to select among possible word order choices. However, in our experiments we show that the language model is not reliable enough to make the choices that lead to a better MT quality. This observation is also reported by (Xia and McCord, 2004).We argue that the distortion model we propose leads to a better translation as measured by BLEU.</Paragraph>
    <Paragraph position="10"> Distortion models were first proposed by (Brown et al., 1993) in the so-called IBM Models. IBM Models 2 and 3 define the distortion parameters in terms of the word positions in the sentence pair, not the actual words at those positions. Distortion probability is also conditioned on the source and target sentence lengths.</Paragraph>
    <Paragraph position="11"> These models do not generalize well since their parameters are tied to absolute word position within sentences which tend to be different for the same words across sentences. IBM Models 4 and 5 alleviate this limitation by replacing absolute word positions with relative positions. The latter models define the distortion parameters for a cept (one or more words). This models phrasal movement better since words tend to move in blocks and not independently. The distortion is conditioned on classes of the aligned source and target words. The entire source and target vocabularies are reduced to a small number of classes (e.g., 50) for the purpose of estimating those parameters.</Paragraph>
    <Paragraph position="12"> Similarly, (Koehn et al., 2003) propose a relative distortion model to be used with a phrase decoder. The model is defined in terms of the difference between the position of the current phrase and the position of the previous phrase in the source sentence. It does not con- null illustrate word positions in the sentence. The indices in the reordered English denote word position in the original English order.</Paragraph>
    <Paragraph position="13"> sider the words in those positions.</Paragraph>
    <Paragraph position="14"> The distortion model we propose assigns a probability distribution over possible relative jumps conditioned on source words. Conditioning on the source words allows for a much more fine-grained model. For instance, words that tend to act as modifers (e.g., adjectives) would have a different distribution than verbs or nouns. Our model's parameters are directly estimated from word alignments as we will further explain in Section 4. We will also show how to generalize this word distortion model to a phrase-based model.</Paragraph>
    <Paragraph position="15"> (Och et al., 2004; Tillman, 2004) propose orientation-based distortion models lexicalized on the phrase level. There are two important distinctions between their models and ours. First, they lexicalize their model on the phrases, which have many more parameters and hence would require much more data to estimate reliably. Second, their models consider only the direction (i.e., orientation) and not the relative jump.</Paragraph>
    <Paragraph position="16"> We are not aware of any work on measuring word order differences between a given language pair in the context of statistical machine translation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML