File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/03/j03-1005_relat.xml

Size: 32,544 bytes

Last Modified: 2025-10-06 14:15:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="J03-1005">
  <Title>c(c) 2003 Association for Computational Linguistics Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation</Title>
  <Section position="3" start_page="98" end_page="112" type="relat">
    <SectionTitle>
2. Previous Work
2.1 IBM Translation Approach
</SectionTitle>
    <Paragraph position="0"> In this article, we use the translation model presented in Brown et al. (1993), and the mathematical notation we use here is taken from that paper as well: a source string  . Here, I is the length of the target string, and J is the length of the source string. Among all possible target strings, we will choose the string with the highest probability as given by Bayes' 1 The word reordering restriction used in the search procedure described in Berger et al. (1996) is not mentioned in Brown et al. (1993), although exactly the translation model described there is used. Equivalently, we use exactly the translation model described in Brown et al. (1993) but try different reordering restrictions for the DP-based search procedure.</Paragraph>
    <Paragraph position="1">  ) is modeled using a series of five models of increasing complexity in training. Here, the model used for the translation experiments is the IBM-4 model. This model uses the same parameter set as the IBM-5 model, which in preliminary experiments did not yield better translation results. The actual implementation used during the experiments is described in Al-Onaizan et al. (1999) and in Och and Ney (2000). The argmax operation denotes the search problem (i.e., the generation of the output sentence in the target language). The overall architecture of the statistical translation approach is summarized in Figure 1. In general, as shown in this figure, there may be additional transformations to make the translation task simpler for the algorithm. The transformations may range from simple word categorization to more complex preprocessing steps that require some parsing of the source string. In this article, however, we will use only word catego- null Computational Linguistics Volume 29, Number 1 rization as an explicit transformation step. In the search procedure both the language and the translation model are applied after the text transformation steps. The following &amp;quot;types&amp;quot; of parameters are used for the IBM-4 translation model: Lexicon probabilities: We use the lexicon probability p(f  |e) for translating the single target word e as the single source word f . A source word f may be translated by the &amp;quot;null&amp;quot; word e  (i.e., it does not produce any target word e). A translation probability p(f  |e  ) is trained along with the regular translation probabilities.</Paragraph>
    <Paragraph position="2"> Fertilities: A single target word e may be aligned to n = 0, 1 or more source words. This is explicitly modeled by the fertility parameter ph(n  |e): the probability that the target word e is translated by n source words is ph(n  |e). The fertility for the &amp;quot;null&amp;quot; word is treated specially (for details see Brown et al. [1993]). Berger et al. (1996) describes the extension of a partial hypothesis by a pair of target words (e prime , e), where e prime is not connected to any source word f . In this case, the so-called spontaneous target word e prime is accounted for with the fertility. Here, the translation probability ph(0  |e prime ) and notranslation probability p(f  |e prime ).</Paragraph>
    <Paragraph position="3"> Class-based distortion probabilities: When covering a source sentence position j, we use distortion probabilities that depend on the previously covered source sentence positions (we say that a source sentence position j is covered for a partial hypothesis when it is taken account of in the translation process by generating a target word or the &amp;quot;null&amp;quot; word e  ). In Brown et al. (1993), two types of distortion probabilities are distinguished: (1) the leftmost word of a set of source words f aligned to the same target word e (which is called the &amp;quot;head&amp;quot;) is placed, and (2) the remaining source words are placed. Two separate distributions are used for these two cases. For placing the &amp;quot;head&amp;quot; the center function center(i) (Brown et al. [1993] uses the notation circledot i ) is used: the average position of the source words with which the target word e i[?]1 is aligned. The distortion probabilities are class-based: They depend on the word class F(f) of a covered source word f as well as on the word class E(e) of the previously generated target word e. The classes are automatically trained (Brown et al. 1992). When the IBM-4 model parameters are used during search, an input sentence can be processed one source position at a time in a certain order primarily determined by the distortion probabilities. We will use the following simplified set of translation model parameters: lexicon probabilities p(f  |e) and distortion probabilities p(j  |j prime , J). Here, j is the currently covered input sentence position and j prime is the previously covered input sentence position. The input sentence length J is included, since we would like to think of the distortion probability as normalized according to J. No fertility probabilities or &amp;quot;null&amp;quot; word probabilities are used; thus each source word f is translated as exactly one target word e and each target word e is translated as exactly one source word f . The simplified notation will help us to focus on the most relevant details of the DP-based search procedure. The simplified set of parameters leads to an unrealistic assumption about the length of the source and target sentence, namely, I = J. During the translation experiments we will, of course, not make this assumption. The implementation details for using the full set of IBM-4 model parameters are given in Section 3.9.2.  Tillmann and Ney DP Beam Search for Statistical MT</Paragraph>
    <Section position="1" start_page="101" end_page="101" type="sub_section">
      <SectionTitle>
2.2 Search Algorithms for Statistical Machine Translation
</SectionTitle>
      <Paragraph position="0"> In this section, we give a short overview of search procedures used in statistical MT: Brown et al. (1990) and Brown et al. (1993) describe a statistical MT system that is based on the same statistical principles as those used in most speech recognition systems (Jelinek 1976). Berger et al. (1994) describes the French-to-English Candide translation system, which uses the translation model proposed in Brown et al. (1993). A detailed description of the decoder used in that system is given in Berger et al. (1996) but has never been published in a paper: Throughout the search process, partial hypotheses are maintained in a set of priority queues. There is a single priority queue for each subset of covered positions in the source string. In practice, the priority queues are initialized only on demand; far fewer than the full number of queues possible are actually used. The priority queues are limited in size, and only the 1,000 hypotheses with the highest probability are maintained. Each priority queue is assigned a threshold to select the hypotheses that are going to be extended, and the process of assigning these thresholds is rather complicated. A restriction on the possible word reorderings, which is described in Section 3.6, is applied.</Paragraph>
      <Paragraph position="1"> Wang and Waibel (1997) presents a search algorithm for the IBM-2 translation model based on the A [?] concept and multiple stacks. An extension of this algorithm is demonstrated in Wang and Waibel (1998). Here, a reshuffling step on top of the original decoder is used to handle more complex translation models (e.g., the IBM-3 model is added). Translation approaches that use the IBM-2 model parameters but are based on DP are presented in Garc'ia-Varea, Casacuberta, and Ney (1998) and Niessen et al. (1998). An approach based on the hidden Markov model alignments as used in speech recognition is presented in Tillmann, Vogel, Ney, and Zubiaga (1997) and Tillmann, Vogel, Ney, Zubiaga, and Sawaf (1997). This approach assumes that source and target language have the same word order, and word order differences are dealt with in a preprocessing stage. The work by Wu (1996) also uses the original IBM model parameters and obtains an efficient search algorithm by restricting the possible word reorderings using the so-called stochastic bracketing transduction grammar.</Paragraph>
      <Paragraph position="2"> Three different decoders for the IBM-4 translation model are compared in Germann et al. (2001). The first is a reimplementation of the stack-based decoder described in Berger et al. (1996). The second is a greedy decoder that starts with an approximate solution and then iteratively improves this first rough solution. The third converts the decoding problem into an integer program (IP), and a standard software package for solving IP is used. Although the last approach is guaranteed to find the optimal solution, it is tested only for input sentences of length eight or shorter.</Paragraph>
      <Paragraph position="3"> This article will present a DP-based beam search decoder for the IBM-4 translation model. The decoder is designed to carry out an almost full search with a small number of search errors and with little performance degradation as measured by the word error  criterion. A preliminary version of the work presented here was published in Tillmann and Ney (2000).</Paragraph>
      <Paragraph position="4"> 3. Beam Search in Statistical Machine Translation</Paragraph>
    </Section>
    <Section position="2" start_page="101" end_page="104" type="sub_section">
      <SectionTitle>
3.1 Inverted Alignment Concept
</SectionTitle>
      <Paragraph position="0"> To explicitly describe the word order difference between source and target language, Brown et al. (1993) introduced an alignment concept, in which a source position j is mapped to exactly one target position i: regular alignment: j - i = a j</Paragraph>
      <Paragraph position="2"> Regular alignment example for the translation direction German to English. For each German source word there is exactly one English target word on the alignment path.</Paragraph>
      <Paragraph position="3"> An example for this kind of alignment is given in Figure 2, in which each German source position j is mapped to an English target position i. In Brown et al. (1993), this alignment concept is used for model IBM-1 through model IBM-5. For search purposes, we use the inverted alignment concept as introduced in Niessen et al. (1998) and Ney et al. (2000). An inverted alignment is defined as follows: inverted alignment: i - j = b i Here, a target position i is mapped to a source position j. The coverage constraint for an inverted alignment is not expressed by the notation: Each source position j should be &amp;quot;hit&amp;quot; exactly once by the path of the inverted alignment b</Paragraph>
      <Paragraph position="5"> advantage of the inverted alignment concept is that we can construct target sentence hypotheses from bottom to top along the positions of the target sentence. Using the inverted alignments in the maximum approximation, we rewrite equation (1) to obtain the following search criterion, in which we are looking for the most likely target  Tillmann and Ney DP Beam Search for Statistical MT Figure 3 Illustration of the transitions in the regular and in the inverted alignment model. The regular alignment model (left figure) is used to generate the sentence from left to right; the inverted alignment model (right figure) is used to generate the sentence from bottom to top.</Paragraph>
      <Paragraph position="7"> The following notation is used: e</Paragraph>
      <Paragraph position="9"> are the immediate predecessor target words,</Paragraph>
      <Paragraph position="11"> is the word to be hypothesized, p(e</Paragraph>
      <Paragraph position="13"> ) denotes the trigram language model probability, p(f</Paragraph>
      <Paragraph position="15"> ) denotes the lexicon probability for translating the target word e</Paragraph>
      <Paragraph position="17"> . Note that in equation (2) two products over i are merged into a single product over i. The translation probability p(f</Paragraph>
      <Paragraph position="19"> ) is computed in the maximum approximation using the distortion and the lexicon probabilities. Finally, p(J  |I) is the sentence length model, which will be dropped in the following (it is not used in the IBM-4 translation model). For each source sentence f</Paragraph>
      <Paragraph position="21"> to be translated, we are searching for the unknown mapping that optimizes equation (2):</Paragraph>
      <Paragraph position="23"> In Section 3.3, we will introduce an auxiliary quantity that can be evaluated recursively using DP to find this unknown mapping. We will explicitly take care of the coverage constraint by introducing a coverage set C of source sentence positions that have already been processed. Figure 3 illustrates the concept of the search algorithm using inverted alignments: Partial hypotheses are constructed from bottom to top along the positions of the target sentence. Partial hypotheses of length i[?]1 are extended to obtain partial hypotheses of the length i. Extending a partial hypothesis means covering a source sentence position j that has not yet been covered. For a given grid point in the  Computational Linguistics Volume 29, Number 1 Table 1 DP-based algorithm for solving traveling-salesman problems due to Held and Karp. The outermost loop is over the cardinality of subsets of already visited cities. input: cities j = 1, ..., J with distance matrix d</Paragraph>
      <Paragraph position="25"> * recover optimal sequence of cities translation lattice, the unknown target word sequence can be obtained by tracing back the translation decisions to the partial hypothesis at stage i = 1. The grid points are defined in Section 3.3. In the left part of the figure the regular alignment concept is shown for comparison purposes.</Paragraph>
    </Section>
    <Section position="3" start_page="104" end_page="105" type="sub_section">
      <SectionTitle>
3.2 Held and Karp Algorithm for Traveling-Salesman Problem
</SectionTitle>
      <Paragraph position="0"> Held and Karp (1962) presents a DP approach to solve the TSP, an optimization problem that is defined as follows: Given are a set of cities {1,..., J} and for each pair of cities j, j prime the cost d jj prime &gt; 0 for traveling from city j to city j prime . We are looking for the shortest tour, starting and ending in city 1, that visits all cities in the set of cities exactly once. We are using the notation C for the set of cities, since it corresponds to a coverage set of processed source positions in MT. A straightforward way to find the shortest tour is by trying all possible permutations of the J cities. The resulting algorithm has a complexity of O(J!). DP can be used, however, to find the shortest tour</Paragraph>
      <Paragraph position="2"> ), which is a much smaller complexity for larger values of J. The approach recursively evaluates the quantity D(C, j): D(C, j) := costs of the partial tour starting in city 1, ending in city j, and visiting all cities in C Subsets of cities C of increasing cardinality c are processed. The algorithm, shown in Table 1, works because not all permutations of cities have to be considered explicitly. During the computation, for a pair (C, j), the order in which the cities in C have been visited can be ignored (except j); only the costs for the best path reaching j has to be stored. For the initialization the costs for starting from city 1 are set: D({k}, k)=d  for each k [?]{2,...,|C|}. Then, subsets C of increasing cardinality are processed. Finally, the cost for the optimal tour is obtained in the second-to-last line of the algorithm. The optimal tour itself can be found using a back-pointer array in which the optimal decision for each grid point (C, j) is stored.</Paragraph>
      <Paragraph position="3"> Figure 4 illustrates the use of the algorithm by showing the &amp;quot;supergraph&amp;quot; that is searched in the Held and Karp algorithm for a TSP with J = 5 cities. When traversing the lattice from left to right following the different possibilities, a partial path to a node j corresponds to the subset C of all cities on that path together with the last visited  Tillmann and Ney DP Beam Search for Statistical MT Figure 4 Illustration of the algorithm by Held and Karp for a traveling salesman problem with J = 5 cities. Not all permutations of cities have to be evaluated explicitly. For a given subset of cities the order in which the cities have been visited can be ignored.</Paragraph>
      <Paragraph position="4"> city j. Of all the different paths merging into the node j, only the partial path with the smallest cost has to be retained for further computation.</Paragraph>
    </Section>
    <Section position="4" start_page="105" end_page="106" type="sub_section">
      <SectionTitle>
3.3 DP-Based Algorithm for Statistical Machine Translation
</SectionTitle>
      <Paragraph position="0"> In this section, the Held and Karp algorithm is applied to statistical MT. Using the concept of inverted alignments as introduced in Section 3.1, we explicitly take care of the coverage constraint by introducing a coverage set C of source sentence positions that have already been processed. Here, the correspondence is according to the fact that each source sentence position has to be covered exactly once, fulfilling the coverage constraint. The cities of the more complex translation TSP correspond roughly to triples (e prime , e, j), the notation for which is given below. The final path output by the translation algorithm will contain exactly one triple (e prime , e, j) for each source position j.</Paragraph>
      <Paragraph position="1"> The algorithm processes subsets of partial hypotheses with coverage sets C of increasing cardinality c. For a trigram language model, the partial hypotheses are of the form (e prime , e,C, j), where e prime , e are the last two target words, C is a coverage set for the already covered source positions, and j is the last covered position. The target word sequence that ends in e prime , e is stored as a back pointer to the predecessor partial hypothesis (and recursively to its predecessor hypotheses) and is not shown in the notation. Each distance in the TSP now corresponds to the negative logarithm of the product of the translation, distortion, and language model probabilities. The following  Computational Linguistics Volume 29, Number 1 Table 2 DP-based algorithm for statistical MT that consecutively processes subsets C of source sentence positions of increasing cardinality.</Paragraph>
      <Paragraph position="2"> input: source language string f</Paragraph>
      <Paragraph position="4"> for each pair (C, j), where C[?]{1, ..., J} and j [?]Cand |C |= c do for each pair of target words e</Paragraph>
      <Paragraph position="6"> The above auxiliary quantity satisfies the following recursive DP equation:  are the predecessor words. The DP equation is evaluated recursively for each hypothesis (e prime , e,C, j). The resulting algorithm is depicted in Table 2. Some details concerning the initialization and the finding of the best target language string are presented in Section 3.4. p($  |e, e prime ) is the trigram language probability for predicting the sentence boundary symbol $. The complexity of the algorithm is O(E</Paragraph>
      <Paragraph position="8"> ), where E is the size of the target language vocabulary.</Paragraph>
    </Section>
    <Section position="5" start_page="106" end_page="109" type="sub_section">
      <SectionTitle>
3.4 Verb Group Reordering: German to English
</SectionTitle>
      <Paragraph position="0"> The above search space is still too large to translate even a medium-length input sentence. On the other hand, only very restricted reorderings are necessary; for example, for the translation direction German to English, the word order difference is mostly restricted to the German verb group. The approach presented here assumes a mostly monotonic traversal of the source sentence positions from left to right.</Paragraph>
      <Paragraph position="1">  A small number of positions may be processed sooner than they would be in that monotonic traversal. Each source position then generates a certain number of target words. The restrictions are fully formalized in Section 3.5.</Paragraph>
      <Paragraph position="2"> A typical situation is shown in Figure 5. When translating the sentence monotoni- null cally from left to right, the translation of the German finite verb kann, which is the left verbal brace in this case, is skipped until the German noun phrase mein Kollege, which is the subject of the sentence, is translated. Then, the right verbal brace is translated: 2 Also, this assumption is necessary for the beam search pruning techniques to work efficiently.</Paragraph>
      <Paragraph position="4"> Figure 5 Word reordering for the translation direction German to English: The reordering is restricted to the German verb group.</Paragraph>
      <Paragraph position="5"> The infinitive besuchen and the negation particle nicht. The following restrictions are used: One position in the source sentence may be skipped for a distance of up to L = 4 source positions, and up to two source positions may be moved for a distance of at most R = 10 source positions (the notation L and R shows the relation to the handling of the left and right verbal brace). To formalize the approach, we introduce four verb  group states S: * Initial: A contiguous initial block of source positions is covered. * Skip: One word may be skipped, leaving a &amp;quot;hole&amp;quot; in the monotonic traversal.</Paragraph>
      <Paragraph position="6"> * Move: Up to two words may be &amp;quot;moved&amp;quot; from later in the sentence. * Cover: The sentence is traversed monotonically until the state Initial is reached.</Paragraph>
      <Paragraph position="7">  11. vierten 5. Kollege 4. mein 1. In 3. Fall 2. diesem 12. Mai 6. kann 13. .</Paragraph>
      <Paragraph position="8"> 8. besuchen 7. nicht 10. am 9. Sie  Figure 6 Order in which the German source positions are covered for the German-to-English reordering example given in Figure 5.</Paragraph>
      <Paragraph position="9"> The states Move and Skip both allow a set of upcoming words to be processed sooner than would be the case in the monotonic traversal. The state Initial is entered whenever there are no uncovered positions to the left of the rightmost covered position. The sequence of states needed to carry out the word reordering example in Figure 5 is given in Figure 6. The 13 source sentence words are processed in the order shown. A formal specification of the state transitions is given in Section 3.5. Any number of consecutive German verb phrases in a sentence can be processed by the algorithm. The finite-state control presented here is obtained from a simple analysis of the German-to-English word reordering problem and is not estimated from the training data. It can be viewed as an extension of the IBM-4 model distortion probabilities. Using the above states, we define partial hypothesis extensions of the following type:</Paragraph>
      <Paragraph position="11"> Not only the coverage set C and the positions j, j prime , but also the verb group states S,S prime , are taken into account. For the sake of brevity, we have omitted the target language words e, e prime in the notation of the partial hypothesis extension. For each extension an uncovered position is added to the coverage set C of the partial hypothesis, and the verb group state S may change. A more detailed description of the partial hypothesis extension for a certain state S is given in the next section in a more general context. Covering the first uncovered position in the source sentence, we use the lan- null Tillmann and Ney DP Beam Search for Statistical MT guage model probability p(e  |$, $). Here, $ is the sentence boundary symbol, which is thought to be at position 0 in the target sentence. The search starts in the hypothesis (Initial,{[?]},0). {[?]} denotes the empty set, where no source sentence position is covered. The following recursive equation is evaluated:  The search ends in the hypotheses (Initial,{1,..., J}, j); the last covered position may be in the range j [?]{J[?]L,..., J}, because some source positions may have been skipped at the end of the input sentence. {1,..., J} denotes a coverage set including all positions from position 1 to position J. The final translation probability Q</Paragraph>
      <Paragraph position="13"> ) denotes the trigram language model, which predicts the sentence boundary $ at the end of the target sentence. Q F can be obtained using an algorithm very similar to the one given in Table 2. The complexity of the verb group reordering for the translation direction German to English is O(E</Paragraph>
    </Section>
    <Section position="6" start_page="109" end_page="112" type="sub_section">
      <SectionTitle>
3.5 Word Reordering: Generalization
</SectionTitle>
      <Paragraph position="0"> For the translation direction English to German, the word reordering can be restricted in a similar way as for the translation direction German to English. Again, the word order difference between the two languages is mainly due to the German verb group.</Paragraph>
      <Paragraph position="1"> During the translation process, the English verb group is decomposed as shown in Figure 7. When the sentence is translated monotonically from left to right, the translation of the English finite verb can is moved, and it is translated as the German left verbal brace before the English noun phrase my colleague, which is the subject of the sentence. The translations of the infinitive visit and of the negation particle not are skipped until later in the translation process. For this translation direction, the translation of one source sentence position may be moved for a distance of up to L = 4 source positions, and the translation of up to two source positions may be skipped for a distance of up to R = 10 source positions (we take over the L and R notation from the previous section). Thus, the role of the skipping and the moving are simply reversed with respect to their roles in German-to-English translation. For the example translation in Figure 7, the order in which the source sentence positions are covered is given in Figure 8.</Paragraph>
      <Paragraph position="2"> We generalize the two approaches for the different translation directions as follows: In both approaches, we assume that the source sentence is mainly processed monotonically. A small number of upcoming source sentence positions may be processed earlier than they would be in the monotonic traversal: The states Skip and Move are used as explained in the preceding section. The positions to be processed outside the monotonic traversal are restricted as follows: * The number of positions dealt with in the states Move and Skip is restricted.</Paragraph>
      <Paragraph position="3"> * There are distance restrictions on the source positions processed in those states.</Paragraph>
      <Paragraph position="5"> Word reordering for the translation direction English to German: The reordering is restricted to the English verb group.</Paragraph>
      <Paragraph position="6"> These restrictions will be fully formalized later in this section. In the state Move, some source sentence positions are &amp;quot;moved&amp;quot; from later in the sentence to earlier. After source sentence positions are moved, they are marked, and the translation of the sentence is continued monotonically, keeping track of the positions already covered. To formalize the approach, we introduce four reordering states S:  * Initial: A contiguous initial block of source positions is covered.</Paragraph>
      <Paragraph position="7"> * Skip: A restricted number of source positions may be skipped, leaving &amp;quot;holes&amp;quot; in the monotonic traversal.</Paragraph>
      <Paragraph position="8"> * Move: A restricted number of words may be &amp;quot;moved&amp;quot; from later in the sentence.</Paragraph>
      <Paragraph position="9"> * Cover: The sentence is traversed monotonically until the state Initial is  1. In 13. not 2. this 3. case 6. colleague 14. visit 15. .</Paragraph>
      <Paragraph position="10"> 4. can 5. my 7. you 8. on 9. the 10. fourth 11. of 12. May  u(C) is the number of &amp;quot;skipped&amp;quot; positions, and m(C) is the number of &amp;quot;moved&amp;quot; positions. The function card(*) returns the cardinality of a set of source positions. The function w(C) describes the &amp;quot;window&amp;quot; size in which the word reordering takes place. A procedural description for the computation of the set of successor hypotheses for a given partial hypothesis (S,C, j) is given in Table 3. There are restrictions on the possible successor states: A partial hypothesis in state Skip cannot be expanded into a partial hypothesis in state Move and vice versa. If the coverage set for the newly generated hypothesis covers a contiguous initial block of source positions, the state Initial is entered. No other state S is considered as a successor state in this case (hence the use of the continue statement in the procedural description). The set of successor hypotheses Succ by which to extend the partial hypothesis (S,C, j) is computed using the constraints defined by the values for numskip, widthskip, nummove, and widthmove, as explained in the Appendix. In particular, a source position k is discarded for extension if the &amp;quot;window&amp;quot; restrictions are violated. Within the restrictions all possible successors are computed. It can be observed that the set of successors, as computed in Table 3, is never empty.</Paragraph>
      <Paragraph position="11">  Computational Linguistics Volume 29, Number 1 Table 3 Procedural description to compute the set Succ of successor hypotheses by which to extend a partial hypothesis (S,C, j).</Paragraph>
      <Paragraph position="12"> input: partial hypothesis (S,C, j)</Paragraph>
      <Paragraph position="14"> output: set Succ of successor hypotheses There is an asymmetry between the two reordering states Move and Skip: While in state Move, the algorithm is not allowed to cover the position l min (C). It must first enter the state Cover to do so. In contrast, for the state Skip, the newly generated hypothesis always remains in the state Skip (until the state Initial is entered.) This is motivated by the word reordering for the German verb group. After the right verbal brace has been processed, no source words may be moved into the verbal brace from later in the sentence. There is a redundancy in the reorderings: The same reordering might be carried out using either the state Skip or Move, especially if widthskip and widthmove are about the same. The additional computational burden is alleviated somewhat by the fact that the pruning, as introduced in Section 3.8, does not distinguish hypotheses according to the states. A complexity analysis for different reordering constraints is given in Tillmann (2001).</Paragraph>
    </Section>
    <Section position="7" start_page="112" end_page="112" type="sub_section">
      <SectionTitle>
3.6 Word Reordering: IBM-Style Restrictions
</SectionTitle>
      <Paragraph position="0"> We now compare the new word reordering approach with the approach used in Berger et al. (1996). In the approach presented in this article, source sentence words are aligned with hypothesized target sentence words.</Paragraph>
      <Paragraph position="1">  When a source sentence word is aligned, we say its position is covered. During the search process, a partial hypothesis is extended by choosing an uncovered source sentence position, and this choice is restricted. Only one of the first n uncovered positions in a coverage set may be chosen, where n is set to 4. This choice is illustrated in Figure 9. In the figure, covered positions are marked by a filled circle, and uncovered positions are marked by an unfilled circle. Positions that may be covered next are marked by an unfilled square. The restrictions for a coverage set C can be expressed in terms of the expression u(C) defined in the previous section: The number of uncovered source sentence positions to the left of the rightmost covered position. Demanding u(C) [?] 3, we obtain the S3 restriction</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML