File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1609_metho.xml
Size: 16,050 bytes
Last Modified: 2025-10-06 14:10:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1609"> <Title>Statistical Machine Reordering</Title> <Section position="4" start_page="70" end_page="71" type="metho"> <SectionTitle> 2 N-gram-based SMT System </SectionTitle> <Paragraph position="0"> This section brie y describes the n-gram-based SMT which uses a translation model based on bilingual n-grams. It is actually a language model of bilingual units, referred to as tuples, which approximates the joint probability between source and target languages by using bilingual n-grams (de Gispert and Mari no, 2002).</Paragraph> <Paragraph position="1"> Bilingual units (tuples) are extracted from any word alignment according to the following constraints: null 1. a monotonous segmentation of each bilingual sentence pairs is produced, 2. no word inside the tuple is aligned to words outside the tuple, and 3. no smaller tuples can be extracted without violating the previous constraints.</Paragraph> <Paragraph position="2"> As a result of these constraints, only one segmentation is possible for a given sentence pair. Figure 1 presents a simple example which illustrates the tuple extraction process.</Paragraph> <Paragraph position="3"> I would like NULL to eat a huge ice-cream NULL quisiera ir a comer un helado gigante</Paragraph> <Paragraph position="5"> aligned bilingual sentence pair.</Paragraph> <Paragraph position="6"> Two important issues regarding this translation model must be considered. First, it often occurs that large number of single-word translation probabilities are left out of the model. This happens for all words that are always embedded in tuples containing two or more words. Consider for example the word ice-cream in Figure 1. As seen from the Figure, ice-cream is embedded into tuple t6. If a similar situation is encountered for all occurrences of ice-cream in the training corpus, then no translation probability for an independent occurrence of this word will exist.</Paragraph> <Paragraph position="7"> To overcome this problem, the tuple 4-gram model is enhanced by incorporating 1-gram translation probabilities for all the embedded words detected during the tuple extraction step. These 1-gram translation probabilities are computed from the intersection of both, the source-to-target and the target-to-source alignments.</Paragraph> <Paragraph position="8"> The second issue has to do with the fact that some words linked to NULL end up producing tuples with NULL source sides. Consider for example the tuple t3 in Figure 1. Since no NULL is actually expected to occur in translation inputs, this type of tuple is not allowed. Any target word that is linked to NULL is attached either to the word that precedes or the word that follows it. To determine this, we use the IBM1 probabilities, see Crego et al. (2005a).</Paragraph> <Paragraph position="9"> In addition to the bilingual n-gram translation model, the baseline system implements a log-linear combination of four feature functions, which are described as follows: * A target language model. This feature consists of a 4-gram model of words, which is trained from the target side of the bilingual corpus.</Paragraph> <Paragraph position="10"> * A word bonus function. This feature introduces a bonus based on the number of target words contained in the partial-translation hypothesis. It is used to compensate for the system's preference for short output sentences.</Paragraph> <Paragraph position="11"> * A source-to-target lexicon model. This feature, which is based on the lexical parameters of the IBM Model 1 (Brown et al., 1993), provides a complementary probability for each tuple in the translation table. These lexicon parameters are obtained from the source-to-target alignments.</Paragraph> <Paragraph position="12"> * A target-to-source lexicon model. Similarly to the previous feature, this feature is based on the lexical parameters of the IBM Model 1 but, in this case, these parameters are obtained from target-to-source alignments.</Paragraph> <Paragraph position="13"> All these models are combined in the decoder. Additionally, the decoder allows for a non-monotonous search with the following distorsion model.</Paragraph> <Paragraph position="15"> where dk is the distance between the rst word of the kth tuple (unit), and the last word+1 of the (k [?] 1)th tuple. Distance are measured in words referring to the units source side.</Paragraph> <Paragraph position="16"> To reduce the computational cost we place limits on the search using two parameters: the distortion limit (the maximum distance measured in words that a tuple is allowed to be reordered, m) and the reordering limit (the maximum number of reordering jumps in a sentence, j). This feature is independent of the reordering approach presented in this paper, so they can be used simultaneously. In order to combine the models in the decoder suitably, an optimization tool is needed to compute log-linear weights for each model.</Paragraph> </Section> <Section position="5" start_page="71" end_page="72" type="metho"> <SectionTitle> 3 Statistical Machine Reordering </SectionTitle> <Paragraph position="0"> As mentioned in the introduction, SMR and SMT are based on the same principles. Here, we give a detailed description of the SMR reordering approach proposed.</Paragraph> <Section position="1" start_page="71" end_page="71" type="sub_section"> <SectionTitle> 3.1 Concept </SectionTitle> <Paragraph position="0"> The aim of SMR consists in using an SMT system to deal with reordering problems. Therefore, the SMR system can be seen as an SMT system which translates from an original source language (S) to a reordered source language (S'), given a target language (T). Then, the translation tasks changes from S2T to S'2T. The main difference between the two tasks is that the latter allows for: (1) monotonized word alignment, and (2) higher quality monotonized translation.</Paragraph> </Section> <Section position="2" start_page="71" end_page="71" type="sub_section"> <SectionTitle> 3.2 Description </SectionTitle> <Paragraph position="0"> Figure 2 shows the SMR block diagram. The input is the initial source sentence (S) and the output is the reordered source sentence (S'). There three blocks inside SMR: (1) class replacing ; (2) the decoder, which requires the translation model; and, (3) the block which reorders the original sentence using the indexes given by the decoder. The following example speci es the input and output of each block inside the SMR.</Paragraph> <Paragraph position="1"> 1. Source sentence (S): El compromiso s*olo podr* a mejorar 2. Source sentence classes (S-c): C38 C43 C49 C42 C22 3. Decoder output (translation, T ):</Paragraph> <Paragraph position="3"> where |indicates the segmentation into translation units and # divides the source and target. The source part is composed of word classes and the target part is composed of the new positions of the source word classes, starting at 0.</Paragraph> <Paragraph position="4"> 4. SMR output (S'). The reordering information inside each translation unit of the decoder output (T ) is applied to the original source sentence (S): El s*olo podr* a compromiso mejorar</Paragraph> </Section> <Section position="3" start_page="71" end_page="72" type="sub_section"> <SectionTitle> 3.3 Training </SectionTitle> <Paragraph position="0"> For the reordering translation, we used an n-gram-based SMT system (and considered only the translation model). Figure 3 shows the block diagram of the training process of the SMR translation model, which is a bilingual n-gram-based model.</Paragraph> <Paragraph position="1"> The training process uses the training source and target corpora and consists of the following steps: 1. Determine source and target word classes.</Paragraph> <Paragraph position="2"> 2. Align parallel training sentences at the word level in both translation directions. Compute the union of the two alignments to obtain a symmetrized many-to-many word alignment.</Paragraph> <Paragraph position="3"> 3. Extract reordering tuples, see Figure 4.</Paragraph> <Paragraph position="4"> (a) From union word alignment, extract bilingual S2T tuples (i.e. source and target fragments) while maintaining the alignment inside the tuple. As an example of a bilingual S2T tuple consider: only possible compromise # compromiso s*olo podr* a # 0-1 1-1 1-2 2-0, as shown in Figure 4, where the different elds are separated by # and correspond to: (1) the target fragment; (2) the source fragment; and (3) the word alignment (in this case, the elds that respectively cor- null respond to a target and source word are separated by [?]).</Paragraph> <Paragraph position="5"> (b) Modify the many-to-many word alignment from each tuple to many-to-one.</Paragraph> <Paragraph position="6"> If one source word is aligned to two or more target words, the most probable link given IBM Model 1 is chosen, while the other are omitted (i.e. the number of source words is the same before and after the reordering translation). In the above example, the tuple would be changed to: only possible compromise # compromiso s*olo podr* a # 0-1 1-2 20, as Pibm1(only, s*olo) is higher than Pibm1(possible, s*olo).</Paragraph> <Paragraph position="7"> (c) From bilingual S2T tuples (with many-to-one inside alignment), extract bilingual S2S' tuples (i.e. the source fragment and its reordering). As in the example: compromiso s*olo podr* a # 1 2 0, where the rst eld is the source fragment, and the second is the reordering of these source words.</Paragraph> <Paragraph position="8"> (d) Eliminate tuples whose source fragment consists of the NULL word.</Paragraph> <Paragraph position="9"> (e) Replace the words of each tuple source fragment with the classes determined in Step 1.</Paragraph> <Paragraph position="10"> 4. Compute the bilingual language model of the bilingual S2S' tuple sequence composed of the source fragment (in classes) and its reorder. null Once the translation model is built, the original source corpus S is translated into the reordered source corpus S' with the SMR system, see Figure 2. The reordered training source corpus and the original training target corpus are used to train the SMT system (as explained in Section 2). Finally, with this system, the reordered test source corpus is translated.</Paragraph> </Section> </Section> <Section position="6" start_page="72" end_page="74" type="metho"> <SectionTitle> 4 Evaluation Framework </SectionTitle> <Paragraph position="0"> In this section, we present experiments carried out using the EsEn WMT06 and the ZhEn IWSLT05 parallel corpus. We detail the tools which have been used and the corpus statistics.</Paragraph> <Paragraph position="1"> pus: training, development and test data sets.</Paragraph> <Section position="1" start_page="73" end_page="73" type="sub_section"> <SectionTitle> 4.1 Tools </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="73" end_page="73" type="sub_section"> <SectionTitle> 4.2 Corpus Statistics </SectionTitle> <Paragraph position="0"> Experiments were carried out on the Spanish and English task of the WMT06 evaluation1 (EuroParl Corpus) and on the Chinese to English task of the IWSLT05 evaluation2 (BTEC Corpus). The former is a large corpus, whereas the latter is a small corpus translation task. Table 1 and 2 show the main statistics of the data used, namely the number of sentences, words, vocabulary, and mean sentence lengths for each language.</Paragraph> </Section> <Section position="3" start_page="73" end_page="74" type="sub_section"> <SectionTitle> 4.3 Units </SectionTitle> <Paragraph position="0"> In this section different statistics units of both approaches (S2T and S'2T) are shown (using the ZhEn task). All the experiments in this section were carried out using 100 classes in the SMR step.</Paragraph> <Paragraph position="1"> Table 3 shows the vocabulary of bilingual n-grams and embedded words in the translation model. Once the reordering translation has been computed, alignment becomes more monotonic. It is commonly known that non-monotonicity poses dif culties for word alignments. Therefore, when the alignment becomes more monotonic, we expect an improvement in the alignment, and, therefore in the translation. Here, we can observe a signi cant enlargement of the number of translation units, which leads to a growth of the translation vocabulary. We also observe a decrease in the number of embedded words (around 20%). From Section 2, we know that the probability of embedded words is estimated independently of the translation model. Reducing embedded words allows for a better estimation of the translation model. Figure 5 shows the histogram of the tuple size in the two approaches. We observe that the number of tuples is similar over length 5. However, there are a greater number of shorter units in the case of SMR+NB (shorter units lead to a reduction in data sparseness).</Paragraph> <Paragraph position="2"> number and vocabulary).</Paragraph> <Paragraph position="3"> Table 4 shows the tuples used to translate the test set (total number and vocabulary). Note that the number of tuples and vocabulary used to translate the test set is signi cantly greater after the re-ordering translation.</Paragraph> </Section> <Section position="4" start_page="74" end_page="74" type="sub_section"> <SectionTitle> 4.4 Results </SectionTitle> <Paragraph position="0"> Here, we introduce the experiments that were carried out in order to evaluate the in uence of the SMR approach in both tasks EsEn and ZhEn. The log-linear translation model was optimized with the simplex algorithm by maximizing over the BLEU score. The evaluation was carried out using references and translation in lowercase and, in the ZhEn task, without punctuation marks.</Paragraph> <Paragraph position="1"> We studied the in uence of the proposed SMR approach on the n-gram-based SMT system described using a monotonous search (NBm or monotonous baseline con guration) in the two tasks and a non-monotonous search (NBnm or non-monotonous baseline con guration) in the ZhEn task. In allowing for reordering in the SMT decoder, the distortion limit (m) and reordering limit (j) (see Section 2) were empirically set to 5 and 3, as they showed a good trade-off between quality and ef ciency. Both systems include the four features explained in Section 2: the language model, the word bonus, and the source-to-target and target-to-source lexicon models.</Paragraph> <Paragraph position="2"> Tables 5 and 6 show the results in the test set.</Paragraph> <Paragraph position="3"> The former corresponds to the in uence of the SMR system on the EsEn task (NBm), whereas the latter corresponds to the in uence of the SMR system on the ZhEn task (NBm and NBnm).</Paragraph> </Section> <Section position="5" start_page="74" end_page="74" type="sub_section"> <SectionTitle> 4.5 Discussion </SectionTitle> <Paragraph position="0"> Both BLEU and NIST coherently increase after the inclusion of the SMR step when 100 classes are used. The improvement in translation quality can be explained as follows: * SMR takes advantage of the use of classes and correctly captures word reorderings that are missed in the standard SMT system. In addition, the use of classes allows new re-orderings to be inferred.</Paragraph> <Paragraph position="1"> * The new task S'2T becomes more monotonous. Therefore, the translation units tend to be shorter and SMT systems perform better.</Paragraph> <Paragraph position="2"> The gain obtained in the SMR+NBnm case indicates that the reordering provided by SMR system and the non-monotonous search are complementary. It means that the output of the SMR could still be further monotonized. Note that the ZhEn task has complex word reorderings.</Paragraph> <Paragraph position="3"> These preliminary results also show that SMR itself provides further improvements to those provided by the non-monotonous search.</Paragraph> </Section> </Section> class="xml-element"></Paper>