File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1032_evalu.xml
Size: 11,480 bytes
Last Modified: 2025-10-06 13:59:02
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1032"> <Title>Symmetric Word Alignments for Statistical Machine Translation</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Evaluation Criterion </SectionTitle> <Paragraph position="0"> We use the same evaluation criterion as described in (Och and Ney, 2000). We compare the generated word alignment to a reference alignment produced by human experts. The annotation scheme explicitly takes the ambiguity of the word alignment into account.</Paragraph> <Paragraph position="1"> There are two different kinds of alignments: sure alignments (S) which are used for unambiguous alignments and possible alignments (P) which are used for alignments that might or might not exist. The P relation is used especially to align words within idiomatic expressions and free translations. It is guaranteed that the sure alignments are a subset of the possible alignments (S P). The obtained reference alignment may contain many-to-one and one-to-many relationships.</Paragraph> <Paragraph position="2"> The quality of an alignment A is computed as appropriately redefined precision and recall measures. Additionally, we use the alignment error rate (AER), which is derived from the well-known F-measure.</Paragraph> <Paragraph position="4"> With these definitions a recall error can only occur if a S(ure) alignment is not found and a precision error can only occur if a found alignment is not even P(ossible).</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Experimental Setup </SectionTitle> <Paragraph position="0"> We evaluated the presented lexicon symmetrization methods on the Verbmobil and the Canadian Hansards task. The German-English Verbmobil task (Wahlster, 2000) is a speech translation task in the domain of appointment scheduling, travel planning and hotel reservation. The French-English Canadian Hansards task consists of the debates in the Canadian Parliament.</Paragraph> <Paragraph position="1"> The corpus statistics are shown in Table 1 and Table 2. The number of running words and the vocabularies are based on full-form words including punctuation marks. As in (Och and Ney, 2003), the first 100 sentences of the test corpus are used as a development corpus to optimize model parameters that are not trained via the EM algorithm, e.g. the interpolation weights. The remaining part of the test corpus is used to evaluate the models. We use the same training schemes (model sequences) as presented in (Och and Ney, 2003): 15H5334363 for the Verbmobil Task , i.e. 5 iteration of IBM-1, 5 iterations of the HMM, 3 iteration of IBM-3, etc.; for the Canadian Hansards task, we use 15H10334363. We refer to these schemes as the Model 6 schemes.</Paragraph> <Paragraph position="2"> For comparison, we also perform less sophisticated trainings, to which we refer as the HMM schemes (15H10 and 15H5, respectively), as well as the IBM Model 4 schemes (15H103343 and 15H53343).</Paragraph> <Paragraph position="3"> Inalltrainingschemesweuseaconventional dictionary (possibly containing phrases) as additional training material. Because we use the same training and testing conditions as (Och and Ney, 2003), we will refer to the results presented in that article as the baseline results.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.3 Non-symmetric Alignments </SectionTitle> <Paragraph position="0"> In the first experiments, we use the state occupation probabilities from only one translation direction to determine the word alignment. This allows for a fair comparison with the Viterbi alignment computed as the result of the training procedure. In the source-to-target translation direction, we cannot estimate the probability for the target words with fertility zero and choose to set it to 0. In this case, the minimum weight edge cover problem is solved by the one-sided MWEC algorithm.</Paragraph> <Paragraph position="1"> Like the Viterbi alignments, the alignments produced by this algorithm satisfy the constraint that multiple source (target) words can only be aligned to one target (source) word.</Paragraph> <Paragraph position="2"> Tables 3 and 4 show the performance of the one-sided MWEC algorithm in comparison with the experiment reported by (Och and Ney, 2003). We report not only the final alignment error rates, but also the intermediate results for the HMM and IBM-4 training schemes.</Paragraph> <Paragraph position="3"> For IBM-3 to IBM-5, the Viterbi alignment and a set of promising alignments are used to determine the state occupation probabilities. Consequently, we observe similar alignment quality when comparing the Viterbi and the one-sided MWEC alignments.</Paragraph> <Paragraph position="4"> We also evaluated the alignment quality after applying alignment generalization methods, i.e. we combine the alignment of both translation directions. Experimentally, the best generalization heuristic for the Canadian Hansards task is the intersection of the source-to-target and the target-to-source alignments. For the Verbmobil task, the refined method of (Och and Ney, 2003) is used. Again, we observed similar alignment error rates when merging either the Viterbi alignments or the ment methods and for various models (HMM, IBM-4, Model 6) on the Canadian Hansards task.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.4 Symmetric Alignments </SectionTitle> <Paragraph position="0"> The heuristically generalized Viterbi alignments presented in the previous section can potentially avoid the alignment constraints3.</Paragraph> <Paragraph position="1"> However, the choice of the optimal generalization heuristic may depend on a particular language pair and may require extensive manual optimization. In contrast, the symmetric MWEC algorithm is a systematic and theoretically well-founded approach to the task of producing a symmetric alignment.</Paragraph> <Paragraph position="2"> In the experiments with the symmetric MWEC algorithm, the optimal interpolation parameter fi (see Equation 3) for the Verbmobil corpus was empirically determined as 0:8. This shows that the model parameters can be estimated more reliably in the direction from German to English. In the inverse English-to-German alignment training, the mappings of many English words to one German word are not allowed by the modeling constraints, although such alignment mappings are significantly more frequent than mappings of many German words to one English word.</Paragraph> <Paragraph position="3"> The experimentally best interpolation parameterfortheCanadianHansardscorpuswas null fi = 0:5. Thus the model parameters estimated in the translation direction from French to English are as reliable as the ones estimated 3Consequently, we will use them as baseline for the experiments with symmetric alignments.</Paragraph> <Paragraph position="4"> in the direction from English to French.</Paragraph> <Paragraph position="5"> Lines 2a and 2b of Table 5 show the performance of the MWEC algorithm. The alignment error rates are slightly lower if the HMM or the full Model 6 training scheme is used to train the state occupation probabilities on the Canadian Hansards task. On the Verbmobil task, the improvement is more significant, yielding an alignment error rate of 4.1%.</Paragraph> <Paragraph position="6"> Columns 4 and 5 of Table 5 contain the results of the experiments, in which the costs cij were determined as the loglinear interpolation of state occupation probabilities obtained from the HMM training scheme with those from IBM-4 (column 4) or from Model 6 (column 5). We set the interpolation parameters for the two translation directions proportional to the optimal values determined in the previous experiments. On the Verbmobil task, we obtain a further improvement of 19% relative over the baseline result reported in (Och and Ney, 2003), reaching an AER as low as 3.8%.</Paragraph> <Paragraph position="7"> The improvements of the alignment quality on the Canadian Hansards task are less significant. The manual reference alignments for this task contain many possible connections and only a few sure connections (cf. Table 2). Thus automatic alignments consisting of only a few reliable alignment points are favored. Because the differences in the number of words and word order between French and English are not as dramatic as e.g. between German and English, the probability of the emptywordalignmentisnotveryhigh. Therefore, plenty of alignment points are produced by the MWEC algorithm, resulting in a high recall and low precision. To increase the precision, we replaced the empty word connection costs (previously trained as state occupation probabiliities using the EM algorithm) by the global, word- and position-independent costs depending only on one of the involved languages. The alignment error rates for these experiments are given in lines 3a and 3b of Table 5. The global empty word probability for the Canadian Hansards task was empirically set to 0.45 for French and for English, and, for the Verbmobil task, to 0.6 for German and 0.1 for English. On the Canadian Hansards task, we achieved further significant reduction of the AER. In particular, we reached an AER of 6.6% by performing only the HMM training.</Paragraph> <Paragraph position="8"> In this case the effectiveness of the MWEC algorithm is combined with the efficiency of the HMM training, resulting in a fast and robust alignment training procedure.</Paragraph> <Paragraph position="9"> We also tested the more simple one-sided MWEC algorithm. In contrast to the experiments presented in Section 5.3, we used the loglinear interpolated state occupation probabilities (given by the Equation 3) as costs. Thus, although the algorithm is not able to produce a symmetric alignment, it operates with symmetrized costs. In addition, we used a combination heuristic to obtain a symmetric alignment. The results of these experiments are presented in Table 5, lines 4-6 a/b.</Paragraph> <Paragraph position="10"> The performance of the one-sided MWEC algorithm turned out to be quite robust on both tasks. However, the o-MWEC alignments are not symmetric and the achieved low AER depends heavily on the differences between the involved languages, which may favor many-to-one alignments in one translation direction only. That is why on the Verbmobil task, when determining the mininum weight in each row for the translation direction from English to German, the alignment quality deteriorates, because the algorithm cannot produce alignments which map several English words to one German word (line 5b of Table 5).</Paragraph> <Paragraph position="11"> Applying the generalization heuristics (line 6a/b of Table 5), we achieve an AER of 6.0% on the Canadian Hansards task when interpolating the state occupation probabilities trained with the HMM and with the IBM-4 schemes. On the Verbmobil task, the interpolation of the HMM and the Model 6 schemes yields the best result of 3.7% AER.</Paragraph> <Paragraph position="12"> In the latter experiment, we reached 97.3% precision and 95.2% recall.</Paragraph> </Section> </Section> class="xml-element"></Paper>