File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/n03-2016_concl.xml
Size: 8,631 bytes
Last Modified: 2025-10-06 13:53:28
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2016"> <Title>Cognates Can Improve Statistical Translation Models</Title> <Section position="4" start_page="0" end_page="3" type="concl"> <SectionTitle> 3 Experiments </SectionTitle> <Paragraph position="0"> We induced translation models using IBM Model 4 (Brown et al., 1990) with the GIZA toolkit (Al-Onaizan et al., 1999). The maximum sentence length in the training data was set at 30 words. The actual translations were produced with a greedy decoder (Germann et al., 2001). For the evaluation of translation quality, we used the BLEU metric (Papineni et al., 2002), which measures the n-gram overlap between the translated output and one or more reference translations. In our experiments, we used only one reference translation.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Word alignment quality </SectionTitle> <Paragraph position="0"> In order to directly measure the influence of the added cognate information on the word alignment quality, we performed a single experiment using a set of 500 manually aligned sentences from Hansards (Och and Ney, 2000). Giza was first trained on 50,000 sentences from Hansards, and then on the same training set augmented with a set of cognates. The set consisted of two copies of a list produced by applying the threshold of BCBMBHBK to LCSR list. The duplication factor was arbitrarily selected on the basis of earlier experiments with a different training and test set taken from Hansards.</Paragraph> <Paragraph position="1"> The incorporation of the cognate information resulted in a 10% reduction of the word alignment error rate, from 17.6% to 15.8%, and a corresponding improvement in both precision and recall. An examination of randomly selected alignments confirms the observation of Al-Onaizan et al. (1999) that the use of cognate information reduces the tendency of rare words to align to many co-occurring words.</Paragraph> <Paragraph position="2"> In another experiment, we concentrated on co-occurring identical words, which are extremely likely to represent mutual translations. In the baseline model, links were induced between 93.6% of identical words. In the cognate-augmented model, the ratio rose to 97.2%.</Paragraph> </Section> <Section position="2" start_page="0" end_page="3" type="sub_section"> <SectionTitle> 3.2 Europarl </SectionTitle> <Paragraph position="0"> Europarl is a tokenized and sentence-aligned multilingual corpus extracted from the Proceedings of the European factor for five methods of cognates identification averaged over nine language pairs.</Paragraph> <Paragraph position="1"> Parliament (Koehn, 2002). The eleven official European Union languages are represented in the corpus. We consider the variety of languages as important for a validation of the cognate-based approach as general, rather than language-specific.</Paragraph> <Paragraph position="2"> As the training data, we arbitrarily selected a subset of the corpus that consisted the proceedings from October 1998. By pairing English with the remaining languages, we obtained nine bitexts</Paragraph> <Paragraph position="4"> aligned sentences (500,000 words). The test data consisted of 1755 unseen sentences varying in length from 5 to 15 words from the 2000 proceedings (Koehn, 2002).</Paragraph> <Paragraph position="5"> The English language model was trained separately on a larger set of 700,000 sentences from the 1996 proceedings. null Figure 1 shows the BLEU scores as a function of the duplication factor for three methods of cognates identification averaged over nine language pairs. The results averaged over a number of language pairs are more informative than results obtained on a single language pair, especially since the BLEU metric is only a rough approximation of the translation quality, and exhibits considerable variance. Three different similarity measures were compared: Simard, DICE with a threshold of 0.39, and LCSR with a threshold of 0.58. In addition, we experimented with two different methods of extending the training set with with a list of cognates: one pair as one sentence (Simard), and thirty pairs as one sentence (DICE and LCSR).</Paragraph> <Paragraph position="6"> Greek was excluded because its non-Latin script requires a different type of approach to cognate identification.</Paragraph> <Paragraph position="7"> In the vast majority of the sentences, the alignment links are correctly induced between the respective cognates when multi- null tion of the LCSR threshold, and the corresponding BLEU scores, averaged over nine Europarl bitexts.</Paragraph> <Paragraph position="8"> The results show a statistically significant improvement null in the average BLEU score when the duplication factor is greater than 1, but no clear trend can be discerned for larger factors. There does not seem to be much difference between various methods of cognate identification. Table 1 shows results of augmenting the training set with different sets of cognates determined using LCSR. A threshold of 0.99 implies that only identical word pairs are admitted as cognates. The words pairs with LCSR around 0.5 are more likely than not to be unrelated. In each case two copies of the cognate list were used. The somewhat surprising result was that adding only &quot;high confidence&quot; cognates is less effective than adding lots of dubious cognates. In that particular set of tests, adding only identical word pairs, which almost always are mutual translations, actually decreased the BLEU score. Our results are consistent with the results of Al-Onaizan et al. (1999), who observed perplexity improvement even when &quot;extremely low&quot; thresholds were used. It seems that the robust statistical training algorithm has the ability of ignoring the unrelated word pairs, while at the same time utilizing the information provided by the true cognates.</Paragraph> </Section> <Section position="3" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 3.3 A manual evaluation </SectionTitle> <Paragraph position="0"> In order to confirm that the higher BLEU scores reflect higher translation quality, we performed a manual evaluation of a set of a hundred six-token sentences. The models were induced on a 25,000 sentences portion of Hansards.</Paragraph> <Paragraph position="1"> The training set was augmented with two copies of a cognate list obtained by thresholding LCSR at 0.56. Results ple pairs per sentence are added.</Paragraph> <Paragraph position="2"> Statistical significance was estimated in the following way. The variance of the BLEU score was approximated by randomly picking a sample of translated sentences from the test set. The size of the test sample was equal to the size of the test set (1755 sentences). The score was computed in this way 200 times for each language. The mean and the variance of the nine-language average was computed by randomly picking one of the 200 scores for each language and computing the average. The mean result produced was 0.2025, which is very close to the baseline average score of 0.2027. The standard deviation of the average was estimated to be 0.0018, which implies that averages above 0.2054 are statistically significant at the 0.95 level.</Paragraph> <Paragraph position="3"> ated by the baseline and the cognate-augmented models.</Paragraph> <Paragraph position="4"> of a manual evaluation of the entire set of 100 sentences are shown in Table 2. Although the overall translation quality is low due to the small size of the training corpus and the lack of parameter tuning, the number of completely acceptable translations is higher when cognates are added.</Paragraph> </Section> <Section position="4" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4Conclusion </SectionTitle> <Paragraph position="0"> Our experimental results show that the incorporation of cognate information can improve the quality of word alignments, which in turn result in better translations, In our experiments, the improvement, although statistically significant, is relatively small, which can be attributed to the relative crudeness of the approach based on appending the cognate pairs directly to the training data. In the future, we plan to develop a method of incorporating the cognate information directly into the training algorithm.</Paragraph> <Paragraph position="1"> We foresee that the performance of such a method will also depend on using more sophisticated word similarity measures.</Paragraph> </Section> </Section> class="xml-element"></Paper>