File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/p02-1044_evalu.xml
Size: 8,453 bytes
Last Modified: 2025-10-06 13:58:50
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1044"> <Title>Word Translation Disambiguation Using Bilingual Bootstrapping</Title> <Section position="8" start_page="1" end_page="4" type="evalu"> <SectionTitle> 6 Experimental Results </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> We conducted two experiments on English-Chinese translation disambiguation.</Paragraph> <Section position="1" start_page="1" end_page="3" type="sub_section"> <SectionTitle> 6.1 Experiment 1: WSD Benchmark Data </SectionTitle> <Paragraph position="0"> We first applied BB, MB-B, and MB-D to translation of the English words 'line' and 'interest' using a benchmark data . The data mainly consists of articles in the Wall Street Journal and it is designed for conducting Word http://www.d.umn.edu/~tpederse/data.html. Sense Disambiguation (WSD) on the two words (e.g., Pedersen 2000).</Paragraph> <Paragraph position="1"> We adopted from the HIT dictionary the Chinese translations of the two English words, as listed in Table 1. One sense of the words corresponds to one group of translations. We then used the benchmark data as our test data. (For the word 'interest', we only used its four major senses, because the remaining two minor senses occur in only 3.3% of the data) The dictionary is created by Harbin Institute of Technology.</Paragraph> <Paragraph position="2"> As classified data in English, we defined a 'seed word' for each group of translations based on our intuition (cf., Table 1). Each of the seed words was then used as a classified 'sentence'. This way of creating classified data is similar to that in (Yarowsky, 1995). As unclassified data in English, we collected sentences in news articles from a web site (www.news.com), and as unclassified data in Chinese, we collected sentences in news articles from another web site (news.cn.tom.com). We observed that the distribution of translations in the unclassified data was balanced.</Paragraph> <Paragraph position="3"> Table 2 shows the sizes of the data. Note that there are in general more unclassified sentences in Chinese than in English because an English word usually has several Chinese words as translations (cf., Figure 5).</Paragraph> <Paragraph position="4"> As a translation dictionary, we used the HIT dictionary, which contains about 76000 Chinese words, 60000 English words, and 118000 links. We then used the data to conduct translation disambiguation with BB, MB-B, and MB-D, as described in Section 5.</Paragraph> <Paragraph position="5"> For both BB and MB-B, we used an ensemble of five Naive Bayesian Classifiers with the window sizes being +-1, +-3, +-5, +-7, +-9 words. For both BB and MB-B, we set the parameters of b, b, and th to 0.2, 15, and 1.5 respectively. The parameters were tuned based on our preliminary experimental results on MB-B, they were not tuned, however, for BB. For the BB specific parameter a, we set it to 0.4, which meant that we treated the information from English and that from Chinese equally.</Paragraph> <Paragraph position="6"> Table 3 shows the translation disambiguation accuracies of the three methods as well as that of a baseline method in which we always choose the major translation. Figures 6 and 7 show the learning curves of MB-D, MB-B, and BB. Figure 8 shows the accuracies of BB with different a values.</Paragraph> <Paragraph position="7"> From the results, we see that BB consistently and significantly outperforms both MB-D and MB-B. The results from the sign test are statistically significant (p-value < 0.001).</Paragraph> <Paragraph position="8"> Table 4 shows the results achieved by some existing supervised learning methods with respect to the benchmark data (cf., Pedersen 2000). Although BB is a method nearly equivalent to one based on unsupervised learning, it still performs favorably well when compared with the supervised methods (note that since the experimental settings are different, the results cannot be directly compared).</Paragraph> </Section> <Section position="2" start_page="3" end_page="4" type="sub_section"> <SectionTitle> 6.2 Experiment 2: Yarowsky's Words </SectionTitle> <Paragraph position="0"> We also conducted translation on seven of the twelve English words studied in (Yarowsky, 1995). Table 5 shows the list of the words.</Paragraph> <Paragraph position="1"> For each of the words, we extracted about 200 sentences containing the word from the Encarta English corpus and labeled those sentences with Chinese translations ourselves. We used the labeled sentences as test data and the remaining sentences as unclassified data in English. We also used the sentences in the Great bass g21072, g21072g12879 / g1314g19911, g1314g19911g18108 142 8811 fish / music 200 drug g14659g10301, g14659g2709 / g8614g2709 3053 5398 treatment / smuggler 197 duty g17143g1231, g13856g17143 / g12258, g12258g6922 1428 4338 discharge / export 197 palm g7849g8028g7653, g7849g8028 / g6175g6496 366 465 tree / hand 197 plant g5049g2390, g2390 / g7905g10301 7542 24977 industry / life 197 space g12366g19400, g19400g19565 / g3838g12366, g4443g4461g12366g19400 3897 14178 volume / outer 197 tank g3386g1823 / g8712g12677, g8845g12677 417 1400 combat / fuel 199 Total - 16845 59567 - 1384 a seed word in English as a classified example (cf., Table 5).</Paragraph> <Paragraph position="2"> We did not, however, conduct translation disambiguation on the words 'crane', 'sake', 'poach', 'axes', and 'motion', because the first four words do not frequently occur in the Encarta corpus, and the accuracy of choosing the major translation for the last word has already exceeded 98%.</Paragraph> <Paragraph position="3"> We next applied BB, MB-B, and MB-D to word translation disambiguation. The experiment settings were the same as those in Experiment 1.</Paragraph> <Paragraph position="4"> From Table 6, we see again that BB significantly outperforms MB-D and MB-B. (We will describe the results in detail in the full version of this paper.) Note that the results of MB-D here cannot be directly compared with those in (Yarowsky, 1995), mainly because the data used are different.</Paragraph> </Section> <Section position="3" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 6.3 Discussions </SectionTitle> <Paragraph position="0"> We investigated the reason of BB's outperforming MB and found that the explanation on the reason in Section 4 appears to be true according to the following observations. (1) In a Naive Bayesian Classifier, words having large values of probability ratio strong influence on the classification of t when they occur, particularly, when they frequently occur. We collected the words having large values of probability ratio for each t in both BB and MB-B and found that BB obviously has more 'relevant words' than MB-B. Here 'relevant words' for t refer to the words which are strongly indicative to t on the basis of human judgments. Table 7 shows the top ten words in terms of probability ratio for the 'g2045g5699' translation ('money paid for the use of money') with respect to BB and MB-B, in which relevant words are underlined. Figure 9 shows the numbers of relevant words for the four translations of 'interest' with respect to BB and MB-B.</Paragraph> <Paragraph position="1"> (2) From Figure 8, we see that the performance of BB remains high or gets higher when a becomes larger than 0.4 (recall that b was fixed to 0.2). This result strongly indicates that the information from Chinese has positive effects on disambiguation.</Paragraph> <Paragraph position="2"> (3) One may argue that the higher performance of BB might be attributed to the larger unclassified data size it uses, and thus if we increase the unclassified data size for MB, it is likely that MB can perform as well as BB.</Paragraph> <Paragraph position="3"> We conducted an additional experiment and found that this is not the case. Figure 10 shows the accuracies achieved by MB-B when data sizes increase. Actually, the accuracies of MB-B cannot further improve when unlabeled data sizes increase. Figure 10 plots again the results of BB as well as those of a method referred to as MB-C. In MB-C, we linearly combine two MB-B classifiers constructed with two different unlabeled data sets and we found that although the accuracies get some improvements in MB-C, they are still much lower than those of BB.</Paragraph> </Section> </Section> class="xml-element"></Paper>