File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/p02-1044_metho.xml
Size: 9,702 bytes
Last Modified: 2025-10-06 14:07:59
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1044"> <Title>Word Translation Disambiguation Using Bilingual Bootstrapping</Title> <Section position="4" start_page="1" end_page="1" type="metho"> <SectionTitle> 3 Bilingual Bootstrapping </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.1 Overview </SectionTitle> <Paragraph position="0"> Instead of using Monolingual Bootstrapping, we propose a new method for word translation disambiguation using Bilingual Bootstrapping.</Paragraph> <Paragraph position="1"> In translation from English to Chinese, for instance, BB makes use of not only unclassified data in English, but also unclassified data in Chinese. It also uses a small number of classified data in English and, optionally, a small number of classified data in Chinese. The data in English and in Chinese are supposed to be not in parallel but from the same domain.</Paragraph> <Paragraph position="2"> BB constructs classifiers for English to Chinese translation disambiguation by repeating the following two steps: (1) constructing classifiers for each of the languages on the basis of the classified data in both languages, (2) using the constructed classifiers in each of the languages to classify some unclassified data and adding them to the classified training data set of the language. The reason that we can use classified data in both languages at step (1) is that words in one language generally have translations in the other and we can find their translation relationship by using a dictionary.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.2 Algorithm </SectionTitle> <Paragraph position="0"> Let E denote a set of words in English, C a set of words in Chinese, and T a set of links in a translation dictionary as shown in Figure 1. (Any two linked words can be translation of each other.) Mathematically, T is defined as a relation between E and C , i.e., CET x[?] .</Paragraph> <Paragraph position="1"> Let e stand for a random variable on E, g a random variable on C. Also let e stand for a random variable on E, c a random variable on C, and t a random variable on T. While e and g represent words to be translated, e and c represent context words.</Paragraph> <Paragraph position="2"> For an English word e,</Paragraph> <Paragraph position="4"> similarly.</Paragraph> <Paragraph position="5"> Let e denote a sequence of words (e.g., a sentence or a text) in English</Paragraph> <Paragraph position="7"> Let c denote a sequence of words in Chinese</Paragraph> <Paragraph position="9"> We view e and c as examples representing context information for translation disambiguation.</Paragraph> <Paragraph position="10"> For an English word e, we define a binary classifier for resolving each of its translation ambiguities in</Paragraph> <Paragraph position="12"> where e denotes an example in English. Similarly, for a Chinese word g, we define a classifier as:</Paragraph> <Paragraph position="14"> where c denotes an example in Chinese.</Paragraph> <Paragraph position="15"> Similarly, we denote the sets of classified and unclassified examples with respect to g in</Paragraph> </Section> </Section> <Section position="5" start_page="1" end_page="1" type="metho"> <SectionTitle> ==== UUUU </SectionTitle> <Paragraph position="0"> We perform Bilingual Bootstrapping as described in Figure 2. Hereafter, we will only explain the process for English (left-hand side); the process for Chinese (right-hand side) can be conducted similarly.</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.3 Naive Bayesian Classifier </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> Output: classifiers in English and Chinese While we can in principle employ any kind of classifier in BB, we use here a Naive Bayesian Classifier. At step 1 in BB, we construct the classifier as described in Figure 3. At step 2, for each example e, we calculate with the Naive The second equation is based on Bayes' rule. In the calculation, we assume that the context words in e (i.e.,</Paragraph> <Paragraph position="4"> We can calculate )|( tP e are independently generated on the basis of the model. We can, therefore, employ the Expectation and Maximization Algorithm (EM Algorithm) (Dempster et al. 1977) to estimate the parameters of the model including )|( teP . We also use the relation T in the estimation. Initially, we set</Paragraph> <Paragraph position="6"> We next estimate the parameters by iteratively updating them ass described in Figure 4 until they converge. Here ),( tcf stands for the frequency of c related to t. The context information in Chinese is then 'translated' into that in English through the links in T.</Paragraph> </Section> </Section> <Section position="6" start_page="1" end_page="1" type="metho"> <SectionTitle> 4 Comparison between BB and MB </SectionTitle> <Paragraph position="0"> We note that Monolingual Bootstrapping is a special case of Bilingual Bootstrapping (consider the situation in which a equals 0 in formula (1)).</Paragraph> <Paragraph position="1"> Moreover, it seems safe to say that BB can always perform better than MB.</Paragraph> <Paragraph position="2"> The many-to-many relationship between the words in the two languages stands out as key to the higher performance of BB.</Paragraph> <Paragraph position="3"> Suppose that the classifier with respect to 'plant' has two decisions (denoted as A and B in Figure respect to 'gongchang' and 'zhiwu' in Chinese have two decisions respectively, (C and D) (E and F). A and D are equivalent to each other (i.e., they represent the same sense), and so are B and E.</Paragraph> <Paragraph position="4"> Assume that examples are classified after several iterations in BB as depicted in Figure 5. Here, circles denote the examples that are correctly classified and crosses denote the examples that are incorrectly classified.</Paragraph> <Paragraph position="5"> Since A and D are equivalent to each other, we can 'translate' the examples with D and use them to boost the performance of classification to A. This is because the misclassified examples (crosses) with D are those mistakenly classified from C and they will not have much negative effect on classification to A, even though the translation from Chinese into English can introduce some noises. Similar explanations can be stated to other classification decisions.</Paragraph> <Paragraph position="6"> In contrast, MB only uses the examples in A and B to construct a classifier, and when the number of misclassified examples increases (this is inevitable in bootstrapping), its performance will stop improving.</Paragraph> </Section> <Section position="7" start_page="1" end_page="1" type="metho"> <SectionTitle> 5 Word Translation Disambiguation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.1 Using Bilingual Bootstrapping </SectionTitle> <Paragraph position="0"> While it is possible to straightforwardly apply the algorithm of BB described in Section 3 to word translation disambiguation, we use here a variant of it for a better adaptation to the task and for a fairer comparison with existing technologies.</Paragraph> <Paragraph position="1"> The variant of BB has four modifications.</Paragraph> <Paragraph position="2"> (1) It actually employs an ensemble of the Naive Bayesian Classifiers (NBC), because an ensemble of NBCs generally performs better than a single NBC (Pedersen 2000). In an ensemble, it creates different NBCs using as data the words within different window sizes surrounding the word to be disambiguated (e.g., 'plant' or 'zhiwu') and further constructs a new classifier by linearly combining the NBCs.</Paragraph> <Paragraph position="3"> (2) It employs the heuristics of 'one sense per discourse' (cf., Yarowsky 1995) after using an ensemble of NBCs.</Paragraph> <Paragraph position="4"> (3) It uses only classified data in English at the beginning.</Paragraph> <Paragraph position="5"> (4) It individually resolves ambiguities on selected English words such as 'plant', 'interest'. As a result, in the case of 'plant'; for example, the classifiers with respect to 'gongchang' and 'zhiwu' only make classification decisions to D and E but not C and F (in Figure 5). It calculates</Paragraph> <Paragraph position="7"> tP cc =l and sets 0=th at the right-hand side of step 2.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.2 Using Monolingual Bootstrapping </SectionTitle> <Paragraph position="0"> We consider here two implementations of MB for word translation disambiguation.</Paragraph> <Paragraph position="1"> In the first implementation, in addition to the basic algorithm of MB, we also use (1) an ensemble of Naive Bayesian Classifiers, (2) the heuristics of 'one sense per discourse', and (3) a small number of classified data in English at the beginning. We will denote this implementation as MB-B hereafter.</Paragraph> <Paragraph position="2"> The second implementation is different from the first one only in (1). That is, it employs as a classifier a decision list instead of an ensemble of NBCs. This implementation is exactly the one proposed in (Yarowsky 1995), and we will denote it as MB-D hereafter.</Paragraph> <Paragraph position="3"> MB-B and MB-D can be viewed as the state-of-the-art methods for word translation disambiguation using bootstrapping.</Paragraph> </Section> </Section> class="xml-element"></Paper>