File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1058_metho.xml

Size: 20,250 bytes

Last Modified: 2025-10-06 14:07:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1058">
  <Title>Unsupervised Word Sense Disambiguation Using Bilingual Comparable Corpora</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
&lt;SENSYA&gt;, Dui &lt;TAI&gt;)
</SectionTitle>
    <Paragraph position="0"> Russia, Serb, air, area, army, battle, commander, defense, fight, fire, force, government, helicopter, soldier null Fig. 2 Example of common related words larger weights than the others.</Paragraph>
    <Paragraph position="1"> The disparity of topical coverage between the corpora of two languages and the insufficient coverage of the bilingual dictionary also cause a lot of pairs of related words not to be aligned with any pair of related words. To recover the failure in alignment, we introduce a &amp;quot;wild card&amp;quot; pair, with which every first-language pair of related words is aligned compulsorily. The alignment with the wild-card pair suggests all senses of the first-language polysemous word, and it is accompanied by a set consisting of the first-language common related words with the same weight.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Proposed method
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Defining word senses
</SectionTitle>
      <Paragraph position="0"> We define each sense of a polysemous word x of the first language by a synonym set consisting of x itself and one or more of its translations y</Paragraph>
      <Paragraph position="2"> second language. The synonym set is similar to that in WordNet (Miller 1990) except that it is bilingual, not monolingual. Examples of some sets are given below. null {tank, tanku&lt;TANKU&gt;, Shui Cao &lt;SUISOU&gt;, Cao &lt;SOU&gt;} {tank, Zhan Che &lt;SENSYA&gt;} These synonym sets define the &amp;quot;container&amp;quot; sense and the &amp;quot;military vehicle&amp;quot; sense of &amp;quot;tank&amp;quot; respectively. Translations that preserve the ambiguity are preferably eliminated from the synonym sets defining senses because they are useless for distinguishing the senses. An example is given below.</Paragraph>
      <Paragraph position="3"> {title, Jian Shu ki&lt;KATAGAKI&gt;, Cheng Hao &lt;SYOUGOU&gt;, taitoru &lt;TAITORU&gt;, Jing Cheng &lt;KEISYOU&gt;} {title, Ti Ming &lt;DAIMEI&gt;, Ti Mu &lt;DAIMOKU&gt;, Biao Ti &lt;HYOUDAI&gt;, Shu Ming &lt;SYOMEI&gt;, taitoru&lt;TAITORU&gt;} {title, taitoru&lt;TAITORU&gt;, Xuan Shou Quan &lt;SENSYUKEN&gt;} These synonym sets define the &amp;quot;person's rank or profession&amp;quot; sense, the &amp;quot;name of a book or play&amp;quot; sense, and the &amp;quot;championship&amp;quot; sense of &amp;quot;title&amp;quot;. A Japanese word &amp;quot;taitoru&lt;TAITORU&gt;&amp;quot;, which represents all these senses, is preferably eliminated from all these synonym sets.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Extraction of pairs of related words
</SectionTitle>
      <Paragraph position="0"> The corpus of each language is statistically processed in order to extract a collection of pairs of related words in the language (Kaji et al. 2000). First, we extract words from the corpus and count the occurrence frequencies of each word. We reject words whose frequencies are less than a certain threshold. We also extract pairs of words co-occurring in a window and count the co-occurrence frequency of each pair of words. In the present implementation, the words are restricted to nouns and unknown words, which are probably nouns, and the window size is set to 25 words excluding function words.</Paragraph>
      <Paragraph position="1"> Next, we calculate mutual information MI(x, x') between each pair of words x and x'. MI(x, x') is defined by the following formula:</Paragraph>
      <Paragraph position="3"> where Pr(x) is the occurrence probability of x, and Pr(x, x') is the co-occurrence probability of x and x'. Finally, we select pairs of words whose mutual information value is larger than a certain threshold and at the same time whose relation is judged to be statistically significant through a log-likelihood ratio test.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Alignment of pairs of related words
</SectionTitle>
      <Paragraph position="0"> In this section, R</Paragraph>
      <Paragraph position="2"> denote the collections of pairs of related words extracted from the corpora of the first language and the second language, respectively. D denotes a bilingual dictionary, that is, a collection of pairs consisting of a first-language word and a second-language word that are translations of each other. Let X(x) be the set of clues for determining the sense of a first-language polysemous word x, i.e.,</Paragraph>
      <Paragraph position="4"> Henceforth, the j-th clue for determining the sense of x is denoted as x'(j).</Paragraph>
      <Paragraph position="5"> Let Y(x, x'(j)) be the set of counterparts of a pair of first-language related words (x, x'(j)), i.e.,</Paragraph>
      <Paragraph position="7"> aligned with each counterpart (y, y') ([?]Y(x, x'(j))), and a weighted set of common related words Z((x,</Paragraph>
      <Paragraph position="9"> where w(x&amp;quot;), which denotes the weight of x&amp;quot;, is set as follows:</Paragraph>
      <Paragraph position="11"> The mutual information of the counterpart, MI(y, y'), was incorporated into the weight according to the assumption that alignments with pairs of strongly related words are more plausible than those with pairs of weakly related words. The coefficient a was set to 5 experimentally.</Paragraph>
      <Paragraph position="12"> (2) Each pair of first-language related words (x, x'(j)) is aligned with the wild-card pair (y</Paragraph>
      <Paragraph position="14"/>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Calculation of correlation between senses and
clues
</SectionTitle>
      <Paragraph position="0"> We define the correlation between the i-th sense S(i) and the j-th clue x'(j) of a polysemous word x as follows: null  where A((x, x'(j)), (y, y), S(i)) denotes the plausibility of alignment of (x, x'(j)) with (y, y) suggesting S(i). The first factor in the above formula, i.e., the mutual information between the polysemous word and the j-th clue, is the base of the correlation. The numerator of the second factor is the maximum plausibility of alignments that suggest the i-th sense of the polysemous word. The denominator of the second factor has been introduced to normalize the plausibility.</Paragraph>
      <Paragraph position="1"> We define the plausibility of alignment suggesting a sense as the weighted sum of the correlations between the sense and the common related words, i.e.,</Paragraph>
      <Paragraph position="3"> As the definition of the correlation between senses and clues is recursive, we calculate it iteratively with the following initial values: C  (S(i), x'(j))=MI(x, x'(j)).</Paragraph>
      <Paragraph position="4"> The number of iteration was set at 6 experimentally. null Figure 3 shows how the correlation values converge. &amp;quot;Troop&amp;quot; demonstrates a typical pattern of convergence; namely, while the correlation with the relevant sense is kept constant, that with the irrelevant sense decreases as the iteration proceeds. &amp;quot;Ozone&amp;quot; demonstrates the effect of the wild-card pair. Note that the correlation values due to an alignment with the wild-card pair begin to diverge in the second cycle of iteration. The alignment with the wild-card pair, which is shared by all senses, does not produce any distinction among the senses in the first cycle of iteration; the divergence is caused by the difference in correlation values between the senses and the common related words.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.5 Selection of the sense of a polysemous word
</SectionTitle>
      <Paragraph position="0"> Consulting sense-vs.-clue correlation data acquired by the method described in the preceding sections, we select a sense for each instance of a polysemous word x in a text. The score of each sense of the polysemous word is defined as the sum of the correlations between the sense and clues appearing in the context, i.e.,</Paragraph>
      <Paragraph position="2"> A window of 51 words (25 words before the polysemous word and 25 words after it) is used as the context. Scores of all senses of a polysemous word are calculated, and the sense whose score is largest is selected as the sense of the instance of the polysemous word.</Paragraph>
      <Paragraph position="3"> When all scores are zero, no sense can be selected; the case is called &amp;quot;inapplicable&amp;quot;.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Experiment
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Experimental method
</SectionTitle>
      <Paragraph position="0"> We evaluated our method through an experiment using corpora of English and Japanese newspaper articles.</Paragraph>
      <Paragraph position="1"> The first language was English and the second language was Japanese. A Wall Street Journal corpus (July, 1994 to Dec., 1995; 189 Mbytes) and a Nihon Keizai Shimbun corpus (Dec., 1993 to Nov., 1994; 275 Mbytes) were used as the training comparable corpus.</Paragraph>
      <Paragraph position="2"> EDR (Japan Electronic Dictionary Research Institute) English-to-Japanese and Japanese-to-English dictionaries were merged for the experiment. The resulting dictionary included 269,000 English nouns and 276,000 Japanese nouns. Pairs of related words were extracted from the corpus of each language under the following parameter settings:  - threshold for occurrence frequencies of words: 10 - threshold for mutual information: 0.0  These settings were common to the English and Japanese corpora.</Paragraph>
      <Paragraph position="3"> We selected 60 English polysemous nouns as the test words. Words whose different senses appear in newspapers were preferred. The frequencies of the test words in the training corpus ranged from 39,140 (&amp;quot;share&amp;quot;, the third noun in descending order of frequency) to 106 (&amp;quot;appreciation&amp;quot;, the 2,914th noun).  We defined the senses of each test word. The number of senses per test word ranged from 2 to 8, and the average was 3.4. For each test word, sense-vs.-clue correlation data were acquired by the method described in Sections 3.2 through 3.4. 175 clues on average were acquired for each test word.</Paragraph>
      <Paragraph position="4"> For evaluation, we selected 100 test passages per test word from a Wall Street Journal corpus (Jan., 1996 to Dec. 1996) whose publishing period was different from that of the training corpus. The instances of test words positioned in the center of each test passage were disambiguated by the method described in Section 3.5, and the results were compared with the manually selected senses.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Results and evaluation
</SectionTitle>
      <Paragraph position="0"> We used two measurements, applicability and precision (Dagan and Itai 1994), to evaluate the performance of our method. The applicability is the proportion of instances of the test word(s) that the method could disambiguate. The precision is the proportion of disambiguated instances of the test word(s) that the method disambiguated correctly. The applicability and precision of the proposed method, averaged over the 60 test polysemous words, were 88.5% and 77.7%, respectively.</Paragraph>
      <Paragraph position="1"> The performance of our method on six out of the 60 test words is summarized in Table 1. That is, the instances are classified according to the correct sense and the sense selected by our method. These results show that the performance varies according to the test words, that our method is better in the case of frequent senses, but worse in the case of infrequent senses, and that our method can easily distinguish topic-specific senses, but not generic senses.</Paragraph>
      <Paragraph position="2"> We consider the reason for the poor performance concerning &amp;quot;measure&amp;quot; [Table 1(a)] and &amp;quot;race&amp;quot; [Table 1(c)] as follows. The second sense of &amp;quot;measure&amp;quot;, {measure, Dui Ce &lt;TAISAKU&gt;, Shou Duan &lt;SYUDAN&gt;, Chu Zhi &lt;SYOTI&gt;}, is a very generic sense; therefore effective clues for identifying the sense could not be acquired.</Paragraph>
      <Paragraph position="3"> The first sense of &amp;quot;race&amp;quot;, {race, resu&lt;REESU&gt;, Jing Zheng &lt;KYOUSOU&gt;, Jing Zou &lt;KYOUSOU&gt;, Zheng i&lt;ARASOI&gt;, Zhan &lt;SEN&gt;}, is specific to the &amp;quot;race for the presidency&amp;quot; topic and the second sense of &amp;quot;race&amp;quot;, {race, Ren Zhong &lt;ZINSYU&gt;, Min Zu &lt;MINZOKU&gt;, Zhong Shu &lt;SYUZOKU&gt;}, is specific to the &amp;quot;racial discrimination&amp;quot; topic; however, both topics are related to &amp;quot;politics&amp;quot; and, therefore, many clues were shared by these two senses.</Paragraph>
      <Paragraph position="4"> Comparison with a baseline method, which selects the most frequent sense of each polysemous word independently of contexts, was also done. Since large sense-tagged corpora were not available, we simulated the baseline method with a modified version of the proposed method; namely, for each polysemous word, the sense that maximizes the sum of correlations with all clues was selected as the most frequent sense. The applicability of the baseline method is 100%, while that of the proposed method is less than 100%. To compare with the baseline method, the proposed method was substituted with the proposed method + baseline method; namely, the baseline method was applied when the proposed method was inapplicable.</Paragraph>
      <Paragraph position="5"> The average precisions of the baseline method and the proposed method + baseline method, both of which attained 100% applicability, were 62.8% and 73.4% respectively. Figure 4 visualizes the superiority of the proposed method + baseline method; the 60 test polysemous words are scattered on a plane whose horizontal and vertical coordinates represent the precision of the baseline method and that of the proposed method + baseline method, respectively.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> Although it has produced promising results, the developed WSD method has a few problems. These limitations, along with future extensions, are discussed  below.</Paragraph>
    <Paragraph position="1"> (1) Multilingual distinction of senses  The developed method is based on the premise that the senses of a polysemous word in a language are lexicalized differently in another language. However, the premise is not always true; that is, the ambiguity of a word may be preserved by its translations. As described in Section 3.1, we preferably use translations that do not preserve the ambiguity. However, doing so is useless unless such translations are frequently used words. An essential approach to solving this problem is to use two or more second languages (Resnik and Yarowsky 2000).</Paragraph>
    <Paragraph position="2"> (2) Use of syntactic relations The developed method extracts clues for WSD according to co-occurrence in a window. However, it is obvious that doing this is not suitable for all polysemous words. Syntactic co-occurrence is more useful for disambiguating some sorts of polysemous words. It is an important and interesting research issue to extend our method so that it can acquire clues according to syntactic co-occurrence. This extended method does not replace the present method; however, we  should combine both methods or use the one suitable for each polysemous word. It should be noted that this extension also enables disambiguation of polysemous verbs.</Paragraph>
    <Paragraph position="3"> The framework of the method is compatible with syntactic co-occurrence. Basically, we only have to incorporate a parser into the step of extracting pairs of related words. A parser of the first language is indispensable, but a parser of the second language is not. As for the second language, we can use co-occurrence in a small-sized window instead of syntactic cooccurrence. null</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Comparison with other methods
</SectionTitle>
    <Paragraph position="0"> While our method aligns pairs of related words that are statistically extracted, WSD using parallel corpora aligns instances of words (Brown, et al. 1991). Both alignment techniques are quite different. Actually, from the technological viewpoint, our method is close to WSD using a second-language monolingual corpus Table 1 Results of sense selection for six polysemous words  S1: a system or instrument for calculating amount, size, weight, etc.</Paragraph>
    <Paragraph position="1"> S2: an action taken to gain a certain end S3: a law suggested in Parliament  S1: any competition, or a contest of speed S2: one of the groups that humans can be divided into according to physical features, history, language, etc.</Paragraph>
    <Paragraph position="2"> S3: a channel for a current of water  S1: a word or name given to a person to be used before his/her name as a sign rank, profession, etc.</Paragraph>
    <Paragraph position="3"> S2: a name given to a book, play, etc.</Paragraph>
    <Paragraph position="4"> S3: the legal right to own something S4: the position of being the winner of an sports competition  S1: a legal process in which a court examines a case S2: a process of testing to determine quality, value, usefulness, etc.</Paragraph>
    <Paragraph position="5"> S3: a sports competition that tests a player's ability S4: annoying thing or person S5: difficulties and troubles (Dagan and Itai 1994; Kikui 1998), where instances of co-occurrence in a first-language text are aligned with co-occurrences statistically extracted from the second-language corpus. A comparison of our method with WSD using a second-language monolingual corpus is given below.</Paragraph>
    <Paragraph position="6"> First, our method performs alignment during the acquisition phase, and transforms word-word correlation data into sense-clue correlation data, which is far more informative than the original word-word correlation data. In contrast, a method using a second-language monolingual corpus uses original word-word correlation data during the disambiguation phase. This difference results in a difference in the performance of WSD, particularly in a poor-context situation (e.g., query translation).</Paragraph>
    <Paragraph position="7"> Second, our method can acquire sense-clue correlation even from a pair of related words for which alignment results in failure [e.g., C({tank, tanku&lt;TANKU&gt;, Shui Cao &lt;SUISOU&gt;, Cao &lt;SOU&gt;}, ozone) in Figure 3]. On the contrary, a conventional WSD method using a second-language monolingual corpus uses only pairs of related words for which alignment results in success. Thus, our method can elicit more information than the conventional method.</Paragraph>
    <Paragraph position="8"> Tanaka and Iwasaki (1996) exploited the idea of translingually aligning word co-occurrences to extract pairs consisting of a word and its translation form a non-aligned (comparable) corpus. The essence of their method is to obtain a translation matrix that maximizes the distance between the co-occurrence matrix of the first language and that of the second language. Their method is useful for extracting corpus-dependent translations; however, it does not extract knowledge for WSD, i.e., which co-occurring word suggests which sense or translation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML