File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1058_intro.xml

Size: 6,001 bytes

Last Modified: 2025-10-06 14:01:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1058">
  <Title>Unsupervised Word Sense Disambiguation Using Bilingual Comparable Corpora</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Approach
2.1 Framework
</SectionTitle>
    <Paragraph position="0"> A comparable corpus consists of a first-language corpus and a second-language corpus of the same domain.</Paragraph>
    <Paragraph position="1"> Unlike a parallel corpus, we cannot align sentences or instances of words translingually. Therefore, we extract a collection of statistically significant pairs of related words from each language corpus independently of the other language, and then align the pairs of related words translingually with the assistance of a bilingual dictionary. The underlying assumption is that translations of words that are related in one language are also related in the other language (Rapp 1995).</Paragraph>
    <Paragraph position="2"> Translingual alignment of pairs of related words enables us to acquire knowledge useful for WSD (i.e., sense-clue pair). For example, the alignment of (tank, gasoline) with (tanku&lt;TANKU&gt;, gasorin&lt;GASORIN&gt;) implies that &amp;quot;gasoline&amp;quot; is a clue for selecting the &amp;quot;container&amp;quot; sense of &amp;quot;tank&amp;quot;, which is translated as &amp;quot;tanku &lt;TANKU&gt;&amp;quot;, and the alignment of (tank, soldier) with (Zhan Che &lt;SENSYA&gt;, Bing Shi &lt;HEISI&gt;) implies that &amp;quot;soldier&amp;quot; is a clue for selecting the &amp;quot;military vehicle&amp;quot; sense of &amp;quot;tank&amp;quot;, which is translated as &amp;quot;Zhan Che &lt;SENSYA&gt;&amp;quot;.</Paragraph>
    <Paragraph position="3"> Figure 1 shows an overview of our proposed method for acquiring knowledge for WSD. In the framework of translingually aligning pairs of related words, we encounter two major problems: the ambiguity in alignment, and the disparity of topical coverage between the corpora of the two languages. The following sections discuss how to overcome these problems.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Coping with ambiguity in alignment
</SectionTitle>
      <Paragraph position="0"> Matching of pairs of related words via a bilingual dictionary often suggests that a pair in one language can be aligned with two or more pairs in the other language.</Paragraph>
      <Paragraph position="1"> For example, an English pair (tank, troop) can be aligned with Japanese pairs (Shui Cao &lt;SUISOU&gt;, Qun re &lt;MURE&gt;), (Cao &lt;SOU&gt;, Duo Shu &lt;TASUU&gt;), (Zhan Che &lt;SENSYA&gt;, Qun &lt;GUN&gt;), (Zhan Che &lt;SENSYA&gt;, Duo Shu &lt;TASUU&gt;), and (Zhan Che &lt;SENSYA&gt;, Dui &lt;TAI&gt;). We resolve this ambiguity on the assumption that correct alignments are accompanied by a lot of common related words that can be aligned with each other. In the above example, a lot of words related to both &amp;quot;tank&amp;quot; and &amp;quot;troop&amp;quot; can be aligned with words related to both &amp;quot;Zhan Che &lt;SENSYA&gt;&amp;quot; and &amp;quot;Dui &lt;TAI&gt;&amp;quot; (see Figure 2(b5)).</Paragraph>
      <Paragraph position="2"> The plausibility of alignment is evaluated according to the set of first-language common related words that can be aligned with second-language common related words. Then, using the plausibility of alignment, the correlation between the senses of a polysemous word and the clues for selecting the most suitable sense is calculated. To precisely evaluate the plausibility of alignment, we define it as the sum of the correlations between the sense suggested by the alignment and the common related words accompanying the alignment.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Coping with disparity between corpora
</SectionTitle>
      <Paragraph position="0"> Given the disparity of topical coverage between the corpora of two languages as well as the insufficient coverage of the bilingual dictionary, the method described in the preceding section seems too strict. As exemplified in Figure 2, even for a correct alignment of a first-language pair of related words with a second-language pair of related words, only a small part of the first-language common related words can be aligned with second-language common related words. To improve the robustness of the method, instead of the set of first-language common related words that can be aligned with second-language common related words, we use a weighted set consisting of all the first-language common related words, where those aligned with second-language common related words are given  acquiring knowledge for WSD (a) Common related words of (tank, troop)  Army, Bosnian, Bosnian government, Chechen, Chechnya, Force, Grozny, Israel, Moscow, Mr. Yeltsin, Mr. Yeltsin's, NATO, Pentagon, Republican, Russia, Russian, Secretary, Serb, U.N., Yeltsin, Yeltsin's, air, area, army, assault, battle, bomb, carry, civilian, commander, control, defense, fight, fire, force, government, helicopter, military, missile, rebel, soldier, weapon  (b1) Common related words of (tank, troop) that can be aligned with common related words of (Shui Cao &lt;SUISOU&gt;, Qun re&lt;MURE&gt;) air, area, fire, government (b2) Common related words of (tank, troop) that can be aligned with common related words of (Cao &lt;SOU&gt;, Duo Shu &lt;TASUU&gt;) area, army, control, force (b3) Common related words of (tank, troop) that can be aligned with common related words of (Zhan Che &lt;SENSYA&gt;, Qun &lt;GUN&gt;) area, army, battle, commander, force, government (b4) Common related words of (tank, troop) that can be aligned with common related words of (Zhan Che &lt;SENSYA&gt;, Duo Shu &lt;TASUU&gt;) Serb, area, army, battle, force, government (b5) Common related words of (tank, troop) that can be aligned with common related words of (Zhan Che</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML