File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2007_metho.xml

Size: 13,499 bytes

Last Modified: 2025-10-06 14:10:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2007">
  <Title>Word Sense Disambiguation Using Automatically Translated Sense Examples</Title>
  <Section position="3" start_page="45" end_page="46" type="metho">
    <SectionTitle>
2 Acquisition of Sense Examples
</SectionTitle>
    <Paragraph position="0"> Wang and Carroll (2005) proposed an automatic approach to acquire sense examples from large amount of Chinese text and English-Chinese and Chinese-English dictionaries. The acquisition process is summarised as follows:  1. Translate an English ambiguous word  to Chinese, using an English-Chinese lexicon. Given the assumption that mappings between words and senses are different between English and Chinese, each sense a1 a2 of  maps to a distinct Chinese word. At the end of this step, we have produced a set  , which consists of Chinese words a5 a6 a8 a10 a6 a12 a10 a13 a13 a13 a10 a6 a18 a20 ,wherea6 a2 is the translation corresponding to sense a1 a2 of  , we collect the text snippets retrieved and construct a Chinese corpus. 3. Word-segment these Chinese text snippets. 4. Use an electronic Chinese-English lexicon to translate  the Chinese corpora constructed word by word to English. null This process can be completely automatic and unsupervised. However, in order to compare the performance against other WSD systems, one needs to map senses in the bilingual dictionary to those used by gold standard datasets, which are often from WordNet (Fellbaum, 1998). This step is inevitable unless we use senses in the bilingual dictionary as gold standard. Fortunately, the mapping process only takes a very short time   ,comparing to the effort that it would take to manually sense annotate training examples. At the end of the acquisition process, for each sense a23 a24 of an ambiguous word a25 , we have a large set of English contexts. Note that a context is represented by a bag of words only. We mimicked this process and built a set of sense examples.</Paragraph>
    <Paragraph position="1"> To obtain a richer set of features, we adapted the above process and carried out another acquisition experiment. When translating Chinese text snippets to English in the 4th step, we used MT software instead of a bilingual dictionary. The intuition is that although machine translated text contains noise, features like word ordering, POS tags  A similar process took 15 minutes per noun as reported in (Chan and Ng, 2005), and about an hour for 20 nouns as reported in (Wang and Carroll, 2005).</Paragraph>
    <Paragraph position="2">  {English sense example 1 for sense 1 of w} {English sense example 2 for sense 1 of w} ... ...</Paragraph>
    <Paragraph position="3"> {English sense example 1 for sense 2 of w} {English sense example 2 for sense 2 of w} ... ...</Paragraph>
    <Paragraph position="4"> Figure 1:Adapted process of automatic acquisition of sense examples. For simplicity, assume  has two senses.</Paragraph>
    <Paragraph position="5"> and bigrams/trigrams may still be of some use for ML classifiers. In this approach, the 3rd step can be omitted, since MT software should be able to take care of segmentation. Figure 1 illustrates our adapted acquisition process.</Paragraph>
    <Paragraph position="6"> As described above, we prepared two sets of training examples for each English word sense to disambiguate: one set was translated word-by-word by looking up a bilingual dictionary, as proposed in (Wang and Carroll, 2005), and the other translated using MT software. In detail, we first mapped senses of ambiguous words, as defined in the gold-standard TWA (Mihalcea, 2003) and Senseval-3 lexical sample (Mihalcea et al., 2004) datasets (which we use for evaluation) onto their  corresponding Chinese translations. We did this by looking up an English-Chinese dictionary PowerWord 2002 2 . This mapping process involved human intervention, but it only took an annotator (fluent speaker in both Chinese and English) 4 hours. Since some Chinese translations are  also ambiguous, which may affect WSD performance, the annotator was asked to select the Chinese words that are relatively unambiguous (or ideally monosemous) in Chinese for the target word senses, when it was possible. Sometimes multiple senses of an English word can map to the same Chinese word, according to the English-Chinese dictionary. In such cases, the annotator was advised to try to capture the subtle difference between these English word senses and then to  PowerWord is a commercial electronic dictionary application. There is a free online version at: http://cb.kingsoft.com.</Paragraph>
    <Paragraph position="7"> select different Chinese translations for them, using his knowledge on the languages. Then, using the translations as queries, we retrieved as many text snippets as possible from the Chinese Gigaword Corpus. For efficiency purposes, we randomly chose maximumly a0 a1 a1 text snippets for each sense, when acquiring data for nouns and adjectives from Senseval-3 lexical sample dataset. The length of the snippets was set to a4</Paragraph>
    <Paragraph position="9"> characters.</Paragraph>
    <Paragraph position="10"> From here we prepared two sets of sense examples differently. For the approach of dictionary-based translation, we segmented all text snippets, using the application ICTCLAS  . After the segmentor marked all word boundaries, the system automatically translated the text snippets word by word using the electronic LDC Mandarin-English Translation Lexicon 3.0. All possible translations of each word were included. As expected, the lexicon does not cover all Chinese words. We simply discarded those Chinese words that do not have an entry in this lexicon. We also discarded those Chinese words with multiword English translations. Finally we got a set of sense examples for each sense. Note that a sense example produced here is simply a bag of words without ordering.</Paragraph>
    <Paragraph position="11"> We prepared the other set of sense examples by translating text snippets with the MT software Systran a5 a6  Standard, where each example contains much richer features that potentially can be exploited by ML algorithms.</Paragraph>
  </Section>
  <Section position="4" start_page="46" end_page="47" type="metho">
    <SectionTitle>
3 Experimental Settings
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="46" end_page="47" type="sub_section">
      <SectionTitle>
3.1 Training
</SectionTitle>
      <Paragraph position="0"> We applied the Vector Space Model (VSM) algorithm on the two different kinds of sense examples (i.e., dictionary translated ones vs. MT software translated ones), as it has been shown to perform well with the features described below (Agirre and Martinez, 2004a). In VSM, we represent each context as a vector, where each feature has an 1 or 0 value to indicate its occurrence or absence.</Paragraph>
      <Paragraph position="1"> For each sense in training, a centroid vector is obtained, and these centroids are compared to the vectors that represent test examples, by means of the cosine similarity function. The closest centroid assigns its sense to the test example.</Paragraph>
      <Paragraph position="2"> For the sense examples translated by MT software, we analysed the sentences using different  tools and extracted relevant features. We applied stemming and POS tagging, using the fnTBL toolkit (Ngai and Florian, 2001), as well as shallow parsing  . Then we extracted the following types of topical and domain features  ,whichwere then fed to the VSM machine learner:  Topical features: we extracted lemmas of the content words in two windows around the target word: the whole context and a</Paragraph>
      <Paragraph position="4"> window. We also obtained salient bigrams in the context, with the methods and the software described in (Pedersen, 2001). We included another feature type, which match the closest words (for each POS and in both directions) to the target word (e.g. LEFT NOUN &amp;quot;dog&amp;quot; or LEFT VERB &amp;quot;eat&amp;quot;).  Domain features: The &amp;quot;WordNet Domains&amp;quot; resource was used to identify the most relevant domains in the context. Following the relevance formula presented in (Magnini and Cavagli'a, 2000), we defined two feature types: (1) the most relevant domain, and (2) a list of domains above a threshold  .</Paragraph>
      <Paragraph position="5"> For the dictionary-translated sense examples, we simply used bags of words as features.</Paragraph>
    </Section>
    <Section position="2" start_page="47" end_page="47" type="sub_section">
      <SectionTitle>
3.2 Evaluation
</SectionTitle>
      <Paragraph position="0"> We evaluated our WSD classifier on both coarse-grained and fine-grained datasets. For coarse-grained WSD evaluation, we used TWA dataset (Mihalcea, 2003), which is a binarily sense-tagged corpus drawn from the British National Corpus (BNC), for 6 nouns. For fine-grained evaluation, we used Senseval-3 English lexical sample dataset (Mihalcea et al., 2004), which comprises 7,860 sense-tagged instances for training and 3,944 for testing, on 57 words (nouns, verbs and adjectives). The examples were mainly drawn from BNC. WordNet a2 a6 a3 a6 a2  was used as sense inventory for nouns and adjectives, and Wordsmyth  for verbs. We only evaluated our WSD systems on nouns and adjectives.</Paragraph>
      <Paragraph position="1">  This software was kindly provided by David Yarowsky's group at Johns Hopkins University.</Paragraph>
      <Paragraph position="2">  Preliminary experiments using local features (bigrams and trigrams) showed low performance, which was expected because of noise in the automatically acquired data.  This software was kindly provided by Gerard Escudero's group at Universitat Politecnica de Catalunya. The threshold was set in previous work.</Paragraph>
      <Paragraph position="3">  http://www.wordsmyth.net We also used the SemCor corpus (Miller et al., 1993) for tuning our relative-threshold heuristic. It contains a number of texts, mainly from the Brown Corpus, comprising about 200,000 words, where all content words have been manually tagged with senses from WordNet.</Paragraph>
      <Paragraph position="4"> Throughout the paper we will use the concepts of precision and recall to measure the performance of WSD systems, where precision refers to the ratio of correct answers to the total number of answers given by the system, and recall indicates the ratio of correct answers to the total number of instances. Our ML systems attempt every instance and always give a unique answer, and hence precision equals to recall. When comparing with other systems that participated in Senseval-3 in Table 7, both recall and precision are shown. When POS and overall averages are given, they are calculated by micro-averaging the number of examples per word.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="47" end_page="48" type="metho">
    <SectionTitle>
4 Experiments on TWA dataset
</SectionTitle>
    <Paragraph position="0"> First we trained a VSM classifier on the sense examples translated with the Systran MT software (we use notion &amp;quot;MT-based approach&amp;quot; to refer to this process), and then tested it on the TWA test dataset. We tried two combinations of features: one only used topical features and the other used the whole feature set (i.e., topical and domain features). Table 1 summarises the sizes of the training/test data, the Most Frequent Sense (MFS) baseline and performances when applying the two different feature combinations. We can see that best results were obtained when using all the features. It also shows that both our systems achieved a significant improvement over the MFS baseline. Therefore, in the subsequent WSD experiments following the MT-based approach, we decided to use the entire feature set.</Paragraph>
    <Paragraph position="1"> To compare the machine-translated sense examples with the ones translated word-by-word, we then trained the same VSM classifier on the examples translated with a bilingual dictionary (we use notion &amp;quot;dictionary-based approach&amp;quot; to refer to this process) and evaluated it on the same test dataset. Table 2 shows results of the dictionary-based approach and the MT-based approach. For comparison, we include results from another system (Mihalcea, 2003), which uses monosemous relatives to automatically acquire sense examples.</Paragraph>
    <Paragraph position="2"> The right-most column shows results of a 10-fold  translated sense examples, with different sets of features. The MFS baseline(%) and the number of training and test examples are also shown.</Paragraph>
    <Paragraph position="3">  tems and a supervised cross-validation on test data. cross-validation on the TWA data, which indicates the score that a supervised system would attain, taking additional advantage that the examples for training and test are drawn from the same corpus.</Paragraph>
    <Paragraph position="4"> We can see that our MT-based approach has achieved significantly better recall than the other two automatic methods. Besides, the results of our unsupervised system are approaching the performance achieved with hand-tagged data. It is worth mentioning that Mihalcea (2003) applied a similar supervised cross-validation method on this dataset that scored 83.35%, very close to our unsupervised system  . Thus, we can conclude that the MT-based system is able to reach the best performance reported on this dataset for an unsupervised system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML