File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1036_metho.xml

Size: 13,449 bytes

Last Modified: 2025-10-06 14:09:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1036">
  <Title>Finding Predominant Word Senses in Untagged Text</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Experiment with SemCor
</SectionTitle>
    <Paragraph position="0"> In order to evaluate our method we use the data in SemCor as a gold-standard. This is not ideal since we expect that the sense frequency distributions within SemCor will differ from those in the BNC, from which we obtain our thesaurus. Nevertheless, since many systems performed well on the English all-words task for SENSEVAL-2 by using the frequency information in SemCor this is a reasonable approach for evaluation.</Paragraph>
    <Paragraph position="1"> We generated a thesaurus entry for all polysemous nouns which occurred in SemCor with a frequency a96 2, and in the BNC with a frequency a137 10 in the grammatical relations listed in section 2.1 above. The jcn measure uses corpus data for the calculation of IC. We experimented with counts obtained from the BNC and the Brown corpus. The variation in counts had negligible affect on the results. 3 The experimental results reported here are obtained using IC counts from the BNC corpus. All the results shown here are those with the size of thesaurus entries (a3 ) set to 50. 4 We calculate the accuracy of finding the predominant sense, when there is indeed one sense with a higher frequency than the others for this word in SemCor (a54 a62a18a138a32a130a60a130 ). We also calculate the WSD accuracy that would be obtained on SemCor, when using our first sense in all contexts (a139a140a62</Paragraph>
    <Paragraph position="3"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Results
</SectionTitle>
      <Paragraph position="0"> The results in table 1 show the accuracy of the ranking with respect to SemCor over the entire set of 2595 polysemous nouns in SemCor with 3Using the default IC counts provided with the package did result in significantly higher results, but these default files are obtained from the sense-tagged data within SemCor itself so we discounted these results.</Paragraph>
      <Paragraph position="1"> 4We repeated the experiment with the BNC data for jcn using a141a35a142a51a143a86a144a69a145a92a146a16a144a69a145a31a147a32a144 and a148a16a144 however, the number of neighbours used gave only minimal changes to the results so we do not report them here.</Paragraph>
      <Paragraph position="2">  the jcn and lesk WordNet similarity measures.</Paragraph>
      <Paragraph position="3"> The random baseline for choosing the predominant sense over all these words (a75  a152 ) is 24%. Again, the automatic ranking outperforms this by a large margin. The first sense in SemCor provides an upper-bound for this task of 67%.</Paragraph>
      <Paragraph position="4"> Since both measures gave comparable results we restricted our remaining experiments to jcn because this gave good results for finding the predominant sense, and is much more efficient than lesk, given the precompilation of the IC files.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Discussion
</SectionTitle>
      <Paragraph position="0"> From manual analysis, there are cases where the acquired first sense disagrees with SemCor, yet is intuitively plausible. This is to be expected regardless of any inherent shortcomings of the ranking technique since the senses within SemCor will differ compared to those of the BNC. For example, in WordNet the first listed sense of pipe is tobacco pipe, and this is ranked joint first according to the Brown files in SemCor with the second sense tube made of metal or plastic used to carry water, oil or gas etc.... The automatic ranking from the BNC data lists the latter tube sense first. This seems quite reasonable given the nearest neighbours: tube, cable, wire, tank, hole, cylinder, fitting, tap, cistern, plate.... Since SemCor is derived from the Brown corpus, which predates the BNC by up to 30 years 5 and contains a higher proportion of fiction 6, the high ranking for the tobacco pipe sense according to SemCor seems plausible. null Another example where the ranking is intuitive, is soil. The first ranked sense according to SemCor is the filth, stain: state of being unclean sense whereas the automatic ranking lists dirt, ground, earth as the first sense, which is the second ranked 5The text in the Brown corpus was produced in 1961, whereas the bulk of the written portion of the BNC contains texts produced between 1975 and 1993.</Paragraph>
      <Paragraph position="1"> 66 out of the 15 Brown genres are fiction, including one specifically dedicated to detective fiction, whilst only 20% of the BNC text represents imaginative writing, the remaining 80% being classified as informative.</Paragraph>
      <Paragraph position="2"> sense according to SemCor. This seems intuitive given our expected relative usage of these senses in modern British English.</Paragraph>
      <Paragraph position="3"> Even given the difference in text type between SemCor and the BNC the results are encouraging, especially given that our a139a97a62 a129 a76 a130 results are for polysemous nouns. In the English all-words SEN-SEVAL-2, 25% of the noun data was monosemous.</Paragraph>
      <Paragraph position="4"> Thus, if we used the sense ranking as a heuristic for an &amp;quot;all nouns&amp;quot; task we would expect to get precision in the region of 60%. We test this below on the</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
all Words Data
</SectionTitle>
      <Paragraph position="0"> In order to see how well the automatically acquired predominant sense performs on a WSD task from which the WordNet sense ordering has not been taken, we use the SENSEVAL-2 all-words data (Palmer et al., 2001). 7 This is a hand-tagged test suite of 5,000 words of running text from three articles from the Penn Treebank II. We use an all-words task because the predominant senses will reflect the sense distributions of all nouns within the documents, rather than a lexical sample task, where the target words are manually determined and the results will depend on the skew of the words in the sample. We do not assume that the predominant sense is a method of WSD in itself. To disambiguate senses a system should take context into account.</Paragraph>
      <Paragraph position="1"> However, it is important to know the performance of this heuristic for any systems that use it.</Paragraph>
      <Paragraph position="2"> We generated a thesaurus entry for all polysemous nouns in WordNet as described in section 2.1 above. We obtained the predominant sense for each of these words and used these to label the instances in the noun data within the SENSEVAL-2 English all-words task. We give the results for this WSD task in table 2. We compare results using the first sense listed in SemCor, and the first sense according to the SENSEVAL-2 English all-words test data itself.</Paragraph>
      <Paragraph position="3"> For the latter, we only take a first-sense where there is more than one occurrence of the noun in the test data and one sense has occurred more times than any of the others. We trivially labelled all monosemous items.</Paragraph>
      <Paragraph position="4"> Our automatically acquired predominant sense performs nearly as well as the first sense provided by SemCor, which is very encouraging given that 7In order to do this we use the mapping provided at http://www.lsi.upc.es/~nlp/tools/mapping.html (Daud'e et al., 2000) for obtaining the SENSEVAL-2 data in WordNet 1.6. We discounted the few items for which there was no mapping. This amounted to only 3% of the data.</Paragraph>
      <Paragraph position="5">  our method only uses raw text, with no manual labelling. The performance of the predominant sense provided in the SENSEVAL-2 test data provides an upper bound for this task. The items that were not covered by our method were those with insufficient grammatical relations for the tuples employed. Two such words, today and one, each occurred 5 times in the test data. Extending the grammatical relations used for building the thesaurus should improve the coverage. There were a similar number of words that were not covered by a predominant sense in SemCor. For these one would need to obtain more sense-tagged text in order to use this heuristic. Our automatic ranking gave 67% precision on these items. This demonstrates that our method of providing a first sense from raw text will help when sense-tagged data is not available.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Experiments with Domain Specific
</SectionTitle>
    <Paragraph position="0"> Corpora A major motivation for our work is to try to capture changes in ranking of senses for documents from different domains. In order to test this we applied our method to two specific sections of the Reuters corpus. We demonstrate that choosing texts from a particular domain has a significant influence on the sense ranking. We chose the domains of SPORTS and FINANCE since there is sufficient material for these domains in this publically available corpus.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Reuters Corpus
</SectionTitle>
      <Paragraph position="0"> The Reuters corpus (Rose et al., 2002) is a collection of about 810,000 Reuters, English Language News stories. Many of the articles are economy related, but several other topics are included too. We selected documents from the SPORTS domain (topic code: GSPO) and a limited number of documents from the FINANCE domain (topic codes: ECAT and MCAT).</Paragraph>
      <Paragraph position="1"> The SPORTS corpus consists of 35317 documents (about 9.1 million words). The FINANCE corpus consists of 117734 documents (about 32.5 million words). We acquired thesauruses for these corpora using the procedure described in section 2.1.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Two Experiments
</SectionTitle>
      <Paragraph position="0"> There is no existing sense-tagged data for these domains that we could use for evaluation. We therefore decided to select a limited number of words and to evaluate these words qualitatively. The words included in this experiment are not a random sample, since we anticipated different predominant senses in the SPORTS and FINANCE domains for these words.</Paragraph>
      <Paragraph position="1"> Additionally, we evaluated our method quantitatively using the Subject Field Codes (SFC) resource (Magnini and Cavagli`a, 2000) which annotates WordNet synsets with domain labels. The SFC contains an economy label and a sports label. For this domain label experiment we selected all the words in WordNet that have at least one synset labelled economy and at least one synset labelled sports. The resulting set consisted of 38 words. We contrast the distribution of domain labels for these words in the 2 domain specific corpora.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Discussion
</SectionTitle>
      <Paragraph position="0"> The results for 10 of the words from the qualitative experiment are summarized in table 3 with the WordNet sense number for each word supplied alongside synonyms or hypernyms from WordNet for readability. The results are promising. Most words show the change in predominant sense (PS) that we anticipated. It is not always intuitively clear which of the senses to expect as predominant sense for either a particular domain or for the BNC, but the first senses of words like division and goal shift towards the more specific senses (league and score respectively). Moreover, the chosen senses of the word tie proved to be a textbook example of the behaviour we expected.</Paragraph>
      <Paragraph position="1"> The word share is among the words whose predominant sense remained the same for all three corpora. We anticipated that the stock certificate sense would be chosen for the FINANCE domain, but this did not happen. However, that particular sense ended up higher in the ranking for the FINANCE domain. null Figure 2 displays the results of the second experiment with the domain specific corpora. This figure shows the domain labels assigned to the predominant senses for the set of 38 words after ranking the words using the SPORTS and the FINANCE corpora.</Paragraph>
      <Paragraph position="2"> We see that both domains have a similarly high percentage of factotum (domain independent) labels, but as we would expect, the other peaks correspond to the economy label for the FINANCE corpus, and the sports label for the SPORTS corpus.</Paragraph>
      <Paragraph position="3"> Word PS BNC PS FINANCE PS SPORTS pass 1 (accomplishment) 14 (attempt) 15 (throw) share 2 (portion, asset) 2 2 division 4 (admin. unit) 4 6 (league) head 1 (body part) 4 (leader) 4 loss 2 (transf. property) 2 8 (death, departure) competition 2 (contest, social event) 3 (rivalry) 2 match 2 (contest) 7 (equal, person) 2 tie 1 (neckwear) 2 (affiliation) 3 (draw) strike 1 (work stoppage) 1 6 (hit, success) goal 1 (end, mental object) 1 2 (score)  inant senses for 38 polysemous words ranked using the SPORTS and FINANCE corpus.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML