File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2036_metho.xml

Size: 3,067 bytes

Last Modified: 2025-10-06 14:10:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2036">
  <Title>Word Domain Disambiguation via Word Sense Disambiguation</Title>
  <Section position="3" start_page="141" end_page="142" type="metho">
    <SectionTitle>
2 WDD via WSD
</SectionTitle>
    <Paragraph position="0"> Our approach relies on the use of WordNet Domains (Bagnini and Cavaglia 2000) and can be outlined in the following two steps:  1. use a WordNet-based WSD algorithm to assign a sense to each word in the input text, e.g. doctor barb2right doctor#n#1 2. use WordNet Domains to map disam- null biguated words into the subject domain associated with the word, e.g. doctor#n#1barb2rightdoctor#n#1#MEDICINE. null</Paragraph>
    <Section position="1" start_page="141" end_page="141" type="sub_section">
      <SectionTitle>
2.1 WordNet Domains
</SectionTitle>
      <Paragraph position="0"> WordNet Domains is an extension of WordNet (http://wordnet.princeton.edu/) where synonym sets have been annotated with one or more sub-ject domain labels, as shown in Figure 1. Subject domains provide an interesting and useful classification which cuts across part of speech and WordNet sub-hierarchies. For example, doctor#n#1 and operate#n#1 both have sub-ject domain MEDICINE, and SPORT includes both athlete#n#1 with top hypernym lifeform#n#1 and sport#n#1 with top hypernym act#n#2.</Paragraph>
    </Section>
    <Section position="2" start_page="141" end_page="142" type="sub_section">
      <SectionTitle>
2.2 Word Sense Disambiguation
</SectionTitle>
      <Paragraph position="0"> To assign a sense to each word in the input text, we used the WSD algorithm presented in Sanfilippo et al. (2006). This WSD algorithm is based on a supervised classification approach that uses SemCor1 as training corpus. The algorithm employs the OpenNLP MaxEnt implementation of the maximum entropy classification algorithm (Berger et al. 1996) to develop word sense recognition signatures for each lemma which predicts the most likely sense for the lemma according to the context in which the lemma occurs.</Paragraph>
      <Paragraph position="1"> Following Dang &amp; Palmer (2005) and Kohomban &amp; Lee (2005), Sanfilippo et al. (2006) use contextual, syntactic and semantic information to inform our verb class disambiguation system.</Paragraph>
      <Paragraph position="2"> * Contextual information includes the verb under analysis plus three tokens found on each side of the verb, within sentence boundaries. Tokens included word as well as punctuation.</Paragraph>
      <Paragraph position="3"> * Syntactic information includes grammatical dependencies (e.g. subject, object) and morpho-syntactic features such as part of speech, case, number and tense.</Paragraph>
      <Paragraph position="4"> * Semantic information includes named entity types (e.g. person, location, organization) and hypernyms.</Paragraph>
      <Paragraph position="5"> We chose this WSD algorithm as it provides some of the best published results to date, as the comparison with top performing WSD systems in Senseval3 presented in Table 1 shows---see</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML