File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0838_metho.xml

Size: 11,649 bytes

Last Modified: 2025-10-06 14:09:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0838">
  <Title>SenseLearner: Minimally Supervised Word Sense Disambiguation for All Words in Open Text</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 SenseLearner
</SectionTitle>
    <Paragraph position="0"> Our goal is to use as little annotated data as possible, and at the same time make the algorithm general enough to be able to disambiguate all content words in a text. We are therefore using (1) SemCor (Miller et al., 1993) - a balanced, semantically annotated dataset, with all content words manually tagged by trained lexicographers - to learn a se-</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Association for Computational Linguistics
</SectionTitle>
      <Paragraph position="0"> for the Semantic Analysis of Text, Barcelona, Spain, July 2004 SENSEVAL-3: Third International Workshop on the Evaluation of Systems mantic language model for the words seen in the training corpus; and (2) information drawn from WordNet (Miller, 1995), to derive semantic generalizations for those words that did not appear in the annotated corpus.</Paragraph>
      <Paragraph position="1"> The input to the disambiguation algorithm consists of raw text. The output is a text with word meaning annotations for all open-class words.</Paragraph>
      <Paragraph position="2"> The algorithm starts with a preprocessing stage, where the text is tokenized and annotated with parts of speech; collocations are identified using a sliding window approach, where a collocation is considered to be a sequence of words that forms a compound concept defined in WordNet; named entities are also identified at this stage.</Paragraph>
      <Paragraph position="3"> Next, the following two main steps are applied sequentially: 1. Semantic Language Model. In the first step, a semantic language model is learned for each part of speech, starting with the annotated corpus. These models are then used to annotate words in the test corpus with their corresponding meaning. This step is applicable only to those words that appeared at least once in the training corpus.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. Semantic Generalizations using Syntactic
</SectionTitle>
    <Paragraph position="0"> Dependencies and a Conceptual Network.</Paragraph>
    <Paragraph position="1"> This method is applied to those words not covered by the semantic language model.</Paragraph>
    <Paragraph position="2"> Through the semantic generalizations it makes, this second step is able to annotate words that never appeared in the training corpus. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Semantic Language Model
</SectionTitle>
      <Paragraph position="0"> The role of this first module is to learn a global model for each part of speech, which can be used to disambiguate content words in any input text.</Paragraph>
      <Paragraph position="1"> Although significantly more general than models that are built individually for each word in a test corpus as in e.g. (Hoste et al., 2002) - the models can only handle words that were previously seen in the training corpus, and therefore their coverage is not 100%.</Paragraph>
      <Paragraph position="2"> Starting with an annotated corpus formed by all annotated files in SemCor, a separate training data set is built for each part of speech. The following features are used to build the training models.</Paragraph>
      <Paragraph position="3"> Nouns a0 The first noun, verb, or adjective before the target noun, within a window of at most five words to the left, and its part of speech.</Paragraph>
      <Paragraph position="4"> Verbs a0 The first word before and the first word after the target verb, and its part of speech.</Paragraph>
      <Paragraph position="5"> Adj a0 One relying on the first noun after the target adjective, within a window of at most five words.</Paragraph>
      <Paragraph position="6"> a0 A second model relying on the first word before and the first word after the target adjective, and its part of speech.</Paragraph>
      <Paragraph position="7"> The two models for adjectives are applied individually, and then combined through voting. null For each open-class word in the training corpus (i.e. SemCor), a feature vector is built and added to the corresponding training set. The label of each such feature vector consists of the target word and the corresponding sense, represented as word#sense. Using this procedure, a total of 170,146 feature vectors are constructed: 86,973 vectors in the noun model, 47,838 in the verb model, and 35,335 vectors in each of the two adjective models.</Paragraph>
      <Paragraph position="8"> To annotate new text, similar vectors are created for all content-words in the raw text. The vectors are stored in different files based on their syntactic class, and a separate learning process is run for each part-of-speech. For learning, we are using the Timbl memory based learning algorithm (Daelemans et al., 2001), which was previously found useful for the task of word sense disambiguation (Mihalcea, 2002).</Paragraph>
      <Paragraph position="9"> Following the learning stage, each vector in the test data set - and thus each content word - is labeled with a predicted word and sense. If the word predicted by the learning algorithm coincides with the target word in the test feature vector, then the predicted sense is used to annotate the test instance. Otherwise, if the predicted word is different than the target word, no annotation is produced, and the word is left for annotation in a later stage.</Paragraph>
      <Paragraph position="10"> During the evaluations on the SENSEVAL-3 English all-words data set, 1,782 words were tagged using the semantic language model, resulting in an average coverage of 85.6%.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Semantic Generalizations using Syntactic
Dependencies and a Conceptual Network
</SectionTitle>
      <Paragraph position="0"> Similar to (Lin, 1997), we consider the syntactic dependency of words, but we also consider the conceptual hierarchy of a word obtained through the WordNet semantic network - as a means for generalization, capable to handle unseen words. Thus, this module can disambiguate multiple words using the same knowledge source.</Paragraph>
      <Paragraph position="1"> Moreover, the algorithm is able to disambiguate a word even if it does not appear in the training corpus. For instance, if we have a verb-object dependency pair, &amp;quot;drink water&amp;quot; in the training corpus, using the conceptual hierarchy, we will be able to successfully disambiguate the verb-object pair &amp;quot;take tea&amp;quot;, even if this particular pair did not appear in the training corpus. This is done via the generalization learned from the semantic network - &amp;quot;drink water&amp;quot; allows us to infer a more general relation &amp;quot;take-in liquid&amp;quot;, which in turn will help disambiguate the pair &amp;quot;take tea&amp;quot;, as a specialization for &amp;quot;take-in liquid&amp;quot;.</Paragraph>
      <Paragraph position="2"> The semantic generalization algorithm is divided into two phases: training phase and test phase.</Paragraph>
      <Paragraph position="3"> Training Phase As mentioned above, we use the annotated data provided in SemCor for training purposes. In order to combine the syntactic dependency of words and the conceptual hierarchies through WordNet hypernymy relations, the following steps are performed:  1. Remove the SGML tags from SemCor, and produce a raw file with one sentence per line.</Paragraph>
      <Paragraph position="4"> 2. Parse the sentence using the Link parser (Sleator and Temperley, 1993), and save all the dependency-pairs.</Paragraph>
      <Paragraph position="5"> 3. Add part-of-speech and sense information (as provided by SemCor) to each open word in the dependency-pairs.</Paragraph>
      <Paragraph position="6"> 4. For each noun or verb in a dependency-pair,  obtain the WordNet hypernym tree of the word. We build a vector consisting of the words themselves, their part-of-speech, their WordNet sense, and a reference to all the hypernym synsets in WordNet. The reason for attaching hypernym information to each dependency pair is to allow for semantic generalizations during the learning phase.</Paragraph>
      <Paragraph position="7"> 5. For each dependency-pair, we generate positive feature vectors for the senses that appear in the training set, and negative feature vectors for all the remaining possible senses.</Paragraph>
      <Paragraph position="8"> Test Phase After training, we can use the generalized feature vector to assign the appropriate sense to new words in a test data set. In the test phase, we complete the following steps:  1. Parse each sentences of the test file using the Link parser, and save all the dependencypairs. null 2. Start from the leftmost open word in the sentence and retrieve all the other open words it connects to.</Paragraph>
      <Paragraph position="9"> 3. For each such dependency-pair, create fea null ture vectors for all possible combinations of senses. For example, if the first open word in the pair has two possible senses and the second one has three possible senses, this results in a total of six possible feature vectors.</Paragraph>
      <Paragraph position="10"> 4. Finally, we pass all these feature vectors to a memory based learner, Timbl (Daelemans et al., 2001), which will attempt to label each feature vector with a positive or negative label, based on information learned from the training data.</Paragraph>
      <Paragraph position="11"> An Example Consider the following sentence from SemCor: The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election produced &amp;quot;no evidence&amp;quot; that any irregularities took place. As mentioned before, the first step consists of parsing the sentence and collecting all possible dependency-pairs among words, such as subject-verb, verb-object, etc. For simplicity, let us focus on the verb-object relation between produce and evidence. We extract the proper senses of the two words from Sem-Cor. Thus, at this point, combining the syntactic knowledge from the parser, and the semantic knowledge extracted from SemCor, we know that there is a object-verb link/relation between produced#v#4 and evidence#n#1.</Paragraph>
      <Paragraph position="12"> We now look up the hypernym tree for each of the words involved in the current dependency-pair, and create a feature vector as follows: Os, produce#v#4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, produce#v#4, expose#v#3, show#v#4, evidence#n#1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, evidence#n#1, information#n#3, cognition#n#1, psychological feature#n#1 where Os indicates an object-verb relation, and the null elements are used to pad the feature vector for a constant size of 20 elements per word.</Paragraph>
      <Paragraph position="13"> Assuming the following sentence in the test data: &amp;quot;expose meaningful information.&amp;quot;, we identify an object-verb relation between expose and information. Although none of the words in the pair &amp;quot;expose information&amp;quot; appear in the training corpus, by looking up the IS-A hierarchy from Word-Net, we will be able to successfully disambiguate this pair, as both &amp;quot;expose&amp;quot; and &amp;quot;information&amp;quot; appear in the feature vector (see the vector above).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML