File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0849_metho.xml

Size: 4,196 bytes

Last Modified: 2025-10-06 14:09:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0849">
  <Title>Class-based Collocations for Word-Sense Disambiguation</Title>
  <Section position="3" start_page="2" end_page="2" type="metho">
    <SectionTitle>
3 Class-based Collocations
</SectionTitle>
    <Paragraph position="0"> The HyperColl features are intended to capture a portion of the information in the WordNet hypernyms links (i.e., is-a relations). Hypernym-based collocations are formulated by replacing each word in the context of the target word (e.g., in the same sentence as the target word) with its complete hypernym ancestry from WordNet.</Paragraph>
    <Paragraph position="1"> Since context words are not sense-tagged, each synset representing a different sense of a context word is included in the set of hypernyms replacing that word. Likewise, in the case of multiple inheritance, each parent synset is included.</Paragraph>
    <Paragraph position="2"> The collocation variable HyperColl s for each sense s is binary, corresponding to the absence or presence of any hypernym in the set chosen for s. This set of hypernyms is chosen using the ratio of conditional probability to prior probability as described for the WordColl s feature above. In contrast, HyperColl [?],i selects nonsense-specific hypernym collocations: 10 separate binary features are used based on the G  selection criteria. (More of these features could be used, but they are limited for tractability.) For more details on hypernym collocations, see (O'Hara, forthcoming).</Paragraph>
    <Paragraph position="3"> Word-similarity classes (Lin, 1998) derived from clustering are also used to expand the pool of potential collocations; this type of semantic relatedness among words is expressed in the SimilarColl feature. For the DictColl features, definition analysis (O'Hara, forthcoming) is used to determine the semantic relatedness of the defining words. Differences between these two sources of word relations are illustrated by looking at the information they provide for 'ballerina': null word-clusters:</Paragraph>
    <Paragraph position="5"> This shows that word clusters capture a wider range of relatedness than the dictionary definitions at the expense of incidental associations (e.g., 'nicole'). Again, because context words are not disambiguated, the relations for all senses of a context word are conflated. For details on the extraction of word clusters, see (Lin, 1998); and, for details on the definition analysis, see (O'Hara, forthcoming).</Paragraph>
    <Paragraph position="6"> When formulating the features SimilarColl and DictColl, the words related to each context word are considered as potential collocations (Wiebe et al., 1998). Co-occurrence fre- null 99.72% of the answers were attempted. All features from Figure 1 were used.</Paragraph>
    <Paragraph position="7"> quencies f(s,w) are used in estimating the conditional probability P(s|w) required by the relative conditional probability selection scheme noted earlier. However, instead of using a unit weight for each co-occurrence, the relatedness weight is used (e.g., 0.056 for 'pianist'); and, because a given related-word might occur with more than one context word for the same targetword sense, the relatedness weights are added. The conditional probability of the sense given the relatedness collocation is estimated by dividing the weighted frequency by the sum of all such weighted co-occurrence frequencies for the word:</Paragraph>
    <Paragraph position="9"> Here wf(s, w) stands for the weighted co-occurrence frequency of the related-word collocation w and target sense s.</Paragraph>
    <Paragraph position="10"> The relatedness collocations are less reliable than word collocations given the level of indirection involved in their extraction. Therefore, tighter constraints are used in order to filter out extraneous potential collocations. In particular, the relative percent gain in the conditional versus prior probability must be 80% or higher, a threshold again determined via an optimization search over the Senseval-2 data. In addition, the context words that they are related to must occur more than four times in the training data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML