File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0813_metho.xml
Size: 3,558 bytes
Last Modified: 2025-10-06 14:08:05
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0813"> <Title>Combining Contextual Features for Word Sense Disambiguation</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 System Description </SectionTitle> <Paragraph position="0"> We developed an automatic WSD system that uses a maximum entropy framework to combine linguistic contextual features from corpus instances of each verb to be tagged. Under the maximum entropy framework (Berger et al., 1996), evidence from different features can be combined with no assumptions of feature independence. The automatic tagger estimates the conditional probability that a word has sense x given that it occurs in context y, where y is a conjunction of features. The estimated probability is derived from feature weights which are determined automatically from training data so as to produce a probability distribution that has maximum entropy, under the constraint that it is consistent with observed evidence.</Paragraph> <Paragraph position="1"> In order to extract the linguistic features necessary for the model, all sentences were first automatically part-of-speech-tagged using a maximum entropy tagger (Ratnaparkhi, 1998) and parsed using the Collins parser (Collins, 1997). In addition, an automatic named entity tagger (Bikel et al., 1997) was run on the sentences to map proper nouns to a small set of semantic classes. Following work by Chodorow, Leacock and Miller, we divided the possible model features into topical and local contextual features. Topical features looked for the presence of keywords occurring anywhere in the sentence and any surrounding sentences provided as context (usually one or two sentences). The set of 200-300 keywords is specific to each lemma to be disambiguated, and is determined automatically from training data so as to minimize the entropy of the probability of the senses conditioned on the keyword. null The local features for a verb w in a particular sentence tend to look only within the smallest clause containing w. They include collocational features requiring no linguistic preprocessing beyond part-of-speech tagging (1), syntactic features that capture relations between the verb and its complements (24), and semantic features that incorporate information about noun classes for objects (5-6): 1. the word w, the part of speech of w, and words at positions -2, -1, +1, +2, relative to w 2. whether or not the sentence is passive 3. whether there is a subject, direct object, indirect object, or clausal complement (a complement whose node label is S in the parse tree) 4. the words (if any) in the positions of subject, direct object, indirect object, particle, prepositional complement (and its object) 5. a Named Entity tag (PERSON, ORGANIZA-TION, LOCATION) for proper nouns appearing in (4) 6. WordNet synsets and hypernyms for the nouns appearing in (4)1 This set of local features relies on access to syntactic structure as well as semantic class information, and represents our move towards using richer syntactic and semantic knowledge sources to model human performance.</Paragraph> <Paragraph position="2"> 1Nouns were not disambiguated in any way, and all possible synsets and hypernyms for the noun were included. No separate disambiguation of noun complements was done because, given enough data, the maximum entropy model should assign high weights to the correct semantic classes of the correct noun sense if they represent defining selectional restrictions.</Paragraph> </Section> class="xml-element"></Paper>