File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0834_metho.xml
Size: 9,987 bytes
Last Modified: 2025-10-06 14:09:12
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0834"> <Title>Supervised Word Sense Disambiguation with Support Vector Machines and Multiple Knowledge Sources</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Support Vector Machines (SVM) </SectionTitle> <Paragraph position="0"> The SVM (Vapnik, 1995) performs optimization to find a hyperplane with the largest margin that separates training examples into two classes. A test example is classified depending on the side of the hyperplane it lies in. Input features can be mapped into high dimensional space before performing the optimization and classification. A kernel function can be used to reduce the computational cost of training and testing in high dimensional space. If the training examples are nonseparable, a regularization parameter a1 (a2a4a3 by default) can be used to control the trade-off between achieving a large margin and a low training error. We used the implementation of SVM in WEKA (Witten and Frank, 2000), where each nominal feature witha5 possible values is converted intoa5 binary (0 or 1) features. If a nominal feature takes the a6 th value, then the a6 th binary feature is set to 1 and all the other binary features are set to 0. The default linear kernel is used. Since SVM only handles binary (2-class) classification, we built one binary classifier for each sense class.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Association for Computational Linguistics </SectionTitle> <Paragraph position="0"> for the Semantic Analysis of Text, Barcelona, Spain, July 2004 SENSEVAL-3: Third International Workshop on the Evaluation of Systems Note that our supervised learning approach made use of a single learning algorithm, without combining multiple learning algorithms as adopted in other research (such as (Florian et al., 2002)).</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Multiple Knowledge Sources </SectionTitle> <Paragraph position="0"> To disambiguate a word occurrence a0 , systems nusels and nusmlst used the first four knowledge sources listed below. System nusmlsts used the English sense given for the target ambiguous word a0 as an additional knowledge source. Previous research (Ng and Lee, 1996; Stevenson and Wilks, 2001; Florian et al., 2002; Lee and Ng, 2002) has shown that a combination of knowledge sources improves WSD accuracy.</Paragraph> <Paragraph position="1"> Our experiments on the provided training data of the SENSEVAL-3 translation and sense subtask also indicated that the additional knowledge source of the English sense of the target word further improved accuracy (See Section 4.3 for details).</Paragraph> <Paragraph position="2"> We did not attempt feature selection since our previous research (Lee and Ng, 2002) indicated that SVM performs better without feature selection.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Part-of-Speech (POS) of Neighboring Words </SectionTitle> <Paragraph position="0"> We use 7 features to encode this knowledge source: a0a2a1a4a3a6a5a7a0a8a1a10a9a11a5a7a0a8a1a13a12a14a5a15a0a17a16a6a5a15a0a2a12a18a5a7a0a19a9a20a5a15a0a17a3 , where a0a8a1 a21 (a0 a21 ) is the POS of thea6 th token to the left (right) ofa0 , and a0a17a16 is the POS of a0 . A token can be a word or a punctuation symbol, and each of these neighboring tokens must be in the same sentence asa0 . We use a sentence segmentation program (Reynar and Ratnaparkhi, 1997) and a POS tagger (Ratnaparkhi, 1996) to segment the tokens surroundinga0 into sentences and assign POS tags to these tokens.</Paragraph> <Paragraph position="1"> For example, to disambiguate the word bars in the POS-tagged sentence &quot;Reid/NNP</Paragraph> <Paragraph position="3"> the POS tag of a null token.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Single Words in the Surrounding Context </SectionTitle> <Paragraph position="0"> For this knowledge source, we consider all single words (unigrams) in the surrounding context of a0 , and these words can be in a different sentence from a0 . For each training or test example, the SENSEVAL-3 official data set provides a few sentences as the surrounding context. In the results reported here, we consider all words in the provided context.</Paragraph> <Paragraph position="1"> Specifically, all tokens in the surrounding context of a0 are converted to lower case and replaced by their morphological root forms. Tokens present in a list of stop words or tokens that do not contain at least an alphabet character (such as numbers and punctuation symbols) are removed. All remaining tokens from all training contexts provided fora0 are gathered. Each remaining token a38 contributes one feature. In a training (or test) example, the feature corresponding to a38 is set to 1 iff the context ofa0 in that training (or test) example contains a38 .</Paragraph> <Paragraph position="2"> For example, if a0 is the word bars and the set of selected unigrams is a39 chocolate, iron, beera40 , the feature vector for the sentence &quot;Reid saw me looking at the iron bars .&quot; is a22 0, 1, 0a37 .</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Local Collocations </SectionTitle> <Paragraph position="0"> A local collocation a1 a21 a41a42 refers to the ordered sequence of tokens in the local, narrow context of a0 .</Paragraph> <Paragraph position="1"> Offsetsa6 and a43 denote the starting and ending position (relative to a0 ) of the sequence, where a negative (positive) offset refers to a token to its left (right). For example, let a0 be the word bars in the sentence &quot;Reid saw me looking at the iron bars .&quot; Then a1 a1a10a9a7a41a44a1a13a12 is the iron and a1 a1a13a12a36a41a9 is iron . a35 , where a35 denotes a null token. Like POS, a collocation does not cross sentence boundary. To represent this knowledge source of local collocations, we extracted 11 features corresponding to the following collocations: a1 a1a45a12a46a41a44a1a45a12 , a1 a12a46a41a44a12 , a1 a1a10a9a7a41a44a1a10a9 , a1 a9a7a41a9 , a1 a1a10a9a7a41a44a1a13a12 , a1 a1a13a12a36a41a44a12 , a1 a12a36a41a9 , a1 a1a4a3a47a41a44a1a45a12 , a1 a1a48a9a18a41a44a12 , a1 a1a13a12a36a41a9 , and a1 a12a36a41a3 . This set of 11 features is the union of the collocation features used in Ng and Lee (1996) and Ng (1997).</Paragraph> <Paragraph position="2"> Note that each collocationa1 a21 a41a42 is represented by one feature that can have many possible feature values (the local collocation strings), whereas each distinct surrounding word is represented by one feature that takes binary values (indicating presence or absence of that word). For example, if a0 is the word bars and suppose the set of collocations fora1 a1a10a9a7a41a44a1a13a12 is a39 a chocolate, the wine, the irona40 , then the feature value for collocation a1 a1a48a9a18a41a44a1a45a12 in the sentence &quot;Reid saw me looking at the iron bars .&quot; is the iron.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 Syntactic Relations </SectionTitle> <Paragraph position="0"> We first parse the sentence containinga0 with a statistical parser (Charniak, 2000). The constituent tree structure generated by Charniak's parser is then converted into a dependency tree in which every word points to a parent headword. For example, in the sentence &quot;Reid saw me looking at the iron bars .&quot;, the word Reid points to the parent headword saw. Similarly, the word me also points to the parent headword saw.</Paragraph> <Paragraph position="1"> We use different types of syntactic relations, depending on the POS ofa0 . Ifa0 is a noun, we use four features: its parent headword a49 , the POS of a49 , the voice of a49 (active, passive, or a50 if a49 is not a verb), and the relative position of a49 from a0 (whether a49 is to the left or right of a0 ). If a0 is a verb, we use six features: the nearest word a0 to the left ofa0 such that a0 is the parent headword of a0 , the nearest word a1 to the right ofa0 such thata0 is the parent headword of a1 , the POS of a0 , the POS of a1 , the POS of a0 , and the voice ofa0 . Ifa0 is an adjective, we use two features: its parent headword a49 and the POS of a49 .</Paragraph> <Paragraph position="2"> Headwords are obtained from a parse tree with the script used for the CoNLL-2000 shared task (Tjong Kim Sang and Buchholz, 2000).1 Some examples are shown in Table 1. Each POS noun, verb, or adjective is illustrated by one example. For each example, (a) showsa0 and its POS; (b) shows the sentence where a0 occurs; and (c) shows the feature vector corresponding to syntactic relations. null</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.5 Source Language (English) Sense </SectionTitle> <Paragraph position="0"> For the translation and sense subtask of the multilingual lexical sample task, the sense of an ambiguous worda0 in the source language (English) is provided for most of the training and test examples. An example with unknown English sense is denoted with question mark (&quot;?&quot;) in the corpus. We treat &quot;?&quot; as another &quot;sense&quot; ofa0 (just like any other valid sense ofa0 ).</Paragraph> <Paragraph position="1"> We compile the set of English senses of a word a0 encountered in the whole training corpus. For each sense a2 in this set, a binary feature is generated for each training and test example. If an example has a2 as the English sense of a0 , this binary feature (corresponding to a2 ) is set to 1, otherwise it is set to 0.</Paragraph> </Section> </Section> class="xml-element"></Paper>