File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0824_metho.xml
Size: 8,318 bytes
Last Modified: 2025-10-06 14:09:12
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0824"> <Title>Multi-Component Word Sense Disambiguationa0</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 External training data </SectionTitle> <Paragraph position="0"> There are 57 different ambiguous words in the task: 32 verbs, 20 nouns, and 5 adjectives. For each word a38 a training set of pairs</Paragraph> <Paragraph position="2"> generated from the task-specific data; a40 a42 is a vector of features and a56a59a39 a38 a47 is the set of possible senses for a38 . Nouns are labeled with Wordnet 1.71 synset labels, while verbs and adjectives are annotated with the Wordsmyth's dictionary labels. For nouns and verbs we used the hierarchies of Wordnet to generate the additional training data. We used the given sense map to map Wordsmyth senses to Wordnet synsets. For adjectives we simply used the task-specific data and a standard flat classifier.3 For each noun, or verb, synset we generated a fixed number a60 of other semantically similar 2E.g., the example sentence for the noun synset relegation is &quot;He has been relegated to a post in Siberia&quot;, 3We used Wordnet 2.0 in our experiments using the Word-net sense map files to map synsets from 1.71 to 2.0. Algorithm 1 Find a60 Closest Neighbors 1: input a61a63a62a65a64a66a46a68a67 , a69a71a70a72a62a74a73 , k 2: repeat 3: a75a77a76a79a78a29a80a82a81a84a83a43a85a86a61a72a87 4: a88a90a89a92a91a71a76a94a93a33a95a41a96a29a97a66a98a99a97a16a100 a101a102a98a84a97a66a93a33a98a99a103a53a101a53a104a68a103a43a100a82a97a84a39a41a75a53a44a105a60a106a47 5: for each a107a71a54a108a88a109a89a92a91 do 6: if a110a69a71a70a99a110a112a111a113a60 then 7: a69 a70 a76a79a69 a70a115a114 a107 8: end if 9: end for 10: for each a116a59a117a31a116 is a parent of a75 do 11: ENQUE(Q,v) 12: end for 13: DEQUE(Q) 14: until a110a69a71a70a118a110a35a62a74a60 or a61a65a62a74a73 synsets. For each sense we start collecting synsets among the descendants of the sense itself and work our way up the hierarchy following the paths from the sense to the top until we found a60 synsets. At each level we look for the closest a60 descendants of the current synset as follows - this is the &quot;closest descendants()&quot; function of Algorithm 1 above. If there are a60 or less descendants we collect them all. Otherwise, we take the closest a60 around the synset exploiting the fact that when ordered, using the synset IDs as keys, similar synsets tend to be close to each other4. For example, synsets around Algorithm 1 presents a schematic description of the procedure. For each sense a46 of a noun, or verb, we produced a set a69a120a70 of a60a77a62a122a121a16a123a35a123 similar neighbor synsets of a46 . We label this set with a124a46 , thus for each set of labels a56a59a39 a38 a47 we induce a set of pseudo-labels</Paragraph> <Paragraph position="4"> ing instance from the Wordnet glosses. At the end of this process, for each noun or verb, there is an additional training set a39a41a40 a42 a44a125a124a46 a42 a47a11a126 .</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Classifier </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Multiclass averaged perceptron </SectionTitle> <Paragraph position="0"> Our base classifier is the multiclass averaged perceptron. The multiclass perceptron (Crammer and Singer, 2003) is an on-line learning algorithm which 9: end for 10: end if 11: end for 12: until no more mistakes extends to the multiclass case the standard perceptron. It takes as input a training set a39a41a40 a42 a44a24a46 a42 a47a11a49</Paragraph> <Paragraph position="2"> ceptron, one introduces a weight vector a116a33a70a57a54a34a29a31 a32 for every a46a59a54 a56a59a39 a38 a47 , and defines a35 by the so-called</Paragraph> <Paragraph position="4"> a32 refers to the matrix of weights, with every column corresponding to one of the weight vectors a116a7a70 . The algorithm is summarized in Algorithm 2. Training patterns are presented one at a time. Whenever a35a19a39a41a40 a42 a36 a1 a47a58a57a62a65a46 a42 an update step is performed; otherwise the weight vectors remain unchanged. To perform the update, one first computes the error set a8 a42 containing those class labels that have received a higher score than the correct class:</Paragraph> <Paragraph position="6"> The perceptron algorithm defines a sequence of weight matrices a1 a49a61a60a20a53 a44a4a3a4a3a4a3a66a44 a1 a49a49 a53 , where a1 a49a42 a53 is the weight matrix after the first a2 training items have been processed. In the standard perceptron, the weight matrix a1 a62 a1 a49a49 a53 is used to classify the unlabeled test examples. However, a variety of methods can be used for regularization or smoothing in order to reduce the effect of overtraining. Here we used the averaged perceptron (Collins, 2002), where the weight matrix used to classify the test data is the average of all of the matrices posited during training, i.e., a1 a62 a52</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Multicomponent architecture </SectionTitle> <Paragraph position="0"> Task specific and external training data are integrated with a two-component perceptron. The dis- null The first component is trained on the task-specific data. The second component learns a separate weight matrix a66 , where each column vector represents the set label a124a46 , and is trained on both the task-specific and the additional training sets. Each component is weighted by a parameter a70 , namely 1 and 0.5. In the former case only the first component is used, in the latter they are both used, and their contributions are equally weighted.</Paragraph> <Paragraph position="1"> The training procedure for the multicomponent classifier is described in Algorithm 3. This is a simplification of the algorithm presented in (Ciaramita et al., 2003). The two algorithms are similar except that convergence, if the data is separable, is clear in this case because the two components are trained individually with the standard multiclass perceptron procedure. Convergence is typically achieved in less than 50 iterations, but the value for a68 to be used for evaluation on the unseen test data was chosen by cross-validation. With this version of the algorithm the implementation is simpler especially if several components are included.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Multilabel cases </SectionTitle> <Paragraph position="0"> Often, several senses of an ambiguous word are very close in the hierarchy. Thus it can happen that a synset belongs to the neighbor set of more than one sense of the ambiguous word. When this is the case the training instance for that synset is treated as a multilabeled instance; i.e., a124a46 a42 is actually a set of labels for a40a68a42 , that is, a124a46a7a42a50a82 a124a56a58a39 a38 a47 . Several methods can be used to deal with multilabeled instances, here we use a simple generalization of Algorithm 2. The error set for a multilabel training instance is defined as: a8 a42 a62a65a64a10a9 a54a77a56 a117a64a83a99a46 a54 a46 a42 a44a65a11a41a116a14a13 a44a24a40 a42a16a15a26a17 a11a41a116a9a70a7a44a24a40 a42a16a15 a67 (3) which is equivalent to the definition in Equation 2 when a110a46 a42 a110a106a62 a121 . The positive update of Algorithm 2 (line 6) is also redefined. The update concerns a set for the best value of a2 , which is then chosen as the value for the final system, together with the value a3a5a4 that performed better. On most words the multicomponent model outperforms the flat one of labels a56 a42a7a6 a124a56a58a39 a38 a47 such that there are incorrect labels wich achieved a better score; i.e., a56 a42 a62 a64a66a46 a54 a46 a42 a117a23a83a76a9a9a8a54 a46 a42 a44a65a11a41a116a14a13a9a44a24a40 a42a16a15 a17 a11a41a116a9a70a84a44a24a40 a42a21a15 a67 . For each a46a30a54 a56 a42 the update is equal to a22 a52a27 a48a10a19 a27 , which, again, reduces to the former case when a110a56 a42 a110a33a62 a121 .</Paragraph> </Section> </Section> class="xml-element"></Paper>