File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0830_metho.xml
Size: 9,342 bytes
Last Modified: 2025-10-06 14:09:12
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0830"> <Title>The University of Jaen Word Sense Disambiguation System *</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Experimental Environment </SectionTitle> <Paragraph position="0"> The presented disambiguator uses the Vector Space Model (VSM) as an information representation model. Each sense of a word is represented as a vector in an n-dimensional space where n is the number of words in all its contexts.</Paragraph> <Paragraph position="1"> The accuracy of the disambiguator depends essentially on the word weights. We use the LVQ algorithm to adjust them. The input vector weights are calculated as shown by (Salton and McGill, 1983) with the standard tf*idf, where the documents are the paragraphs. They are presented to the LVQ network and, after training, the output vectors (called prototype or codebook vectors) are obtained, containing the adjusted weights for all senses of each word.</Paragraph> <Paragraph position="2"> Any word to disambiguate is represented with a vector in the same way. This representation must be compared with all the trained word sense vectors by applying the cosine similarity rule:</Paragraph> <Paragraph position="4"> The sense corresponding to the vector of highest similarity is selected as the disambiguated sense.</Paragraph> <Paragraph position="5"> To train the neural network we have integrated semantic information from two linguistic resources: SemCor corpus and WordNet lexical database. null</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 SemCor </SectionTitle> <Paragraph position="0"> Firstly, the SemCor (the Brown Corpus labeled with the WordNet senses) was fully used (the Brown-1, Brown-2 and Brown-v partitions). We used the paragraph as a contextual semantic unit</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Association for Computational Linguistics </SectionTitle> <Paragraph position="0"> for the Semantic Analysis of Text, Barcelona, Spain, July 2004 SENSEVAL-3: Third International Workshop on the Evaluation of Systems and each context was included in the training vector set.</Paragraph> <Paragraph position="1"> The SENSEVAL-3 English tasks have used the WordNet 1.7.1 sense inventory, but the SemCor is tagged with an earlier version of WordNet (specifically WordNet version 1.6).</Paragraph> <Paragraph position="2"> Figure 1. SemCor context for &quot;climb&quot;.</Paragraph> <Paragraph position="3"> Therefore it was necessary to update the SemCor word senses. We have used the automatically mapped version of Semcor with the WordNet 1.7.1 senses found in WordNet site</Paragraph> <Paragraph position="5"> Figure 1 shows the common format for the all the resource input paragraphs. For each word, the pos and sense are described, e.g. &quot;climb\2#1&quot; is the verb &quot;climb&quot; with sense 1. In addition, it has 158 different words in its context and all of them are shown like the pair word-frequency.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 WordNet </SectionTitle> <Paragraph position="0"> 'Semantic relations from WordNet 1.7.1 were considered, in particular synonymy, antonymy, hyponymy, homonymy, hyperonymy, meronymy, and coordinate terms to generate artificial paragraphs with words along each relation.</Paragraph> <Paragraph position="1"> For example, for a word with 7 senses, 7 artificial paragraphs with the synonyms of the 7 senses were added, 7 more with all its hyponyms, and so on.</Paragraph> <Paragraph position="2"> Figure 2 shows these artificial paragraphs for the &quot;climb&quot; verb.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Learning Vector Quantization </SectionTitle> <Paragraph position="0"> The LVQ algorithm (Kohonen, 1995) performs supervised learning, which uses a set of inputs with their correctly annotated outputs adjusting the model when an error is committed between the model outputs and the known outputs.</Paragraph> <Paragraph position="1"> The LVQ algorithm is a classification method based on neural competitive learning, which allows the definition of a group of categories on the input data space by reinforced learning, either positive (reward) or negative (punishment). In competitive learning, the output neurons compete to become active. Only a single output neuron is active at any one time.</Paragraph> <Paragraph position="2"> The general application of LVQ is to adjust the weights of labels to high dimensional input vectors, which is technically done by representing the labels as regions of the data space, associated with adjustable prototype or codebook vectors. Thus, a codebook vector, w k , is associated for each class, k. This is particularly useful for pattern classification problems.</Paragraph> <Paragraph position="3"> The learning algorithm is very simple. First, the learning rate and the codebook vectors are initialised. Then, the following procedure is repeated for all the training input vectors until a stopping criterion is satisfied: - Select a training input pattern, x, with class d, and present it to the network - Calculate the Euclidean distance between the input vector and each codebook vector</Paragraph> <Paragraph position="5"> closest to the input vector, x.</Paragraph> <Paragraph position="7"> This codebook vector is the winner neuron and only this neuron updates its weights according the learning equation (equation 3). If the class of the input pattern, x, matches the class of the winner codebook vector, w c (the classification has been correct), then the codebook vector is moved closer to the pattern (reward), otherwise it is moved further away.</Paragraph> <Paragraph position="8"> Let x(t) be a input vector at time t, and w</Paragraph> <Paragraph position="10"> codebook vector for the class k at time t. The following equation defines the basic learning process for the LVQ algorithm.</Paragraph> <Paragraph position="12"> long to the same class (c = d); and s = -1, if they do not (c [?] d). a(t) is the learning rate, and 0<a(t)<1 is a monotically decreasing function of time. It is recommended that a(t) should initially be rather small, say, smaller than 0.1 (Kohonen, 1995) and a(t) continues decreasing to a given threshold, u, very close to 0.</Paragraph> <Paragraph position="13"> The codebook vectors for the LVQ were initialized to zero and every training vector was introduced into the neural network, modifying the prototype vector weights depending on the correctness in the winner election.</Paragraph> <Paragraph position="14"> All training vectors were introduced several times, updating the weights according to learning equation. a(t) is a monotonically decreasing function and it represents the learning rate factor, beginning with 0.1 and decreasing lineally: where P is the number of iterations performed in the training. The number of iterations has been fixed at 25 because at this point the network is stabilized.</Paragraph> <Paragraph position="15"> The LVQ must find the winner sense by calculating the Euclidean distances between the codebook vectors and input vector. The shortest distance points to the winner and its weights must be updated.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 English Tasks </SectionTitle> <Paragraph position="0"> The training corpus generated from SemCor and WordNet has been used to train the neural networks. All contexts of every word to disambiguate constitute a domain. Each domain represents a word and its senses. Figure 3 shows the codebook vectors generated after training process for &quot;climb&quot; domain.</Paragraph> <Paragraph position="1"> We have generated one network per domain and after the training process, we have as many domains as there are words to disambiguate adjusted. The network architecture per domain is shown in of different terms in all contexts of the given domain and the number of output units is the number of different senses.</Paragraph> <Paragraph position="2"> The disambiguator system has been used in English lexical sample and English all words tasks. For the English lexical sample task, we have used the available SENSEVAL-3 corpus to train the neural networks. We have also used the contexts generated using SemCor and WordNet for each word in SENSEVAL-3 corpus. For the Eng- null Once the training has finished, the testing begins. The test is very simple. We establish the similarity between a given vector of the corpus evaluation with all the codebook vectors of its domain, and the highest similarity value corresponds to the disambiguated sense (winner sense). If it is not possible to find a sense (it is impossible to obtain the cosine similarity value), we assign by default the most frequent sense (e.g. the first sense in WordNet).</Paragraph> <Paragraph position="3"> The official results achieved by the University of Jaen system are presented in Table 1 for English lexical sample task, and in Table 2 for English all words.</Paragraph> </Section> class="xml-element"></Paper>