XML Viewer - w04-0855

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0855_metho.xml
Size: 7,483 bytes
Last Modified: 2025-10-06 14:09:12
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0855">
  <Title>UBB system at Senseval3</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Machine learning approach in
WSD
</SectionTitle>
    <Paragraph position="0"> Our system falls in the supervised learning approach category. It was trained to learn a classifler that can be used to assign a yet unseen example to one or two of a flxed number of senses.</Paragraph>
    <Paragraph position="1"> We had a trained corpus (a number of annotated contexts), from where the system learned the classifler, and a test corpus which the system will annotate.</Paragraph>
    <Paragraph position="2"> In our system we used the Vector Space Model: a context c was represented as a vector~c of some features which we will present bellow. By a context we mean the same deflnition as in Senseval denotation: the content between &lt;context&gt; and &lt;/context&gt;.</Paragraph>
    <Paragraph position="3"> The notations used to explain our method are (Manning and Schutze, 1999): + w - the word to be disambiguate; + s1;C/C/C/;sNs the senses for w; + c1;C/C/C/;cNc the contexts for w; + v1;C/C/C/;vNf the features selected.</Paragraph>
    <Paragraph position="4"> As we treated each word w to be disambiguated separately, let us explain the method for a single word. The features selected was the set of ALL words used in the trained corpus (nouns, verbs, prepositions, etc) , so we used the cooccurrence paradigm (Dagan, Lee and Pereira , 1994).</Paragraph>
    <Paragraph position="5"> The vector of a context c of the target word w is deflned as: + ~c = (w1;C/C/C/;wjWj) where wi is the number of occurences of the word vi in the context c and vi is a word from the entire trained corpus of j W j words.</Paragraph>
    <Paragraph position="6"> The similarity between two contexts ca;cb is the normalised cosine between the vectors ~ca and ~cb (Jurafsky and Martin, 2000):</Paragraph>
    <Paragraph position="8"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Association for Computational Linguistics
</SectionTitle>
      <Paragraph position="0"> for the Semantic Analysis of Text, Barcelona, Spain, July 2004 SENSEVAL-3: Third International Workshop on the Evaluation of Systems The number wi is the weight of the feature vi. This can be the frequency of the feature vi (term frequency or tf), or &amp;quot;inverse document frequency &amp;quot;, denoted by idf. In our system we considered all the words from the entire corpus, so both these aspects are satisfled.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 k-NN or memory based learning
</SectionTitle>
    <Paragraph position="0"> At training time, our k-NN model memorizes all the contexts in the training set by their associated features. Later, when proceeds a new context cnew, the classifler flrst selects k contexts in the training set that are closest to cnew, then pick the best sense (senses) for cnew (Jackson and Moulinier, 2002).</Paragraph>
    <Paragraph position="1">  that means A is the set of the k nearest neighbors contexts of ~cnew.</Paragraph>
    <Paragraph position="3"> We used the value of k set to 3 after some experimental veriflcations.</Paragraph>
    <Paragraph position="4"> A major problem with supervised approaches is the need for a large sense tagged training set. The bootstrapping methods use a small number of contexts labeled with senses having a high degree of confldence.</Paragraph>
    <Paragraph position="5"> These labeled contexts are used as seeds to train an initial classifler. This is then used to extract a larger training set from the remaining untagged contexts. Repeating this process, the number of training contexts grows and the number of untagged contexts reduces. We will stop when the remaining unannotated corpus is empty or any new context can't be annotated.</Paragraph>
    <Paragraph position="6"> In(Tatarand Serban, 2001), (Serbanand Tatar, 2003) we presented an algorithm which falls in this category. The algorithm is based on the two principles of Yarowsky (Resnik and Yarowsky, 1999): + One sense per discourse: the sense of a target word is highly consistent within a given discourse (document); + One sense per collocation: the contextual features ( nearby words) provide strong clues to the sense of a target word.</Paragraph>
    <Paragraph position="7"> Also, for each iteration, the algorithm uses a NBC classifler. We intend to present a second system based on this algorithm at the next Senseval contest.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Implementation details
</SectionTitle>
    <Paragraph position="0"> Our disambiguation system is written in JDK 1.4.</Paragraph>
    <Paragraph position="1"> In order to improve the performance of the disambiguation algorithm, we made the following reflnements in the above k-NN algorithm. First one is to substitute the lack of an e-cient tool for stemming words in Romanian.</Paragraph>
    <Paragraph position="2">  1. We deflned a relation between words as - : W PS W, where W is the set of words. If w1 2 W and w2 2 W are two words, we  say that (w1;w2) 2 - if w1 and w2 have the same grammatical root. Therefore, if w is a word and C is a context, we say that w occurs in C ifi exists a word w2 2 C so that (w;w2) 2 -. In other words, we replaced the stemming step with collecting all the words with the same root in a single class. This collection is made considering the rules for romanian morphology; 2. The step 3 of the algorithm for choosing the appropriate sense (senses) of a polysemic word w in a given context C (in fact the sense that maximizes the set S = fScore(C;sj) j j = 1;C/C/C/Nsg of scores for C) is divided in three sub-steps: + If there is a single sense s that maximizes S, then s is reported as the appropriate sense for C; + If there are two senses s1 and s2 that maximize S, then s1 and s2 are reported as the appropriate senses for C; + Consider that Max1 and Max2 are the flrst two maximum values from S where (Max1 &gt; Max2). If Max1 is obtained for a sense s1 and if Max2 is obtained for a sense s2 and if</Paragraph>
    <Paragraph position="4"> minimum score from S, then s1 and s2 are reported as the appropriate senses for C.</Paragraph>
    <Paragraph position="5"> Experimentally, we proved that the above improvements grow the precision of the disambiguation process.</Paragraph>
    <Paragraph position="6">  Considering as baseline procedure the majority sense (all contexts are solved with the most frequent sense in the training corpus), for the word nucleu (noun) is obtained a precision of 0,78 while our procedure obtained 0,81. Also, for the word desena (verb) the baseline procedure of the majority sense obtains precision 0,81 while our procedure obtained 0,85.</Paragraph>
    <Paragraph position="7"> At this stage our system has not as a goal to label with U (unknown) a context, every time choosing one or two from the best scored senses. Annotating with the label U is one of our coming improving. This can be done simply by adding as a new sense for each word the sense U. A simple experiment reported a number of right annotated contexts.</Paragraph>
    <Paragraph position="8"> Another direction to improve our system is to exploit better the senses as they are done in training corpus: our system simply consider the flrst sense.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML