File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-3204_concl.xml
Size: 3,590 bytes
Last Modified: 2025-10-06 13:54:31
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3204"> <Title>Unsupervised WSD based on automatically retrieved examples: The importance of bias</Title> <Section position="6" start_page="8" end_page="8" type="concl"> <SectionTitle> 6 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> This paper explores the large-scale acquisition of sense-tagged examples for WSD, which is a very pervised systems compared to the unsupervised systems (marked in bold) in the 29 noun subset of the Senseval-2 Lexical Sample.</Paragraph> <Paragraph position="1"> promising line of research, but remains relatively under-studied. We have applied the &quot;monosemous relatives&quot; method to construct automatically a web corpus which we have used to train three systems based on Decision Lists: one fully supervised (applying examples from Semcor and the web corpus), one minimally supervised (relying on the distribution of senses in Semcor and the web corpus) and another fully unsupervised (using an automatically acquired sense rank and the web corpus). Those systems were tested on the Senseval-2 lexical sample test set.</Paragraph> <Paragraph position="2"> We have shown that the fully supervised system combining our web corpus with the examples in Semcor improves over the same system trained on Semcor alone. This improvement is specially noticeable in the nouns that have less than 10 examples in Semcor. Regarding the minimally supervised and fully unsupervised systems, we have shown that they perform well better than the other systems of the same category presented in the Senseval-2 lexical-sample competition.</Paragraph> <Paragraph position="3"> The system can be trained for all nouns in WordNet, using the data available at http://ixa2.si.ehu.es/pub/sensecorpus.</Paragraph> <Paragraph position="4"> The research also highlights the importance of bias. Knowing how many examples are to be fed into the machine learning system is a key issue. We have explored several possibilities, and shown that the learning system (DL) is able to learn from the web corpus in all the cases, beating the respective heuristic for sense distribution.</Paragraph> <Paragraph position="5"> We think that this research opens the opportunity for further improvements. We have to note that the MFS heuristic and the supervised systems based on the Senseval-2 training data are well ahead of our results, and our research aims at investigating ideas to close this gap. Some experiments on the line of adding automatically retrieved examples to available hand-tagged data (Semcor and Senseval2) have been explored. The preliminary results indicate that this process has to be performed carefully, taking into account the bias of the senses and applying a quality-check of the examples before they are included in the training data.</Paragraph> <Paragraph position="6"> For the future we also want to test the performance of more powerful Machine Learning methods, explore feature selection methods for each individual word, and more sophisticated ways to combine the examples from the web corpus with those of Semcor or Senseval. Now that the monosemous corpus is available for all nouns, we would also like to test the system on the all-words task. In addition, we will give preference to multiwords that contain the target word when choosing the relatives. Finally, more sophisticated methods to acquire examples are now available, like ExRetriever (Fernandez et al., 2004), and they could open the way to better examples and performance.</Paragraph> </Section> class="xml-element"></Paper>