File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1131_concl.xml
Size: 2,421 bytes
Last Modified: 2025-10-06 13:53:57
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1131"> <Title>Word sense disambiguation criteria: a systematic study</Title> <Section position="5" start_page="1" end_page="1" type="concl"> <SectionTitle> 4 Conclusion </SectionTitle> <Paragraph position="0"> We have described here the results of a systematic and in-depth research on WSD criteria. This may be the first research of this extent carried out within a unified framework. This study enabled us to confirm certain results stated in the field literature such as: * importance of short contexts; * importance of adjacent noun or adverb for adjective disambiguation; * importance of adjacent adjective, or noun in a very short context for noun disambiguation; * importance of the noun in the area after the verb and use of dissymmetrical contexts for verb disambiguation.</Paragraph> <Paragraph position="1"> We have also obtained more original results, not always in line with some practices in the field such as: * importance of stop-words whose withdrawal decreases the performance almost systematically; * better results obtained by bigrams taken alone than unigrams alone; * unnecessary constraint of including or be adjacent to the target word.</Paragraph> <Paragraph position="2"> Disambiguation accuracy could probably be improved by the study of other sources of information useful in disambiguation, such as: * criteria based on binary syntactic relations (nounnoun, noun-verb, adjective-noun, etc.) to capture information which can be absent from short contexts; * the use of thesauri or other sources of information to carry out generalizations on context words to overcome data sparseness problem; * topical text information; * selectional restrictions.</Paragraph> <Paragraph position="3"> This preliminary study focuses on homogenous criteria (for example: lemmas located from -2 to +2 position). To improve the disambiguation accuracy, we have to look for heterogeneous criteria by gathering the most relevant pieces of contextual evidence not necessarily of the same type (for example: lemma in position -2, part-of-speech in position -1, morphological form of target word and lemma in position +2). This feature selection leads to a combinatorial explosion that can be solved by genetic algorithms number of senses (S), sense repartition entropy (H) and base-line accuracy (Most Frequent Sense: MFS).</Paragraph> </Section> class="xml-element"></Paper>