File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1131_concl.xml

Size: 2,421 bytes

Last Modified: 2025-10-06 13:53:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1131">
  <Title>Word sense disambiguation criteria: a systematic study</Title>
  <Section position="5" start_page="1" end_page="1" type="concl">
    <SectionTitle>
4 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have described here the results of a systematic and in-depth research on WSD criteria. This may be the first research of this extent carried out within a unified framework. This study enabled us to confirm certain results stated in the field literature such as:  * importance of short contexts; * importance of adjacent noun or adverb for adjective disambiguation; * importance of adjacent adjective, or noun in a very short context for noun disambiguation; * importance of the noun in the area after the verb  and use of dissymmetrical contexts for verb disambiguation.</Paragraph>
    <Paragraph position="1"> We have also obtained more original results, not always in line with some practices in the field such as: * importance of stop-words whose withdrawal decreases the performance almost systematically; * better results obtained by bigrams taken alone than unigrams alone; * unnecessary constraint of including or be adjacent to the target word.</Paragraph>
    <Paragraph position="2"> Disambiguation accuracy could probably be improved by the study of other sources of information useful in disambiguation, such as: * criteria based on binary syntactic relations (nounnoun, noun-verb, adjective-noun, etc.) to capture information which can be absent from short contexts; * the use of thesauri or other sources of information to carry out generalizations on context words to overcome data sparseness problem; * topical text information; * selectional restrictions.</Paragraph>
    <Paragraph position="3"> This preliminary study focuses on homogenous criteria (for example: lemmas located from -2 to +2 position). To improve the disambiguation accuracy, we have to look for heterogeneous criteria by gathering the most relevant pieces of contextual evidence not necessarily of the same type (for example: lemma in position -2, part-of-speech in position -1, morphological form of target word and lemma in position +2). This feature selection leads to a combinatorial explosion that can be solved by genetic algorithms  number of senses (S), sense repartition entropy (H) and base-line accuracy (Most Frequent Sense: MFS).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML