File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-0101_concl.xml
Size: 4,323 bytes
Last Modified: 2025-10-06 13:55:29
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0101"> <Title>Improving Context Vector Models by Feature Clustering for Automatic Thesaurus Construction</Title> <Section position="8" start_page="4" end_page="5" type="concl"> <SectionTitle> 6. Error Analysis and Conclusion </SectionTitle> <Paragraph position="0"> Using context vector models to construct thesaurus suffers from the problems of large feature dimensions and data sparseness. We propose a feature clustering method to overcome the problems. The experimental results show that it performs better than the LSI models in distinguishing related/unrelated pairs for the infrequent data, and also achieve relevant scores on other evaluations. null Feature clustering method could raise the ability of discrimination, but not robust enough to improve the performance in extracting synonyms.</Paragraph> <Paragraph position="1"> It also reveals the truth that it's easy to distinguish whether a pair is related or unrelated once the word pair shares the same sense in their senses. However, it's not the case when seeking synonyms. One has to discriminate each sense for each word first and then compute the similarity between these senses to achieve synonyms.</Paragraph> <Paragraph position="2"> Because feature clustering method lacks the ability of senses discrimination of a word, the method can handle the task of distinguishing correlation pairs rather than synonyms identification. Also, after analyzing discrimination errors made by context vector models, we found that some errors are not due to insufficient contextual information. Certain synonyms have dissimilar contextual contents for different reasons. We observed some phenomenon of these cases: a) Some senses of synonyms in testing data are not their dominant senses.</Paragraph> <Paragraph position="3"> Take guang1hua2 (Guang Hua ) for example, it has a sense of &quot;splendid&quot; which is similar to the sense of guang1mang2 (Guang Mang ). Guang1hua2 and guang1mang2 are certainly mutually changeable in a certain degree, guang1hua2jin4shi4 (Guang Hua Jin Shi ) and guang1mang2jin4shi4 (Guang Mang Jin Shi ), or xi2ri4guang1hua2 ( Xi Ri Guang Hua ) and xi2ri4guang1mang2 (Xi Ri Guang Mang ). However, the dominated contextual sense of guang1hua2 is more likely to be a place name, like guang1hua2shi4chang3( Guang Hua Shi Chang ) or hua1lian2guang1hua2 (Hua Lian Guang Hua ) etc3.</Paragraph> <Paragraph position="4"> b) Some synonyms are different in usages for pragmatic reasons.</Paragraph> <Paragraph position="5"> Synonyms with different contextual vectors could be result from different perspective views.</Paragraph> <Paragraph position="6"> For example, we may view wai4jie4 (Wai Jie ) as a container image with viewer inside, but on the other hand, yi3wai4 (Yi Wai ) is an omnipotence perspective. This similar meaning but different perspective makes distinct grammatical usage and different collocations.</Paragraph> <Paragraph position="7"> Similarly, zhong1shen1 (Zhong Shen ) and sheng1ping2 ( Sheng Ping ) both refer to &quot;life-long time&quot;. zhong1shen1 explicates things after a time point, which differs from sheng1ping2, showing matters before a time point.</Paragraph> <Paragraph position="8"> c) Domain specific usages.</Paragraph> <Paragraph position="9"> For example, in medical domain news ,wa1wa1 (Wa Wa ) occurs frequently with bo1li2 (Bo Li ) refer 3 This may due to different genres. In newspapers the proper noun usage of guang1hua2 is more common than in a literature text.</Paragraph> <Paragraph position="10"> to kind of illness. Then the corpus reinterpret wa1wa1 (Wa Wa ) as a sick people, due to it occurs with medical term. But the synonym of wa1wa1 (Wa Wa ), xiao3peng2you3(Xiao Peng You ) stands for money in some finance news. Therefore, the meanings of words change from time to time. It's hard to decide whether meaning is the right answer when finding synonyms.</Paragraph> <Paragraph position="11"> With above observations, our future researches will be how to distinguish different word senses from its context features. Once we could distinguish the corresponding features for different senses, it will help us to extract more accurate synonyms for both frequent and infrequent words.</Paragraph> </Section> class="xml-element"></Paper>