File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/c02-1090_concl.xml
Size: 2,388 bytes
Last Modified: 2025-10-06 13:53:12
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1090"> <Title>Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision</Title> <Section position="6" start_page="3" end_page="3" type="concl"> <SectionTitle> 7. Discussion </SectionTitle> <Paragraph position="0"> In this paper we have examined different possibilities to take advantage of the taxonomic organization of a thesaurus to improve the accuracy of classifying new words into its classes.</Paragraph> <Paragraph position="1"> The study demonstrated that taxonomic similarity between nearest neighbors, in addition to their distributional similarity to the new word, may be a useful evidence on which classification decision can be based. We have proposed a &quot;tree ascending&quot; classification algorithm which extends the kNN method by making use of the taxonomic similarity between nearest neighbors.</Paragraph> <Paragraph position="2"> This algorithm was found to have a very good ability to choose a superconcept of the correct class for a new word. On the basis of this finding, another algorithm was developed that combines the tree ascending algorithm and kNN in order to optimize the search for the correct class. Although only limited statistical significance of its improvement on kNN was found, the results of the study indicate that this algorithm is a promising possibility to incorporate the structure of a thesaurus into the decision as to the class of the new word. We conjecture that the tree ascending algorithm leaves a lot of room for improvements and combinations with other algorithms like kNN.</Paragraph> <Paragraph position="3"> The tree descending algorithm, a technique widely used for text categorization, proved to be much less efficient than standard classifiers when applied to the task of augmenting a domain-specific thesaurus. Its poor performance is due to the fact that in such a thesaurus there are great differences between top concepts in the amount of distributional data used to represent them, which very often misleads the top-down search.</Paragraph> <Paragraph position="4"> We believe that a study of the two algorithms on the material of a larger thesaurus, where richer taxonomic information is available, can yield a further understanding of its role in the performance of the algorithms.</Paragraph> </Section> class="xml-element"></Paper>