File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/n03-1036_concl.xml
Size: 4,009 bytes
Last Modified: 2025-10-06 13:53:29
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-1036"> <Title>Unsupervised methods for developing taxonomies by combining syntactic and statistical information</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Related work and future directions </SectionTitle> <Paragraph position="0"> The experiments in this paper describe one combination of algorithms for lexical acquisition: both the finding of semantic neighbors and the process of class-labelling could take many alternative forms, and an exhaustive evaluation of such combinations is far beyond the scope of this paper. Various mathematical models and distance measures are available for modelling semantic proximity, and more detailed linguistic preprocessing (such as chunking, parsing and morphology) could be used in a variety of ways. As an initial step, the way the granularity of part-of-speech classification affects our results for lexical acquistion will be investigated. The class-labelling algorithm could be adapted to use more sensitive measures of distance (Budanitsky and Hirst, 2001), and correlations between taxonomic distance and WordSpace similarity used as a filter.</Paragraph> <Paragraph position="1"> The coverage and accuracy of the initial taxonomy we are hoping to enrich has a great influence on success rates for our methods as they stand. Since these are precisely the aspects of the taxonomy we are hoping to improve, this raises the question of whether we can use automatically obtained hypernyms as well as the hand-built ones to help classification. This could be tested by randomly removing many nodes from WordNet before we begin, and measuring the effect of using automatically derived classifications for some of these words (possibly those with high confidence scores) to help with the subsequent classification of others.</Paragraph> <Paragraph position="2"> The use of semantic neighbors and class-labelling for computing with meaning go far beyond the experimental set up for lexical acquisition described in this paper -- for example, Resnik (1999) used the idea of a most informative subsuming node (which can be regarded as a kind of class-label) for disambiguation, as did Agirre and Rigau (1996) with the conceptual density algorithm. Taking a whole domain as a 'context', this approach to disambiguation can be used for lexical tuning. For example, using the Ohsumed corpus of medical abstracts, the top few neighbors of operation are amputation, disease, therapy and resection. Our algorithm gives medical care, medical aid and therapy as possible class-labels for this set, which successfully picks out the sense of operation which is most important for the medical domain. null The level of detail which is appropriate for defining and grouping terms depends very much on the domain in question. For example, the immediate hypernyms offered by WordNet for the word trout include fish, foodstuff, salmonid, malacopterygian, teleost fish, food fish, saltwater fish Many of these classifications are inappropriately fine-grained for many circumstances. To find a degree of abstraction which is suitable for the way trout is used in the BNC, we found its semantic neighbors which include herring swordfish turbot salmon tuna. The highest-scoring class-labels for this set are The preferred labels are the ones most humans would answer if asked what a trout is. This process can be used to select the concepts from an ontology which are appropriate to a particular domain in a completely unsupervised fashion, using only the documents from that domain whose meanings we wish to describe.</Paragraph> <Paragraph position="3"> Demonstration Interactive demonstrations of the class-labelling algorithm and WordSpace are available on the web at http://infomap.stanford.edu/classes and http://infomap.stanford.edu/webdemo. An interface to WordSpace incorporating the part-of-speech information is currently under consideration.</Paragraph> </Section> class="xml-element"></Paper>