File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-1036_intro.xml

Size: 3,854 bytes

Last Modified: 2025-10-06 14:01:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1036">
  <Title>Unsupervised methods for developing taxonomies by combining syntactic and statistical information</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The importance of automatic methods for enriching lexicons, taxonomies and knowledge bases from free text is well-recognized. For rapidly changing domains such as current affairs, static knowledge bases are inadequate for responding to new developments, and the cost of building and maintaining resources by hand is prohibitive.</Paragraph>
    <Paragraph position="1"> This paper describes experiments which develop automatic methods for taking an original taxonomy as a skeleton and fleshing it out with new terms which are discovered in free text. The method is completely automatic and it is completely unsupervised apart from using the original taxonomic skeleton to suggest possible classifications for new terms. We evaluate how accurately our methods can reconstruct the WordNet taxonomy (Fellbaum, 1998).</Paragraph>
    <Paragraph position="2"> The problem of enriching the lexical information in a taxonomy can be posed in two complementary ways.</Paragraph>
    <Paragraph position="3"> Firstly, given a particular taxonomic class (such as fruit) one could seek members of this class (such as apple, banana). This problem is addressed by Riloff and Shepherd (1997), Roark and Charniak (1998) and more recently by Widdows and Dorow (2002). Secondly, given a particular word (such as apple), one could seek suitable taxonomic classes for describing this object (such as fruit, foodstuff). The work in this paper addresses the second of these questions.</Paragraph>
    <Paragraph position="4"> The goal of automatically placing new words into a taxonomy has been attempted in various ways for at least ten years (Hearst and Sch&amp;quot;utze, 1993). The process for placing a word w in a taxonomy T using a corpus C often contains some version of the following stages: * For a word w, find words from the corpus C whose occurrences are similar to those of w. Consider these the 'corpus-derived neighbors' N(w) of w.</Paragraph>
    <Paragraph position="5"> * Assuming that at least some of these neighbors are already in the taxonomy T, map w to the place in the taxonomy where these neighbors are most concentrated. null Hearst and Sch&amp;quot;utze (1993) added 27 words to Word-Net using a version of this process, with a 63% accuracy at assigning new words to one of a number of disjoint WordNet 'classes' produced by a previous algorithm. (Direct comparison with this result is problematic since the number of classes used is not stated.) A more recent example is the top-down algorithm of Alfonseca and Manandhar (2001), which seeks the node in T which shares the most collocational properties with the word w, adding 42 concepts taken from The Lord of the Rings with an accuracy of 28%.</Paragraph>
    <Paragraph position="6"> The algorithm as presented above leaves many degrees of freedom and open questions. What methods should be used to obtain the corpus-derived neighbors N(w)? This question is addressed in Section 2. Given a collection of neighbors, how should we define a &amp;quot;place in the taxonomy where these neighbors are most concentrated?&amp;quot; This question is addressed in Section 3, which  defines a robust class-labelling algorithm for mapping a list of words into a taxonomy. In Section 4 we describe experiments, determining the accuracy with which these methods can be used to reconstruct the WordNet taxonomy. To our knowledge, this is the first such evaluation for a large sample of words. Section 5 discusses related work and other problems to which these techniques can be adapted.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML