File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1038_intro.xml

Size: 4,711 bytes

Last Modified: 2025-10-06 14:03:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1038">
  <Title>Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words</Title>
  <Section position="4" start_page="297" end_page="298" type="intro">
    <SectionTitle>
2 Previous Work
</SectionTitle>
    <Paragraph position="0"> Much work has been done on lexical acquisition of all sorts. The three main distinguishing axes are (1) the type of corpus annotation and other human input used; (2) the type of lexical relationship targeted; and (3) the basic algorithmic approach. The two main approaches are pattern-based discovery and clustering of context feature vectors.</Paragraph>
    <Paragraph position="1"> Many of the papers cited below aim at the construction of hyponym (is-a) hierarchies. Note that they can also be viewed as algorithms for category discovery, because a subtree in such a hierarchy defines a lexical category.</Paragraph>
    <Paragraph position="2"> A first major algorithmic approach is to represent word contexts as vectors in some space and use similarity measures and automatic clustering in that space (Curran and Moens, 2002). Pereira (1993) and Lin (1998) use syntactic features in the vector definition. (Pantel and Lin, 2002) improves on the latter by clustering by committee. Caraballo (1999) uses conjunction and appositive annotations in the vector representation.</Paragraph>
    <Paragraph position="3"> 2We did not compare against methods that use richer syntactic information, both because they are supervised and because they are much more computationally demanding.</Paragraph>
    <Paragraph position="4"> 3We are not aware of any multilingual evaluation previously reported on the task.</Paragraph>
    <Paragraph position="5"> The only previous works addressing our problem and not requiring any syntactic annotation are those that decompose a lexically-defined matrix (by SVD, PCA etc), e.g. (Sch&amp;quot;utze, 1998; Deerwester et al, 1990). Such matrix decomposition is computationally heavy and has not been proven to scale well when the number of words assigned to categories grows.</Paragraph>
    <Paragraph position="6"> Agglomerative clustering (e.g., (Brown et al, 1992; Li, 1996)) can produce hierarchical word categories from an unannotated corpus. However, we are not aware of work in this direction that has been evaluated with good results on lexical category acquisition. The technique is also quite demanding computationally.</Paragraph>
    <Paragraph position="7"> The second main algorithmic approach is to use lexico-syntactic patterns. Patterns have been shown to produce more accurate results than feature vectors, at a lower computational cost on large corpora (Pantel et al, 2004). Hearst (1992) uses a manually prepared set of initial lexical patterns in order to discover hierarchical categories, and utilizes those categories in order to automatically discover additional patterns.</Paragraph>
    <Paragraph position="8"> (Berland and Charniak, 1999) use hand crafted patterns to discover part-of (meronymy) relationships, and (Chklovski and Pantel, 2004) discover various interesting relations between verbs. Both use information obtained by parsing. (Pantel et al, 2004) reduce the depth of the linguistic data used but still requires POS tagging.</Paragraph>
    <Paragraph position="9"> Many papers directly target specific applications, and build lexical resources as a side effect. Named Entity Recognition can be viewed as an instance of our problem where the desired categories contain words that are names of entities of a particular kind, as done in (Freitag, 2004) using coclustering. Many Information Extraction papers discover relationships between words using syntactic patterns (Riloff and Jones, 1999).</Paragraph>
    <Paragraph position="10"> (Widdows and Dorow, 2002; Dorow et al, 2005) discover categories using two hard-coded symmetric patterns, and are thus the closest to us. They also introduce an elegant graph representation that we adopted. They report good results. However, they require POS tagging of the corpus, use only two hard-coded patterns ('x and y', 'x or y'), deal only with nouns, and require non-trivial computations on the graph.</Paragraph>
    <Paragraph position="11"> A third, less common, approach uses set-theoretic inference, for example (Cimiano et al,  2005). Again, that paper uses syntactic information. null In summary, no previous work has combined the accuracy, scalability and performance advantages of patterns with the fully unsupervised, unannotated nature possible with clustering approaches. This severely limits the applicability of previous work on the huge corpora available at present.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML