File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1038_concl.xml

Size: 2,601 bytes

Last Modified: 2025-10-06 13:55:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1038">
  <Title>Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words</Title>
  <Section position="8" start_page="302" end_page="303" type="concl">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> We have presented a novel method for pattern-based discovery of lexical semantic categories.</Paragraph>
    <Paragraph position="1"> It is the first pattern-based lexical acquisition method that is fully unsupervised, requiring no corpus annotation or manually provided patterns or words. Pattern candidates are discovered using meta-patterns of high frequency and content words, and symmetric patterns are discovered using simple graph-theoretic measures. Categories are generated using a novel graph clique-set algorithm. The only other fully unsupervised lexical category acquisition approach is based on decomposition of a matrix defined by context feature vectors, and it has not been shown to scale well yet.</Paragraph>
    <Paragraph position="2"> Our algorithm was evaluated using both human judgment and automatic comparisons with Word-Net, and results were superior to previous work (although it used a POS tagged corpus) and more efficient computationally. Our algorithm is also easy to implement.</Paragraph>
    <Paragraph position="3"> Computational efficiency and specifically lack of annotation are important criteria, because they allow usage of huge corpora, which are presently becoming available and growing in size.</Paragraph>
    <Paragraph position="4"> There are many directions to pursue in the future: (1) support multi-word lexical items; (2) increase category quality by improved merge algorithms; (3) discover various relationships (e.g., hyponymy) between the discovered categories; (4) discover finer inter-word relationships, such as verb selection preferences; (5) study various properties of discovered patterns in a detailed manner; and (6) adapt the algorithm to morphologically rich languages.</Paragraph>
    <Paragraph position="5">  words' precision of 90.47%. This metric was reported to be 82% in (Widdows and Dorow, 2002).</Paragraph>
    <Paragraph position="6"> It should be noted that our algorithm can be viewed as one for automatic discovery of word senses, because it allows a word to participate in more than a single category. When merged properly, the different categories containing a word can be viewed as the set of its senses. We are planning an evaluation according to this measure after improving the merge stage.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML