XML Viewer - p06-3002

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-3002_intro.xml

Size: 1,322 bytes

Last Modified: 2025-10-06 14:03:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-3002">
  <Title>Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering</Title>
  <Section position="4" start_page="7" end_page="7" type="intro">
    <SectionTitle>
1.3 Outline
</SectionTitle>
    <Paragraph position="0"> This work constructs an unsupervised POS tagger from scratch. Input to our system is a considerable amount of unlabeled, monolingual text bar any POS information. In a first stage, we employ a clustering algorithm on distributional similarity, which groups a subset of the most frequent 10,000 words of a corpus into several hundred clusters (partitioning 1). Second, we use similarity scores on neighbouring co-occurrence profiles to obtain again several hundred clusters of medium- and low frequency words (partitioning 2). The combination of both partitionings yields a set of word forms belonging to the same derived syntactic category. To gain on text coverage, we add ambiguous high-frequency words that were discarded for partitioning 1 to the lexicon. Finally, we train a Viterbi tagger with this lexicon and augment it with an affix classifier for unknown words.</Paragraph>
    <Paragraph position="1"> The resulting taggers are evaluated against outputs of supervised taggers for various languages.</Paragraph>
  </Section>
class="xml-element"></Paper>

Download Original XML