File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1080_intro.xml

Size: 4,414 bytes

Last Modified: 2025-10-06 14:02:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1080">
  <Title>Part of Speech Tagging in Context</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Previous Work
</SectionTitle>
    <Paragraph position="0"> A common formulation of an unsupervised part-of-speech tagger takes the form of a hidden Markov model (HMM), where the states correspond to part-of-speech tags, t</Paragraph>
    <Paragraph position="2"> each time a state is visited. The training of HMM-based taggers involves estimating lexical</Paragraph>
    <Paragraph position="4"> ). The ultimate goal of HMM training is to find the model that maximizes the probability of a given training text, which can be done easily using the forward-backward, or Baum-Welch algorithm (Baum et al 1970, Bahl, Jelinek and Mercer, 1983). These model probabilities are then used in conjunction with the Viterbi algorithm (Viterbi, 1967) to find the most probable sequence of part-of-speech tags for a given sentence.</Paragraph>
    <Paragraph position="5"> When estimating tag sequence probabilities, an HMM tagger, such as that described in Merialdo (1991), typically takes into account a history consisting of the previous two tags -- e.g. we</Paragraph>
    <Paragraph position="7"> ). Kupiec (1992) describes a modified trigram HMM tagger in which he computes word classes for which lexical probabilities are then estimated, instead of computing probabilities for individual words. Words contained within the same equivalence classes are those which possess the same set of possible parts of speech.</Paragraph>
    <Paragraph position="8"> Another highly-accurate method for part-of-speech tagging from unlabelled data is Brill's unsupervised transformation-based learner (UTBL) (Brill, 1995). Derived from his supervised transformation-based tagger (Brill, 1992), UTBL uses information from the distribution of unambiguously tagged data to make informed labeling decisions in ambiguous contexts. In contrast to the HMM taggers previously described, which make use of contextual information coming from the left side only, UTBL considers both left and right contexts.</Paragraph>
    <Paragraph position="9"> Reported tagging accuracies for these methods range from 87% to 96%, but are not directly comparable. Kupiec's HMM class-based tagger, when trained on a sample of 440,000 words of the original Brown corpus, obtained a test set accuracy of 95.7%. Brill assessed his UTBL tagger using 350,000 words of the Brown corpus for training, and found that 96% of words in a separate 200,000-word test set could be tagged correctly. Furthermore, he reported test set accuracy of 95.1% for the UTBL tagger trained on 120,000 words of Penn Treebank and tested on a separate test set of 200,000 words taken from the same corpus. Finally, using 1 million words from the Associated Press for training, Merialdo's trigram tagger was reported to have an accuracy of 86.6%. This tagger was assessed using a tag set other than that which is employed by the Penn Treebank.</Paragraph>
    <Paragraph position="10"> Unfortunately none of these results can be directly compared to the others, as they have used different, randomized and irreproducible splits of training and test data (Brill and Kupiec), different tag sets (Merialdo) or different corpora altogether. The HMM taggers we have discussed so far are similar in that they use condition only on left context when estimating probabilities of tag sequences. Recently, Toutanova et al. (2003) presented a supervised conditional Markov Model part-of-speech tagger (CMM) which exploited information coming from both left and right contexts. Accuracy on the Penn Treebank using two tags to the left as features in addition to the current tag was 96.10%. When using tag to the left and tag to the right as features in addition to the current tag, accuracy improved to 96.55%.</Paragraph>
    <Paragraph position="11"> Lafferty et al. (2001) also compared the accuracies of several supervised part-of-speech tagging models, while examining the effect of directionality in graphical models. Using a 50%50% train-test split of the Penn Treebank to assess HMMs, maximum entropy Markov models (MEMMs) and conditional random fields (CRFs), they found that CRFs, which make use of observation features from both the past and future, outperformed HMMs which in turn outperformed MEMMs.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML