File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/95/e95-1020_relat.xml
Size: 3,890 bytes
Last Modified: 2025-10-06 14:16:03
<?xml version="1.0" standalone="yes"?> <Paper uid="E95-1020"> <Title>Distributional Part-of-Speech Tagging Hinrich Schfitze</Title> <Section position="3" start_page="0" end_page="141" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> The simplest part-of-speech taggers are bigram or trigram models (Church, 1989; Charniak et al., 1993). They require a relatively large tagged training text. Transformation-based tagging as introduced by Brill (1993) also requires a hand-tagged text for training. No pretagged text is necessary for Hidden Markov Models (Jelinek, 1985; Cutting et al., 1991; Kupiec, 1992). Still, a lexicon is needed that specifies the possible parts of speech for every word. Brill and Marcus (1992a) have shown that the effort necessary to construct the part-of-speech lexicon can be considerably reduced by combining learning procedures and a partial part-of-speech categorization elicited from an informant.</Paragraph> <Paragraph position="1"> The present paper is concerned with tagging languages and sublanguages for which no a priori knowledge about grammatical categories is available, a situation that occurs often in practice (Brill and Marcus, 1992a).</Paragraph> <Paragraph position="2"> Several researchers have worked on learning grammatical properties of words. Elman (1990) trains a connectionist net to predict words, a process that generates internal representations that reflect grammatical category. Brill et al. (1990) try to infer grammatical category from bi-gram statistics. Finch and Chater (1992) and Finch (1993) use vector models in which words are clustered according to the similarity of their close neighbors in a corpus. Kneser and Ney (1993) present a probabilistic model for entropy maximization that also relies on the immediate neighbors of words in a corpus. Biber (1993) applies factor analysis to collocations of two target words (&quot;certain&quot; and &quot;right&quot;) with their immediate neighbors.</Paragraph> <Paragraph position="3"> What these approaches have in common is that they classify words instead of individual occurrences. Given the widespread part-of-speech ambiguity of words this is problematicJ How should a word like &quot;plant&quot; be categorized if it has uses both as a verb and as a noun? How can a categorization be considered meaningful if the infinitive marker &quot;to&quot; is not distinguished from the homophonous preposition? In a previous paper (Schfitze, 1993), we trained a neural network to disambiguate part-of-speech *Although Biber (1993) classifies collocations, these can also be ambiguous. For example, &quot;for certain&quot; has both senses of &quot;certain&quot;: &quot;particular&quot; and &quot;sure&quot;.</Paragraph> <Paragraph position="4"> word side nearest neighbors onto left onto right seemed left seemed right into toward away off together against beside around down reduce among regarding against towards plus toward using unlike appeared might would remained had became could must should seem seems wanted want going meant tried expect likely Table h Words with most similar left and right neighbors for &quot;onto&quot; and &quot;seemed&quot;. using context; however, no information about the word that is to be categorized was used. This scheme fails for cases like &quot;The soldiers rarely come home.&quot; vs. &quot;The soldiers will come home.&quot; where the context is identical and information about the lexical item in question (&quot;rarely&quot; vs. &quot;will&quot;) is needed in combination with context for correct classification. In this paper, we will compare two tagging algorithms, one based on classifying word types, and one based on classifying words-plus-context.</Paragraph> </Section> class="xml-element"></Paper>