File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2114_metho.xml

Size: 13,052 bytes

Last Modified: 2025-10-06 14:14:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2114">
  <Title>Linguistic Indeterminacy as a Source of Errors in Tagging</Title>
  <Section position="4" start_page="0" end_page="676" type="metho">
    <SectionTitle>
3 Manual and Automatic Markup
</SectionTitle>
    <Paragraph position="0"> The SUC has been annotated by a process that combines automatic and manual steps. The raw texts get their first analysis from the SWETWOL computerized dictionary (Karlsson 1992) and then pass a step of postprocessing to reach the analysis described in the SUC tagging manual (Ejerhed et al.</Paragraph>
    <Paragraph position="1"> 1992). The coverage of the dictionary is high, but the degree of ambiguity in Swedish is also high, actually higher than in English, so the texts return from dictionary lookup with 51% of the word tokens carrying more than one analysis.</Paragraph>
    <Paragraph position="2"> In the next step, a human annotator is to mark for each ambiguous word which of the suggested readings is the correct one and for each unambiguous word whether the suggested reading is correct. The output of this step is used as the 'man version' in the  man-machine comparison (or rather the 'woman version' as the majority of the annotators were female students).</Paragraph>
    <Paragraph position="3"> The entire corpus of 1 million words has passed through this stage of manual disambiguation and annotation, which makes it an important standard that can be used as a tool, e.g., when training probabilistic taggers. The goal of the experiment reported in Kallgren (1996) was, however, to compare 'sheer' machine tagging to the performance of human annotators. The tagger used is thus one that does not need tagged and disambiguated material to be trained on, namely the XPOST originally constructed at Xerox Parc (Cutting et al. 1992, Cutting and Pedersen 1993).</Paragraph>
    <Paragraph position="4"> The XPOST algorithm has been transferred to other languages than English. Douglass Cutting himself made the first Swedish version of it (Cutting 1993) and a later version has been implemented by Gunnar Eriksson (Eriksson 1995) and refined by Tomas Svensson (Svensson 1996). It is this latter version that has been used in the experiment.</Paragraph>
    <Paragraph position="5"> Starting from a set of texts and a lexicon, the XPOST looks up all words in the texts and assigns to them a set of one or more readings. The words are then classified into so-called ambiguity classes according to which set of readings they have been assigned. The training is performed on ambiguity classes and not on individual word tokens. Kallgren (1996) gives a more covering description of how XPOST is used on the Swedish material and also sketches the major differences between this algorithm and some others used for tagging, such as PARTS (Church 1988) and VOLSUNGA (DeRose 1988).</Paragraph>
    <Paragraph position="6"> A characteristic tbature of the SUC is its high number of different tags. The number of part-of-speech tags used in the SUC is 21. With the addition of a category for foreign words the number of major categories used is 22 (plus three tags for punctuation), which is in no way a remarkable amount, but the SUC tags are composite. This means that all words have one tag for part-of-speech, but for many parts-of speech this tag is followed by other tags for various morphological features, Where, e.g., English nouns have a variation between two possible values, singular and plural, the Swedish pattern allows for 1 x 2 x 2 x 2 x 3 = 24 different tags, specifying not only part-of-speech but also gender, number, definiteness, and case. The number of different tags actually occurring in texts is mostly around 180.</Paragraph>
    <Paragraph position="7"> A remarkable fact is that the high number of different tags does not seem to influence the training and performance of probabilistic taggers negatively in the way that might have been expected. The morphological errors in the material are not disturbingly many, considering the fact that all Swedish content words have such features.</Paragraph>
    <Paragraph position="8"> Morphological agreement provides enough information to make it possible fbr an atttomatic tagger to pick the right form in most cases. This sensitivity to close context probably explains why the high number of tags does not influence performance when it comes to picking an alternative, but it does not explain why training is so little affected by the high number of different observed situations.</Paragraph>
    <Section position="1" start_page="676" end_page="676" type="sub_section">
      <SectionTitle>
Results from a Comparison
</SectionTitle>
      <Paragraph position="0"> between 'Man' and 'Machine' The automatic tagger was run on 50,000 words of text not used in the training of the tagger. The output was compared to the same texts with manual disambiguation. All instances where the two differ have been manually inspected. The evaluation of the results is far from trivial. The 'correctness' of the tagging must be judged relative to some norm. One such norm is the SUC tagging manual (Ejerhed et al.</Paragraph>
      <Paragraph position="1"> 1992). Although it is very comprehensive and explicit, no manual can ever foresee and cover all the tricky instances that will occur in unrestricted language. Another norm is the intuition of the working linguist, with the possibility of consulting other people to get their intuitions. This also has clear drawbacks. There will always remain a set of doubtful cases which do not necessarily depend on deficits in the linguistic description. Be it here sufficient to say that in general \[ prefer the term 'consistent (with a certain norm)' instead of the term 'correct'; nevertheless, in the following discussion I will call the deviances from the applied noun 'errors'.</Paragraph>
      <Paragraph position="2"> Table I gives the errors found in a material of 50,498 words sorted according to whether they occurred in automatically or manually tagged text or both. Where both have an error, the errors can sometimes be of the same type, sometimes of different types.</Paragraph>
      <Paragraph position="3">  The automatic tagger is truly automatic in that it has not at all been adjusted to the specific task at hand. With fairly little trimming it could well reach a level of at least 95-96% consistence with the human annotator but now the basic idea was to test it 'raw'. Humans are not infallible, if anyone thought so, 1.2% of the errors are man-made. It is still a consolation to see that human annotators are seven times as good as computers when it comes to disambiguation.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="676" end_page="678" type="metho">
    <SectionTitle>
5 Types of Errors
</SectionTitle>
    <Paragraph position="0"> The errors occurring in the material can be classified according to type. By 'error type' is here meant a classification of tag pairs with an erroneous tag followed by the correct tag, e.g., an error can be of the type 'preposition suggested where it should have been an adverb'. This classification shows both which parts-of-speech are most often involved in errors and which readings of a particular word are most often mixed up with each other, and in which direction the errors mostly go. The classification can also give hints about what could possibly be done about the errors.</Paragraph>
    <Section position="1" start_page="677" end_page="677" type="sub_section">
      <SectionTitle>
5.1 Errors among Content Words
</SectionTitle>
      <Paragraph position="0"> It is clear that content words (here: nouns, verbs, adjectives, participles, proper nouns) are seldom involved in errors. Considering the large proportion of the number of running words that these major categories cover, this is even more remarkable. If words from these categories are ever mixed up, they are mixed up in very specific patterns, namely with themselves (as when different inflected forms of the same stem coincide) or they are mixed up with words they are related to (e.g., by derivation). Among the ten most common error types for either automatic or manual disambiguation, there are actually only two that involve content words.</Paragraph>
      <Paragraph position="1"> One of these error types is almost exclusively in the realm of automatic disambiguation. Swedish nouns are inflected according to five different declensions, one of which has zero plural. The automatic tagger sometimes mistakes singular nouns of that declension without modifiers for plurals, but never the other way round. This is just as could be expected; 'naked' plurals are far more common than 'naked' singulars in all declinations and will thus be favoured by the statistics. To remedy this situation, it would probably be necessary to have a phrasal lexicon, as most instances of naked singular nouns appear in lexicalized phrases.</Paragraph>
      <Paragraph position="2"> As has been pointed out for English material (cf.</Paragraph>
      <Paragraph position="3"> below) different inflections of the same verb can get mixed up. This phenomenon can be found in Swedish too, but not very frequently.</Paragraph>
      <Paragraph position="4"> The other common error type involving content words concerns adverbs derived from adjectives. The most frequent derivational pattern for Swedish adverbs makes them identical to neutral singular indefinite adjectives. Here both manual and automatic disambiguation leads to errors but in different directions. The automatic tagger suggests adverb where there should have been an adjective, while human annotators sometimes call an adverb an adjective. Both types mainly occur post-verbally and often at the very end of a graphic sentence, where it may be difficult to decide whether the concerned word is a predicative adjective or an adverb. It may well be that a subcategorization of verbs might eliminate the problem, but this is a large task to implement both in the lexicon and in the tagger.</Paragraph>
      <Paragraph position="5"> However, these errors are neither the most frequent nor the most disturbing ones. Instead, it is the function words that get mixed up in all their different uses. Actually, almost all errors concern function words and a scrutiny of them makes it clear how doubtful the whole concept of correctness is in this connection.</Paragraph>
    </Section>
    <Section position="2" start_page="677" end_page="678" type="sub_section">
      <SectionTitle>
5.2 Errors among Function Words
</SectionTitle>
      <Paragraph position="0"> The degree of homography - or is it polysemy? - is generally higher among function words than among content words which, of course, leads to more situations where errors can occur. Furthermore, the number of readings connected with each word token is highly dependent on the linguistic description used as a basis for the tagging system, its theoretical assumptions and the granularity of the system, among other things.</Paragraph>
      <Paragraph position="1"> The ten words most frequently involved in errors in the studied material are (with approximate translations and number of errors in parenthesis) the following: 'det' (it~the in neuter gender, 330 errors), 'ett' (a/one in neuter, 254), 'sore' (rel.pron and adv., 180), 'den' (it~the in common gender, 153), 'om' (if, about, 122), 'en' (a/one in common, 109), 'att' (that, inf.marker, 83), 'sS.' (so, 79), 'ut' (out, 73), 'fOr' (for, 70). They are all high frequency function words that play many different syntactic roles depending on their context.</Paragraph>
      <Paragraph position="2"> One interesting fact that the classification into error types makes clear is that all the different readings of these words do not get mixed tip at random but in rather strong, often mirror-like patterns. Let us take the word 'om' as an example. It can be used as adverb, preposition, or subordinating conjunction and all the six possible mistagged combinations do occur, but with quite varying frequency. Three of them are almost neglectable and one has a strong unidirectional pattern where the reading as an adverb (more precisely a verbal particle) is often taken for a preposition. This is an instance of the by far most common error type in the entire material, and is of course directly dependent on the way verbal particles are treated in the underlying linguistic description.</Paragraph>
      <Paragraph position="3"> The remaining two error types are the most interesting ones. They form a bidirectional pattern where the reading as a preposition is confused with the reading as a subordinating conjunction.</Paragraph>
      <Paragraph position="4"> Preposition instead of subjunction appears 40 times, subjunction instead of preposition 33 times, altogether 77 of the 122 errors connected with the word 'om'.</Paragraph>
      <Paragraph position="5"> All errors on this word were machine-induced, except 8 cases where human annotators took a subjunction to  be a preposition. Some of the error situations may be regarded as truly undecidable.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML