File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/96/c96-2130_evalu.xml

Size: 4,189 bytes

Last Modified: 2025-10-06 14:00:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2130">
  <Title>Learning Part-of-Speech Guessing Rules from Lexicon: Extension to Non-Concatenative Operations*</Title>
  <Section position="6" start_page="773" end_page="774" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> The direct performance measures of the rule-sets gave us the grounds for the comparison and selection of the best performing guessing rule-sets.</Paragraph>
    <Paragraph position="1"> The task of unknown word guessing is, however, a subtask of the overall part-of-speech tagging process. Thus we are mostly interested in how the advantage of one rule-set over another will affect the tagging performance. So, we performed an independent evaluation of the lint)act of the word guessing sets on tagging accuracy. In this evaluation we used the cascading application of prefix rules, suffix rules and ending-guessing rules as described in (Mikheev, 1996). We measured whether the addition of the suffix rules with alterations increases the accuracy of tagging in comparison with the standard rule-sets. In this experiment we used a tagger which was a c++ re-implementation of the LISP implemented HMM Xerox tagger described in (Kupiec, 1992) trained on the Brown Corpus. For words which failed to be guessed by tile guessing rules we applied the standard method of classifying them as common nouns (NN) if they are not capitalised inside a sentence and proper nouns (NP) otherwise.</Paragraph>
    <Paragraph position="2"> In the evaluation of tagging accuracy on unknown words we payed attention to two metrics.</Paragraph>
    <Paragraph position="3"> First we measure the accuracy of tagging solely on unknown words: UnkownSeore = CorrectlyTa,q,qcdUnkownWords TotalUnknownWords This metric gives us the exact measure of how the tagger has done when equipped with different guessing rule-sets. In this case, however, we do not account for the known words which were mis-tagged because of the unknown ones. To put a 9these words were not listed in the training lexicon perspective on that aspect we measure the overall tagging performance: TotaIScore = CdegrrectlyTaggedWdegrds TotaIWords To perform such evaluation we tagged several texts of different origins, except ones from the Brown Corpus. These texts were not seen at the training phase which means that neither the tagger nor the guesser had been trained on these texts and they naturally had words unknown to the lexicon. For each text we performed two tagging experiments. In tile first experiment we tagged the text with the full-fledged Brown Corpus lexicon and hence had only those unknown words which naturally occur in this text. In the second experiment we tagged the same text with the lexicon which contained only closed-class a and short 4 words. This small lexicon contained only 5,456 entries out of 53,015 entries of the original Brown Corpus lexicon. All other words were considered as unknown and had to be guessed by the guesser.</Paragraph>
    <Paragraph position="4"> In both experiments we ineasured tagging accuracy when tagging with the guesser equipped with the standard Prefix+Suffix+Ending rule-sets and with the additional rule-set of suffixes with alterations in the last letter.</Paragraph>
    <Paragraph position="5"> Table 2 presents some results of a typical example of such experiments. There we tagged a text of 5,970 words. This text was detected to have 347 unknown to the Brown Corpus lexicon words and as it can be seen the additional rule-set did not cause any improvement to the tagging accuracy. Then we tagged tile same text using the small lexicon. Out of 5,970 words of the text, 2,215 were unknown to the small lexicon. Here we noticed that the additional rule-set improved tile tagging accuracy on unknown words for about 1%: there were 21 more word-tokens tagged correctly because of the additional rule-set. Among these words were: &amp;quot;classified&amp;quot;, &amp;quot;applied&amp;quot;, &amp;quot;tries&amp;quot;, &amp;quot;tried&amp;quot;, &amp;quot;merging&amp;quot;, &amp;quot;subjective&amp;quot;, etc. aarticles, prepositions, conjunctions, etc.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML