File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-2251_evalu.xml

Size: 1,947 bytes

Last Modified: 2025-10-06 14:00:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2251">
  <Title>Predicting Part-of-Speech Information about Unknown Words using Statistical Methods</Title>
  <Section position="6" start_page="1505" end_page="1506" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"> The results from the initial experiments are shown in Table 1. Some trends can be seen in this data. For example, choosing whether  to use the prefix distribution or suffix distribution using entropy calculations clearly improves the performance over using the baseline method (about 4-5% overall), and using only suffix distributions improves it another 4-5%. The use of context improves the likelihood that the correct tag is in the n-best predicted for small values of n (improves nearly 4% for 1-best), but it is less important for larger values of n. On the other hand, smoothing the distributions with open-class tag distributions offers no improvement for the 1-best results, but improves the n-best performance for larger values of n.</Paragraph>
    <Paragraph position="1"> Overall, the best performing system was the system using both context and open-class smoothing, relying on only the suffix information. To offer a more valid comparison between this work and Mikheev's latest work (Mikheev, 1997), the accuracies were tested again, ignoring mistags between NN and NNP (common and proper nouns) as Mikheev did. This improved results to 77.5% for 1-best, 89.9% for 2-best, and 94.9% for 3-best. Mikheev obtains 87.5% accuracy when using a full HMM tagging system with his cascading tagger. It should be noted that our system is not using a full tagger, and presumably a full tagger would correctly disambiguate many of the words where the correct tag was not the 1-best choice. Also, Mikheev's work suffers from reduced coverage, while our predictor offers a prediction for every unknown word encountered.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML