File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/p02-1060_evalu.xml

Size: 2,362 bytes

Last Modified: 2025-10-06 13:58:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1060">
  <Title>Named Entity Recognition using an HMM-based Chunk Tagger</Title>
  <Section position="6" start_page="4321" end_page="4321" type="evalu">
    <SectionTitle>
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"> In this section, we will report the experimental results of our system for English NER on MUC-6 and MUC-7 NE shared tasks, as shown in Table 6, and then for the impact of training data size on performance using MUC-7 training data. For each experiment, we have the MUC dry-run data as the held-out development data and the MUC formal test data as the held-out test data.</Paragraph>
    <Paragraph position="1"> For both MUC-6 and MUC-7 NE tasks, Table 7 shows the performance of our system using MUC evaluation while Figure 1 gives the comparisons of our system with others. Here, the precision (P) measures the number of correct NEs in the answer file over the total number of NEs in the answer file and the recall (R) measures the number of correct NEs in the answer file over the total number of NEs in the key file while F-measure is the weighted harmonic mean of precision and recall:  b =1. It shows that the performance is significantly better than reported by any other machine-learning system. Moreover, the performance is consistently better than those based on handcrafted rules.</Paragraph>
    <Paragraph position="2">  With any learning technique, one important question is how much training data is required to achieve acceptable performance. More generally how does the performance vary as the training data size changes? The result is shown in Figure 2 for MUC-7 NE task. It shows that 200KB of training data would have given the performance of 90% while reducing to 100KB would have had a significant decrease in the performance. It also shows that our system still has some room for performance improvement. This may be because of the complex word feature and the corresponding sparseness problem existing in our system.  comes from where there is no explicit indicator information in/around the NE and there is no reference to other NEs in the macro context of the document. The NEs contributed by  f are always well-known ones, e.g. Microsoft, IBM and Bach (a composer), which are introduced in texts without much helpful context.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML