File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0427_concl.xml

Size: 2,448 bytes

Last Modified: 2025-10-06 13:53:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0427">
  <Title>Memory-based one-step named-entity recognition: Effects of seed list features, classifier stacking, and unannotated data</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> In this paper we have presented a memory-based named-entity recognition system that chunks and labels named entities in one shot. We reported on three extensions; incorporating seed list information, second-stage stacking and adding selected instances from classified unannotated data to the training material.</Paragraph>
    <Paragraph position="1"> First, we trained and tested a basic classifier without any of the extensions. Subsequently, we found that (i) incorporating seed list information as binary features does not always help; only in two of the four test sets the seedlists had a positive effect. There can be several explanations for this, such as the quality of the seed lists, the chosen parameter setting from the iterative deepening process or overestimated weights given to the features by the classifier. Due to the tight time schedule we could not further investigate this.</Paragraph>
    <Paragraph position="2"> Second, second-stage stacking improves generalisation performance consistently on all test sets as compared to the seed-list extended systems.</Paragraph>
    <Paragraph position="3"> Third, only in the final experiment we added selected classified instances from unannotated data. This gave an additional reasonable boost in performance on the English development set, it attains an overall F-rate of 86.97 (an error reduction of 8%) over the initial classifier. The same effect was seen on both German test sets, on which the combination of the three extensions achieved a Fscore of 59.58 ( 5% error reduction ) and 63.02 ( 5% error reduction). This effect is not seen on the English test set; here the initial classifier performs best. This can partly be explained by the fact that the last two extensions were built upon the first extension, which had a markedly lower score than the initial classifier to begin with.</Paragraph>
    <Paragraph position="4"> In sum, our results suggest that two of the three extensions, the stacking method, and the unlabeled instance selection method, have been consistently helpful. Seed list features, however, have not.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML