File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-0422_evalu.xml

Size: 2,323 bytes

Last Modified: 2025-10-06 13:58:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0422">
  <Title>Learning a Perceptron-Based Named Entity Chunker via Online Recognition Feedback</Title>
  <Section position="5" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiments and Results
</SectionTitle>
    <Paragraph position="0"> A list of functional words was automatically extracted from each language training set, selecting those lower-cased words within NEs appearing 3 times or more. For each language, we also constructed a gazetteer with the NEs in the training set. When training, only a random 40% of the entries was considered.</Paragraph>
    <Paragraph position="1"> We performed parameter tuning on the English language. Concerning the features, we set the window sizes (Lw and Lp) to 3 (we tested 2 and 3) , and we did not considered features occurring less than 5 times in the data. When moving to German, we found better to work with lemmas instead of word forms.</Paragraph>
    <Paragraph position="2"> Concerning the learning algorithm, we evaluated kernel degrees from 1 to 5. Degrees 2 and 3 performed somewhat better than others, and we chose degree 2. We then ran the algorithm through the English training set for up to five epochs, and through the German training set for up to 3 epochs. 3 On both languages, the performance was still slightly increasing while visiting more training sentences. Unfortunately, we were not able to run the algorithm until performance was stable. Table 1 summarizes the obtained results on all sets. Clearly, the NERC task on English is much easier than on German. Figures indicate that the moderate performance on German is mainly caused by the low recall, specially for ORG and MISC entities. It is interesting to note that while in English the performance is much better on the development set, in German we achieve better results on the test set. This seems to indicate that the difference in performance between development and test sets is due to irregularities in the NEs that appear in each set, rather than overfitting problems of our learning strategy.</Paragraph>
    <Paragraph position="3"> The general performance of phrase recognition system we present is fairly good, and we think it is competitive with state-of-the-art named entity extraction systems.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML