File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-0421_evalu.xml
Size: 4,465 bytes
Last Modified: 2025-10-06 13:58:58
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0421"> <Title>A Simple Named Entity Extractor using AdaBoost</Title> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> 7 Results </SectionTitle> <Paragraph position="0"> The described system has been applied to both languages in the shared task, though German and English environments are not identical: The German corpus enables the use of lemma features while English does not. Also, the used trigger word list is available for English but not for German.</Paragraph> <Paragraph position="1"> The results of the BIO model for the NER task on the development and test sets for English and German are presented in table 1. As will be seen later for the whole task, the results are systematically better for English than for German. As it can be observed, the behaviour on the development and test English sets is quite different. While in the development set the NER module achieves a very good balance between precision and recall, in the test set the precision drops almost 4 points, being the F1 results much worse. On the contrary, development and test sets for German are much more similar. In this case, recall levels obtained for the language are much lower compared to precision ones. This fact is indicating the difficulty for reliably detecting the beginnings of the Named Entities in German (all common and proper nouns are capitalized). Probably, a non-greedy tagging procedure would have the chance to improve the recognition results.</Paragraph> <Paragraph position="2"> Regarding NEC task, optimal feature selection is different for each language: Chunk information is almost useless in English (or even harmful, when combined with PoS features), but useful in German. On the contrary, although the use of left predictions for NEC is useful for English, the lower accuracy of the German system renders those features harmful (they are very useful when assuming perfect left predictions). Table 2 presents NEC accuracy results assuming perfect recognition of entities. a perfect recognition of named entities The basic feature set includes all lexical, orthographic, affix and bag-of-words information. P stands for Part-of-Speech features, C for chunking-related information, T for trigger-words features and g/G for gazetteer-related information2. In general, more complex features sets yield better results, except for the C case in English, as commented above.</Paragraph> <Paragraph position="3"> Table 4 presents the results on the NEE task obtained by pipelining the NER and NEC modules. The NEC module used both knowledge extracted from the training set as well as external sources such as the gazetteer or trigger word lists.</Paragraph> <Paragraph position="4"> Almost the same conclusions extracted from the NER results apply to the complete task, although here the results are lower due to the cascade of errors introduced by the two modules: 1) Results on English are definitely better than on German; 2) Development and test sets present a regular behaviour in German, while for English they are significantly different. We find the latter particularly disappointing because it is indicating that no reliable conclusions can be extracted about the generalization error of the NEE system constructed, by testing it on a 3,000 sentence corpus. This may be caused by the fact that the training set is no representative enough, or by a too biased learning of the NEE system towards the development set.</Paragraph> <Paragraph position="5"> Regarding particular categories, we can see that for English the results are not extremely dissimilar (F1 values fall in a range of 10 points for each set), being LOC and PER the most easy to identify and ORG and MISC the most difficult. Comparatively, in the German case bigger differences are observed (F1 ranges from 52.58 to 80.79 in the test set), e.g., recognition of MISC entities is far worse than all the rest. Another slight difference against English is that the easiest category is PER instead of LOC.</Paragraph> <Paragraph position="6"> In order to allow fair comparison with other systems, table 3 presents the results achieved on the development set without using external knowledge. The features used correspond to the basic model plus Part-of-Speech information (plus Chunks for German), plus a gazetteer build with the entities appearing in the training corpus.</Paragraph> </Section> class="xml-element"></Paper>