File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-1218_evalu.xml
Size: 2,600 bytes
Last Modified: 2025-10-06 13:59:16
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1218"> <Title>Adapting an NER-System for German to the Biomedical Domain</Title> <Section position="5" start_page="93" end_page="94" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> All the evaluation was conducted on the corpus made available for the shared task Bio-Entity Recognition. All configurations were trained on the 2000 abstracts provided, i.e. 500,000 words to train and we finally evaluated them on the 100,000 words evaluation data. Table 2 shows the scores for the different classifiers and components in the first rows, and the performance of the best configuration evaluated for each NE-class.</Paragraph> <Paragraph position="1"> On the basis of the scores in Table 2 it is possible to discuss the impact and values of the different components of the system.</Paragraph> <Paragraph position="2"> Using the surface words instead of f3, the subword-form representation with positional character n-grams leads to a decrease of more than 2 points in terms of recall and precision.</Paragraph> <Paragraph position="3"> The f-score of the Markov Model, trained on the word forms, is almost comparable to the basic SVM-configuration f1-f3, but the precision of the SVM is higher.</Paragraph> <Paragraph position="4"> The post-processing component cannot be applied to the output of the Markov Model, as the definition of the revisability is specifically designed for the output of the seven SVM-classifiers. The post-processing component shows very good results and leads to an increase of 4 points almost equal for precision and recall, i.e. the component is able to address the boundary detection problem by means of the definition of the revisability of a tag with regard to a competing tag.</Paragraph> <Paragraph position="5"> class. See Table 1 for the feature sets f1-f4; post-Proc refers to the second post-processing component described in Section 3.</Paragraph> <Paragraph position="6"> Combining the basic SVM-configuration f1-f3 with f4, the probabilities calculated by the Markov Model, leads to a slight increase compared to the post-processing component. We are convinced that both the post processing and the Markov Model cover similar phenomena by supporting the SVM to detect the correct boundaries.</Paragraph> <Paragraph position="7"> The combination of all feature sets f1-f4 with the post-processing leads to a further increase of 1 point, demonstrating the ability of the SVM to optimize its predictions on heterogeneous knowledge sources.</Paragraph> </Section> class="xml-element"></Paper>