File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2209_intro.xml
Size: 1,620 bytes
Last Modified: 2025-10-06 14:04:06
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2209"> <Title>Active Annotation</Title> <Section position="3" start_page="64" end_page="64" type="intro"> <SectionTitle> 2 Experimental setup </SectionTitle> <Paragraph position="0"> The data used in the experiments that follow are taken from the BioNLP 2004 named entity recognition shared task (Kim et al., 2004). The text passages have been annotated with five classes of entities, &quot;DNA&quot;, &quot;RNA&quot;, &quot;protein&quot;, &quot;cell type&quot; and &quot;cell line&quot;. In our experiments, following the example of Dingare et al. (2004), we simplified the annotation to one entity class, namely &quot;gene&quot;, which includes the DNA, RNA and protein classes. In order to evaluate the performance on the task, we used the evaluation script supplied with the data, which computes the F-score (F1 = 2[?]Precision[?]RecallPrecision+Recall ) for each entity class. It must be noted that all tokens of an entity must be recognized correctly in order to count as a correct prediction. A partially recognized entity counts both as a precision and recall error. In all the experiments that follow, the official split of the data in training and testing was maintained.</Paragraph> <Paragraph position="1"> The named entity recognition system used in our experiments is the open source NLP toolkit Lingpipe1. The named entity recognition module is an HMM model using Witten-Bell smoothing. In our experiments, using the data mentioned earlier it achieved 70.06% F-score.</Paragraph> </Section> class="xml-element"></Paper>