File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1218_intro.xml
Size: 1,987 bytes
Last Modified: 2025-10-06 14:02:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1218"> <Title>Adapting an NER-System for German to the Biomedical Domain</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> NER describes the detection and classification of proper names into predefined categories. Beside the distinction between rule-based and automatically trained systems, the approaches can be classified according to the amount of domain- and/or linguistic knowledge they incorporate.</Paragraph> <Paragraph position="1"> In order to build an efficient and easy to adapt system, we developed a knowledge-poor approach that is successful for German person names (Rossler, 2004). German NER shares some characteristics with bio-entity recognition such as the unreliable capitalization of names, the resulting difficulties of boundary detection and the entailed treatment of homonymic and polysemic items. We believe that the process of adaptation is able to sketch out some interesting aspects of the biomedical domain.</Paragraph> <Paragraph position="2"> In Section 2 we introduce the design guidelines and the underlying model of our knowledge-poor approach to NER. In Section 3 we describe the adaptation of the system and the modifications and enhancements involved. Section 4 introduces a three-level model to observe word forms that allows further improvements based on discourse units and the utilization of unlabeled data. These techniques were successfully applied to German person names, i.e. they led to more than 10 points increase in f-score, thus exhibiting state of the art performance. However, they completely failed on the bio-entity task. We will discuss what the failure of this technique reveals about the bio-entity task.</Paragraph> <Paragraph position="3"> Section 5 presents and discusses the final evaluation, while Section 6 contains some concluding remarks.</Paragraph> </Section> class="xml-element"></Paper>