File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1217_intro.xml
Size: 1,611 bytes
Last Modified: 2025-10-06 14:02:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1217"> <Title>Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The explosion of information in the fields of molecular biology and genetics has provided a unique opportunity for natural language processing techniques to aid researchers and curators of databases in the biomedical field by providing text mining services. Yet typical natural language processing tasks such as named entity recognition, information extraction, and word sense disambiguation are particularly challenging in the biomedical domain with its highly complex and idiosyncratic language.</Paragraph> <Paragraph position="1"> With the increasing use of shared tasks and shared evaluation procedures (e.g., the recent BioCreative, TREC, and KDD Cup), it is rapidly becoming clear that performance in this domain is markedly lower than the field has come to expect from the standard domain of newswire. The Coling 2004 shared task focuses on the problem of Named Entity Recognition, requiring participating systems to identify the five named entities of protein, RNA, DNA, cell line, and cell type in the GENIA corpus of MEDLINE abstracts (Ohta et al., 2002). In this paper we describe a machine learning system incorporating a diverse set of features and various external resources to accomplish this task. We describe our system in detail and also discuss some sources of error.</Paragraph> </Section> class="xml-element"></Paper>