File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3306_intro.xml

Size: 2,326 bytes

Last Modified: 2025-10-06 14:04:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3306">
  <Title>Human Gene Name Normalization using Text Matching with Automatically Extracted Synonym Dictionaries</Title>
  <Section position="2" start_page="0" end_page="41" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Gene and protein name identification and recognition in biomedical text are challenging problems.</Paragraph>
    <Paragraph position="1"> A recent competition, BioCreAtIvE, highlighted the [?] To whom correspondence should be addressed.</Paragraph>
    <Paragraph position="2"> two tasks inherent in gene recognition: identifying gene mentions in text (task 1A) (Yeh et al., 2005) and normalizing an identified gene list (task 1B) (Hirschman et al., 2005). This competition resulted in many novel and useful approaches, but the results clearly identified that more important work is necessary, especially for normalization, the subject of the current work.</Paragraph>
    <Paragraph position="3"> Compared with gene NER, gene normalization is syntactically easier because identification of the textual boundaries of each mention is not required.</Paragraph>
    <Paragraph position="4"> However, gene normalization poses significant semantic challenges, as it requires detection of the actual gene intended, along with reporting of the gene in a standardized form (Crim et al., 2005). Several approaches have been proposed for gene normalization, including classification techniques (Crim et al., 2005; McDonald et al., 2004), rule-based systems (Hanisch et al., 2005), text matching with dictionaries (Cohen, 2005), and combinations of these approaches. Integrated systems for gene identification typically have three stages: identifying candidate mentions in text, identifying the semantic intent of each mention, and normalizing mentions by associating each mention with a unique gene identifier (Morgan et al., 2004). In our current work, we focus upon normalization, which is currently underexplored for human gene names. Our objective is to create systems for automatically identifying human gene mentions with high accuracy that can be used for practical tasks in biomedical literature retrieval and extraction. Our current approach relies on a manually created and tuned set of rules.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML