File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1216_intro.xml

Size: 1,400 bytes

Last Modified: 2025-10-06 14:02:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1216">
  <Title>Named Entity Recognition in Biomedical Texts using an HMM Model</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In the Message Understanding Conference (MUC), Named entity Recognition aims to classify proper nouns, dates, time, measures and locations, etc. Many researchers adapt their systems from MUC to the biomedical domain, such as (Fukuda et al 1998), (Proux et al 1998), (Nobata et al 2000), (Collier et al 2000), (Gaizauskas et al 2000), (Kazama et al 2002), (Takeuchi et al 2002), (Lee et al 2003) and (Zhou et al 2004). As opposed to rule-based systems, machine learning-based systems could train their models on labeled data.</Paragraph>
    <Paragraph position="1"> But due to the irregular forms of biomedical texts, people still need to carefully choose word features for their systems. This work requires domain specific knowledge. How to get the domain knowledge automatically is a question that has not been fully investigated. Our system is built on an HMM model with the words themselves as the features. Huge unlabeled corpus is gathered from MEDLINE. Word similarity information is computed from the corpus and we use a word similarity-based smoothing to overcome the data sparseness.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML