File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/03/w03-0432_abstr.xml

Size: 827 bytes

Last Modified: 2025-10-06 13:43:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0432">
  <Title>Named Entity Recognition Using a Character-based Probabilistic Approach</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> We present a named entity recognition and classification system that uses only probabilistic character-level features. Classifications by multiple orthographic tries are combined in a hidden Markov model framework to incorporate both internal and contextual evidence. As part of the system, we perform a preprocessing stage in which capitalisation is restored to sentence-initial and all-caps words with high accuracy. We report f-values of 86.65 and 79.78 for English, and 50.62 and 54.43 for the German datasets.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML