File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/c04-1140_abstr.xml

Size: 961 bytes

Last Modified: 2025-10-06 13:43:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1140">
  <Title>High-Performance Tagging on Medical Texts</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> We ran both Brill's rule-based tagger and TNT, a statistical tagger, with a default German newspaper-language model on a medical text corpus. Supplied with limited lexicon resources, TNT outperforms the Brill tagger with state-of-the-art performance figures (close to 97% accuracy). We then trained TNT on a large annotated medical text corpus, with a slightly extended tagset that captures certain medical language particularities, and achieved 98% tagging accuracy. Hence, statistical off-the-shelf POS taggers cannot only be immediately reused for medical NLP, but they also - when trained on medical corpora - achieve a higher performance level than for the newspaper genre.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML