File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/p02-1021_abstr.xml

Size: 1,226 bytes

Last Modified: 2025-10-06 13:42:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1021">
  <Title>Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Text normalization is an important aspect of successful information retrieval from medical documents such as clinical notes, radiology reports and discharge summaries. In the medical domain, a significant part of the general problem of text normalization is abbreviation and acronym disambiguation. Numerous abbreviations are used routinely throughout such texts and knowing their meaning is critical to data retrieval from the document. In this paper I will demonstrate a method of automatically generating training data for Maximum Entropy (ME) modeling of abbreviations and acronyms and will show that using ME modeling is a promising technique for abbreviation and acronym normalization. I report on the results of an experiment involving training a number of ME models used to normalize abbreviations and acronyms on a sample of 10,000 rheumatology notes with ~89% accuracy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML