XML Viewer - c04-1114

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1114_intro.xml
Size: 2,284 bytes
Last Modified: 2025-10-06 14:02:12
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1114">
  <Title>Improving Statistical Machine Translation in the Medical Domain using the Unified Medical Language System</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Hospitals in the United States have to deal with an increasing number of patients who have no knowledge of the English language. It is not surprising that in this area translation errors can lead to severe problems (Neergard, 2003; Flores et al. 2003). This is one of the main reasons why the medical domain plays an important role in many of the current projects involving natural language processing. Especially many text or speech translation projects include tasks to translate texts or dialogues with medical topics.</Paragraph>
    <Paragraph position="1"> The goal of this research was the improvement of translation quality in the medical domain using a statistical machine translation system. A statistical machine translation system deduces translation rules from large amounts of parallel texts in the source and target language.</Paragraph>
    <Paragraph position="2"> The general approach to gather as much training data as possible is usually complicated and expensive. So it is necessary to make use of already available data and databases and it is reasonable to hope that some ideas and special methods could actually improve the performance in limited domains, like the medical domain.</Paragraph>
    <Paragraph position="3"> The Internet and especially the WWW offers a lot of data related to medical topics. Especially interesting and promising for us was the Unified Medical Language System(r) (UMLS, 1986-2004) available from the US National Library of Medicine. It provides a vast amount of information concerning medical terms and we extracted information from this database to improve an existent translation system.</Paragraph>
    <Paragraph position="4"> The paper will first give an introduction into the Unified Medical Language system. We will then point out which parts could be useful for statistical machine translation and later show how the baseline system was actually significantly improved using this data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML