File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1114_metho.xml

Size: 21,615 bytes

Last Modified: 2025-10-06 14:08:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1114">
  <Title>Improving Statistical Machine Translation in the Medical Domain using the Unified Medical Language System</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Unified Medical Language System
2.1 Introduction
</SectionTitle>
    <Paragraph position="0"> The Unified Medical Language System (UMLS, 1986-2004) project was initiated in 1986 by the U.S. National Library of Medicine. It integrates different knowledge sources into one database (e.g.</Paragraph>
    <Paragraph position="1"> biomedical vocabularies, dictionaries).</Paragraph>
    <Paragraph position="2"> The goal is to help health professionals and researchers to use biomedical information from these different sources. It is usually updated about 3 or 4 times per year.</Paragraph>
    <Paragraph position="3"> It consists of three main knowledge repositories, the UMLS Metathesaurus, the UMLS Semantic Network and the SPECIALIST lexicon.</Paragraph>
    <Paragraph position="4"> Interesting facts about the UMLS, related work and further information can be found in (Lindbergh, 1990; Kashyap, 2003; Brown et al., 2003; Friedman et al., 2001; Zweigenbaum et al., 2003).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 The UMLS Metathesaurus
</SectionTitle>
      <Paragraph position="0"> The UMLS Metathesaurus provides a common structure for approximately 100 source biomedical vocabularies.</Paragraph>
      <Paragraph position="1"> The 2003AB1 version of the Metathesaurus contains exactly 900,551 concepts named by 2,247,457 terms. It is organized by concept, which is a cluster of terms (i.e. synonyms, lexical variants 1 2003AB was the actual release when the experiments described in this paper were executed. The most recent version now is 2004AA, which contains certain additional and updated information. All numbers given in this paper are according to the 2003AB version.</Paragraph>
      <Paragraph position="2"> and translations) with the same meaning.</Paragraph>
      <Paragraph position="3"> Translations are present for up to 14 additional languages besides English. It is very likely that other languages will be added in later releases. Table 1 shows the distribution of the terms according to the 15 different languages.</Paragraph>
      <Paragraph position="4">  For example the concept &amp;quot;arm&amp;quot; includes the English lexical variant, its plural form, &amp;quot;arms&amp;quot; and with &amp;quot;bras&amp;quot;, &amp;quot;arm&amp;quot;, &amp;quot;braccio&amp;quot;, &amp;quot;braco&amp;quot;, &amp;quot;ruka&amp;quot; and &amp;quot;brazo&amp;quot; the French, German, Italian, Portuguese, Russian and Spanish translations.</Paragraph>
      <Paragraph position="5"> Some entries contain case information, too, and the entries are not limited to words but some terms are also longer phrases like &amp;quot;third degree burn of lower leg&amp;quot; or &amp;quot;loss of consciousness&amp;quot;. It also includes inter-concept relationships across the multiple vocabularies. The main relationship types are shown in Table 2:  The synonym-relationship is implicitly realized by different terms that are affiliated with the same concept.</Paragraph>
      <Paragraph position="6"> The co-occurrence relationship refers to concepts co-occurring in the MEDLINEpublications. null In addition each concept is categorized into semantic types according to the UMLS Semantic Network.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 The UMLS Semantic Network
The UMLS Semantic Network categorizes the
</SectionTitle>
      <Paragraph position="0"> concepts of the UMLS Metathesaurus through semantic types and relationships.</Paragraph>
      <Paragraph position="1"> Every concept in the Metathesaurus is part of one or more semantic types.</Paragraph>
      <Paragraph position="2"> There are 135 semantic types arranged in a generalization hierarchy with the two roots &amp;quot;Entity&amp;quot; and &amp;quot;Event&amp;quot;. This hierarchy is still rather abstract (e.g. not deeper than six).</Paragraph>
      <Paragraph position="3"> A more detailed generalization hierarchy is realized with the child, parent and sibling relationships of the UMLS Metathesaurus.</Paragraph>
      <Paragraph position="4"> Figure 1 shows some examples for semantic types.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 The SPECIALIST lexicon
</SectionTitle>
      <Paragraph position="0"> The SPECIALIST lexicon contains over 30,000 English words. It is intended to be a general English lexicon including many biomedical terms.</Paragraph>
      <Paragraph position="1"> The lexicon entry for each word or term records the syntactic, morphological and orthographic information.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Specialist Lexicon
</SectionTitle>
      <Paragraph position="0"> Figure 2 shows the entry for &amp;quot;anesthetic&amp;quot;. There is a spelling variant &amp;quot;anaesthetic&amp;quot; and an entry number. The category in this case is noun (there is another entry for &amp;quot;anesthetic&amp;quot; as an adjective). The variants-slot contains a code indicating the inflectional morphology of the entry. &amp;quot;anesthetic&amp;quot; can either be a regular count noun (with regular plural &amp;quot;anesthetics&amp;quot;) or an uncountable noun.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Machine Translation Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 The Baseline System
</SectionTitle>
      <Paragraph position="0"> The Baseline system, which we used to test different approaches to improve the translation performance, is a statistical machine translation system. The task was to facilitate doctor-patient dialogues across languages. In this case we chose translation from Spanish to English.</Paragraph>
      <Paragraph position="1"> The Baseline system was trained using 9,227 lines of training data (90,012 English words, 89,432 Spanish words). 3,227 lines of this data are &amp;quot;in-domain&amp;quot; data. We collected doctor patient dialogues during ongoing research projects in our group and used this data as training data. The 6,000 other lines of training data are out of domain data from the C-Star Project. This data also consists of dialogues but not from the medical domain.</Paragraph>
      <Paragraph position="2"> The test data consists of 500 lines with 6,886 words. The test data was also taken from medical dialogues between a doctor and a patient and contains a reasonable number of medical terms but the language is not very complex. Figure 3 shows some example test sentences (from the reference data).</Paragraph>
      <Paragraph position="3"> (...) Doctor: The symptoms you are describing and given your recent change in diet, I believe you may be anemic.</Paragraph>
      <Paragraph position="4"> Patient: Anemic? Really? Is that serious? Doctor: Anemia can be very serious if left untreated. Being anemic means your body lacks a sufficient amount of red blood cells to carry oxygen through your body.</Paragraph>
      <Paragraph position="5">  The Baseline system uses IBM1 lexicon transducers and different types of phrase transducers (Zhang et al. 2003, Vogel et al. 1996, Vogel et al. 2003). The Language model is a trigram language model with Good-Turing-Smoothing built with the SRI-Toolkit (SRI, 19952004) using only the English part of the training data.</Paragraph>
      <Paragraph position="6"> The Baseline system scores a 0.171 BLEU and 4.72 NIST. [BLEU and NIST are well known scoring methods for measuring machine translation quality. Both calculate the precision of a translation by comparing it to a reference translation and incorporating a length penalty (Doddington, 2001; Papineni et al., 2002).]</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Extracting dictionaries from the UMLS
</SectionTitle>
      <Paragraph position="0"> The first way to exploit the UMLS database for a statistical machine translation system naturally is to extract additional Spanish-English lexicons or phrasebooks.</Paragraph>
      <Paragraph position="1"> The UMLS Metathesaurus provides translation information as we can assume that Spanish and English terms that are associated with the same concept are respective translations. For example as the English term &amp;quot;arm&amp;quot; is associated with the same concept as the Spanish term &amp;quot;brazo&amp;quot; we can deduce that &amp;quot;arm&amp;quot; is the English translation of &amp;quot;brazo&amp;quot;.</Paragraph>
      <Paragraph position="2"> Unfortunately the UMLS does not contain morphological information about languages other than English. This means it cannot be automatically detected that &amp;quot;brazo&amp;quot; is the singular form and thus the translation of &amp;quot;arm&amp;quot; and not the translation of &amp;quot;arms&amp;quot;.</Paragraph>
      <Paragraph position="3"> As most of the entries are in singular form we just extracted every possible combination of Spanish and English terms regardless of possible errors like combining the singular &amp;quot;brazo&amp;quot; and the plural &amp;quot;arms&amp;quot;.</Paragraph>
      <Paragraph position="4"> The resulting (lower-cased) Spanish-English lexicon/phrasebook contains 495,248 pairs of words and phrases. This means each Spanish term is combined with seven English terms on average.</Paragraph>
      <Paragraph position="5"> This seems to be an extremely huge amount but it has to be considered that there are terms in the UMLS and the resulting lexicon that are probably too special to be really useful for the translation of dialogues (e.g. &amp;quot;1,1,1-trichloropropene-2,3-oxide&amp;quot; translating to &amp;quot;oxido de tricloropropeno&amp;quot;). Nevertheless there are lots of meaningful entries as the following experiments show.</Paragraph>
      <Paragraph position="6"> Applying the dictionaries to the Baseline system In the first step we just added this lexicon/phrasebook as an additional transducer and did not change the language model.</Paragraph>
      <Paragraph position="7"> The experiment showed a nice increase in BLEU and NIST performances and scored at 0.180 BLEU and 4.86 NIST.</Paragraph>
      <Paragraph position="8"> This system especially has a higher coverage, as only 302 words (types) are not covered by the training data compared to 411 for the baseline system.</Paragraph>
      <Paragraph position="9"> Adding the English side to the Language Model As the extracted dictionary contained many phrases it seemed reasonable to add the English side to the language modeling data. This also prevents words from the extracted dictionary to be treated as &amp;quot;unknown&amp;quot; by the language model if they were not in the language model training data. This further improved the BLEU and NIST scores to 0.182 BLEU and 4.92 NIST.</Paragraph>
      <Paragraph position="10"> It should not be surprising to get an improvement in these first two experiments because basically just more data was used to train the systems. The really interesting ideas will be presented in the next sections.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Using the Semantic Type Information
</SectionTitle>
      <Paragraph position="0"> The overall idea to use the semantic type information is to generalize the training data.</Paragraph>
      <Paragraph position="1"> The training data contains for example sentence pairs like: Necesito examinar su cabeza.</Paragraph>
      <Paragraph position="2"> I need to examine your head.</Paragraph>
      <Paragraph position="3"> Necesito examinar su brazo.</Paragraph>
      <Paragraph position="4"> I need to examine your arm.</Paragraph>
      <Paragraph position="5"> Necesito examinar su rodilla.</Paragraph>
      <Paragraph position="6"> I need to examine your knee.</Paragraph>
      <Paragraph position="7"> If we could generalize these sentences by replacing the special body parts like &amp;quot;head&amp;quot;, &amp;quot;arm&amp;quot; and &amp;quot;knee&amp;quot; with a general tag e.g. &amp;quot;@BODYPART&amp;quot; and especially treat this tag we could use one sentence of training data for every body part imaginable in this sentence.</Paragraph>
      <Paragraph position="8"> We would just need an additional lexicon that just translates body parts.</Paragraph>
      <Paragraph position="9"> Necesito examinar su @BODYPART.</Paragraph>
      <Paragraph position="10"> I need to examine your @BODYPART.</Paragraph>
      <Paragraph position="11"> We could additionally correctly translate possibly unseen sentences like &amp;quot;Necesito examinar su antebrazo&amp;quot; (&amp;quot;I need to examine your forearm&amp;quot;) if we could automatically deduce that &amp;quot;antebrazo/forearm&amp;quot; is a body part and if we just knew this translation pair.</Paragraph>
      <Paragraph position="12"> Some additional similar sentences in which we could apply the same ideas are: Enseneme que @BODYPART es.</Paragraph>
      <Paragraph position="13"> Show me which @BODYPART.</Paragraph>
      <Paragraph position="14"> ?Que @BODYPART le/la duele? Which @BODYPART hurts? (In the last sentence it actually depends on the gender of the body part on the Spanish side if the sentence is &amp;quot;?Que @BODYPART la duele?&amp;quot; or &amp;quot;?Que @BODYPART le duele?&amp;quot;. But as we are translating from Spanish to English this did not seem to be a big problem.) As stated before every concept in the UMLS Metathesaurus is categorized into one or more semantic types defined in the UMLS Semantic Network.</Paragraph>
      <Paragraph position="15"> The two semantic types &amp;quot;Body Part, Organ, or Organ Component&amp;quot; and &amp;quot;Body Location or Region&amp;quot; from the UMLS Semantic Network cover pretty closely what we usually affiliate with the colloquial meaning of body part.</Paragraph>
      <Paragraph position="16"> [The terminological difference is that the semantic type &amp;quot;Body Part, Organ, or Organ Component&amp;quot; is defined by a certain function. For example &amp;quot;liver&amp;quot; and &amp;quot;eye&amp;quot; are part of this semantic type, whereas the semantic type &amp;quot;Body Location or Region&amp;quot; is defined by the topographical location of the respective body part. Examples are &amp;quot;head&amp;quot; and &amp;quot;arm&amp;quot;. The function in this case is not as clearly defined as the function of a &amp;quot;liver&amp;quot;.] This information was used in the next experiment. We first filtered the general Spanish-English dictionary, we had extracted from the UMLS, to contain only words and phrases from the two semantic types &amp;quot;Body Part, Organ, or Organ Component&amp;quot; and &amp;quot;Body Location or Region&amp;quot;. This gave a dictionary of 11,260 translation entries for body parts. Again each Spanish term is combined with about seven English terms on average.</Paragraph>
      <Paragraph position="17"> In the next step we replaced every occurrence of a word or phrase pair from this new dictionary in the training data (i.e. if it occurred on the Spanish and English side) with a general body-part-tag.</Paragraph>
      <Paragraph position="18"> 527 sentence pairs of the original 9,227 sentence pairs contained a word or phrase pair from this dictionary.</Paragraph>
      <Paragraph position="19"> A retraining of the translation system with this changed training data resulted in transducer rules containing this body-part-tag.</Paragraph>
      <Paragraph position="20"> By using cascaded transducers (Vogel and Ney, 2000) in the actual translation the first transducer, that is applied (in this case the body-part dictionary) replaces the Spanish body part with its translation pair and the body-part tag.</Paragraph>
      <Paragraph position="21"> The following transducers can apply their generalized rules containing the body-part-tag instead of the real body part.</Paragraph>
      <Paragraph position="22"> E.g. translation of the sentence: Necesito examinar su antebrazo.</Paragraph>
      <Paragraph position="23"> First step apply body-part dictionary rule (antebrazo-forearm) Necesito examinar su @BODYPART(antebrazo-forearm). Apply generalized transducer rule: (a rule could be: Necesito examinar su @BODYPART - I need to examine your @BODYPART) I need to examine your @BODYPART(antebrazo-forearm). Resolve tags: I need to examine your forearm.</Paragraph>
      <Paragraph position="24"> By applying this to the whole translation system the score improved to 0.188 BLEU/4.94 NIST.</Paragraph>
      <Paragraph position="25"> Using other semantic types As the body-part lexicon and the replacement of body-parts proved to be helpful we applied two more of these replacement strategies. Consider the following 4 sentence pairs from the training data. ?Siente dolor cuando respira? Do you feel pain when you breathe? ?Cuando le empezo la fiebre? When did the fever start? ?Podria ser artritis? Could this be arthritis? ?Es grave la anemia, doctor? Is anemia serious, doctor? The first two sentences contain findings or symptoms with the terms &amp;quot;dolor/pain&amp;quot; and &amp;quot;fiebre/fever&amp;quot;. The second two sentences contain diseases with &amp;quot;artritis/arthritis&amp;quot; and &amp;quot;anemia/anemia&amp;quot;. The appropriate semantic types from the UMLS Semantic Network for these terms are &amp;quot;Finding&amp;quot; and &amp;quot;Sign or Symptom&amp;quot; for &amp;quot;pain&amp;quot; and &amp;quot;fever&amp;quot; and &amp;quot;Disease or Syndrome&amp;quot; for &amp;quot;arthritis&amp;quot; and &amp;quot;anemia&amp;quot; Filtering the Spanish-English dictionary resulted in 25,987 &amp;quot;Finding/Sign or Symptom&amp;quot; translation pairs (approximately three English terms per Spanish term) and 116,793 &amp;quot;Disease or Syndrome&amp;quot; translation pairs (approximately five English terms per Spanish term).</Paragraph>
      <Paragraph position="26"> 198 sentence pairs from the training data contained a &amp;quot;Finding/Sign or Symptom&amp;quot;-pair and 127 sentence pairs contained a &amp;quot;Disease or Syndrome&amp;quot;-pair from these dictionaries.</Paragraph>
      <Paragraph position="27"> The final translation with those three semantic types replaced in the training data and using the three filtered dictionaries with the cascaded transducer application gave a translation performance of 0.190 BLEU/5.02 NIST.</Paragraph>
      <Paragraph position="28"> This shows that although less than 10% of the sentences were affected by the replacement with the appropriate tags we could nicely improve the overall translation performance.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Example translations
</SectionTitle>
      <Paragraph position="0"> The last example sentence is an interesting case.</Paragraph>
      <Paragraph position="1"> The best system does not get more words right compared to the baseline system and so the BLEU/NIST-score does not improve. But &amp;quot;sternum&amp;quot; is a synonym of the correct &amp;quot;breastbone&amp;quot; and a more technical term. This supports the claim that the UMLS tends to contain more technical terms (like &amp;quot;tenosynovitis&amp;quot; in the first sentence).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Future work
</SectionTitle>
    <Paragraph position="0"> It is surely possible to use every semantic type from the semantic network in the same way like the overall five semantic types, which were used in the experiments. We did not do this here because further semantic types occurred extremely rarely in the test and training data. But this could easily be done for other test and training data and it is reasonable to expect similar improvements.</Paragraph>
    <Paragraph position="1"> Another idea is to use a more specialized approach and to make use of the relationships in the UMLS Metathesaurus. Each concept could be generalized by its parent-concepts instead of its semantic type. The generalization hierarchy for the concept &amp;quot;leg&amp;quot; is for example: leg - lower extremity - extremity - body region - anatomy.</Paragraph>
    <Paragraph position="2"> This could be especially helpful when translating to morphologically richer languages than English because the usage of extremities could differ from other body parts for example.</Paragraph>
    <Paragraph position="3"> In the extracted dictionaries every translation pair was given the same translation probability. It might be helpful to re-score these probabilities by using information from bilingual or monolingual texts to improve the translation probabilities for usually frequently used terms compared to rarely used terms.</Paragraph>
    <Paragraph position="4"> As the example translations showed, the extracted dictionaries from the UMLS tend to contain technical terms instead of colloquial terms (translation &amp;quot;sternum&amp;quot; instead of &amp;quot;breastbone&amp;quot;). We can further assume that a doctor prefers to use the more technical terms and a patient prefers the more colloquial terms. Therefore it could be interesting to examine if having two different translation systems for sentences uttered by a doctor and a patient would improve the overall translation performance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML