File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1414_intro.xml

Size: 3,734 bytes

Last Modified: 2025-10-06 14:01:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1414">
  <Title>Adding Domain Specificity to an MT system</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The machine translation system described here is a French-English translation system which uses a French broad coverage analyzer, a large multi-purpose French dictionary, a large French-English bilingual lexicon, an application independent English natural language generation component and a transfer component. The transfer component consists of high-quality transfer patterns automatically acquired from sentence-aligned bilingual corpora using an alignment grammar and algorithm described in detail in Menezes (2001) (see Figure 1 for an overview of the French-English MT system).</Paragraph>
    <Paragraph position="1"> The transfer component consists only of correspondences learned during the alignment process. Training takes place on aligned sentences which have been analyzed by the French and English analysis systems to yield dependency structures specific to our system entitled Logical Forms (LF). The LF structures, when aligned, allow the extraction of lexical and structural translation correspondences which are stored for use at runtime in the transfer database. The transfer database can also be thought of as an example-base of conceptual structure representations. See Figure 2 for an illustration of the training process.</Paragraph>
    <Paragraph position="2"> The transfer database for French-English was trained on approximately 200,000 pairs of aligned sentences from computer manuals and help files. In these aligned pairs, the French text was produced by human translators from the original English version.</Paragraph>
    <Paragraph position="3"> Sample sentences from the training set are: French training sentence: Dans le menu Demarrer, pointez sur Programmes, sur Outils d'administration (commun), puis cliquez sur Gestionnaire des utilisateurs pour les domaines.</Paragraph>
    <Paragraph position="4"> English training sentence: On the Start menu, point to Programs, point to Administrative Tools (Common), and then click User Manager for Domains.</Paragraph>
    <Paragraph position="5"> The French-English lexicon is used during the training period of the transfer component to establish initial, tentative, word correspondences during the alignment process. The sources for the bilingual dictionary were: Cambridge</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
University Press English-French, Soft-Art
</SectionTitle>
      <Paragraph position="0"> English-French, and Langenscheidt French-English and English-French dictionaries. The English-French translation data was reversed to create French-English pairs in order to augment the size of the dictionary, with a final translation count of 75,000 pairs.</Paragraph>
      <Paragraph position="1"> However, quick examination of the sample sentence above shows that many terms are highly specific to the domain, e.g menu Demarrer &lt;-&gt; Start menu. To further add to the specificity of the vocabulary available to the alignment process, we added translation pairs extracted from the actual domain, using statistical word/phrase assignment, as described below. This resulted in one file of automatically created French English translation correspondences, or word associations (WA), and a second file of specialized multi-word translation correspondences which we term Title Associations (TA). These files, of size 30,000 and 2600 respectively, added to the quality of the alignments and to overall translation quality.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML