File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0405_intro.xml

Size: 4,437 bytes

Last Modified: 2025-10-06 14:06:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0405">
  <Title>A Formal Basis for Spoken Language Translation by Analogy</Title>
  <Section position="3" start_page="0" end_page="32" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The la.,~t decade has seen growing interest in the example-based framework for translation of written and spoken language (Nagao, 1984),(Jones, 1996).</Paragraph>
    <Paragraph position="1"> This approach, sometimes called analogical, casebased, or memory-based, originated with the following insight. In the course of translating an expression, a skilled human translator often recalls a similar translation that she has performed or studied before, and then carries out the new translation by analogy to the previous case, instead of applying a large number of lexical and grammatical rules in her head. In an example-based translation architecture, pairs of bilingual expressions are stored in the example database. The source language input expression is matched against the source language portion of each example pair, and the best matching example</Paragraph>
    <Section position="1" start_page="0" end_page="32" type="sub_section">
      <SectionTitle>
I.I Pre-translation
</SectionTitle>
      <Paragraph position="0"> The example-based approach has certain advantages over traditional rule-based approaches to translating spoken language. Since an analogical system relies on a database of pre-translated example pairs, it results in high translation quality. High translation quality requires not only that the output be grammatically correct, but also that the output sound natural and idiomatic. Spoken utterances consist of larger portions of fully-lexicalized or semi-lexicalized morpheme sequences, the use of which greatly contributes to sounding natural and native-like, but whose meanings are not totally predicatable from their forms (Pawley and Syder, 1983). An analogical system can generate natural-sounding output more easily than a compositional, rule-based system, because it directly uses the correspondences between source-language and target-language expressions.</Paragraph>
    </Section>
    <Section position="2" start_page="32" end_page="32" type="sub_section">
      <SectionTitle>
1.2 Robustness
</SectionTitle>
      <Paragraph position="0"> Another important requirement for spoken language translation is that the system has to be very robust.</Paragraph>
      <Paragraph position="1"> Spoken utterances contain a lot of disfluencies, such as pronunciation errors, word selection errors, word fragments and repairs. Furthermore, a speech translation module also has to handle the errors introduced by the speech recognition component. In an analogical system, the process that matches the input expression against examples can be very robust, and can always return the best matching output expressions instead of failing completely.</Paragraph>
    </Section>
    <Section position="3" start_page="32" end_page="32" type="sub_section">
      <SectionTitle>
1.3 Improving Translation Quality
</SectionTitle>
      <Paragraph position="0"> An additional requirement of an automatic translation system is that it should be possible to improve the translation quality by expending additional effort. In a traditional rule-based system, as the knowledge sources (such ms grammar rules, semantic disambiguation rules, transfer rules, etc.) expand in size, there comes a point at which the complex interrelationships between the different types of information precludes any further improvement. In an analogical system, it is possible to incrementally improve the translation quality by adding more examples to the example database, and by effecting corresponding improvements in the matching function by e.g. refining the thesaurus or re-estimating word similarities from an expanded bilingual corpus.</Paragraph>
    </Section>
    <Section position="4" start_page="32" end_page="32" type="sub_section">
      <SectionTitle>
1.4 The Problem of ScalabiHty
</SectionTitle>
      <Paragraph position="0"> Unfortunately, the pure analogical approach lacks scalability. The effort required to acquire and maintain the example database, the cost of the space required to store the examples, and the cost of the time required to search the database can become prohibitively high, since a pure analogical system requires a separate example for every linguistic variation. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML