File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/c92-2101_intro.xml

Size: 15,141 bytes

Last Modified: 2025-10-06 14:05:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-2101">
  <Title>LEARNING TRANSLATION TEMPLATES FROM BILINGUAL TEXT</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> in the field of machine translation, there is growing interest in example-based approaches. The basic idea of example-based machine translation is to perform translation by imitating translation examples of similar sentences.\[Nagao84\] This is similar to a method often used by human translators. If appropriate examples are available, high-quality translations can be produced.</Paragraph>
    <Paragraph position="1"> We are developing a two-phase example-based machine translation system which is composed of two subsystems: learning of translation templates from examples and translation based on template matching.</Paragraph>
    <Paragraph position="2"> This paper discusses in particular how to learn translation templates from examples. While most previous research in this area has focused on other aspects,\[Sato90\]\[Sumita91\] we believe that automatic learning from examples is essential for implementing practical example-based machine translation systems. Fig.l Two-Phase Example-Based Machine Translation ACRES DE COLING-92, NANTES, 23-28 Aot'rr 1992 6 7 2 PROC. OF COLING-92, NANTES, AUG. 23-'98, 1992 translation system. As shown in the figure, a collection of translation templates are learned from a bilingual corpus. Source language (SL) texts are translated into target language (TL) texts by using the translation templates.</Paragraph>
    <Paragraph position="3"> Each translation template is a bilingual pair of pseudo sentences. And each pseudo sentence is a sentence which includes variables. Conditions concerning syntactic categories, semantic categories, etc.</Paragraph>
    <Paragraph position="4"> are attached to each variable. A word or phrase satisfying the conditions can be substituted for a variable. The two pseudo sentences constituting a template include the sarne set of variables. Parallel substitution of pairs of words or phrases, which arc translations of each other, for the variables in a template produces a pair of real sentences which are translations of each other.</Paragraph>
    <Paragraph position="5"> The learning procedure is divided into two steps. In the first step, a series of translation templates is geuerated from each pair of sentences in the corpus.</Paragraph>
    <Paragraph position="6"> Tiffs first step is subdivided into (a) coupling of corresponding units (words and phrases) aud (h) generation of translation templates as shown in Fig. 2.</Paragraph>
    <Paragraph position="7"> The details of (a) and (b) are described in Section 3 and Section 4, respectively. In the second step, translation templates are refined to resolve conflicts among the translation templates. The details of the second step are described iu Section 5.</Paragraph>
    <Paragraph position="8"> Translation ba.wal on templates consists of (i) SI.</Paragraph>
    <Paragraph position="9"> template matching, (ii) translation of words and phrases, and (iii) TL seutence generation, as shown in Fig. 3.</Paragraph>
    <Paragraph position="10"> Translation temp 'lates arc regarded as directional from SL to TL, although they are actually bidirectional. First, a translation template whose SL part matches the SL sentence to be translated is retrieved. Words and phrases in the SL sentence are then bound to each variable in the template. Second, the words and phrases which are bound to variables are translated by a conventional machine hanslatiou method. Aml finally, a TL sentence is generated by substituting the translated words toni phrases for the wwiables in the TL part of the translation template.</Paragraph>
    <Paragraph position="11">  3. Coupling of Corresponding Units itl</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Bilingual Text
</SectionTitle>
      <Paragraph position="0"> An algorithm for coupling corresponding units (words ~lllCl phrases) betweeu a .sentence in ouc langnage and its translation in another language is described. Although it is applicable to any pair of language.s, it is explained for Jalmnese and English. The procedure consists of four steps: (a) analysis of Japanese sentence, (b) analysis of &lt;Translation example&gt; t/~ -- F' 0):~ ~ 13: ~:::~ 5 1 2 ! &lt; ~( b ~ g5 ;5 o The maximum lenglh ol a record is 512 Bytes.</Paragraph>
      <Paragraph position="1">  j 1 2 3 4 5 6 7 8 iPS 1 pp ,, s J \[11 \[71 1 I ; P 2 bought 4 N NP PP VP 3 - \[21 \[41 \[6\]  words between Japanese and English sentences, and (d) coupling of corresponding phrases between Japanese and English sentences.</Paragraph>
      <Paragraph position="2"> (a) Analysis of Japanese sentence The Japanese sentence is segmented into words by consulting a Japanese language dictionary. Then it is parsed with a parallel parsing algorithm, e.g. the CYK (Cocke-Younger-Kasami) method. As a result, a Japanese sentence analysis table is produced which expresses all possible phrases in the sentence. This Japanese sentence analysis table is a triangular matrix, as shown in the upper right portion of Fig. 4. The syntactic categories (phrase markers) of all possible phrases constituted by i-th through (i+j-1)-th words in the Japanese sentence are written in the (id)-element of the table. Resolution of syntactic ambiguity is postponed until the phrase coupling step.</Paragraph>
      <Paragraph position="3"> Acrr.s Dn COLING-92, NAN'I~, 23-28 nOra&amp;quot; 1992 6 7 4 PROC. OF COLING-92, NANTES, AUO. 23-28, 1992 (b) Analysis of English seutence The English sentence is similarly analyzed and an English sentence analysis table is obtained. The English sentence analysis table is a triangular matrix, as shown in the lower left portkm of Fig. 4.</Paragraph>
      <Paragraph position="4"> (c) Coupling of possible corresponding words Each pair of words between the Japanese senteace and its translation in English is coupled if, and only if, the pair is tound in the bilingual dictionary. Obviously, there is potential ambiguity in correspondence between words if the sentence includes words which have a common translation. The most typical case is when a word occurs more than once in a sentence, as shown in Fig. 5. In this example, tile correslxmdence between the two ') .7,' and the two 'path' cannot be determined simply by consulting the bilingual dictionary. This anthiguity will therefore be resolved in the process of coupling phrases. The COUlthng of words between the Japanese and English sentences is done th order to obtain candidates for variables in translation templates. We therefore restrict coupling to content words. A content word is usually replaceable with another word without affecting the grammar of the sentence. Verbs of course are closely related with sentence pattern, ltowever, a group of verbs can produce the same sentence pattern. Therefore verbs ,are candidates for variables. On the other Imnd, function words are closely related to sentence patterns. Moreover, correspondence is not straightforward between Japanese function words and English function words. Therefore, function words shoukl be excluded from coupling.</Paragraph>
      <Paragraph position="5"> ((1) Coupling of corresponding phrases The Japanese and English sentence analysis tables are searched bottom up for corresponding phrases. For each phrase X in the Japanese analysis table, the English sentence analysis table is searched for a phrase Y which includes a counterpart for each word inside of X, but none for words outside of X. If a Y is found, X and Y are coupled together.</Paragraph>
      <Paragraph position="6"> (i) Resolution of ambiguity in correspondence between words Ambiguity in correspondence between words is resolvtxl during the phrase coupling process as Ii~llows. Assmne that a word J in the Japanese seatenee has more than one counterpart in the English sentence. When a phrase X which includes J is coupled to a phrase Y in the English sentence, it is assumed that the correct counterpart lot' J is included in Y. This decision is highly reliable, as shorter phrases are examined before longer phrases. An example of anthiguity resolution in correspondence between words is given in Fig. 5. In this example, the ambiguity in correspondence between the two '.' '~ .X' and the two 'path' is resolved simultaneously as NP ( .,':.x ~C/) ) and NP ( path name ) are couplexl togethel: ltere, X  ( w I w 2 &amp;quot;&amp;quot; Wn) stands for a phrase whose syntactic category is X and which is constituted by words w 1' w2' &amp;quot;'', and W n, (ii) Resolution of syntactic ambiguity  A phrase X in one hmguage sentence S is uot coupled to any phrase in the other langnage ~ntencc T, if T d(~s not include a phrase which includes counterparts fi)r all the words inside X, but none for words outside of X.</Paragraph>
      <Paragraph position="7"> This means that syntactic ambiguity is resolved implicitly in file process of coupling phrases. An example of this is shown in Fig. 4. While the English sentence analysis table contains NP ( a car with four dollars ), tim Japanese sentence analysis table does not contain a phrase which includes ' 4 ', ' \]e it/, and ' tic and none of the other content words. Accordingly NP ( a car with fimr dollars ) is not coupled to any phrase in the Japanese sentence. This means that NP ( a car with foar dollar.,; ) is m jetted.</Paragraph>
      <Paragraph position="8"> Fig. 6 shows another example of ambiguity resolution. &amp;quot;\[he pair of sentences is 'A rf) B ~_ C' and 'B and C of A'. While the Japanese sentence analysis table contains NP ( A C/) B ), the English sentence analysis table does not contain a phrase which includes A and B and does not include C. Accordingly NP ( A C/)  one phrase containing the same set of content words. In Fig. 7(a). for example, S'( .,':.7, ~ ~ .elliOT 7~ )and ADVP(~J, ~ ~ ~-J-~ ~ )contain thesameset of content words {/'C/Y,, ~,-~ 7o }. Likewise, S'( the path name is omitted ) and ADVP ( If the path name is omitted ) contain the same set of content words {path, name, omit}. There are several possibilities for deciding which phrase to couple to which phrase. We decided that the smallest ones should be coupled together and the largest ones should be coupled together. In the above example, S (/~7, ~ '~- ~I~T~ )and S'( the path name is omitted ) are coupled together, and ADVP (.m 7, :~ ~l~-J-~ ~ ) and ADVP ( If the path name is omitted ) are coupled together.</Paragraph>
      <Paragraph position="9"> This strategy is also effective when a content word has no counterpart, as shown in Fig. 7(b). The  Each pair of coupled units is a candidate for being replaced with a variable. A template is obtained by choosing a subset of the coupled unit and assigning a unique variable to each pair in the subset. The syntactic categories (phrase markers) of the unit in the Japanese sentence are appended to the variable in the Japanese part of the template. Likewise, the syntactic categories of the unit in the English sentence are appended to the variable in the English part of the template.</Paragraph>
      <Paragraph position="10"> The above procedure can be applied to any subset of the coupled units, as lung as units which do not overlap are chosen. Accordingly, a series of translation templates can be generated from a pair of sentences. A pair of sentences and some of the translation templates generated from it are shown in Fig. 2.</Paragraph>
      <Paragraph position="11"> A translation template need not correspond to a full sentence. Fragmentary translation templates, which correspond to fragments in a sentence, improve the flexibility of the system. The result of translation by a  A fragmentary translation template is generated by choosing a coupled unit pair and applying the above-described procedure to the inside of the units. The syntactic categories of the units are appended to the fragmentary translation template. An example of a fragmentary translation template is:  ADVP( X\[NP\] ~T~ ~ ) / ADVP ( if X \[NP\] is omitted ), which is generated from the following pair of sentences. \[ If the path name is omitted, the carrent path is assumed.</Paragraph>
      <Paragraph position="12"> 5. Refinement of Translation Templates  Obviously the procedure described here also generates some ineffective templates, which should of course be eliminated from the collection of translation templates. The remaining ones should be refined.</Paragraph>
      <Paragraph position="13"> In this stage, translation templates are considered to be directional. All the translation templates obtained from a bilingual corpus are grouped by their SL part, and further subgrouped by their TI, part. When there is a group of templates whose SL parts are the same but whose TL parts are different, we say that they conflict with each other, because they can produce different translations for the same sentence.</Paragraph>
      <Paragraph position="14"> If a template does not conflict with any other template, it is judged effective. It will probably produce good translations for sentences in the domain of the corpus. If a template conflicts with many templates, it is judged useless and eliminated from the collection of templates. If a template conflicts with a lower number of templates, it is judged incomplete but possibly effective. Templates whicll conflict with each other are refined by examining the original translation examples from which they were generated. That is, semantic categories which thstinguish each template are extracted from the original translation examples, and attached to variables in the template.</Paragraph>
      <Paragraph position="15"> A simple example is given below. There is a conflict between templates (#1) and (#2):  (#1) play XINP\] ~ XINP\] ~ ~ ~.</Paragraph>
      <Paragraph position="16"> (#2) play XINP\] &amp;quot; X\[NP\] ~ 0&amp;quot; &lt;.</Paragraph>
      <Paragraph position="17">  The following are translation examples from which (#1) is generated: play baseball / ~'.~ ~ &amp;quot;# 70.</Paragraph>
      <Paragraph position="18"> play tennis / 7&amp;quot; :-- ~ ~ -~ 70.</Paragraph>
      <Paragraph position="19"> And the following are translation exantples from which (#2) is generated: play the piano / I~ T .\] ~_ ~ &lt; .</Paragraph>
      <Paragraph position="20"> play file violin//':4 * ') &gt;'~ U&amp;quot; &lt;.</Paragraph>
      <Paragraph position="21"> The conflict between (#1) and (#2) is resolved by using the semantic categories 'sporf ,and 'instrument' extracted from these examples. The following are the refined version of the template.s:  (#1') play X\[NP/sport\]-- X\[NP\] ~7o.</Paragraph>
      <Paragraph position="22"> (t12') play X\[NP/instrument\] ~ XINP\] ~ U&amp;quot; &lt;. 6. Discussion</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.1 Advantages of twn-phase example-based
</SectionTitle>
      <Paragraph position="0"> machine translation &amp;quot;file proposed system has the lollowing advantages.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML