File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-3245_metho.xml

Size: 19,315 bytes

Last Modified: 2025-10-06 14:09:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3245">
  <Title>From Machine Translation to Computer Assisted Translation using Finite-State Models</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Machine translation with finite-state
</SectionTitle>
    <Paragraph position="0"> transducers Given a source sentence a0 , the goal of MT is to find a target sentence a1t that maximizes:</Paragraph>
    <Paragraph position="2"> The joint distribution a1a8a2a5a4 ta9 sa7 can be modeled by a Stochastic Finite State Transducer a10 (Pic'o and Casacuberta, 2001):</Paragraph>
    <Paragraph position="4"> is a finite-state network whose transitions are labeled by three items: 1. a source symbol (a word from the source language vocabulary); 2. a target string (a sequence of words from the target language vocabulary) and 3. a transition probability.</Paragraph>
    <Paragraph position="5">  They have been successfully applied into many translation tasks (Vidal, 1997; Amengual et al., 2000; Casacuberta et al., 2001). Furthermore, there exist efficient search algorithms like Viterbi (Viterbi, 1967) for the best path and the Recursive Enumeration Algorithm (REA) (Jim'enez and Marzal, 1999) for the a16 -best paths.</Paragraph>
    <Paragraph position="6"> One possible way of inferring SFSTs is the Grammatical Inference and Alignments for Transducer Inference (GIATI) technique (the previous name of this technique was MGTI - Morphic-Generator Transducer Inference) (Casacuberta et al., 2004). Given a finite sample of string pairs, it works in three steps: 1. Building training strings. Each training pair is transformed into a single string from an extended alphabet to obtain a new sample of strings. The &amp;quot;extended alphabet&amp;quot; contains words or substrings from source and target sentences coming from training pairs.</Paragraph>
    <Paragraph position="7"> 2. Inferring a (stochastic) regular grammar.</Paragraph>
    <Paragraph position="8"> Typically, smoothed a16 -gram is inferred from the sample of strings obtained in the previous step.</Paragraph>
    <Paragraph position="9"> 3. Transforming the inferred regular grammar into a transducer. The symbols associated to the grammar rules are transformed into source/target symbols by applying an adequate transformation, thereby transforming the grammar inferred in the previous step into a transducer.</Paragraph>
    <Paragraph position="10"> The transformation of a parallel corpus into a corpus of single sentences is performed with the help of statistical alignments: each word is joined with its translation in the output sentence, creating an &amp;quot;extended word&amp;quot;. This joining is done taking care not to invert the order of the output words. The third step is trivial with this arrangement. In our experiments, the alignments are obtained using the GIZA software (Och and Ney, 2000; Al-Onaizan et al., 1999), which implements IBM statistical models (Brown et al., 1993).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="11" type="metho">
    <SectionTitle>
3 Interactive search
</SectionTitle>
    <Paragraph position="0"> The concept of interactive search is closely related to the CAT paradigm. This paradigm introduces the new factor ta17 into the general machine translation equation (Equation 1). ta17 represents a prefix in the target language obtained as a result of the interaction between the human translator and the machine translation system.</Paragraph>
    <Paragraph position="1"> As a side effect of this reformulation, the optimization defined in Equation 3 is performed over the set of target suffixes rather than the set of complete target sentences. Thence, the goal of CAT in the finite-state transducer framework is to find a prediction of the best suffix a1ta18 , given a source sentence s, a prefix of the target sentence ta17 and a SFST a10 :</Paragraph>
    <Paragraph position="3"> A transducer can be understood as a weighted graph in which every path is a possible source-target sentence pair represented in a compact manner.</Paragraph>
    <Paragraph position="4"> Given a source sentence s to be translated, this sentence is initially employed to define a set of paths in the transducer, whose sequence of source symbols is compatible with the source sentence. Equation 3 is just defining the most probable path (target suffix a1t a18 ) among those that are compatible, having ta17 as a target prefix.</Paragraph>
    <Paragraph position="6"/>
    <Paragraph position="8"> The search for this path (the product of the probabilities associated with its edges is maximum) is performed according to the Viterbi decoding over the set of paths that were compatible with the source sentence. The concatenation of the target symbols of this best path will give place to the target sentence (translation).</Paragraph>
    <Paragraph position="9"> The solution to the search problem has been devised in two phases. The first one copes with the extraction of a word graph a0 from a SFST a10 given a source sentence s. A word graph represents the set of paths whose sequence of source symbols is compatible with the source sentence s.</Paragraph>
    <Paragraph position="10"> The second phase involves the search for the best translation over the word graph a0 . To be more precise, in the present work the concept of best translation has been extended to a set of best translations (n-best translations). This search can be carried out efficiently taking into account not only the a posteriori probability of a given translation a1t, but also the minimum edit cost with respect to the target prefix. The way in which this latter criterium is integrated in the search process will be explain in section 3.2.</Paragraph>
    <Section position="1" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
3.1 Word-graph derivation
</SectionTitle>
      <Paragraph position="0"> A word graph represents the set of all possible translations for a given source sentence s that were embeded in the SFST a10 . The derivation of the word graph is performed by intersecting the SFST a10 with the source sentence s defining a subgraph in a10 whose paths are compatible with the source sentence. null Interactive search can be simplified significantly by using this representation of the set of target sentences, since the inclusion of edit cost operations along with the search procedure introduces some peculiarities that can be solved efficiently in the word graph. An example of word graph is shown in Figure 1.</Paragraph>
    </Section>
    <Section position="2" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
3.2 Search for a16 -best translations given a
</SectionTitle>
      <Paragraph position="0"> prefix of the target sentence The application of this type of search is aimed at the core of CAT. In this paradigm, given a source sentence s, the human translator is provided with a list of n translations, also called a16 -best translations. Then, the human translator will proceed to accept a prefix of one of these a16 -best translations as correct, appending some rectifications to the selected prefix.</Paragraph>
      <Paragraph position="1"> This new prefix of the target sentence ta17 together with the source sentence s will generate a new set of best translations that will be again modified by the human translator. This process is repeated as many times as neccessary to achieve the desired final translation.</Paragraph>
      <Paragraph position="2"> Ideally, the task would be to find the target suffix ta18 that maximizes the probability a posteriori given a prefix ta17 of the target sentence and the input sentence. In practice, however, it may happen that ta17 is not present in the word graph a0 . The solution is to use not ta17 but a prefix ta1a17 that minimizes the edition distance with ta17 and is compatible with a0 . Therefore, the score of a target translation t a2 ta17 ta18 is characterized by two functions, the edit cost between the target prefix ta17 and the optimal prefix ta1  found in the word graph a0 and the a posteriori probability of ta18 (a3a5a4 a4 ta18 a6 ta1</Paragraph>
      <Paragraph position="4"> to value more significantly those translations that were closer to the user preferences, the list of a16 -best translations has been prioritized using two criteria: first, the minimum edit cost and then, by the a posteriori probability.</Paragraph>
      <Paragraph position="5"> The algorithm proposed to solve this search problem is an adapted version of the Recursive Enumeration Algorithm (REA) described in (Jim'enez and Marzal, 1999) that integrates the minimum edit cost algorithm in the search procedure to deal with words, introduced by the user, that are not present in the word graph. This algorithm consists of two parts: a0 Forward search that calculates the 1-best path from the initial state to every state in the word graph a0 . Paths in the word graph are weighted not only based on their a posteriori probability, but also on their edit cost respect to the target sentence prefix.</Paragraph>
      <Paragraph position="6"> To this purpose, ficticious edges have been inserted into the word graph to represent edition  operations like insertion, substitution and deletion. These edition operations have been included in the word graph in the following way: - Insertion: An insertion edge has been &amp;quot;inserted&amp;quot; as a loop for each state in the word graph with unitary cost.</Paragraph>
      <Paragraph position="7"> - Deletion: A deletion edge is &amp;quot;added&amp;quot;  for each arc in the word graph having the same source and target state than its sibling arc with unitary cost.</Paragraph>
      <Paragraph position="8">  - Substitution: Each arc in the word graph  is treated as a substitution edge whose edit cost is proportional to the levenshtein distance between the symbol associated with this arc and the word prefix employed to traverse this arc during the search. This substitution cost is zero when the word prefix matches the symbol in the word graph arc.</Paragraph>
      <Paragraph position="9"> a0 Backward search that enumerates candidates for the a1 -best path along the a4 a1a3a2a5a4 a7 -best path. This recursive algorithm defines the next best path that arrives at a given state a6 as the next best path that reaches a6 a1 plus the arc leaving from a6 a1 to a6 . If this next best path arriving at state a6 a1 has not been calculated yet, then the next best path procedure is called recursively until a 1-best path is found or no best paths are found.</Paragraph>
      <Paragraph position="10"> To reduce the computational cost of the search, the beam-search technique (Ney et al., 1992) has been implemented. During the word graph construction, two beam coefficients were employed to penalize those edges leading to backoff states over those ones arriving at normal states. Finally, a third beam coefficient controls how far in terms of number of edition operations a hypothesis.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="11" end_page="11" type="metho">
    <SectionTitle>
4 Experimental results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
4.1 Corpus features
</SectionTitle>
      <Paragraph position="0"> The corpus employed to perform experiments was the Xerox corpus (SchlumbergerSema S.A et al., 2001). It involves the translation of technical Xerox manuals from English to Spanish, French and German and vice-versa. Some statistics about the data used for training and test purposes are shown in Table 1.</Paragraph>
    </Section>
    <Section position="2" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
4.2 Sample session
</SectionTitle>
      <Paragraph position="0"> A TT2 interactive prototype, which uses the searching techniques presented in the previous sections, has been implemented. The user is able to customized this prototype in different ways: number of suggested translations, length in number of words of these suggestions, etc. In the example below, the number of suggestions is five and the length of these suggestions has not been bound.</Paragraph>
      <Paragraph position="1"> Example 1 This example shows the functionality and the interaction between the TT2 prototype and a translator through a translation instance from English to Spanish for a given sentence drawn from the Xerox corpus. For better understanding of this example the reference target sentence is given below: null Reference target sentence: Instalaci'on de controladores de impresora y archivos PPD.</Paragraph>
      <Paragraph position="2"> Source sentence: Installing the Printer Drivers and PPDs.</Paragraph>
      <Paragraph position="3">  de impresi'on y archivos PPD adaptados.</Paragraph>
      <Paragraph position="4"> Hypothesis 0.3: Instalaci'on de la los controladores de impresi'on y ver los archivos PPD.</Paragraph>
      <Paragraph position="5"> Hypothesis 0.4: Instalaci'on de la esta los controladores de impresi'on y ver los archivos PPD. User interaction 0: Hypothesis 0.2 is selected and the cursor is positioned at the beginning of the word &amp;quot;los&amp;quot;. Then, the translator would type the character &amp;quot;c&amp;quot;, that is, the next character in the reference target sentence.</Paragraph>
      <Paragraph position="6"> Prefix 0: Instalaci'on de c Hypothesis 1.0: Instalaci'on de c los controladores de impresi'on y archivos PPD adapatados.</Paragraph>
      <Paragraph position="7"> Hypothesis 1.1: Instalaci'on de c los controladores de impresi'on y ver los archivos PPD.</Paragraph>
      <Paragraph position="8"> Hypothesis 1.2: Instalaci'on de c esta los controladores de impresi'on y archivos PPD adapatados. null Hypothesis 1.3: Instalaci'on de c esta los controladores de impresi'on y ver los archivos PPD. Hypothesis 1.4: Instalaci'on de controladores de impresi'on y fax y en archivos PPD adapatados.</Paragraph>
      <Paragraph position="9"> User interaction 1: Hypothesis 1.4 is selected and the cursor is positioned between the character &amp;quot;s&amp;quot; and &amp;quot;i&amp;quot; of the word &amp;quot;impresi'on&amp;quot;. Then, the translator would type the next character in the reference target sentence: &amp;quot;o&amp;quot;.</Paragraph>
      <Paragraph position="10">  Prefix 1: Instalaci'on de controladores de impreso null Hypothesis 2.0: Instalaci'on de controladores de impresora y archivos PPD adaptados.</Paragraph>
      <Paragraph position="11"> Hypothesis 2.1: Instalaci'on de controladores de impresora y ver los archivos PPD.</Paragraph>
      <Paragraph position="12"> Hypothesis 2.2: Instalaci'on de controladores de impresora/fax y ver los archivos PPD.</Paragraph>
      <Paragraph position="13"> Hypothesis 2.3: Instalaci'on de controladores de impresora/fax y archivos PPD adaptados.</Paragraph>
      <Paragraph position="14"> Hypothesis 2.4: Instalaci'on de controladores de impresora y fax de CentreWare y ver los archivos PPD.</Paragraph>
      <Paragraph position="15"> User interaction 2: Hypothesis 2.0 is selected and the cursor is positioned at the end of the word &amp;quot;PPD&amp;quot;. The translator would just need to add the character &amp;quot;.&amp;quot;.</Paragraph>
      <Paragraph position="16"> Prefix 2: Instalaci'on de controladores de impresora y archivos PPD.</Paragraph>
      <Paragraph position="17"> Hypothesis 3.0: Instalaci'on de controladores de impresora y archivos PPD.</Paragraph>
      <Paragraph position="18"> Hypothesis 3.1: Instalaci'on de controladores de impresora y archivos PPD.: Hypothesis 3.2: Instalaci'on de controladores de impresora y archivos PPD..</Paragraph>
      <Paragraph position="19"> Hypothesis 3.3: Instalaci'on de controladores de impresora y archivos PPD...</Paragraph>
      <Paragraph position="20"> Hypothesis 3.4: Instalaci'on de controladores de impresora y archivos PPD.:.</Paragraph>
      <Paragraph position="21"> User interaction 3 : Hypothesis 3.0 is selected  and the user accepts the target sentence.</Paragraph>
      <Paragraph position="22"> Final hypothesis: Instalaci'on de controladores de impresora y archivos PPD.</Paragraph>
    </Section>
    <Section position="3" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
4.3 Translation quality evaluation
</SectionTitle>
      <Paragraph position="0"> The assessment of the techniques presented in section 3 has been carried out using three measures: 1. Translation Word Error Rate (TWER): It is defined as the minimum number of word substitution, deletion and insertion operations to convert the target sentence provided by the transducer into the reference translation. Also known as edit distance.</Paragraph>
      <Paragraph position="1">  2. Character Error Rate (CER): Edit distance in terms of characters between the target sentence provided by the transducer and the reference translation.</Paragraph>
      <Paragraph position="2"> 3. Key-Stroke Ratio (KSR): Number of key- null strokes that are necessary to achieve the reference translation plus the acceptance keystroke divided by the number of running characters.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="11" end_page="11" type="metho">
    <SectionTitle>
4. BiLingual Evaluation Understudy (BLEU)
</SectionTitle>
    <Paragraph position="0"> (Papineni et al., 2002): Basically is a function of the k-substrings that appear in the hypothesized target sentence and in the reference target sentence.</Paragraph>
    <Paragraph position="1"> These experiments were perfomed with 3gram transducers based on the GIATI technique. On the leftmost column appears the language pair employed for each experiment, English (En), Spanish (Es), French (Fr) and German (De). The main two central columns compare the results obtained with 1-best translation to 5-best translations. When using 5-best translations, that target sentence out of these five, that minimizes most the correspondent error measure is selected. The results are shown in Table 2.</Paragraph>
    <Paragraph position="2"> The best results were obtained between English and Spanish language pairs, in which the human translator would need to type less than 25% of the total reference sentences. In other words, this would result in a theoretically factor of 4 increase in the productivity of human translators. In fact, preliminary subjective evaluations have received positive feedback from professional translators when testing the prototype.</Paragraph>
    <Paragraph position="3">  Furthermore, in all cases there is a clear and significant improvement in error measures when moving from 1 to 5-best translations. This gain in translation quality dimishes in a log-wise fashion as the number of best translations increases. However, the number of hypotheses should be limited to the user capability to skim through the candidate translations and decide on which one to select.</Paragraph>
    <Paragraph position="4"> Table 3 presents the results obtained on a simplified version of the corpus. This simplification consists on tokenization, case normalization and the substitution of numbers, printer codes, etc. by their correspondent category labels.</Paragraph>
    <Paragraph position="5">  Pair of languages as English and French presents somewhat higher error rates, as is also the case between English and German, reflecting the complexity of the task faced in these experiments.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML