XML Viewer - p01-1050

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/p01-1050_concl.xml
Size: 5,776 bytes
Last Modified: 2025-10-06 13:53:06
<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1050">
  <Title>Towards a Unified Approach to Memoryand Statistical-Based Machine Translation</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> The approach to translation described in this paper is quite general. It can be applied in conjunction with other statistical translation mod-Sentence Humans Greedy with Greedy with Greedy without Commercial Commercial  els. And it can be applied in conjunction with existing translation memories. To do this, one would simply have to train the statistical model on the translation memory provided as input, determine the Viterbi alignments, and enhance the existing translation memory with word-level alignments as produced by the statistical translation model. We suspect that using manually produced TMEMs can only increase the performance as such TMEMs undergo periodic checks for quality assurance.</Paragraph>
    <Paragraph position="1"> The work that comes closest to using a statistical TMEM similar to the one we propose here is that of Vogel and Ney (2000), who automatically derive from a parallel corpus a hierarchical TMEM. The hierarchical TMEM consists of a set of transducers that encode a simple grammar. The transducers are automatically constructed: they reflect common patterns of usage at levels of abstractions that are higher than the words. Vogel and Ney (2000) do not evaluate their TMEM-based system, so it is difficult to empirically compare their approach with ours. From a theoretical perspective, it appears though that the two approaches are complementary: Vogel and Ney (2000) identify abstract patterns of usage and then use them during translation. This may address the data sparseness problem that is characteristic to any statistical modeling effort and produce better translation parameters.</Paragraph>
    <Paragraph position="2"> In contrast, our approach attempts to stir the statistical decoding process into directions that are difficult to reach when one relies only on the parameters of a particular translation model.</Paragraph>
    <Paragraph position="3"> For example, the two phrases &amp;quot;il est mort&amp;quot; and &amp;quot;he kicked the bucket&amp;quot; may appear only in one sentence in an arbitrary large corpus. The parameters learned from the entire corpus will very likely associate very low probability to the words &amp;quot;kicked&amp;quot; and &amp;quot;bucket&amp;quot; being translated into &amp;quot;est&amp;quot; and &amp;quot;mort&amp;quot;. Because of this, a statistical-based MT system will have trouble producing a translation that uses the phrase &amp;quot;kick the bucket&amp;quot;, no matter what decoding technique it employs. However, if the two phrases are stored in the TMEM, producing such a translation becomes feasible.</Paragraph>
    <Paragraph position="4"> If optimal decoding algorithms capable of searching exhaustively the space of all possible translations existed, using TMEMs in the style presented in this paper would never improve the performance of a system. Our approach works because it biases the decoder to search in sub-spaces that are likely to yield translations of high probability, subspaces which otherwise may not be explored. The bias introduced by TMEMs is a practical alternative to finding optimal translations, which is NP-complete (Knight, 1999).</Paragraph>
    <Paragraph position="5"> It is clear that one of the main strengths of the TMEM is its ability to encode contextual, long-distance dependencies that are incongruous with the parameters learned by current context poor, reductionist channel models. Unfortunately, the criterion used by the decoder in order to choose between a translation produced starting from a gloss and one produced starting from a TMEM is biased in favor of the gloss-based translation. It is possible for the decoder to produce a perfect translation using phrases from the TMEM, and yet, to discard the perfect translation in favor of an incorrect translation of higher probability that was obtained from a gloss (or from the TMEM).</Paragraph>
    <Paragraph position="6"> It would be desirable to develop alternative ranking techniques that would permit one to prefer in some instances a TMEM-based translation, even though that translation is not the best according to the probabilistic channel model. The examples in Table 7 shows though that this is not trivial: it is not always the case that the translation of high-Translations Does this translation Is this Is this the translation use TMEM translation of highest phrases? correct? probability? monsieur le pr'esident , je aimerais savoir .</Paragraph>
    <Paragraph position="7"> mr. speaker , i would like to know . yes yes yes mr. speaker , i would like to know . no yes yes je ne peux vous entendre , brian .</Paragraph>
    <Paragraph position="8"> i cannot hear you , brian . yes yes yes i can you listen , brian . no no no alors , je termine l`a - dessus .</Paragraph>
    <Paragraph position="9"> therefore , i will conclude my remarks . yes yes no therefore , i conclude - over . no no yes  est probability is the perfect one. The first French sentence in Table 7 is correctly translated with or without help from the translation memory. The second sentence is correctly translated only when the system uses a TMEM seed; and fortunately, the translation of highest probability is the one obtained using the TMEM seed. The translation obtained from the TMEM seed is also correct for the third sentence. But unfortunately, in this case, the TMEM-based translation is not the most probable. null Acknowledgments. This work was supported by DARPA-ITO grant N66001-00-1-9814.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML