XML Viewer - n06-1002

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-1002_concl.xml
Size: 2,868 bytes
Last Modified: 2025-10-06 13:55:06
<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1002">
  <Title>Machine Translation</Title>
  <Section position="8" start_page="14" end_page="15" type="concl">
    <SectionTitle>
6. Conclusions
</SectionTitle>
    <Paragraph position="0"> In this paper we have teased apart the role of  phrases and handled each contribution via a distinct model best suited to the task. Non-compositional translations stay as MTU phrases. Context and robust estimation is provided by MTU-based n-gram models. Local and global ordering is handled by a tree-based model.</Paragraph>
    <Paragraph position="1"> The first interesting result is that at normal phrase sizes, augmenting an SMT system with MTU n-gram models improves quality; whereas replacing the standard phrasal channel models by the more theoretically sound MTU n-gram channel models leads to very similar performance.</Paragraph>
    <Paragraph position="2"> Even more interesting are the results on smaller phrases. A system using very small phrases (size 2) and MTU bigram models matches (English-French) or at least approaches (English-Japanese) the performance of the baseline system using large phrases (size 4). While this work does not yet obviate the need for phrases, we consider it a promising step in that direction.</Paragraph>
    <Paragraph position="3"> An immediate practical benefit is that it allows systems to use much smaller phrases (and hence smaller phrase tables) with little or no loss in quality. This result is particularly important for syntax-based systems, or any system that allows discontiguous phrases. Given a fixed length limit, the number of surface phrases extracted from any sentence pair of length n where all words are uniquely aligned is O(n), but the number of treelets is potentially exponential in the number of children; and the number of rules with two gaps extracted by Chiang (2005) is potentially O(n3). Our results using MTUs suggest that such systems can avoid unwieldy, poorly estimated long phrases and instead anchor decoding on shorter, more tractable knowledge units such as MTUs, incorporating channel model information and contextual knowledge with an MTU n-gram model.</Paragraph>
    <Paragraph position="4"> Much future work does remain. From inspecting the model weights of the best systems, we note that only the source order MTU n-gram model has a major contribution to the overall score of a given candidate. This suggests that the three distinct models, despite their different walk orders, are somewhat redundant. We plan to consider other approaches for conditioning on context. Furthermore phrasal channel models, in spite of the laundry list of problems presented here, have a significant impact on translation quality. We hope to replace them with effective models without the brittleness and sparsity issues of heavy lexicalization.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML