File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3124_intro.xml

Size: 1,643 bytes

Last Modified: 2025-10-06 14:04:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3124">
  <Title>Microsoft Research Treelet Translation System: NAACL 2006 Europarl Evaluation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> The dependency treelet translation system developed at MSR is a statistical MT system that takes advantage of linguistic tools, namely a source language dependency parser, as well as a word alignment component. [1] To train a translation system, we require a sentence-aligned parallel corpus. First the source side is parsed to obtain dependency trees. Next the corpus is word-aligned, and the source dependencies are projected onto the target sentences using the word alignments. From the aligned dependency corpus we extract all treelet translation pairs, and train an order model and a bi-lexical dependency model.</Paragraph>
    <Paragraph position="1"> To translate, we parse the input sentence, and employ a decoder to find a combination and ordering of treelet translation pairs that cover the source tree and are optimal according to a set of models. In a now-common generalization of the classic noisy-channel framework, we use a log-linear combination of models [2], as in below:</Paragraph>
    <Paragraph position="3"> Such an approach toward translation scoring has proven very effective in practice, as it allows a translation system to incorporate information from a variety of probabilistic or non-probabilistic sources. The weights L = { lf } are selected by discriminatively training against held out data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML