XML Viewer - w05-0836

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0836_intro.xml
Size: 2,846 bytes
Last Modified: 2025-10-06 14:03:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0836">
  <Title>Training and Evaluating Error Minimization Rules for Statistical Machine Translation</Title>
  <Section position="2" start_page="0" end_page="208" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> State of the art statistical machine translation takes advantage of exponential models to incorporate a large set of potentially overlapping features to select translations from a set of potential candidates.</Paragraph>
    <Paragraph position="1"> As discussed in (Och, 2003), the direct translation model represents the probability of target sentence 'English' e = e1 ...eI being the translation for a source sentence 'French' f = f1 ...fJ through an exponential, or log-linear model</Paragraph>
    <Paragraph position="3"> where e is a single candidate translation for f from the set of all English translations E, l is the parameter vector for the model, and each hk is a feature function of e and f. In practice, we restrict E to the set Gen(f) which is a set of highly likely translations discovered by a decoder (Vogel et al., 2003). Selecting a translation from this model under the Maximum A Posteriori (MAP) criteria yields</Paragraph>
    <Paragraph position="5"> This decision rule is optimal under the zero-one loss function, minimizing the Sentence Error Rate (Mangu et al., 2000). Using the log-linear form to model pl(e|f) gives us the flexibility to introduce overlapping features that can represent global context while decoding (searching the space of candidate translations) and rescoring (ranking a set of candidate translations before performing the argmax operation), albeit at the cost of the traditional source-channel generative model of translation proposed in (Brown et al., 1993).</Paragraph>
    <Paragraph position="6"> A significant impact of this paradigm shift, however, has been the movement to leverage the flexibility of the exponential model to maximize performance with respect to automatic evaluation met- null rics. Each evaluation metric considers different aspects of translation quality, both at the sentence and corpus level, often achieving high correlation to human evaluation (Doddington, 2002). It is clear that the decision rule stated in (1) does not reflect the choice of evaluation metric, and substantial work has been done to correct this mismatch in criteria. Approaches include integrating the metric into the decision rule, and learning l to optimize the performance of the decision rule. In this paper we will compare and evaluate several aspects of these techniques, focusing on Minimum Error Rate (MER) training (Och, 2003) and Minimum Bayes Risk (MBR) decision rules, within a novel training environment that isolates the impact of each component of these methods.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML