File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-2016_evalu.xml

Size: 1,384 bytes

Last Modified: 2025-10-06 13:59:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-2016">
  <Title>Dependency-Based Statistical Machine Translation</Title>
  <Section position="13" start_page="94" end_page="94" type="evalu">
    <SectionTitle>
7 Evaluation
</SectionTitle>
    <Paragraph position="0"> We propose to evaluate system performance with version 0.9 of the NIST automated scorer (NIST, 2002), which is a modification of the BLEU system (Papineni et al., 2001). BLEU calculates a score based on a weighted sum of the counts of matching n-grams, along with a penalty for a significant difference in length between the system output and the reference translation closest in length. Experiments have shown a high degree of correlation between BLEU score and the translation quality judgments of humans. The most interesting difference in the NIST scorer is that it weights n-grams based on a notion of informativeness. Details of the scorer can be found in their paper.</Paragraph>
    <Paragraph position="1"> For our experiments, we propose to use the data from the PDT, which has already been segmented into training, held out (or development test), and evaluation sets. As a baseline, we will run the GIZA++ implementation of IBM's Model 4 translation algorithm under the same training conditions as our own system (Al-Onaizan et al., 1999; Och and Ney, 2000; Germann et al., 2001).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML