File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-2016_evalu.xml
Size: 1,384 bytes
Last Modified: 2025-10-06 13:59:28
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-2016"> <Title>Dependency-Based Statistical Machine Translation</Title> <Section position="13" start_page="94" end_page="94" type="evalu"> <SectionTitle> 7 Evaluation </SectionTitle> <Paragraph position="0"> We propose to evaluate system performance with version 0.9 of the NIST automated scorer (NIST, 2002), which is a modification of the BLEU system (Papineni et al., 2001). BLEU calculates a score based on a weighted sum of the counts of matching n-grams, along with a penalty for a significant difference in length between the system output and the reference translation closest in length. Experiments have shown a high degree of correlation between BLEU score and the translation quality judgments of humans. The most interesting difference in the NIST scorer is that it weights n-grams based on a notion of informativeness. Details of the scorer can be found in their paper.</Paragraph> <Paragraph position="1"> For our experiments, we propose to use the data from the PDT, which has already been segmented into training, held out (or development test), and evaluation sets. As a baseline, we will run the GIZA++ implementation of IBM's Model 4 translation algorithm under the same training conditions as our own system (Al-Onaizan et al., 1999; Och and Ney, 2000; Germann et al., 2001).</Paragraph> </Section> class="xml-element"></Paper>