File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1091_evalu.xml

Size: 3,566 bytes

Last Modified: 2025-10-06 13:59:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1091">
  <Title>An Algorithmic Framework for the Decoding Problem in Statistical Machine Translation</Title>
  <Section position="10" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiments and Results
</SectionTitle>
    <Paragraph position="0"> In this section we describe our experimental setup and present the initial results. Our goal  was not only to evaluate the performance of our algorithms on real data, but also to evaluate how easy it is to code the algorithm and whether a straightforward implementation of the algorithm with no parameter tuning can give satisfactory results.</Paragraph>
    <Paragraph position="1"> We implemented the algorithms in C++ and conducted the experiments on an IBM RS-6000 dual processor machine with 1 GB of RAM. We built a French-English translation model (IBM Model 3) by training over a corpus of 100 K sentence pairs from theHansard corpus. Thetranslation direction was from French to English. We built an English language model by training over a corpus consisting of about 800 million words. We divided the test sentences into several classes based on their length. Each length class consisted of 300 test French sentences.</Paragraph>
    <Paragraph position="2"> We implemented four algorithms -1.1 (NaiveDecode), 1.2 (Alternating Search with l restricted to m), 2.1 (NaiveDecode with l varying from m/2 to 2m) and 2.2 (Alternating Search). In order to compare the performance of the algorithms proposed in this paper with a previous decoding algorithm, we also implemented the dynamic programming based algorithm by (Tillman, 2001). For each of the algorithms, we computed the following:  1. Average time taken for translation for each length class.</Paragraph>
    <Paragraph position="3"> 2. NIST score of the translations for each length class.</Paragraph>
    <Paragraph position="4"> 3. Average value of the optimization  function for the translations for each length class.</Paragraph>
    <Paragraph position="5"> The results of the experiments are summarized in Plots 1, 2 and 3. In all the plots, the length class is denoted by the x-axis. 11-20 indicates the class with sentences of length between 11 words to 20 words. 51 indicates the group of sentences with sentence length 51 or more. Plot 1 shows the average time taken by the algorithms for translating the sentences in each length class. Time is shown in seconds on a log scale. Plot 2 shows the NIST score of the translations for each length class while Plot 3 shows the average log score of the translations (-ve log ofPr(f,a|e)Pr(e)) again for each length class. It can be seen from Plot 1 that all of our algorithms are indeed very fast in practice. They are, in fact, an order faster than the Held-Karp algorithm. Our algorithms are able to translate even long sentences (50+ words) in a few seconds.</Paragraph>
    <Paragraph position="6"> Plot 3 shows that the log scores of the translations computed by our algorithms are very close to those computed by the Held-Karp algorithm. Plot 2 compares the NIST scores obtained with each of the algorithm. Among the four algorithms based on our framework, Algorithm 2.2 gives the best NIST scores as expected. Although, the log scores of our algorithms are comparable to those of the Held-Karp algorithm, our NIST scores are lower. It should be noted that the mathematical quantity that our algorithm tries to optimize is the log score. Plot 3 shows that our algorithms are quite good at findingsolutions with good scores.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML