File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-0834_evalu.xml

Size: 5,261 bytes

Last Modified: 2025-10-06 13:59:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0834">
  <Title>Word Graphs for Statistical Machine Translation</Title>
  <Section position="7" start_page="194" end_page="196" type="evalu">
    <SectionTitle>
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="194" end_page="195" type="sub_section">
      <SectionTitle>
5.1 Tasks
</SectionTitle>
      <Paragraph position="0"> We will show experimental results for two Chinese English translation tasks.</Paragraph>
      <Paragraph position="1">  main is travel expressions from phrase-books. This is a small task with a clean training and test corpus. The vocabulary is limited and the sentences are relatively short. The corpus statistics are shown in Table 1. The Chinese part of this corpus is already segmented into words.</Paragraph>
      <Paragraph position="2"> NIST Chinese English Task. The second task is the NIST Chinese English large data track task. For this task, there are many bilingual corpora available. The domain is news, the vocabulary is very large and the sentences have an average length of 30 words. We train our statistical models on various corpora provided by LDC. The Chinese part is segmented using the LDC segmentation tool. After the preprocessing, our training corpus consists of about three million sentences with somewhat more than 50 million running words. The corpus statistics of the preprocessed training corpus are shown in Table 2. We use the NIST 2002 evaluation data as test set.</Paragraph>
      <Paragraph position="3">  as a function of the word graph density for different window sizes.</Paragraph>
    </Section>
    <Section position="2" start_page="195" end_page="195" type="sub_section">
      <SectionTitle>
5.2 Search Space Analysis
</SectionTitle>
      <Paragraph position="0"> In Table 3, we show the search space statistics of the IWSLT task for different reordering window sizes.</Paragraph>
      <Paragraph position="1"> Each line shows the resulting graph densities after the corresponding step in our search as described in Section 3.2. Our search process starts with the re-ordering graph. The segmentation into phrases increases the graph densities by a factor of two. Doing the phrase translation results in an increase of the densities by a factor of twenty. Unsegmenting the phrases, i.e. replacing the phrase edges with a sequence of word edges doubles the graph sizes. Applying the language model results in a signi cant increase of the word graphs.</Paragraph>
      <Paragraph position="2"> Another interesting aspect is that increasing the window size by one roughly doubles the search space.</Paragraph>
    </Section>
    <Section position="3" start_page="195" end_page="196" type="sub_section">
      <SectionTitle>
5.3 Word Graph Error Rates
</SectionTitle>
      <Paragraph position="0"> In Figure 1, we show the graph word error rate for the IWSLT task as a function of the word graph density. This is done for different window sizes for the reordering. We see that the curves start with a single-best word error rate of about 50%. For the monotone search, the graph word error rate goes down to about 31%. Using local reordering during the search, we can further decrease the graph word error rate down to less than 17% for a window size of 5. This is almost one third of the single-best word error rate. If we aim at halving the single-best word error rate, word graphs with a density of less than  as a function of the word graph density for different window sizes.</Paragraph>
      <Paragraph position="1"> 200 would already be suf cient.</Paragraph>
      <Paragraph position="2"> In Figure 2, we show the same curves for the NIST task. Here, the curves start from a single-best word error rate of about 64%. Again, dependent on the amount of reordering the graph word error rate goes down to about 36% for the monotone search and even down to 23% for the search with a window of size 5. Again, the reduction of the graph word error rate compare to the single-best error rate is dramatic. For comparison we produced an N-best list of size 10 000. The N-best list error rate (or oraclebest WER) is still 50.8%. A word graph with a density of only 8 has about the same GWER.</Paragraph>
      <Paragraph position="3"> In Figure 3, we show the graph position-independent word error rate for the IWSLT task. As this error criterion ignores the word order it is not affected by reordering and we show only one curve.</Paragraph>
      <Paragraph position="4"> We see that already for small word graph densities the GPER drops signi cantly from about 42% down to less than 14%.</Paragraph>
      <Paragraph position="5">  independent word error rate as a function of the word graph density.</Paragraph>
      <Paragraph position="6"> In Figure 4, we show the graph BLEU scores for the IWSLT task. We observe that, similar to the GPER, the GBLEU score increases signi cantly already for small word graph densities. We attribute this to the fact that the BLEU score and especially the PER are less affected by errors of the word order than the WER. This also indicates that producing translations with correct word order, i.e. syntactically well-formed sentences, is one of the major problems of current statistical machine translation systems.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML