XML Viewer - w06-1607

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1607_evalu.xml
Size: 5,951 bytes
Last Modified: 2025-10-06 13:59:50
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1607">
  <Title>Phrasetable Smoothing for Statistical Machine Translation</Title>
  <Section position="7" start_page="57" end_page="58" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> We carried out experiments in two different settings: broad-coverage ones across six European language pairs using selected smoothing techniques and relatively small training corpora; and Chinese to English experiments using all implemented smoothing techniques and large training corpora. For the black-box techniques, the smoothed phrase table replaced the original relative-frequency (RF) phrase table. For the glass-box techniques, a phrase table (either the original RF phrase table or its replacement after black-box smoothing) was interpolated in loglinear fashion with the smoothing glass-box distribution, with weights set to maximize BLEU on a development corpus.</Paragraph>
    <Paragraph position="1"> To estimate the significance of the results across different methods, we used 1000-fold pairwise bootstrap resampling at the 95% confidence level.</Paragraph>
    <Section position="1" start_page="57" end_page="57" type="sub_section">
      <SectionTitle>
5.1 Broad-Coverage Experiments
</SectionTitle>
      <Paragraph position="0"> In order to measure the benefit of phrasetable smoothing for relatively small corpora, we used the data made available for the WMT06 shared task (WMT, 2006). This exercise is conducted openly with access to all needed resources and is thus ideal for benchmarking statistical phrase-based translation systems on a number of language pairs.</Paragraph>
      <Paragraph position="1"> The WMT06 corpus is based on sentences extracted from the proceedings of the European Parliament. Separate sentence-aligned parallel corpora of about 700,000 sentences (about 150MB) are provided for the three language pairs having one of French, Spanish and German with English. SRILM language models based on the same source are also provided for each of the four languages. We used the provided 2000-sentence devsets for tuning loglinear parameters, and tested on the 3064-sentence test sets.</Paragraph>
      <Paragraph position="2"> Results are shown in table 1 for relative-frequency (RF), Good-Turing (GT), Kneser-Ney with 1 (KN1) and 3 (KN3) discount coefficients; and loglinear combinations of both RF and KN3 phrasetables with Zens-Ney-IBM1 (ZN-IBM1) smoothed phrasetables (these combinations are denoted RF+ZN-IBM1 and KN3+ZN-IBM1).</Paragraph>
      <Paragraph position="3"> It is apparent from table 1 that any kind of phrase table smoothing is better than using none; the minimum improvement is 0.45 BLEU, and the difference between RF and all other methods is statistically significant. Also, Kneser-Ney smoothing gives a statistically significant improvement over GT smoothing, with a minimum gain of 0.30 BLEU. Using more discounting coefficients does not appear to help. Smoothing relative frequencies with an additional Zens-Ney phrasetable gives about the same gain as Kneser-Ney smoothing on its own. However, combining Kneser-Ney with Zens-Ney gives a clear gain over any other method (statistically significant for all language pairs except en-es and en-de) demonstrating that these approaches are complementary.</Paragraph>
    </Section>
    <Section position="2" start_page="57" end_page="58" type="sub_section">
      <SectionTitle>
5.2 Chinese-English Experiments
</SectionTitle>
      <Paragraph position="0"> To test the effects of smoothing with larger corpora, we ran a set of experiments for Chinese-English translation using the corpora distributed for the NIST MT05 evaluation (www.nist.gov/speech/tests/mt). These are summarized in table 2. Due to the large size of the out-of-domain UN corpus, we trained one phrasetable on it, and another on all other parallel corpora (smoothing was applied to both). We also used a subset of the English Gigaword corpus to augment the LM training material.</Paragraph>
      <Paragraph position="1">  experiments, including fixed-discount with uni-gram smoothing (FDU), and Koehn-Och-Marcu smoothing with the IBM1 model (KOM-IBM1)  smoothing method fr [?]- en es [?]- en de [?]- en en [?]- fr en [?]- es en [?]- de  as described in section 3.3. As with the broad-coverage experiments, all of the black-box smoothing techniques do significantly better than the RF baseline. However, GT appears to work better in the large-corpus setting: it is statistically indistinguishable from KN3, and both these methods are significantly better than all other fixed-discount variants, among which there is little difference. null Not surprisingly, the two glass-box methods, ZN-IBM1 and KOM-IBM1, do poorly when used on their own. However, in combination with another phrasetable, they yield the best results, obtained by RF+ZN-IBM1 and GT+KOM-IBM1, which are statistically indistinguishable. In constrast to the situation in the broad-coverage setting, these are not significantly better than the best black-box method (GT) on its own, although RF+ZN-IBM1 is better than all other glass-box combinations.</Paragraph>
      <Paragraph position="2"> smoothing method BLEU score  A striking difference between the broad-coverage setting and the Chinese-English setting is that in the former it appears to be beneficial to apply KN3 smoothing to the phrasetable that gets combined with the best glass-box phrasetable (ZN), whereas in the latter setting it does not. To test whether this was due to corpus size (as the broad-coverage corpora are around 10% of those for Chinese-English), we calculated Chinese-English learning curves for the RF+ZN-IBM1 and KN3-ZN-IBM1 methods, shown in figure 1. The results are somewhat inconclusive: although the KN3+ZN-IBM1 curve is perhaps slightly flatter, the most obvious characteristic is that this method appears to be highly sensitive to the particular corpus sample used.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML