File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-3123_evalu.xml

Size: 4,984 bytes

Last Modified: 2025-10-06 13:59:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3123">
  <Title>Constraining the Phrase-Based, Joint Probability Statistical Translation Model</Title>
  <Section position="5" start_page="155" end_page="156" type="evalu">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="155" end_page="155" type="sub_section">
      <SectionTitle>
4.1 Constraints
</SectionTitle>
      <Paragraph position="0"> The unconstrained joint model becomes intractable with very small amounts of training data. On a machine with 2 Gb of memory, we were only able to train 10,000 sentences of the German-English Europarl corpora. Beyond this, pruning is required to keep the model in memory during EM.</Paragraph>
      <Paragraph position="1">  Table1showsthattheapplicationofthewordconstraints considerably reduces the size of the space of phrasal alignments that is searched. It also improves the BLEU score of the model, by guiding it to explore the more promising areas of the search space.</Paragraph>
    </Section>
    <Section position="2" start_page="155" end_page="156" type="sub_section">
      <SectionTitle>
4.2 Scalability
</SectionTitle>
      <Paragraph position="0"> Even though the constrained joint model reduces complexity,pruningisstillneededinordertoscale up to larger corpora. After the initialization phase of the training, all phrase pairs with counts less  trained on 10,000 sentences of the German-English Europarl corpora and tested with the Europarl test set used in Koehn et al. (2003) than 10 million times that of the phrase pair with the highest count, are pruned from the phrase table. The model is also parallelized in order to speed up training.</Paragraph>
      <Paragraph position="1"> The translation models are included within a log-linear model (Och and Ney, 2002) which allows a weighted combination of features functions. For the comparison of the basic systems in Table 2 only three features were used for both the joint and the standard model: p(e|f), p(f|e) andthelanguagemodel,andtheyweregivenequal weights.</Paragraph>
      <Paragraph position="2"> The results in Table 2 show that the joint model is capable of training on large data sets, with a reasonable performance compared to the standard model. However, here it seems that the standard model has a slight advantage. This is almost certainlyrelatedtothefactthatthejointmodelresults null in a much smaller phrase table. Pruning eliminates many phrase pairs, but further investigations indicate that this has little impact on BLEU scores.  and model size in millions of phrase pairs for Spanish-English null The results in Table 3 compare the joint and the standard model with more features. Apart from including all Pharaoh's default features, we use two new features for both the standard and joint models: a 5-gram language model and a lexicalized reordering model as described in Koehn et al. (2005). The weights of the feature functions, or model components, are set by minimum error rate training provided by David Chiang from the University of Maryland.</Paragraph>
      <Paragraph position="3"> On smaller data sets (Koehn et al., 2003) the joint model shows performance comparable to the standard model, however the joint model does not reach the level of performance of the stan- null dard model showing the effect of the 5-gram language model, distortion length of 6 (dl) and the addition of lexical reordering for the English-Spanish and Spanish-English tasks.</Paragraph>
      <Paragraph position="4"> dard model for this larger data set. This could be due to the fact that the joint model results in a much smaller phrase table. During EM only phrase pairs that occur in an alignment visited during hill-climbing are retained. Only a very small proportion of the alignment space can be searched and this reduces the chances of finding optimum parameters. The small number of alignments visited would lead to data sparseness and over-fitting. Another factor could be efficiency trade-offs like the fast but not optimal competitive linking search for phrasal alignments.</Paragraph>
    </Section>
    <Section position="3" start_page="156" end_page="156" type="sub_section">
      <SectionTitle>
4.3 German-English submission
WealsosubmittedaGerman-Englishsystemusing
</SectionTitle>
      <Paragraph position="0"> the standard approach to phrase extraction. The purpose of this submission wasto validate the syntactic reordering method that we previously proposed (Collins et al., 2005). We parse the German training and test corpus and reorder it according to a set of manually devised rules. Then, we  useourphrase-basedsystemwithstandardphraseextraction, lexicalized reordering, lexical scoring, 5-gram LM, and the Pharaoh decoder.</Paragraph>
      <Paragraph position="1"> On the development test set, the syntactic re-ordering improved performance from 26.86 to 27.70. The best submission in last year's shared task achieved a score of 24.77 on this set.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML