File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1023_metho.xml

Size: 8,128 bytes

Last Modified: 2025-10-06 14:09:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1023">
  <Title>Statistical Machine Translation with Wordand Sentence-Aligned Parallel Corpora</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Parameter Estimation Using Word- and
Sentence-Aligned Corpora
</SectionTitle>
    <Paragraph position="0"> As an alternative to collecting a huge amount of sentence-aligned training data, by annotating some of our sentence pairs with word-level alignments we can explicitly provide information to highlight plausible alignments and thereby help parameters converge upon reasonable settings with less training data.</Paragraph>
    <Paragraph position="1"> Since word-alignments are inherent in the IBM translation models it is straightforward to incorporate this information into the parameter estimation procedure. For sentence pairs with explicit word-level alignments marked, fractional counts over all permissible alignments need not be collected. Instead, whole counts are collected for the single hand annotated alignment for each sentence pair which has been word-aligned. By doing this the expected complete log likelihood collapses to a single term, the complete log likelihood (p(f,a|e)), and the E-step is circumvented.</Paragraph>
    <Paragraph position="2"> The parameter estimation procedure now involves maximizing the likelihood of data aligned only at the sentence level and also of data aligned at the word level. The mixed likelihood function, M, combines the expected information contained in the sentence-aligned data with the complete information contained in the word-aligned data.</Paragraph>
    <Paragraph position="4"> Here s and w index the Ns sentence-aligned sentences and Nw word-aligned sentences in our corpora respectively. Thus M combines the expected complete log likelihood and the complete log likelihood. In order to control the relative contributions of the sentence-aligned and word-aligned data in the parameter estimation procedure, we introduce a mixing weight l that can take values between 0 and</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 The impact of word-level alignments
</SectionTitle>
      <Paragraph position="0"> The impact of word-level alignments on parameter estimation is closely tied to the structure of the IBM Models. Since translation and word alignment parameters are shared between all sentences, the posterior alignment probability of a source-target word pair in the sentence-aligned section of the corpus that were aligned in the word-aligned section will tend to be relatively high.</Paragraph>
      <Paragraph position="1"> In this way, the alignments from the word-aligned data effectively percolate through to the sentence-aligned data indirectly constraining the E-step of EM.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Weighting the contribution of
</SectionTitle>
      <Paragraph position="0"> word-aligned data By incorporating l, Equation 6 becomes an interpolation of the expected complete log likelihood provided by the sentence-aligned data and the complete log likelihood provided by word-aligned data.</Paragraph>
      <Paragraph position="1"> The use of a weight to balance the contributions of unlabeled and labeled data in maximum likelihood estimation was proposed by Nigam et al.</Paragraph>
      <Paragraph position="2"> (2000). l quantifies our relative confidence in the expected statistics and observed statistics estimated from the sentence- and word-aligned data respectively. null Standard maximum likelihood estimation (MLE) which weighs all training samples equally, corresponds to an implicit value of lambda equal to the proportion of word-aligned data in the whole of the training set: l = NwNw+Ns . However, having the total amount of sentence-aligned data be much larger than the amount of word-aligned data implies a value of l close to zero. This means that M can be maximized while essentially ignoring the likelihood of the word-aligned data. Since we believe that the explicit word-alignment information will be highly effective in distinguishing plausible alignments in the corpus as a whole, we expect to see benefits by setting l to amplify the contribution of the word-aligned data set particularly when this is a relatively small portion of the corpus.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Experimental Design
</SectionTitle>
    <Paragraph position="0"> To perform our experiments with word-level alignements we modified GIZA++, an existing and freely available implementation of the IBM models and HMM variants (Och and Ney, 2003). Our modifications involved circumventing the E-step for sentences which had word-level alignments and incorporating these observed alignment statistics in the M-step. The observed and expected statistics were weighted accordingly by l and (1[?]l) respectively as were their contributions to the mixed log likelihood. null In order to measure the accuracy of the predictions that the statistical translation models make under our various experimental settings, we choose the alignment error rate (AER) metric, which is defined in Och and Ney (2003). We also investigated whether improved AER leads to improved translation quality. We used the alignments created during our AER experiments as the input to a phrase-based decoder. We translated a test set of 350 sentences, and used the Bleu metric (Papineni et al., 2001) to automatically evaluate machine translation quality.</Paragraph>
    <Paragraph position="1"> We used the Verbmobil German-English parallel corpus as a source of training data because it has been used extensively in evaluating statistical translation and alignment accuracy. This data set comes with a manually word-aligned set of 350 sentences which we used as our test set.</Paragraph>
    <Paragraph position="2"> Our experiments additionally required a very large set of word-aligned sentence pairs to be incorporated in the training set. Since previous work has shown that when training on the complete set of 34,000 sentence pairs an alignment error rate as low as 6% can be achieved for the Verbmobil data, we automatically generated a set of alignments for the entire training data set using the unmodified version of GIZA++. We wanted to use automatic alignments in lieu of actual hand alignments so that we would be able to perform experiments using large data sets. We ran a pilot experiment to test whether our automatic would produce similar results to manual alignments.</Paragraph>
    <Paragraph position="3"> We divided our manual word alignments into training and test sets and compared the performance of models trained on human aligned data against models trained on automatically aligned data. A  Models trained with sentence-aligned data 100-fold cross validation showed that manual and automatic alignments produced AER results that were similar to each other to within 0.1%.2 Having satisfied ourselves that automatic alignment were a sufficient stand-in for manual alignments, we performed our main experiments which fell into the following categories:  1. Verifying that the use of word-aligned data has an impact on the quality of alignments predicted by the IBM Models, and comparing the quality increase to that gained by using a bilingual dictionary in the estimation stage.</Paragraph>
    <Paragraph position="4"> 2. Evaluating whether improved parameter estimates of alignment quality lead to improved translation quality.</Paragraph>
    <Paragraph position="5"> 3. Experimenting with how increasing the ratio of word-aligned to sentence-aligned data affected the performance.</Paragraph>
    <Paragraph position="6"> 4. Experimenting with our l parameter which allows us to weight the relative contributions of the word-aligned and sentence-aligned data, and relating it to the ratio experiments.</Paragraph>
    <Paragraph position="7"> 5. Showing that improvements to AER and translation quality held for another corpus.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML