XML Viewer - p04-1023

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1023_intro.xml
Size: 4,166 bytes
Last Modified: 2025-10-06 14:02:22
<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1023">
  <Title>Statistical Machine Translation with Wordand Sentence-Aligned Parallel Corpora</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Parameter Estimation Using
Sentence-Aligned Corpora
</SectionTitle>
    <Paragraph position="0"> The task of statistical machine translation is to choose the source sentence, e, that is the most probable translation of a given sentence, f, in a foreign language. Rather than choosing e[?] that directly maximizes p(e|f), Brown et al. (1993) apply Bayes' rule and select the source sentence: e[?] = argmaxe p(e)p(f|e). (1) In this equation p(e) is a language model probability and is p(f|e) a translation model probability. A series of increasingly sophisticated translation models, referred to as the IBM Models, was defined in Brown et al. (1993).</Paragraph>
    <Paragraph position="1"> The translation model, p(f|e) defined as a marginal probability obtained by summing over word-level alignments, a, between the source and target sentences: p(f|e) = summationdisplay a p(f,a|e). (2) While word-level alignments are a crucial component of the IBM models, the model parameters are generally estimated from sentence-aligned parallel corpora without explicit word-level alignment information. The reason for this is that word-aligned parallel corpora do not generally exist. Consequently, word level alignments are treated as hidden variables. To estimate the values of these hidden variables, the expectation maximization (EM) framework for maximum likelihood estimation from incomplete data is used (Dempster et al., 1977).</Paragraph>
    <Paragraph position="2"> The previous section describes how the translation probability of a given sentence pair is obtained by summing over all alignments p(f|e) =summationtext a p(f,a|e). EM seeks to maximize the marginallog likelihood, logp(f|e), indirectly by iteratively maximizing a bound on this term known as the expected complete log likelihood, &lt;logp(f,a|e)&gt; q(a),1</Paragraph>
    <Paragraph position="4"> where the bound in (5) is given by Jensen's inequality. By choosing q(a) = p(a|f,e) this bound becomes an equality.</Paragraph>
    <Paragraph position="5"> This maximization consists of two steps: * E-step: calculate the posterior probability under the current model of every permissible alignment for each sentence pair in the sentence-aligned training corpus; * M-step: maximize the expected log likelihood under this posterior distribution, &lt;logp(f,a|e)&gt; q(a), with respect to the model's parameters.</Paragraph>
    <Paragraph position="6"> While in standard maximum likelihood estimation events are counted directly to estimate parameter settings, in EM we effectively collect fractional counts of events (here permissible alignments weighted by their posterior probability), and use these to iteratively update the parameters.</Paragraph>
    <Paragraph position="7"> 1Here &lt; *&gt; q(*) denotes an expectation with respect to q(*). Since only some of the permissible alignments make sense linguistically, we would like EM to use the posterior alignment probabilities calculated in the E-step to weight plausible alignments higher than the large number of bogus alignments which are included in the expected complete log likelihood. This in turn should encourage the parameter adjustments made in the M-step to converge to linguistically plausible values.</Paragraph>
    <Paragraph position="8"> Since the number of permissible alignments for a sentence grows exponentially in the length of the sentences for the later IBM Models, a large number of informative example sentence pairs are required to distinguish between plausible and implausible alignments. Given sufficient data the distinction occurs because words which are mutual translations appear together more frequently in aligned sentences in the corpus.</Paragraph>
    <Paragraph position="9"> Given the high number of model parameters and permissible alignments, however, huge amounts of data will be required to estimate reasonable translation models from sentence-aligned data alone.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML