XML Viewer - c00-2163

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2163_metho.xml
Size: 14,221 bytes
Last Modified: 2025-10-06 14:07:15
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2163">
  <Title>A Comparison of Alignment Models for Statistical Machine Translation</Title>
  <Section position="3" start_page="0" end_page="1087" type="metho">
    <SectionTitle>
2 Alignment with HMM
</SectionTitle>
    <Paragraph position="0"> In the Hidden-Markov alignment model we assume a first-order dependence for tim aligmnents aj and that the translation probability depends Olfly on aj and not Oil (tj_l:</Paragraph>
    <Paragraph position="2"> Later, we will describe a refinement with a dependence on e,,j_, iu the alignment model. Putting everything together, we have the following basic</Paragraph>
    <Paragraph position="4"> with the alignment I)robability p(ili',I ) and the translation probability p(fle). To find a Viterbi aligninent for the HMM-based model we resort to dynamic progralnming (Vogel et al., 1996).</Paragraph>
    <Paragraph position="5"> The training of tlm HMM is done by the EMalgorithm. In the E-step the lexical and alignment  counts for one sentenee-i)air (f, e) are calculated:</Paragraph>
    <Paragraph position="7"> To avoid the smlunation ov(;r all possible aligmnents a, (Vogel et el., 1996) use the maximum apllroximation where only the Viterbi alignlnent t)ath is used to collect counts. We used the Baron-Welch-algorithm (Baum, 1972) to train the model parameters in out' ext)eriments. Theret/y it is possible to t)erti)rm an efl-iciellt training using; all aligmnents.</Paragraph>
    <Paragraph position="8"> To make the alignlnenl; t)arameters indo,1)en(lent t'ronl absolute word i)ositions we assmne that the alignment i)robabilities p(i\[i', I) (lel)end only Oil the jmnp width (i - i'). Using a set of non-negative t)arameters {c(i -i')}, we can write the alignment probabilities ill the fl)rm:</Paragraph>
    <Paragraph position="10"> This form ensures that for eadl word posilion it, i' = 1, ..., I, the aligmnent probat)ilities satis(y th(, normalization constraint.</Paragraph>
    <Paragraph position="11"> Extension: refined aligmnent model The count table e(i - i') has only 2.1 ......... - 1 entries. This might be suitable for small corpora, but fi)r large corpora it is possil)le to make a more refine(1 model of Pr(aj ~i-I i-I Ji ,% ,c'~). Est)ecially, we analyzed the effect of a det)endence on c,b_ ~ or .fj.</Paragraph>
    <Paragraph position="12"> As a dependence on all English words wouht result ill a huge mmflmr of aligmnent 1)arameters we use as (Brown et el., 1993) equivalence classes G over tlle English and the French words. Here G is a mallping of words to (:lasses. This real)ping is trained autonmtically using a modification of the method descrilled ill (Kneser and Ney, 1991.). We use 50 classes in our exlmriments. The most general form of alignment distribution that we consider in the ItMM is p(aj - a.+_, la(%), G(f~), h-Extension: empty word In the original formulation of the HMM alignment model there ix no 'empty' word which generates Fren(:h words having no directly aligned English word. A direct inchlsion of an eml/ty wor(t ill the HMM model by adding all c o as in (Brown et al., 1.993) is not 1)ossit)le if we want to model the junlp distances i - i', as the I)osition i = 0 of tim emt)ty word is chosen arbitrarily. Therefore, to introduce the eml)ty word we extend the HMM network by I empty words ci+ 1.'2I The English word ci has a col rest)onding eml)ty word el+ I. The I)osition of the eml)ty word encodes the previously visited English word.</Paragraph>
    <Paragraph position="13"> We enforce the following constraints for the transitions in the HMM network (i _&lt; I, i' _&lt; I):</Paragraph>
    <Paragraph position="15"> The parameter pff is the 1)robability of a transition to the emt)ty word. In our extleriments we set pIl = 0.2.</Paragraph>
    <Paragraph position="16"> Smoothing For a t)etter estimation of infrequent events we introduce the following smoothing of alignment t)robabilities: null  1 F(ajI~j-,,~) = ~&amp;quot; ~- + (1 -,,).p(ajlaj_l ,I) in our exlleriments we use (t = 0.4.</Paragraph>
    <Paragraph position="17"> 3 Model 1 and Model 2 l~cl)lacing the (l(~,t)endence on aj-l in the HMM alignment mo(M I)y a del)endence on j, we olltain a model wlfich (:an lie seen as a zero-order Hid(l(m-Markov Model which is similar to Model 2 1)rot)ose(t t/y (Brown et al., 1993). Assmning a mfiform alignment prol)ability p(ilj, I) = 1/1, we obtain Model 1.</Paragraph>
    <Paragraph position="18">  Assuming that the dominating factor in the alignment model of Model 2 is the distance relative to the diagonal line of the (j, i) plane the too(tel p(ilj , I) can 1)e structured as tbllows (Vogel et al., 1996):</Paragraph>
    <Paragraph position="20"> This model will be referred to as diagonal-oriented Model 2.</Paragraph>
    <Paragraph position="21"> 4 Model 3 and Model 4 Model: The fertility models of (Brown et el., 1993) explicitly model the probability l,(C/lc) that the English word c~ is aligned to</Paragraph>
    <Paragraph position="23"> \]~rench words.</Paragraph>
    <Paragraph position="24">  Model 3 of (Brown et al., 1993) is a zero-order alignment model like Model 2 including in addition fertility paranmters. Model 4 of (Brown et al., 1993) is also a first-order alignment model (along the source positions) like the HMM, trot includes also fertilities. In Model 4 the alignment position j of an English word depends on the alignment position of tile previous English word (with non-zero fertility) j'. It models a jump distance j-j' (for consecutive English words) while in the HMM a jump distance i-i' (for consecutive French words) is modeled. Tile full description of Model 4 (Brown et al., 1993) is rather complica.ted as there have to be considered tile cases that English words have fertility larger than one and that English words have fertility zero.</Paragraph>
    <Paragraph position="25"> For training of Model 3 and Model 4, we use an extension of the program GlZA (A1-Onaizan et al., 1999). Since there is no efficient way in these models to avoid tile explicit summation over all alignments in the EM-algorithin, the counts are collected only over a subset of promising alignments. It is not known an efficient algorithm to compute the Viterbi alignment for the Models 3 and 4. Therefore, the Viterbi alignment is comlmted only approximately using the method described in (Brown et al., 1993).</Paragraph>
    <Paragraph position="26"> The models 1-4 are trained in succession with the tinal parameter values of one model serving as the starting point tbr the next.</Paragraph>
    <Paragraph position="27"> A special problein in Model 3 and Model 4 concerns the deficiency of tile model. This results in problems in re-estimation of the parameter which describes the fertility of the empty word. In norreal EM-training, this parameter is steadily decreasing, producing too many aligmnents with tile empty word. Therefore we set tile prot)ability for aligning a source word with tile emt)ty word at a suitably chosen constant value.</Paragraph>
    <Paragraph position="28"> As in tile HMM we easily can extend the dependencies in the alignment model of Model 4 easily using the word class of the previous English word</Paragraph>
    <Paragraph position="30"/>
  </Section>
  <Section position="4" start_page="1087" end_page="1087" type="metho">
    <SectionTitle>
5 Including a Manual Dictionary
</SectionTitle>
    <Paragraph position="0"> We propose here a simple method to make use of a bilingual dictionary as an additional knowledge source in the training process by extending the training corpus with the dictionary entries. Thereby, the dictionary is used already in EM-training and can improve not only the alignment fox&amp;quot; words which are in the dictionary but indirectly also for other words.</Paragraph>
    <Paragraph position="1"> The additional sentences in the training cortms are weighted with a factor Fl~x during the EM-training of the lexicon probabilities.</Paragraph>
    <Paragraph position="2"> We assign tile dictionary entries which really co-occur in the training corpus a high weight Fle.~. and the remaining entries a vex'y low weight. In our experiments we use Flex = 10 for the co-occurring dictionary entries which is equivalent to adding every dictionary entry ten times to the training cortms.</Paragraph>
  </Section>
  <Section position="5" start_page="1087" end_page="1087" type="metho">
    <SectionTitle>
6 The Alignment Template System
</SectionTitle>
    <Paragraph position="0"> The statistical machine-translation method described in (Och et al., 1999) is based on a word aligned traiifing corIms and thereby makes use of single-word based alignment models. Tile key element of tiffs apt/roach are the alignment templates which are pairs of phrases together with an alignment between the words within tile phrases. The advantage of the alignment template approach over word based statistical translation models is that word context and local re-orderings are explicitly taken into account. We typically observe that this approach produces better translations than the single-word based models. The alignment templates are automatically trailmd using a parallel trailxing corlms. For more information about the alignment template approach see (Och et at., 1999).</Paragraph>
  </Section>
  <Section position="6" start_page="1087" end_page="1088" type="metho">
    <SectionTitle>
7 Results
</SectionTitle>
    <Paragraph position="0"> We present results on the Verbmobil Task which is a speech translation task ill the donmin of appointnxent scheduling, travel planning, and hotel reservation (Wahlster, 1993).</Paragraph>
    <Paragraph position="1"> We measure the quality of tile al)ove inentioned aligmnent models with x'espect to alignment quality and translation quality.</Paragraph>
    <Paragraph position="2"> To obtain a refereuce aligmnent for evaluating alignlnent quality, we manually aligned about 1.4 percent of onr training corpus. We allowed the humans who pertbrmed the alignment to specify two different kinds of alignments: an S (sure) a, lignment which is used for alignmelxts which are unambiguously and a P (possible) alignment which is used for alignments which might or might not exist. The P relation is used especially to align words within idiomatic expressions, free translations, and missing function words. It is guaranteed that S C P. Figure 1 shows all example of a manually aligned sentence with S and P relations. The hunxan-annotated alignment does not prefer rely translation direction and lnay therefore contain many-to-one and one-to-many relationships. The mmotation has been performed by two annotators, producing sets $1, 1~, S2, P2.</Paragraph>
    <Paragraph position="3"> Tile reference aliglunent is produced by forming the intersection of the sure aligmnents (S = $1 rqS2) and the ration of the possible atignumnts (P = P1 U P'2).</Paragraph>
    <Paragraph position="4"> Tim quality of an alignment A = { (j, aj) } is measured using the following alignment error rate:</Paragraph>
    <Paragraph position="6"> that ......... \[\] at ......... \[\] ....... V1V1.</Paragraph>
    <Paragraph position="7"> leave ....... \[---'l \[-&amp;quot;~ &amp;quot; ....... liE\].</Paragraph>
    <Paragraph position="8"> let ....... Cll-1 &amp;quot; e ...... * ....</Paragraph>
    <Paragraph position="9"> say ..... * .....</Paragraph>
    <Paragraph position="11"> Obviously, if we colnpare the sure alignnlents of every sitigle annotator with the reference a.ligmnent we obtain an AEI{ of zero percent.</Paragraph>
    <Paragraph position="12"> ~\[ifl)le l.: Cort)us characteristics for alignment quality experiments.</Paragraph>
  </Section>
  <Section position="7" start_page="1088" end_page="1089" type="metho">
    <SectionTitle>
3 109 I 3 233
</SectionTitle>
    <Paragraph position="0"> Tal)le 1 shows the characteristics of training and test corlms used in the alignment quality ext)eriinents. The test cortms for these ext)eriments (not for the translation exl)eriments) is 1)art of the training corpus.</Paragraph>
    <Paragraph position="1"> Table 2 shows the aligmnent quality of different alignment models. Here the alignment models of IIMM and Model 4 do not include a dependence on word classes. We conclude that more sophisticated alignment lnodels are crtlcial tbr good alignment quality. Consistently, the use of a first-order aligmnent model, modeling an elnpty word and fertilities result in better alignments. Interestingly, the siinl)ler HMM aligninent model outt)erforms Model 3 which shows the importance of first-order alignment models. The best t)erformanee is achieved with Model 4. The improvement by using a dictionary is small eomI)ared to the effect of using 1)etter a.lignmellt models. We see a significant dill'erence in alignment quality if we exchange source and target languages. This is due to the restriction in all alignment models that a source language word can 1)e aligned to at most one target language word. If German is source language the t'requelltly occurring German word coml)ounds, camlot be aligned correctly, as they typically correspond to two or more English words.</Paragraph>
    <Paragraph position="2"> WaNe 3 shows the effect of including a det)endence on word classes in the aligmnent model of ItMM or  For tile evMuation of the translation quality we used the automatically comlmtable Word Error Rate (WEll.) and the Subjective Sentence Error Rate (SSEll,) (Niefien et al., 2000). The WEll, correspomls to the edit distance t)etween the produced translation and one t)redefined reference translation. To obtain the SSER the translations are classified by human experts into a small number of quality classes ranging from &amp;quot;l)ertbet&amp;quot; to &amp;quot;at)solutely wrong&amp;quot;. In comparison to the WEll,, this criterion is more meaningflfl, but it is also very exl)ensive to measure. The translations are produced by the aligmnent template system mentioned in the previous section.</Paragraph>
    <Paragraph position="3">  The results are shown in Table 5. We see a clear improvement in translation quality as measured by SSER whereas WER is inore or less the same for all models. The imwovement is due to better lexicons and better alignment templates extracted from the resulting aliglunents.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML