File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/n04-4026_metho.xml

Size: 9,516 bytes

Last Modified: 2025-10-06 14:08:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4026">
  <Title>A Unigram Orientation Model for Statistical Machine Translation</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Orientation Unigram Model
</SectionTitle>
    <Paragraph position="0"> The basic idea of the orientation model can be illustrated as follows: In the example translation in Fig. 1, block a0a5a11 occurs to the left of block a0a2a1 . Although the joint block  a0a8a11 a3a6a0a5a1 a8 consisting of the two smaller blocks a0a2a1 and a0a8a11 has not been seen in the training data, we can still profit from the fact that block a0a10a11 occurs more frequently with left than with right orientation. In our Arabic-English training data, block a0a10a11 has been seena30a44a41 a1 a0a8a11 a8 a33a52a51a54a53 times with left orientation, and a30a44a43 a1 a0a8a11 a8 a33a56a55 with right orientation, i.e. it is always involved in swapping. This intuition is formalized using unigram counts with orientation. The orientation model is related to the distortion model in (Brown et al., 1993), but we do not compute a block alignment during training. We rather enumerate all relevant blocks in some order. Enumeration does not allow us to capture position dependent distortion probabilities, but we can compute statistics about adjacent block predecessors. null Our baseline model is the unigram monotone model described in (Tillmann and Xia, 2003). Here, we select blocks a0 from word-aligned training data and unigram block occurrence counts a30 a1  a8 are computed: all blocks for a training sentence pair are enumerated in some order and we count how often a given block occurs in the parallel training data 1. The training algorithm yields a list of about a57 a51 blocks per training sentence pair. In this paper, we make extended use of the baseline enumeration procedure: for each block a0 , we additionally enumerate all its left and right predecessors a0a10a9 . No optimal block segmentation is needed to compute the predecessors: for each block a0 , we check for adjacent predecessor blocks a0 a9 that also occur in the enumeration list. We compute left</Paragraph>
    <Paragraph position="2"> Here, we enumerate all adjacent predecessors a0a5a9 of block</Paragraph>
    <Paragraph position="4"> some right adjacent predecessor block a0a5a9 . The 'right' orientation count a30a44a43 a1  a8 is defined accordingly. Note, that in general the unigram count a30 a1</Paragraph>
    <Paragraph position="6"> during enumeration, a block a0 might have both left and right adjacent predecessors, either a left or a right adjacent predecessor, or no adjacent predecessors at all. The orientation count collection is illustrated in Fig. 2: each time a block a0 has a left or right adjacent predecessor in the parallel training data, the orientation counts are incremented accordingly.</Paragraph>
    <Paragraph position="7"> The decoding orientation restrictions are illustrated in  a63a65a64a67a66a69a68a2a70a72a71 and the phrase length is less or equala73 . No other selection criteria are applied. For thea74a76a75a37a77a79a78a81a80 model, we keep all blocks for whicha63a65a64a67a66a69a68a76a70</Paragraph>
    <Paragraph position="9"> order: for each block a0 , we look for left and right adjacent predecessors a0a10a9 .</Paragraph>
    <Paragraph position="10"> orientation is generated. If a block is skipped e.g. block a0a8a12 in Fig 3 by first generating block a0a10a11 then block a0a8a12 , the block a0a8a12 is generated using left orientationa5 a12 a33 a28 . Since the block translation is generated from bottom-to-top, the blocks a0a8a11 and a0 a1 do not have adjacent predecessors below them: they are generated by a default model  predecessor is ignored. The a11 a16 are chosen to be optimal on the devtest set (the optimal parameter setting is shown in Table. 1). Only two parameters have to be optimized due to the constraint that the a11 a16 have to sum to a60</Paragraph>
    <Paragraph position="12"> Straightforward normalization over all successor blocks in Eq. 2 and in Eq. 3 is not feasible: there are tens of millions of possible successor blocks a0 . In future work, normalization over a restricted successor set, e.g. for a given source input sentence, all blocks a0 that match this sentence might be useful for both training and decoding. The segmentation model in Eq. 1 naturally prefers translations that make use of a smaller number of blocks which leads to a smaller number of factors in Eq. 1. Using fewer 'bigger' blocks to carry out the translation generally seems to improve translation performance. Since normalization does not influence the number of blocks used to carry out the translation, it might be less important for our segmentation model.</Paragraph>
    <Paragraph position="13"> We use a DP-based beam search procedure similar to the one presented in (Tillmann and Xia, 2003). We maximize</Paragraph>
    <Paragraph position="15"> a8 orientation is generated as shown in the left picture. In the right picture, block swapping generates block a0a10a12 to the left of block a0a10a11 . The blocks a0a10a11 and a0 a1 do not have a left or right adjacent predecessor.</Paragraph>
    <Paragraph position="16"> over all block segmentations with orientationa1 a0a4a3a1 a3a6a5a7a3a1a9a8 for which the source phrases yield a segmentation of the input sentence. Swapping involves only blocks a1  the block swapping. In particular the orientationa5 a9 of the predecessor block a0a10a9 is ignored: in future work, we might take into account that a certain predecessor block a0 a9 typically precedes other blocks.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Experimental Results
</SectionTitle>
    <Paragraph position="0"> The translation system is tested on an Arabic-to-English  a18 million training sentence pairs. The Arabic data is romanized, some punctuation tokenization and some number classing are carried out on the English and the Arabic training data. As devtest set, we use testing data provided by LDC, which consists of a60 a55a27a26 a18 sentences with a53a54a51 a23a5a23a5a25 Arabic words with a26 reference translations. As a blind test set, we use MT 03 Arabic-English  Three systems are evaluated in our experiments: a29 a55 is the baseline block unigram model without re-ordering. Here, monotone block alignments are generated: the blocks  a16 have only left predecessors (no blocks are swapped). This is the model presented in (Tillmann and Xia, 2003). For the a29 a60 model, the sentence is translated mostly monotonously, and only neighbor blocks are allowed to be swapped (at most a60 block is skipped). The a29 a60a31a30a33a32 a29 model allows for the same block swapping as the a29 a60 model, but additionally uses the orientation component described in Section 2: the block swapping is controlled  English test data: LDC devtest set and DARPA MT 03 blind test set.</Paragraph>
    <Paragraph position="1">  vtest set: the Arabic phrases are romanized. The example blocks were swapped in the development test set translations. The counts are obtained from the parallel training data.</Paragraph>
    <Paragraph position="2">  a8 are used.</Paragraph>
    <Paragraph position="3"> Experimental results are reported in Table 1: three BLEU results are presented for both devtest set and blind test set. Two scaling parameters are set on the devtest set and copied for use on the blind test set. The second column shows the model name, the third column presents the optimal weighting as obtained from the devtest set by carrying out an exhaustive grid search. The fourth column shows BLEU results together with confidence intervals (Here, the word casing is ignored). The block swapping model a29 a60 a30 a32 a29 obtains a statistical significant improvement over the baseline a29 a55 model. Interestingly, the swapping model a29 a60 without orientation performs worse than the baseline a29 a55 model: the word-based trigram language model alone is too weak to control the block swapping: the model is too unrestrictive to handle the block swapping reliably. Additionally, Table 2 presents devtest set example blocks that have actually been swapped. The training data is unsegmented, as can be seen from the first two blocks. The block in the first line has been seen a18 times more often with left than with right orientation. Blocks for which the ratio a8  are likely candidates for swapping in our Arabic-English experiments. The ratio a8 itself is not currently used in the orientation model. The orientation model mostly effects blocks where the Arabic and English words are verbs or nouns. As shown in Fig. 1, the orientation model uses  and only the default model for the adjective block a0 a1 . Although the noun block might occur by itself without adjective, the swapping is not controlled by the occurrence of the adjective block a0a2a1 (which does not have adjacent predecessors). We rather model the fact that a noun block a0 is typically preceded by some block a0a5a9 . This situation seems typical for the block swapping that occurs on the evaluation test set.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
Acknowledgment
</SectionTitle>
    <Paragraph position="0"> This work was partially supported by DARPA and monitored by SPAWAR under contract No. N66001-99-28916. The paper has greatly profited from discussion with Kishore Papineni and Fei Xia.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML