File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4026_intro.xml

Size: 4,113 bytes

Last Modified: 2025-10-06 14:02:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4026">
  <Title>A Unigram Orientation Model for Statistical Machine Translation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In recent years, phrase-based systems for statistical machine translation (Och et al., 1999; Koehn et al., 2003; Venugopal et al., 2003) have delivered state-of-the-art performance on standard translation tasks. In this paper, we present a phrase-based unigram system similar to the one in (Tillmann and Xia, 2003), which is extended by an unigram orientation model. The units of translation are blocks, pairs of phrases without internal structure. Fig. 1 shows an example block translation using five Arabic-English blocks a0a2a1a2a3a5a4a5a4a5a4a5a3a6a0a8a7 . The unigram orientation model is trained from word-aligned training data. During decoding, we view translation as a block segmentation process, where the input sentence is segmented from left to right and the target sentence is generated from bottom to top, one block at a time. A monotone block sequence is generated except for the possibility to swap a pair of neighbor blocks. The novel orientation model is used to assist the block swapping: as shown in  taken from the devtest set. The Arabic words are romanized. null section 3, block swapping where only a trigram language model is used to compute probabilities between neighbor blocks fails to improve translation performance. (Wu, 1996; Zens and Ney, 2003) present re-ordering models that make use of a straight/inverted orientation model that is related to our work. Here, we investigate in detail the effect of restricting the word re-ordering to neighbor block swapping only.</Paragraph>
    <Paragraph position="1"> In this paper, we assume a block generation process that generates block sequences from bottom to top, one block at a time. The score of a successor block a0 depends on its predecessor block a0a10a9 and on its orientation relative to the block a0a8a9 . In Fig. 1 for example, block a0a2a1 is the predecessor of block a0a10a11 , and block a0a10a11 is the predecessor of block a0a8a12 . The target clump of a predecessor block a0a5a9 is adjacent to the target clump of a successor block a0 . A right adjacent predecessor block a0a10a9 is a block where additionally the source clumps are adjacent and the source clump of a0a8a9 occurs to the right of the source clump of a0 . A left adjacent predecessor block is defined accordingly.</Paragraph>
    <Paragraph position="2"> During decoding, we compute the score a0a2a1</Paragraph>
    <Paragraph position="4"> where a0 a16 is a block anda5 a16a25a24a27a26a19a28 a3a6a29 a3a6a30a32a31 is a three-valued orientation component linked to the block a0 a16 (the orientation a5 a16a23a22 a1 of the predecessor block is ignored.). A block</Paragraph>
    <Paragraph position="6"> a29 ) if it has a left adjacent predecessor. Accordingly, a block a0 a16 has left orientation (a5 a16a35a33a36a28 ) if it has a right adjacent predecessor. If a block has neither a left or right adjacent predecessor, its orientation is neutral (a5 a16a37a33 a30 ). The neutral orientation is not modeled explicitly in this paper, rather it is handled as a default case as explained below. In Fig. 1, the orientation sequence is a5</Paragraph>
    <Paragraph position="8"> block a0a8a7 are generated using left orientation. During decoding most blocks have right orientation a1</Paragraph>
    <Paragraph position="10"> the block translations are mostly monotone.</Paragraph>
    <Paragraph position="11"> We try to find a block sequence with orientation a1  a block based on its occurrence count a30 a1 a0 a8 . The blocks are counted from word-aligned training data. We also collect unigram counts with orientation: a left count a30a42a41 a1 a0 a8 and a right count a30a44a43 a1 a0 a8 . These counts are defined via an enumeration process and are used to define the orientation model  The three models are combined in a log-linear way, as shown in the following section.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML