File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1609_intro.xml
Size: 3,015 bytes
Last Modified: 2025-10-06 14:04:00
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1609"> <Title>Statistical Machine Reordering</Title> <Section position="3" start_page="0" end_page="70" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> During the last few years, SMT systems have evolved from the original word-based approach (Brown et al., 1993) to phrase-based translation systems (Koehn et al., 2003). In parallel to the phrase-based approach, the use of bilingual n-grams gives comparable results, as shown by Crego et al. (2005a). Two basic issues differentiate the n-gram-based system from the phrasebased: training data are monotonously segmented into bilingual units; and, the model considers n-gram probabilities rather than relative frequencies.</Paragraph> <Paragraph position="1"> This translation approach is described in detail by Mari no et al. (2005). The n-gram-based system follows a maximum entropy approach, in which a log-linear combination of multiple models is implemented (Och and Ney, 2002), as an alternative to the source-channel approach.</Paragraph> <Paragraph position="2"> In both systems, introducing reordering capabilities is of crucial importance for certain language pairs. Recently, new reordering strategies have been proposed in the literature on SMT such as the reordering of each source sentence to match the word order in the corresponding target sentence, see Kanthak et al. (2005) and Crego et al. (2005b).</Paragraph> <Paragraph position="3"> Similarly, Matusov et al. (2006) describe a method for simultaneously aligning and monotonizing the training corpus. The main problems of these approaches are: (1) the fact that the proposed monotonization is based on the alignment and cannot be applied to the test sets, and (2) the lack of reordering generalization.</Paragraph> <Paragraph position="4"> This paper presents a reordering approach called statistical machine reordering (SMR) which improves the reordering capabilities of SMT systems without incurring any of the problems mentioned above. SMR is a rst-pass translation performed on the source corpus, which converts it into an intermediate representation, in which source-language words are presented in an order that more closely matches that of the target language. SMR and SMT are performed using the same modeling tools as n-gram-based systems but using different statistical log-linear models.</Paragraph> <Paragraph position="5"> In order to be able to infer new reorderings we use word classes instead of words themselves as the input to the SMR system. In fact, the use of classes to help in the reordering is a key difference between our approach and standard SMT systems.</Paragraph> <Paragraph position="6"> This paper is organized as follows: Section 2 outlines the baseline system. Section 3 describes the reordering strategy in detail. Section 4 presents and discusses the results, and Section 5 presents our conclusions and suggestions for further work.</Paragraph> </Section> class="xml-element"></Paper>