File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-1010_intro.xml
Size: 4,598 bytes
Last Modified: 2025-10-06 14:01:41
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-1010"> <Title>Greedy Decoding for Statistical Machine Translation in Almost Linear Time</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Most of the current work in statistical machine translation builds on word replacement models developed at IBM in the early 1990s (Brown et al., 1990, 1993; Berger et al., 1994, 1996). Based on the conventions established in Brown et al. (1993), these models are commonly referred to as the (IBM) Models 1-5.</Paragraph> <Paragraph position="1"> One of the big challenges in building actual MT systems within this framework is that of decoding: finding the translation candidate a9 that maximizes the translation probability a10 a1 a9a12a11 a13 a7 for the given input a13 . Knight (1999) has shown the problem to be NP-complete.</Paragraph> <Paragraph position="2"> Due to the complexity of the task, practical MT systems usually do not employ optimal decoders (that is, decoders that are guaranteed to find an optimal solution within the constraints of the framework), but rely on approximative algorithms instead. Empirical evidence suggests that such algorithms can perform resonably well.</Paragraph> <Paragraph position="3"> For example, Berger et al. (1994), attribute only 5% of the translation errors of their Candide system, which uses 1Technically, the complexity is still a14a16a15a18a17a20a19a22a21 . However, the quadratic component has such a small coefficient that it does not have any noticable effect on the translation speed for all reasonable inputs.</Paragraph> <Paragraph position="4"> a restricted stack search, to search errors. Using the same evaluation metric (but different evaluation data), Wang and Waibel (1997) report search error rates of 7.9% and 9.3%, respectively, for their decoders.</Paragraph> <Paragraph position="5"> Och et al. (2001) and Germann et al. (2001) both implemented optimal decoders and benchmarked approximative algorithms against them. Och et al. report word error rates of 68.68% for optimal search (based on a variant of the A* algorithm), and 69.65% for the most restricted version of a decoder that combines dynamic programming with a beam search (Tillmann and Ney, 2000).</Paragraph> <Paragraph position="6"> Germann et al. (2001) compare translations obtained by a multi-stack decoder and a greedy hill-climbing algorithm against those produced by an optimal integer programming decoder that treats decoding as a variant of the traveling-salesman problem (cf. Knight, 1999).</Paragraph> <Paragraph position="7"> Their overall performance metric is the sentence error rate (SER). For decoding with IBM Model 3, they report SERs of about 57% (6-word sentences) and 76% (8-word sentences) for optimal decoding, 58% and 75% for stack decoding, and 60% and 75% for greedy decoding, which is the focus of this paper.</Paragraph> <Paragraph position="8"> All these numbers suggest that approximative algorithms are a feasible choice for practical applications. The purpose of this paper is to describe speed improvements to the greedy decoder mentioned above. While acceptably fast for the kind of evaluation used in Germann et al. (2001), namely sentences of up to 20 words, its speed becomes an issue for more realistic applications.</Paragraph> <Paragraph position="9"> Brute force translation of the 100 short news articles in Chinese from the TIDES MT evaluation in June 2002 (878 segments; ca. 25k tokens) requires, without any of the improvements described in this paper, over 440 CPU hours, using the simpler, &quot;faster&quot; algorithm a23a2a24 (described below). We will show that this time can be reduced to ca. 40 minutes without sacrificing translation quality.</Paragraph> <Paragraph position="10"> In the following, we first describe the underlying IBM initial string: I do not understand the logic of these people . pick fertilities: I not not understand the logic of these people . replace words: Je ne pas comprends la logique de ces gens . reorder: Je ne comprends pas la logique de ces gens .</Paragraph> <Paragraph position="11"> insert spurious words: Je ne comprends pas la logique de ces gens -l`a . actual training or decoding logs.</Paragraph> <Paragraph position="12"> model(s) of machine translation (Section 2) and our hill-climbing algorithm (Section 3). In Section 4, we discuss improvements to the algorithm and its implementation, and the effect of restrictions on word reordering.</Paragraph> </Section> class="xml-element"></Paper>