File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1030_intro.xml
Size: 2,434 bytes
Last Modified: 2025-10-06 14:01:13
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1030"> <Title>Fast Decoding and Optimal Decoding for Machine Translation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A statistical MT system that translates (say) French sentences into English, is divided into three parts: (1) a language model (LM) that assigns a probability P(e) to any English string, (2) a translation model (TM) that assigns a probability P(fa4 e) to any pair of English and French strings, and (3) a decoder. The decoder takes a previously unseen sentence a5 and tries to find the a6 that maximizes P(ea4 f), or equivalently maximizes P(e) a7 P(fa4 e).</Paragraph> <Paragraph position="1"> Brown et al. (1993) introduced a series of TMs based on word-for-word substitution and reordering, but did not include a decoding algorithm. If the source and target languages are constrained to have the same word order (by choice or through suitable pre-processing), then the linear Viterbi algorithm can be applied (Tillmann et al., 1997). If re-ordering is limited to rotations around nodes in a binary tree, then optimal decoding can be carried out by a high-polynomial algorithm (Wu, 1996). For arbitrary word-reordering, the decoding problem is NP-complete (Knight, 1999).</Paragraph> <Paragraph position="2"> A sensible strategy (Brown et al., 1995; Wang and Waibel, 1997) is to examine a large subset of likely decodings and choose just from that. Of course, it is possible to miss a good translation this way. If the decoder returns ea8 but there exists some e for which P(ea4 f) a9 P(ea8 a4 f), this is called a search error. As Wang and Waibel (1997) remark, it is hard to know whether a search error has occurred--the only way to show that a decoding is sub-optimal is to actually produce a higher-scoring one.</Paragraph> <Paragraph position="3"> Thus, while decoding is a clear-cut optimization task in which every problem instance has a right answer, it is hard to come up with good answers quickly. This paper reports on measurements of speed, search errors, and translation quality in the context of a traditional stack decoder (Jelinek, 1969; Brown et al., 1995) and two new decoders. The first is a fast greedy decoder, and the second is a slow optimal decoder based on generic mathematical programming techniques.</Paragraph> </Section> class="xml-element"></Paper>