File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-1017_metho.xml
Size: 5,951 bytes
Last Modified: 2025-10-06 14:08:10
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-1017"> <Title>Statistical Phrase-Based Translation</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Methods for Learning Phrase Translation </SectionTitle> <Paragraph position="0"> We carried out experiments to compare the performance of three different methods to build phrase translation probability tables. We also investigate a number of variations. We report most experimental results on a German-English translation task, since we had sufficient resources available for this language pair. We confirm the major points in experiments on additional language pairs.</Paragraph> <Paragraph position="1"> As the first method, we learn phrase alignments from a corpus that has been word-aligned by a training toolkit for a word-based translation model: the Giza++ [Och and Ney, 2000] toolkit for the IBM models [Brown et al., 1993]. The extraction heuristic is similar to the one used in the alignment template work by Och et al. [1999].</Paragraph> <Paragraph position="2"> A number of researchers have proposed to focus on the translation of phrases that have a linguistic motivation [Yamada and Knight, 2001; Imamura, 2002]. They only consider word sequences as phrases, if they are constituents, i.e. subtrees in a syntax tree (such as a noun phrase). To identify these, we use a word-aligned corpus annotated with parse trees generated by statistical syntactic parsers [Collins, 1997; Schmidt and Schulte im Walde, 2000].</Paragraph> <Paragraph position="3"> The third method for comparison is the joint phrase model proposed by Marcu and Wong [2002]. This model learns directly a phrase-level alignment of the parallel corpus.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Phrases from Word-Based Alignments </SectionTitle> <Paragraph position="0"> The Giza++ toolkit was developed to train word-based translation models from parallel corpora. As a byproduct, it generates word alignments for this data. We improve this alignment with a number of heuristics, which are described in more detail in Section 4.5.</Paragraph> <Paragraph position="1"> We collect all aligned phrase pairs that are consistent with the word alignment: The words in a legal phrase pair are only aligned to each other, and not to words outside [Och et al., 1999].</Paragraph> <Paragraph position="2"> Given the collected phrase pairs, we estimate the phrase translation probability distribution by relative frequency: null</Paragraph> <Paragraph position="4"/> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Syntactic Phrases </SectionTitle> <Paragraph position="0"> If we collect all phrase pairs that are consistent with word alignments, this includes many non-intuitive phrases. For instance, translations for phrases such as &quot;house the&quot; may be learned. Intuitively we would be inclined to believe that such phrases do not help: Restricting possible phrases to syntactically motivated phrases could filter out such non-intuitive pairs.</Paragraph> <Paragraph position="1"> Another motivation to evaluate the performance of a phrase translation model that contains only syntactic phrases comes from recent efforts to built syntactic translation models [Yamada and Knight, 2001; Wu, 1997]. In these models, reordering of words is restricted to reordering of constituents in well-formed syntactic parse trees. When augmenting such models with phrase translations, typically only translation of phrases that span entire syntactic subtrees is possible. It is important to know if this is a helpful or harmful restriction.</Paragraph> <Paragraph position="2"> Consistent with Imamura [2002], we define a syntactic phrase as a word sequence that is covered by a single subtree in a syntactic parse tree.</Paragraph> <Paragraph position="3"> We collect syntactic phrase pairs as follows: We word-align a parallel corpus, as described in Section 3.1. We then parse both sides of the corpus with syntactic parsers [Collins, 1997; Schmidt and Schulte im Walde, 2000].</Paragraph> <Paragraph position="4"> For all phrase pairs that are consistent with the word alignment, we additionally check if both phrases are sub-trees in the parse trees. Only these phrases are included in the model.</Paragraph> <Paragraph position="5"> Hence, the syntactically motivated phrase pairs learned are a subset of the phrase pairs learned without knowledge of syntax (Section 3.1).</Paragraph> <Paragraph position="6"> As in Section 3.1, the phrase translation probability distribution is estimated by relative frequency.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Phrases from Phrase Alignments </SectionTitle> <Paragraph position="0"> Marcu and Wong [2002] proposed a translation model that assumes that lexical correspondences can be established not only at the word level, but at the phrase level as well. To learn such correspondences, they introduced a phrase-based joint probability model that simultaneously generates both the Source and Target sentences in a parallel corpus. Expectation Maximization learning in Marcu and Wong's framework yields both (i) a joint probabil-</Paragraph> <Paragraph position="2"> distribution a10 a5a47a19 a0a7a6 a10 , which reflects the probability that a phrase at position a19 is translated into a phrase at position a6 . To use this model in the context of our framework, we simply marginalize to conditional probabilities the joint probabilities estimated by Marcu and Wong [2002]. Note that this approach is consistent with the approach taken by Marcu and Wong themselves, who use conditional models during decoding.</Paragraph> <Paragraph position="3"> distinct phrase pairs (maximum phrase length 4)</Paragraph> </Section> </Section> class="xml-element"></Paper>