File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3117_metho.xml
Size: 6,446 bytes
Last Modified: 2025-10-06 14:11:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3117"> <Title>Stochastic Inversion Transduction Grammars for Obtaining Word Phrases for Phrase-based Statistical Machine Translation</Title> <Section position="3" start_page="0" end_page="130" type="metho"> <SectionTitle> 2 Phrase-based Statistical Machine Transduction </SectionTitle> <Paragraph position="0"> The translation units in a phrase-based statistical translation system are bilingual phrases rather than simple paired words. Several systems that follow this approach have been presented in recent works (Zens et al., 2002; Koehn, 2004). These systems have demonstrated excellent translation performance in real tasks.</Paragraph> <Paragraph position="1"> The basic idea of a phrase-based statistical machine translation system consists of the following steps (Zens et al., 2002): first, the source sentence is segmented into phrases; second, each source phrase is translated into a target phrase; and third, the target phrases are reordered in order to compose the target sentence.</Paragraph> <Paragraph position="2"> Bilingual translation phrases are an important component of a phrase-based system. Different methods have been defined to obtain bilingual translations phrases, mainly from word-based alignments and from syntax-based models (Yamada and Knight, 2001).</Paragraph> <Paragraph position="3"> In this work, we focus on learning bilingual word phrases by using Stochastic Inversion Transduction Grammars (SITGs) (Wu, 1997). This formalism al- null lows us to obtain bilingual word phrases in a natural way from the bilingual parsing of two sentences. In addition, the SITGs allow us to easily incorporate many desirable characteristics to word phrases such as length restrictions, selection according to the word alignment probability, bracketing information, etc. We review this formalism in the following section. null</Paragraph> </Section> <Section position="4" start_page="130" end_page="130" type="metho"> <SectionTitle> 3 Stochastic Inversion Transduction </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="130" end_page="130" type="sub_section"> <SectionTitle> Grammars Stochastic Inversion Transduction Grammars </SectionTitle> <Paragraph position="0"> (SITGs) (Wu, 1997) can be viewed as a restricted subset of Stochastic Syntax-Directed Transduction Grammars. They can be used to simultaneously parse two strings, both the source and the target sentences. SITGs are closely related to Stochastic Context-Free Grammars.</Paragraph> <Paragraph position="1"> Formally, a SITG in Chomsky Normal Form1 a0a2a1 can be defined as a tuple a3a5a4a7a6a9a8a10a6a9a11a13a12a2a6a9a11a15a14a16a6a18a17a19a6a5a20a22a21 , where: a4 is a finite set of non-terminal symbols; a8a24a23a25a4 is the axiom of the SITG; a11a13a12 is a finite set of terminal symbols of language 1; and a11 a14 is a finite set of terminal symbols of language 2. a17 is a finite set of: lexical rules of the type a26a28a27a30a29a22a31a16a32 , a26a33a27a30a32a34a31a36a35 ,</Paragraph> <Paragraph position="3"> a32 is the empty string. When a direct syntactic rule is used in a parsing, both strings are parsed with the syntactic rule a26a52a27a53a39a47a40 . When an inverse rule is used in a parsing, one string is parsed with the syntactic rule a26 a27 a39a41a40 , and the other string is parsed with the syntactic rule a26a37a27 a40a54a39 . Terma20 of the tuple is a function that attaches a probability to each rule.</Paragraph> <Paragraph position="4"> An efficient Viterbi-like parsing algorithm that is based on a Dynamic Programing Scheme is proposed in (Wu, 1997). The proposed algorithm has a time complexity of a55 a3a18a56a29 a56a58a57a59a56a35 a56a57a59a56a17a60a56a58a21 . It is important to note that this time complexity restricts the use of the algorithm to real tasks with short strings.</Paragraph> <Paragraph position="5"> If a bracketed corpus is available, then a modified version of the parsing algorithm can be defined to take into account the bracketing of the strings.</Paragraph> <Paragraph position="6"> The modifications are similar to those proposed in (Pereira and Schabes, 1992) for the inside algorithm.</Paragraph> <Paragraph position="7"> Following the notation that is presented in (Pereira and Schabes, 1992), we can define a partially bracketed corpus as a set of sentence pairs that are annotated with parentheses that mark constituent frontiers. More precisely, a bracketed corpus a61 is a set of tuples a3 a29 a6 a39a63a62 a6 a35 a6 a39a63a64 a21 , where a29 and a35 are strings, a39a54a62 is the bracketing of a29 , and a39a54a64 is the bracketing of a35 . Let a65 a62a66a64 be a parsing of a29 and a35 with the SITG a0 a1 . If the SITG does not have useless symbols, then each non-terminal that appears in each sentential form of the derivation a65a67a62a66a64 generates a pair of substrings a span a3 a84 a6 a86 a21 of a35 . A derivation of a29 and a35 is compatible with a39 a62 and a39 a64 if all the spans defined by it are compatible with a39a54a62 and a39a63a64 . This compatibility can be easily defined by the function a88 a3 a77 a6 a79 a6 a84 a6 a86 a21 , which takes a value of a74 if a3 a77 a6 a79 a21 does not overlap any</Paragraph> <Paragraph position="9"> otherwise it takes a value of a92 . This function filters those derivations (or partial derivations) whose parsing is not compatible with the bracketing defined in the sample (Sanchez and Benedi, 2006).</Paragraph> <Paragraph position="10"> The algorithm can be implemented to compute only those subproblems in the Dynamic Programing Scheme that are compatible with the bracketing. Thus, the time complexity is a55 a3a18a56a29 a56a93a57a59a56a35 a56a57a59a56a17a60a56a58a21 for an unbracketed string, while the time complexity is</Paragraph> <Paragraph position="12"> a56a94a56a17a60a56a58a21 for a fully bracketed string. It is important to note that the last time complexity allows us to work with real tasks with longer strings.</Paragraph> <Paragraph position="13"> Moreover, the parse tree can be efficiently obtained. Each node in the tree relates two word phrases of the strings being parsed. The related word phrases can be considered to be the translation of each other. These word phrases can be used to compute the translation table of a phrase-based machine statistical translation system.</Paragraph> </Section> </Section> class="xml-element"></Paper>