File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/p02-1039_intro.xml
Size: 3,529 bytes
Last Modified: 2025-10-06 14:01:29
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1039"> <Title>A Decoder for Syntax-based Statistical MT</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Syntax-based TM </SectionTitle> <Paragraph position="0"> The syntax-based TM defined by (Yamada and Knight, 2001) assumes an English parse tree a15 as a channel input. The channel applies three kinds of stochastic operations on each node a17a19a18 : reordering children nodes (a20 ), inserting an optional extra word to the left or right of the node (a21 ), and translating leaf words (a22 ).1 These operations are independent of each other and are conditioned on the features (a23 ,a24 ,a25 ) of the node. Figure 1 shows an example.</Paragraph> <Paragraph position="1"> The child node sequence of the top node VB is re-ordered from PRP-VB1-VB2 into PRP-VB2-VB1 as seen in the second tree (Reordered). An extra word ha is inserted at the leftmost node PRP as seen in the third tree (Inserted). The English word He under the same node is translated into a foreign word kare as seen in the fourth tree (Translated). After these operations, the channel emits a foreign word sentence a16 by taking the leaves of the modified tree.</Paragraph> <Paragraph position="2"> Formally, the channel probability Pa4a5a16a26a8a15a27a6 is</Paragraph> <Paragraph position="4"> The model tables a99a69a4a88a20a100a8a23a101a6 , a102a103a4a97a21a104a8a24a105a6 , and a106a68a4a5a22a104a8a25a7a6 are called the r-table, n-table, and t-table, respectively.</Paragraph> <Paragraph position="5"> These tables contain the probabilities of the channel operations (a20 , a21 , a22 ) conditioned by the features (a23 , a24 , a25 ). In Figure 1, the r-table specifies the probability of having the second tree (Reordered) given the first tree. The n-table specifies the probability of having the third tree (Inserted) given the second 1The channel operations are designed to model the difference in the word order (SVO for English vs. VSO for Arabic) and case-marking schemes (word positions in English vs. case-marker particles in Japanese).</Paragraph> <Paragraph position="6"> tree. The t-table specifies the probability of having the fourth tree (Translated) given the third tree.</Paragraph> <Paragraph position="7"> The probabilities in the model tables are automatically obtained by an EM-algorithm using pairs of a15 (channel input) and a16 (channel output) as a training corpus. Usually a bilingual corpus comes as pairs of translation sentences, so we need to parse the corpus. As we need to parse sentences on the channel input side only, many X-to-English translation systems can be developed with an English parser alone.</Paragraph> <Paragraph position="8"> The conditioning features (a23 ,a24 ,a25 ) can be anything that is available on a tree a15 , however they should be carefully selected not to cause data-sparseness problems. Also, the choice of features may affect the decoding algorithm. In our experiment, a sequence of the child node label was used for a23 , a pair of the node label and the parent label was used for a24 , and the identity of the English word is used for a25 . For example, a99a69a4a88a20a107a8a23a108a6a96a76 Pa4 PRP-VB2-VB1a8PRP-VB1-VB2a6 for the top node in Figure 1. Similarly for the node PRP, a102a103a4a97a21a104a8a24a105a6a109a76 Pa4 right, ha a8VB-PRPa6 and a106a68a4a5a22a104a8a25a7a6a110a76 Pa4 karea8hea6 . More detailed examples are found in (Yamada and Knight, 2001).</Paragraph> </Section> class="xml-element"></Paper>