File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1015_metho.xml
Size: 7,131 bytes
Last Modified: 2025-10-06 14:08:43
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1015"> <Title>Example-based Machine Translation Based on Syntactic Transfer with Statistical Models</Title> <Section position="4" start_page="11" end_page="11" type="metho"> <SectionTitle> 3 Statistical Generation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="11" end_page="11" type="sub_section"> <SectionTitle> 3.1 Translation Model and Language Model </SectionTitle> <Paragraph position="0"> Statistical generation searches for the most appropriate sequence of target words from the target tree output from the example-based syntactic transfer. The most appropriate sequence is determined from the product of the translation model and the language model in the same manner as statistical MT. In other words, when F and E denote the channel target and channel source sequence, respectively, the output word sequence</Paragraph> <Paragraph position="2"> isfies the following equation is searched for.</Paragraph> <Paragraph position="4"> We only utilize the lexicon model as the translation model in this paper, similar to the models proposed by Vogel et al. (2003). Namely, when f and e denote the channel target and channel source word, respectively, the translation probability is computed by the following equation.</Paragraph> <Paragraph position="6"> The IBM models include other models, such as fertility, NULL, and distortion models. As we described in Section 2.2, the quality of machine translation is maintained using only the lexicon model because syntactical correctness is already preserved by example-based transfer.</Paragraph> <Paragraph position="7"> For the language model, we utilize a standard word n-gram model.</Paragraph> </Section> <Section position="2" start_page="11" end_page="11" type="sub_section"> <SectionTitle> 3.2 Bottom-up Generation </SectionTitle> <Paragraph position="0"> We can construct word graphs by serializing the target tree structure, which allows us to select the best word sequence from the graphs. However, the tree structure already shares nodes transferred from the same input sub-sequence. The cost of calculating probabilities is equivalent if we calculate the probabilities while serializing the tree structure. We call this method bottom-up generation in this paper.</Paragraph> <Paragraph position="1"> Figure 5 shows a partial example of bottom-up generation when the target tree in Figure 4 is given. For each node, word sub-sequences and their probabilities (language and translation) are obtained from child nodes. Then, the new probabilities of the word sequence combination are calculated, and the n-best sequences are selected. These n-best sequences and their probabilities are reused to calculate the probabilities of parent nodes. When the translation probability is calculated, the source word sub-sequence is obtained by tracing transfer mapping, and the applied translation model is restricted to the source sub-sequence. In other words, the translation probability is locally calculated between the corresponding phrases.</Paragraph> <Paragraph position="2"> When the generation reaches the top node, the language probability is re-calculated with marks for start-of-sentence and end-of-sentence, and the n-best list is re-sorted. As a result, the translation &quot;The bus will leave at 11 o'clock&quot; is obtained from the tree of Figure 4.</Paragraph> <Paragraph position="3"> Bottom-up generation calculates the probabilities of shared nodes only once, so it effectively uses tree information.</Paragraph> </Section> </Section> <Section position="5" start_page="11" end_page="11" type="metho"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> In order to evaluate the effect when models of statistical MT are integrated into example-based MT, we compared various methods that changed the statistical generation module.</Paragraph> <Section position="1" start_page="11" end_page="11" type="sub_section"> <SectionTitle> 4.1 Experimental Setting Bilingual Corpus The corpus used in the fol- </SectionTitle> <Paragraph position="0"> lowing experiments is the Basic Travel Expression Corpus (Takezawa et al., 2002; Kikui et al., 2003).</Paragraph> <Paragraph position="1"> This is a collection of Japanese sentences and their English translations based on expressions that are usually found in phrasebooks for foreign tourists.</Paragraph> <Paragraph position="2"> We divided it into subsets for training and testing as shown in Table 1.</Paragraph> <Paragraph position="3"> Transfer Rules Transfer rules were acquired from the training set using hierarchical phrase alignment, and low-frequency rules that appeared less than twice were removed. The number of rules was 24,310.</Paragraph> <Paragraph position="4"> Translation Model and Language Model We used a lexicon model of IBM Model 4 learned by GIZA++ (Och and Ney, 2003) and word bigram and trigram models learned by CMU-Cambridge (TM and LM denote log probabilities of the translation and language models, respectively) tree that was output from the example-based transfer module. The translation words were selected in advance as those having the highest frequency in the training corpus. This is the baseline for translating a sentence when using only the example-based transfer.</Paragraph> <Paragraph position="5"> * Bottom-up The bottom-up generation selects the best translation from the outputs of the example-based transfer. We used a 100-best criterion in this experiment.</Paragraph> </Section> </Section> <Section position="6" start_page="11" end_page="11" type="metho"> <SectionTitle> * All Search </SectionTitle> <Paragraph position="0"> For all combinations that can be generated from the outputs of the example-based transfer, we calculated the translation and language probabilities and selected the best translation.</Paragraph> <Paragraph position="1"> Namely, a globally optimal solution was selected when the search space was restricted by the example-based transfer.</Paragraph> </Section> <Section position="7" start_page="11" end_page="11" type="metho"> <SectionTitle> * LM Only </SectionTitle> <Paragraph position="0"> In the same way as All Search, the best translation was searched for, but only the language model was used for calculating probabilities.</Paragraph> <Paragraph position="1"> The purpose of this experiment is to measure the influence of the translation model.</Paragraph> <Paragraph position="2"> Evaluation Metrics From the test set, 510 sentences were evaluated by the following automatic and subjective evaluation metrics. The number of reference translations for automatic evaluation was 16 per sentence.</Paragraph> <Paragraph position="3"> BLEU: Automatic evaluation by BLEU score (Papineni et al., 2002).</Paragraph> <Paragraph position="4"> NIST: Automatic evaluation by NIST score (Doddington, 2002).</Paragraph> <Paragraph position="5"> mWER: The mean rate by calculating the word error rates between the MT results and all reference translations, where the lowest rate is selected. null</Paragraph> </Section> class="xml-element"></Paper>