File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1057_intro.xml
Size: 3,113 bytes
Last Modified: 2025-10-06 14:01:49
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1057"> <Title>Feedback Cleaning of Machine Translation Rules Using Automatic Evaluation</Title> <Section position="3" start_page="0" end_page="1" type="intro"> <SectionTitle> 2 MT System and Problems of Automatic Acquisition </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 MT Engine </SectionTitle> <Paragraph position="0"> We use the Hierarchical Phrase Alignment-based Translator (HPAT) (Imamura, 2002) as a transfer-based MT system. The most important knowledge in HPAT is transfer rules, which define the correspondences between source and target language expressions. An example of English-to-Japanese transfer rules is shown in Figure 2. The transfer rules are regarded as a synchronized context-free grammar.</Paragraph> <Paragraph position="1"> When the system translates an input sentence, the sentence is first parsed by using source patterns of the transfer rules. Next, a tree structure of the target language is generated by mapping the source patterns to the corresponding target patterns. When non-terminal symbols remain in the target tree, target words are inserted by referring to a translation dictionary.</Paragraph> <Paragraph position="2"> Ambiguities, which occur during parsing or mapping, are resolved by selecting the rules that minimize the semantic distance between the input words and source examples (real examples in the training corpus) of the transfer rules (Furuse and Iida, 1994).</Paragraph> <Paragraph position="3"> For instance, when the input phrase &quot;leave at 11 a.m.&quot; is translated into Japanese, Rule 2 in Figure 2 is selected because the semantic distance from the source example (arrive, p.m.) is the shortest to the head words of the input phrase (leave, a.m.).</Paragraph> </Section> <Section position="2" start_page="0" end_page="1" type="sub_section"> <SectionTitle> 2.2 Problems of Automatic Acquisition </SectionTitle> <Paragraph position="0"> HPAT automatically acquires its transfer rules from parallel corpora by using Hierarchical Phrase Alignment (Imamura, 2001). However, the rule set contains many incorrect/redundant rules. The reasons for this problem are roughly classified as follows.</Paragraph> <Paragraph position="1"> * Errors in automatic rule acquisition * Translation variety in corpora - The acquisition process cannot generalize the rules because bilingual sentences depend on the context or the situation.</Paragraph> <Paragraph position="2"> - Corpora contain multiple (paraphrasable) translations of the same source expression. null In the experiment of Imamura (2002), about . Most of these rules are low-frequency. They reported that MT quality slightly improved, even though the low-frequency rules were removed to a level of about 1/9 the previous number. However, since some of them, such as idiomatic rules, are necessary for translation, MT quality cannot be dramatically improved by only removing low-frequency rules.</Paragraph> </Section> </Section> class="xml-element"></Paper>