File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/n01-1019_concl.xml
Size: 8,278 bytes
Last Modified: 2025-10-06 13:53:00
<?xml version="1.0" standalone="yes"?> <Paper uid="N01-1019"> <Title>Information-based Machine Translation</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 4. Implementation and Evaluation </SectionTitle> <Paragraph position="0"> A prototype implementation of this translation method has been created by the Sony USRL Speech Translation group (Franz et al. 200b). The prototype was developed for the &quot;overseas travel domain&quot;, which includes utterances and expressions useful for travel between e.g. Japan and the USA.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1. Lexicon and Example Database </SectionTitle> <Paragraph position="0"> The English-to-Japanese translation system includes an English dictionary with 6483 unique English root forms, and the English-to-Japanese example database contains 14,281 separate example pairs. These entries consist of constructions of various sizes, ranging from Can I have your name? conjoined sentences to individual words. For some example pairs, the system automatically extracts corresponding parts from the source and target expressions, and creates a new example pair. As a result, the system has a total of 24,072 example database entries available.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2. Development Set </SectionTitle> <Paragraph position="0"> We developed, tested, and refined the system until all of the main predicates of the 615 development set sentences with to have were translated correctly. For this, the system used 129 distinct example pairs with the main verb to have.</Paragraph> <Paragraph position="1"> Many example pairs encode a specific translation: 68 out of the 129 entries were used to translate only one expression from the development set. On the other hand, some entries are very general, and are used to translate a large number of expressions. The most frequently used entry is Do you have sushi?G14 G02 G0CG0EG0FG10 G11G0BG0CG0D (sushi-ga arimasu-ka), which is used to translate 113 out of the 615 development set expressions.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3. Linguistic Transfer </SectionTitle> <Paragraph position="0"> The transfer grammar contains 153 context-free rules. Each rule includes a rule-body with GPL statements, which can include calls to the example matching procedure, and calls to sub-transfer rules. To translate the 615 expression in the to have development set, the system performed an average of 3.4 match-and-transfer steps. (In many cases, more than one transfer path was pursued.) Only 26 out of the 615 expressions were translated with only one match-and-transfer step. Examples of such expressions include Have a good one! and You can have it. At the other extreme, the maximum number of match-and-transfer steps required to translate a single input expression was 9. One of the expressions that required 9 match-and-transfer steps was The double on the third floor has a really nice view of the ocean.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4. Evaluation </SectionTitle> <Paragraph position="0"> The system was evaluated using a new corpus of unseen expressions with the verb to have. The evaluation data was collected from three different travel phrase books published by Barron, Berlitz, and Lonely Planet. The English expressions containing to have as a regular verb (and have got as a main predicate) were manually extracted from the phrase books. There were 405 unique expressions with have in the resulting evaluation corpus, with an average of 5.5 words. The evaluation corpus was translated by the translation system, and each of the output expressions was examined and manually categorized according to its translation quality. The result is shown in the table below: The category &quot;flawless translation&quot; refers to translations without any obvious flaws or problems. &quot;Incomplete translations due to OOV&quot; refers to translations where the main predicate was correctly translated, but due to some out-of-vocabulary (OOV) nouns or modifiers, parts of the source-language input words were carried through to the target language expression. The category &quot;wrong translation&quot; refers to translations where the main predicate is incorrectly translated, with or without out-of-vocabulary words.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.5. Discussion </SectionTitle> <Paragraph position="0"> Some of the wrong translations are due to ambiguities in the object noun phrase, such as a fall in My child has had a fall, which the system translated as watashi-no kodomo-wa aki-ga arimashita (meaning My child had an autum).</Paragraph> <Paragraph position="1"> There were also a number of expressions that should have been translated into different predicates in Japanese, but which were not covered in the example database. Examples of these include the following : The evaluation shows that the information-based translation method works reliably for translating short, single-clause utterances. In support of the generality of this method, we found that translation accuracy could be improved by adding more examples, and that the features that mark specificity of example entries are applicable to expressions with other common verbs besides have.</Paragraph> <Paragraph position="2"> 4.6. Future Work One difficult problem remains in the treatment of support verb constructions. When the object has a modifier, the modifier has to be transferred as a verbal modifier in the target language if the target language requires a single verb construction. For example, to have a close look is translated as to look closely, and to have another look is translated as to look again. There are, however, not enough data in the development set to draw any conclusions about how general these modifiers can be treated across different support verb constructions.</Paragraph> <Paragraph position="3"> One hypothesis is that there are different degrees of proximity between the support verb and the object noun phrase. In some cases, there might be only one fixed phrase to be interpreted as the support verb construction, while other cases may allow many different modifiers for the object noun phrase. This is suggested by the case of to have a seat in the development set. This phrase allows the interpretation of to sit only if the object noun phrase is exactly a seat. The expression to have another seat cannot be translated as to sit again, but more like for another seat to exist. Further analysis of support verb construction data, including instances with other verbs besides have, will be necessary to determine how these constructions can best be handled in the current framework.</Paragraph> <Paragraph position="4"> Another avenue for future work is the use of Machine Learning techniques to select linguistic features, and statistical methods (such as loglinear models) to model the effect of feature combinations.</Paragraph> <Paragraph position="5"> Conclusion The approach described in this paper is based on the conviction that natural language transfer must be driven by qualitative, linguistic information. The analysis of the problem of translating one construction from English to Japanese has shown that a significant amount of linguistic information is necessary for achieving high-quality translation of something as simple as single-clause input. The transfer method that this paper described as one possible solution can integrate translation examples with linguistic rules and constraints in an effective manner.</Paragraph> <Paragraph position="6"> The linguistic information used in this approach is general and domain-independent; domain-specific translation knowledge is confined to the example database. This modular system architecture presents significant advantages for developing, maintaining, and extending a practical machine translation system.</Paragraph> </Section> </Section> class="xml-element"></Paper>