File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-3010_intro.xml
Size: 2,163 bytes
Last Modified: 2025-10-06 14:02:18
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-3010"> <Title>A THAI SPEECH TRANSLATION SYSTEM FOR MEDICAL DIALOGS</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 3. Machine Translation </SectionTitle> <Paragraph position="0"> The Machine Translation (MT) component of our current Thai system is based on an interlingua called the Interchange Format (IF). The IF developed by CMU has been expanded and now encompasses concepts in both the travel and medical domains, as well as many general-use or cross-domain concepts in many different languages [4].</Paragraph> <Paragraph position="1"> Interlingua-based MT has several advantages, namely: (1) it abstracts away from variations in syntax across languages, providing potentially deep analysis of meaning without relying on information pertinent only to one particular language pair, (2) modules for analysis and generation can be developed monolingually, with additional reference only to the second &quot;language&quot; of the interlingua, (3) the speaker can be given a paraphrase in his or her own language, which can help verify the accuracy of the analysis and be used to alert the listener to inaccurate translations, and (4) translation systems can be extended to new languages simply by hooking up new monolingual modules for analysis and/or generation, eliminating the need to develop a completely new system for each new language pair.</Paragraph> <Paragraph position="2"> Thai has some particular characteristics which we addressed in IF and appear in the grammars as follows: For natural language generation from interlingua for Thai and English, we are currently investigating two options: a knowledge-based generation with the pseudo-unification based GenKit generator developed at CMU, which employs manually written semantic/syntactic grammars and lexicons, and a statistical generation operating on a training corpus of aligned interlingua and natural language correspondences. Performance tests as well as the amount and quality of training data will decide which approach will be pursued in the future.</Paragraph> </Section> class="xml-element"></Paper>