File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1103_concl.xml

Size: 2,684 bytes

Last Modified: 2025-10-06 13:53:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1103">
  <Title>Direct Orthographical Mapping for Machine Transliteration</Title>
  <Section position="9" start_page="111" end_page="111" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> In this paper, we propose a new framework, direct orthographical mapping (DOM) for machine transliteration and back-transliteration. Under the DOM framework, we further propose a joint source-channel transliteration model, also called n-gram TM. We also implement the NCM model under DOM for reference. We use EM algorithm as an unsupervised training approach to train the n-gram TM and NCM. The proposed methods are tested on an English-Chinese name corpus and English-Japanese katakana word pair extracted from EDICT dictionary. The data-driven and one-step mapping strategies greatly reduce the development efforts of machine transliteration systems and improve accuracy significantly over earlier reported results. We also find the back-transliteration is more challenging than the transliteration.</Paragraph>
    <Paragraph position="1"> The DOM framework demonstrates several unique edges over phoneme-based approach:  1) By skipping the intermediate phonemic interpretation, the transliteration error rate is reduced significantly; 2) Transliteration models under DOM are datadriven. Assuming sufficient training corpus, the modeling approach applies to different language pairs; 3) DOM presents a paradigm shift for machine transliteration, that provides a platform for implementation of many other transliteration models; The n-gram TM is a successful implementation of DOM framework due to the following aspects: 1) N-gram TM captures contextual information in both source and target languages jointly; unlike the phoneme-based approach, the modeling of transformation rules and target language is tightly coupled in n-gram TM model.</Paragraph>
    <Paragraph position="2"> 2) As n-gram TM uses transliteration pair as modeling unit, the same model applies to bi-directional transliteration; 3) The bilingual aligning process is integrated into the decoding process in n-gram TM, which allows us to achieve a joint optimization of alignment and transliteration automatically. Hence manual pre-alignment is unnecessary.</Paragraph>
    <Paragraph position="3"> Named entities are sometimes translated in combination of transliteration and meanings. As the proposed framework allows direct orthographical mapping, we are extending our approach to handle such name translation. We also extending our method to handle the disorder and fertility issues in named entity translation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML