File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1061_intro.xml

Size: 3,014 bytes

Last Modified: 2025-10-06 14:00:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1061">
  <Title>English-to-Korean Transliteration using Multiple Unbounded Overlapping Phoneme Chunks</Title>
  <Section position="2" start_page="0" end_page="418" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In Korean technical documents, many English words are used in their original forms. But sometimes they are transliterated into Korean in different forms. Ex. 1, 2 show the examples of w~rious transliterations in KTSET 2.0(Park et al., 1996).</Paragraph>
    <Paragraph position="1">  (1) data (a) l:~\] o\] &gt;\]-(teyitha) \[1,033\] 1 (b) r~l o\] &gt;\] (teyithe) \[527\] 1the frequency in KTSET (2) digital (a) ~\] x\] ~-(ticithul) \[254\] (b) qN~-(tichithM)\[7\] (c) ~1 z\] ~ (ticithel) \[6\]  These various transliterations are not negligible tbr natural langnage processing, especially ill information retrieval. Because same words are treated as different ones, the calculation based (m tile frequency of word would produce misleading results. An experiment shows that the effectiveness of infbrmation retrieval increases when various tbrms including English words are treated eqnivMently(Jeong et al., 1997). We may use a dictionary, to tind a correct transliteratkm and its variations. But it is not fhasible because transliterated words are usually technical terms and proper nouns that have rich productivity. Therefore an automatic transliteration system is needed to find transliterations without manual intervention.</Paragraph>
    <Paragraph position="2"> There have been some studies on E-K transliteration. They tried to explain transliteration as tflloneme-lmr-phoneme or alphabetper-phonenm classification problem. They restricted the information length to two or three units beibre and behind an input unit. In tact, ninny linguistic phenomena involved in the E-K transliteration are expressed in terms of units that exceed a phoneme and an alphabet. For example, 'a' in 'ace' is transliterated into %11 degl (ey0&amp;quot; lint in 'acetic', &amp;quot;ot (eft and ill 'acetone', &amp;quot;O}(a)&amp;quot;. If we restrict the information length to two alphabets, then we cmmot explain these phenomena. Three words ge~ the same result  ~()r ~ a'.</Paragraph>
    <Paragraph position="3"> (3) ace otl o\] ~(eyisu) (4) acetic cq.q v I (esithik)  (5) acetone ol-J~ll ~-(aseython) In this t)at)er, we t)rot)ose /;he E-K transliteral;ion model t)ased on l)honeme chunks that do not have a length limit and can explain transliter;~tion l)henolnem, in SOllle degree of reliability. Not a alphal)et-per-all)habet but a chunk-i)er-chunk classification 1)roblem.</Paragraph>
    <Paragraph position="4"> This paper is organized as tbllows. In section 2, we survey an E-K transliteration. \]111 section 3, we propose, phonenm chunks 1)asexl transliteration and back-transliteration. In Seel;ion 4, the lesults of ext)erilnents are presented. Finally, the con(:hlsion follows in section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML