File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-1111_abstr.xml

Size: 893 bytes

Last Modified: 2025-10-06 13:43:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1111">
  <Title>A Statistical Model for Hangeul-Hanja Conversion in Terminology Domain</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Sino-Korean words, which are historically borrowed from Chinese language, could be represented with both Hanja (Chinese characters) and Hangeul (Korean characters) writings. Previous Korean Input Method Editors (IMEs) provide only a simple dictionary-based approach for Hangeul-Hanja conversion. This paper presents a sentence-based statistical model for Hangeul-Hanja conversion, with word tokenization included as a hidden process. As a result, we reach 91.4% of character accuracy and 81.4% of word accuracy in terminology domain, when only very limited Hanja data is available.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML