File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1111_concl.xml

Size: 1,946 bytes

Last Modified: 2025-10-06 13:54:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1111">
  <Title>A Statistical Model for Hangeul-Hanja Conversion in Terminology Domain</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> This paper proposes a sentence based statistical model for Hangeul-Hanja conversion in Korean language. This model provides a unified approach to the whole conversion processing, which includes word tokenization, sino-Korean word recognition and the correct Hanja correspondence selection. A series of experiments have been done for the issues in model and system implementation.</Paragraph>
    <Paragraph position="1"> Including, adapting the model to character-level or word-level, the influence of the TM weight, the different POS tag constraints on the sino-Korean word recognition, etc.</Paragraph>
    <Paragraph position="2"> The experiments show that best result is achieved from character based TM with using both dictionary and user data. The best character accuracy in computer science and electronic engineering terminology domain is 91.4%, which is even better than the draft result from untrained human translator.</Paragraph>
    <Paragraph position="3"> This paper also uses several different evaluation standards to see which method is the most suitable one. As a result, we found that the word/term accuracy and word based precision/recall can reflect the user readability well, when the character accuracy is more suitable to the system performance evaluation in full detail.</Paragraph>
    <Paragraph position="4"> We are doing further research on general domain, especially about utilizing the concept hierarchy of thesaurus to solve data sparseness problem. We are also considering about use Japanese corpus for Hangeul-Hanja, because the Kanji in Japanese language also has some overlap with the Hanja in Korean language.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML