File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/w98-1110_abstr.xml

Size: 1,333 bytes

Last Modified: 2025-10-06 13:49:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1110">
  <Title>Generalized unknown morpheme guessing for hybrid POS tagging of Korean*</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Most of errors in Korean morphological analysis and POS (Part-of-Speech) tagging are caused by unknown morphemes. This paper presents a generalized unknown morpheme handling method with P OSTAG (POStech TAGger) which is a statistical/rule based hybrid POS tagging system. The generalized unknown morpheme guessing is based on a combination of a morpheme pattern dictionary which encodes general lexical patterns of Korean morphemes with a posteriori syllable tri-gram estimation.</Paragraph>
    <Paragraph position="1"> The syllable tri-grams help to calculate lexical probabilities of the unknown morphemes and are utilized to search the best tagging result.</Paragraph>
    <Paragraph position="2"> In our scheme, we can guess the POS's of unknown morphemes regardless of their numbers and positions in an eojeol, which was not possible before in Korean tagging systems. In a series of experiments using three different domain corpora, we can achieve 97% tagging accuracy regardless of many unknown morphemes in test corpora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML