File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2165_intro.xml

Size: 2,584 bytes

Last Modified: 2025-10-06 14:00:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2165">
  <Title>KCAT : A Korean Corpus Annotating Tool Minimizing Human Intervention</Title>
  <Section position="2" start_page="0" end_page="1096" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> The POS annotated corpora are very important as a resource of usefiil information tbr natural language processing. A problem for corpus annotation is tile trade-off between efficiency and accuracy.</Paragraph>
    <Paragraph position="1"> Although manual POS ta,,&lt;,in,,== = is very reliable, it is labor intcnsive and hard to make a consistent POS tagged corpus. On the other hand, automatic ta,-,in,,&gt;~ &gt; is prone to erroi-s Ibr infrequently occurring words duo to tile lack el&amp;quot; overall linguistic information. At present, it is ahnost impossible to construct a highly accurate corptls by usin&lt;,~ an automatic taggcr~ alone.</Paragraph>
    <Paragraph position="2"> /ks a consequence, a semi-autonmtic ta,,,,in,~== method is proposed IBi corpus annotation. In  tagger tags each word and human experts correct the rots-tagged words in the post-editing step. But, in the post-editing step, as the human expert cannot know which word has been annotated incorrectly, he must check every word in the whole corpus. And he lnust do the same work again and again for the same words in the same context. This situation causes as Inuch labor-intensive work as in manual ta&lt;+&lt;qlw In this paper, we propose a semi-automatic tagging method that can reduce the human labor and guarantee the consistent tagging.</Paragraph>
    <Section position="1" start_page="0" end_page="1096" type="sub_section">
      <SectionTitle>
2o System Requivemer~ts
</SectionTitle>
      <Paragraph position="0"> To develop ari efficient tool that attempts to build a large accurately armotated corpus with minimal human labor~ we must consider the following requirements: (r) In order to minimize human labor, the same human intervention to tag and to correct the same word in tile same context should not be repeated.</Paragraph>
      <Paragraph position="1"> * There may be a word which was tagged inconsistently in the same context becatlse it was tagged by different human experts or at a different task time. As an elticient tool, it can prevent tile inconsistency of tile annotated (I results and ~uarantec the consistency of the annotated results.</Paragraph>
      <Paragraph position="2"> * It must provide an effective annotating capability lbr many unknown words in the whole corpus.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML