File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2167_metho.xml

Size: 7,071 bytes

Last Modified: 2025-10-06 14:15:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2167">
  <Title>Machine Aided Error-Correction Environment for Korean Morphological Analysis and Part-of-Speech Tagging</Title>
  <Section position="4" start_page="1015" end_page="1016" type="metho">
    <SectionTitle>
3 Proposed Model
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1015" end_page="1015" type="sub_section">
      <SectionTitle>
3.1 The Causes of Part-of-Speech
Tagging Error
</SectionTitle>
      <Paragraph position="0"> We will mention important causes to make POS tagging errors. The first cause comes from the low accuracy at tagging unknown words, since assigning the most likely tag for unknown words cannot be expected to give a good result. Second, the linguistic information reflects only the morpheme concatenation, as mentioned in the previous section. Especially, errors occur because of the complex morphological characteristics of Korean. Third, the ambiguities of meanings cannot be resolved, since tagger would not distinguish them in the morphological level.</Paragraph>
    </Section>
    <Section position="2" start_page="1015" end_page="1015" type="sub_section">
      <SectionTitle>
3.2 Processing Unknown Words
</SectionTitle>
      <Paragraph position="0"> Some of the tagging errors come from the unknown word - absence of the word entry in the dictionary. If at least one sequence of morphological analysis can produce sequence of morphemes registered in the dictionary, the unknown word identification routine does not work even if other sequence contains unknown word.</Paragraph>
      <Paragraph position="1"> If no sequence is successful, then the system suggests the possible POS-tagged unknown words.</Paragraph>
      <Paragraph position="2"> In our system, if the morphological analyzer cannot find that all morphemes are in the dictionary, unknown words are supposed to be included in the word. Then, the user adds the unknown words into the dictionary with dictionary manager, if any. After adding the words, morphological analyzer is called once again. Because the user adds the identified unknown words into the dictionary, morphological overanalysis can be avoided.</Paragraph>
    </Section>
    <Section position="3" start_page="1015" end_page="1016" type="sub_section">
      <SectionTitle>
3.3 Correction of Errors
</SectionTitle>
      <Paragraph position="0"> The result produced by any tagger will contain errors, and correcting these errors would cost very much. Hence, it would be helpful to correct tagging errors using a system which finds errors and correct them. To correct errors in this proposed model is defined first to suggest candidate tags to the user and then to find words which is likely to be wrong tagged. Correction rule  and manual correction log are necessary for automatic error detection and candidate suggestion. Rule-based method is a way of finding the wrong tags with exact match using the predescribed rule and suggestion pair. The correction rules are in the form of: (&lt;current morpheme&gt; &lt; current tag&gt;)*/position of wrong morpheme or tag/corrected morpheme or ta 9 where * means the repetition. Four kinds of operators can be used in current morpheme or tag.</Paragraph>
      <Paragraph position="1">  * Don't Care(.) indicates that matching with all morpheme or tag is permitted. If we replace all the tag a after noun word with tag/3, the rule ', &lt; noun &gt; * &lt; a &gt; /4/&lt;/3 &gt;' is used.</Paragraph>
      <Paragraph position="2"> * Or(I ) allows to match any one of the expressions. If we replace all the tag a after common or proper noun word with tag/3, the rule ', &lt; noun &gt; I &lt; propernoun &gt; * &lt; a &gt;/4/&lt;/3 &gt;' is used.</Paragraph>
      <Paragraph position="3"> * Closure(-{-) matches only the content before &amp;quot;+&amp;quot;. If we replace all the tag a after common noun(tagged as 'ncn', 'ncpa', 'ncps'), with tag /3, the rule, '*nc + * &lt; a &gt;/4/&lt;/3 &gt;' is sufficient.</Paragraph>
      <Paragraph position="4"> * Not(!) matches except expressions following &amp;quot;!&amp;quot; If we replace all the tag except a after noun word with tag a, the rule</Paragraph>
      <Paragraph position="6"> For example, the following rule can replace all the tag 'jcs' before the word &amp;quot;-~ r%(doeda)&amp;quot; with 'jet'.</Paragraph>
      <Paragraph position="7"> ', jcs ~ (doe) pvg / 2 / jcc' Another is the method of using manual correction log. Errors which are not detected by correction rules should be corrected by human tagger. The result of correction is compiled for the next time. Manual log is composed of part of error and part of suggestion. For example, when we change &amp;quot;u\]-~(da'un)/ncpa&amp;quot; to &amp;quot;~(dab)/xsm-t-t-(n)/etm&amp;quot;, the entry will be 'da'un/ncpa, dab/xsm+n/etm'. We can adapt the entry to the augmented case, such as '~(saram)/ncn+da'un/ncpa', '2 ,-7, (hag'gyo)/ncn+da'un/ncpa'.</Paragraph>
      <Paragraph position="8"> Correction rule can apply to the many kinds of word phrase; while manual log is concerned about only one instance of word phrase. With the manual correction logs, many repetitive errors in a document can be remedied.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1016" end_page="1017" type="metho">
    <SectionTitle>
4 Implementation
</SectionTitle>
    <Paragraph position="0"> We have implemented error-correction environment to provide the human tagger with the interactive and efficient tagging environment.</Paragraph>
    <Paragraph position="1"> The overall structure of our environment is shown in Figure 2.</Paragraph>
    <Paragraph position="2"> The process of making POS-tagged documents in this environment is as follows:  1. Identify unknown words through morphological analysis.</Paragraph>
    <Paragraph position="3"> 2. Add unknown word to the dictionary. 3. Repeat morphological analysis using updated dictionary until no more unknown word is found.</Paragraph>
    <Paragraph position="4"> 4. Run automatic POS tagging.</Paragraph>
    <Paragraph position="5"> 5. Detect unknown word error and suggest a correct candidate word.</Paragraph>
    <Paragraph position="6"> 6. Act according to reaction of human tagger - approving modificaton or not, receiving direct input from the human tagger.</Paragraph>
    <Paragraph position="7"> 7. Repeat steps 5 and 6 with automatic error correction using rules and correction logs so that incremental improvement of tagging accurarcy can be achieved.</Paragraph>
    <Paragraph position="8"> 8. Correct manually, if there is any error, which is not detected.</Paragraph>
    <Paragraph position="9"> 9. Save what the human tagger corrected at step 8, and start detecting errors and give suggestion on the POS-tagged document, with manual log.</Paragraph>
    <Paragraph position="10"> 10. If unknown word exists in the result from  step 9, save the result in the dictionary; otherwise, add it to the manual log.</Paragraph>
    <Paragraph position="11"> 11. Repeat steps 8 and 10 until the human tagger finds no error in the POS-tagged document. null Figure 3 shows the Tagging Workbench.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML