File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/i05-2034_evalu.xml

Size: 5,418 bytes

Last Modified: 2025-10-06 13:59:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2034">
  <Title>Probabilistic Models for Korean Morphological Analysis</Title>
  <Section position="7" start_page="199" end_page="201" type="evalu">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="199" end_page="200" type="sub_section">
      <SectionTitle>
4.1 Experimental environment
</SectionTitle>
      <Paragraph position="0"> For evaluation, three data sets having different tag sets and annotation guidelines are used: ETRI POS tagged corpus, KAIST POS tagged corpus, and Sejong POS tagged corpus. All experiments were performed by the 10-fold cross-validation.</Paragraph>
      <Paragraph position="1">  In this paper, we use the following measures in order to evaluate the system:  Eojeol na-neun 'I' hag-gyo-e 'to school' gan-da 'go' Tagged Eojeol na/np+neun/jx hag-gyo/nc+e/jc ga/pv+n-da/ef Morpheme na neun hag-gyo e ga n-da Morpheme tag np jx nc jc pv ef Syllable na neun hag gyo e ga n da Syllable tag B-np B-jx B-nc I-nc B-jc B-pv B-ef I-ef Answer inclusion rate (AIR) is defined as the number of Eojeols among whose results contain the gold standard over the entire Eojeols in the test data.</Paragraph>
      <Paragraph position="2"> Average ambiguity (AA) is defined as the average number of returned results per Eojeol by the system.</Paragraph>
      <Paragraph position="3"> Failure rate (FR) is defined as the number of Eojeols whose outputs are not produced over the number of Eojeols in the test data.</Paragraph>
      <Paragraph position="4"> 1-best tagging accuracy (1A) is defined as the number of Eojeols of which only one interpretation with highest probability per Eojeol is matched to the gold standard over the entire Eojeols in the test data.</Paragraph>
      <Paragraph position="5"> There is a trade-off between AIR and AA. If a system outputs many results, it is likely to include the correct answer in them, but this leads to an increase of the ambiguity, and vice versa. The higher AIR is, the better the system. The AIR can be an upper bound on the accuracy of POS taggers. On the contrary to AIR, the lower AA is, the better the system. A low AA can reduce the burden of the disambiguation process of the POS tagger. Although the 1A is not used as a common evaluation measure for morphological analysis because previous systems do not rank the results, ProKOMA can be evaluated by this measure because it provides the probabilities for the results. This measure can also be served as a base-line for POS tagging.</Paragraph>
    </Section>
    <Section position="2" start_page="200" end_page="201" type="sub_section">
      <SectionTitle>
4.2 Experimental results
</SectionTitle>
      <Paragraph position="0"> To investigate the performance and the effectiveness of the three models, we conducted several tests according to the combinations of the models. For each test, we also performed the experiments on the three corpora. The results of the experiments are listed in Table 5. In the table, &amp;quot;E&amp;quot;, &amp;quot;M&amp;quot;, and &amp;quot;S&amp;quot; mean the Eojeol-unit analysis, the morpheme-unit analysis, and the syllable-unit analysis, respectively. The columns having more than one symbol mean that each model performs sequentially.</Paragraph>
      <Paragraph position="1"> According to the results, when applying a single model, each model shows the significant differences, especially between &amp;quot;E&amp;quot; and &amp;quot;S&amp;quot;. Because of low coverage of the Eojeol-unit analysis, &amp;quot;E&amp;quot; shows the lowest AIR and the highest FR. However, it shows the lowest AA because it produces the small number of results. On the contrary, &amp;quot;S&amp;quot; shows the highest AA but the best performances on AIR and FR, which is caused by producing many results.</Paragraph>
      <Paragraph position="2"> Most previous systems use morpheme as a processing unit for morphological analysis. We would like to examine the effectiveness of the proposed models based on Eojeol and syllable.</Paragraph>
      <Paragraph position="3"> First, compare the models that use the Eojeol-unit analysis with others (&amp;quot;M&amp;quot; vs. &amp;quot;EM&amp;quot;, &amp;quot;S&amp;quot; vs. &amp;quot;ES&amp;quot;, and &amp;quot;MS&amp;quot; vs. &amp;quot;EMS&amp;quot;). When applying the Eojeol-unit analysis, AA is decreased, and AIS and 1A are increased. Then, compare the models that use the syllable-unit analysis with others (&amp;quot;E&amp;quot; vs. &amp;quot;ES&amp;quot;, &amp;quot;M&amp;quot; vs. &amp;quot;MS&amp;quot;, and &amp;quot;EM&amp;quot; vs. &amp;quot;EMS&amp;quot;). When applying the syllable-unit analysis, AIR and 1A are increased, and FR is decreased. Therefore, both models are very useful when compared the morpheme-unit model only.</Paragraph>
      <Paragraph position="4"> Compared with the performances of two systems that participated in MATEC 99, we listed the results in Table 6. In this evaluation, the ETRI corpus was used and the number of Eojeols included in the test data is 33,855. The evaluation data used in MATEC 99 and ours are not the same, but are close. As can be  (Lee et al., 1999)'s system (Song et al., 1999)'s system Answer inclusion rate (%) 98 92 Average ambiguity 4.13 1.75 seen, the Lee et al. (1999)'s system is better than ProKOMA in terms of AIS, but it generates too many results (with higher AA).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML