XML Viewer - w06-0122

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0122_metho.xml
Size: 2,379 bytes
Last Modified: 2025-10-06 14:10:37
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0122">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics On Using Ensemble Methods for Chinese Named Entity Recognition</Title>
  <Section position="5" start_page="143" end_page="144" type="metho">
    <SectionTitle>
3 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="143" end_page="143" type="sub_section">
      <SectionTitle>
3.1 Data
</SectionTitle>
      <Paragraph position="0"> We selected the corpora of City University of Hong Kong (CityU) and Microsoft Research (MSRA) corpora to evaluate our methods. CityU is a Traditional Chinese corpus, and MSRA is Simplified Chinese corpus.</Paragraph>
    </Section>
    <Section position="2" start_page="143" end_page="144" type="sub_section">
      <SectionTitle>
3.2 Results
</SectionTitle>
      <Paragraph position="0"> Table 5 shows the results of several methods applied to the MSRA corpus. The memory-based ensemble method, which combines the results of a maximum entropy model and those of a CRF classifier, achieves the best performance. The majority vote combined with the results of three CRF models based on different feature sets has the worst performance.</Paragraph>
      <Paragraph position="1"> Table 5 msra  The results obtained on Cityu, presented in Table 6, show that the single CRF classifier achieved the best performance. None of the ensemble methods can outperform the non-ensemble methods. null  based ensemble methods under different rules. We set the frequency threshold as 2 and the relative frequency threshold as 0.5. The results show that the relative frequencies rule effectively reduces the loss of precision caused by more entities being tagged by the memory-based classifier. The memory-based ensemble method works well on the MSRA corpus, but not on the CityU corpus. In the MSRA corpus, the memory-based  ensemble method outperforms the individual CRF model by approximately 0.4 % in FB1. We found that the memory-based classifier can not achieve a better performance than the CRF model because it misclassifies many organizations' names. Therefore, we chose another strategy that restricts the memory-based classifier to tagging person names only. Under this restriction, the performance of the memory-based classifier improves FB1 by approximately 0.2%.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML