File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0122_metho.xml
Size: 2,379 bytes
Last Modified: 2025-10-06 14:10:37
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0122"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics On Using Ensemble Methods for Chinese Named Entity Recognition</Title> <Section position="5" start_page="143" end_page="144" type="metho"> <SectionTitle> 3 Experiments </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="143" end_page="143" type="sub_section"> <SectionTitle> 3.1 Data </SectionTitle> <Paragraph position="0"> We selected the corpora of City University of Hong Kong (CityU) and Microsoft Research (MSRA) corpora to evaluate our methods. CityU is a Traditional Chinese corpus, and MSRA is Simplified Chinese corpus.</Paragraph> </Section> <Section position="2" start_page="143" end_page="144" type="sub_section"> <SectionTitle> 3.2 Results </SectionTitle> <Paragraph position="0"> Table 5 shows the results of several methods applied to the MSRA corpus. The memory-based ensemble method, which combines the results of a maximum entropy model and those of a CRF classifier, achieves the best performance. The majority vote combined with the results of three CRF models based on different feature sets has the worst performance.</Paragraph> <Paragraph position="1"> Table 5 msra The results obtained on Cityu, presented in Table 6, show that the single CRF classifier achieved the best performance. None of the ensemble methods can outperform the non-ensemble methods. null based ensemble methods under different rules. We set the frequency threshold as 2 and the relative frequency threshold as 0.5. The results show that the relative frequencies rule effectively reduces the loss of precision caused by more entities being tagged by the memory-based classifier. The memory-based ensemble method works well on the MSRA corpus, but not on the CityU corpus. In the MSRA corpus, the memory-based ensemble method outperforms the individual CRF model by approximately 0.4 % in FB1. We found that the memory-based classifier can not achieve a better performance than the CRF model because it misclassifies many organizations' names. Therefore, we chose another strategy that restricts the memory-based classifier to tagging person names only. Under this restriction, the performance of the memory-based classifier improves FB1 by approximately 0.2%.</Paragraph> </Section> </Section> class="xml-element"></Paper>