File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/i05-3033_evalu.xml

Size: 2,888 bytes

Last Modified: 2025-10-06 13:59:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-3033">
  <Title>Towards a Hybrid Model for Chinese Word Segmentation</Title>
  <Section position="4" start_page="190" end_page="190" type="evalu">
    <SectionTitle>
3 Results
</SectionTitle>
    <Paragraph position="0"> The segmenter was evaluated on the closed track of the Peking University Corpus in the bakeoff.</Paragraph>
    <Paragraph position="1"> In the development stage, we partitioned the official training data into two portions: the training set consists of 90% of the data, and the development set consists of the other 10%. The POC tagging accuracy on the development set is summarized in Table 1. The results indicate that the TBL tagger significantly improves the initial tagging produced by the HMM tagger.</Paragraph>
    <Paragraph position="2">  The performance of the merging algorithm on the development set is summarized in Table 2.</Paragraph>
    <Paragraph position="3"> To understand whether and how much the heuristics contribute to improving segmentation, we evaluated four versions of the merging algorithm. The set of heuristics used to handle non-Chinese characters and numeric type compounds did not seem to improve segmentation results on the development set, suggesting that these characters are handled well by the tagging component. However, the second set of heuristics improved segmentation accuracy significantly.</Paragraph>
    <Paragraph position="4"> This seems to confirm our hypothesis that longer words tend to behave more stably.</Paragraph>
    <Paragraph position="5">  Set. H1 stands for the set of heuristics used to handle non-Chinese characters and numeric type compounds. H2 stands for the set of heuristics used to handle long words.</Paragraph>
    <Paragraph position="6">  The official results of the segmenter in the closed-track of the Peking University Corpus are summarized in Table 3. It is somewhat unexpected that the results on the official test data dropped over 2% compared with the results obtained on the development set. Compared with the other systems, the segmenter performed relatively well on OOV words.</Paragraph>
    <Paragraph position="7"> Our preliminary error analysis indicates that this discrepancy in performance is partially attributable to two kinds of inconsistencies between the training and test datasets. One is that there are many ASCII numbers in the test set, but none in the training set. These numbers became unknown characters to the tagger and affected tagging accuracy. It is possible that this inconsistency affected our system more than other systems. Second, there are also a number of segmentation inconsistencies between the training and test sets, but these should have affected all systems more or less equally. The error analysis also indicates that the current segmenter performed poorly on transliterations of foreign names.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML