File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2013_intro.xml

Size: 3,093 bytes

Last Modified: 2025-10-06 14:03:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2013">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An Empirical Study of Chinese Chunking</Title>
  <Section position="3" start_page="0" end_page="97" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Chunking identifies the non-recursive cores of various types of phrases in text, possibly as a precursor to full parsing or information extraction. Steven P. Abney was the first person to introduce chunks for parsing(Abney, 1991).</Paragraph>
    <Paragraph position="1"> Ramshaw and Marcus(Ramshaw and Marcus, 1995) first represented base noun phrase recognition as a machine learning problem. In 2000, CoNLL-2000 introduced a shared task to tag many kinds of phrases besides noun phrases in English(Sang and Buchholz, 2000). Additionally, many machine learning approaches, such as Support Vector Machines (SVMs)(Vapnik, 1995), Conditional Random Fields (CRFs)(Lafferty et al., 2001), Memory-based Learning (MBL)(Park and Zhang, 2003), Transformation-based Learning (TBL)(Brill, 1995), and Hidden Markov Models (HMMs)(Zhou et al., 2000), have been applied to text chunking(Sang and Buchholz, 2000; Hammerton et al., 2002).</Paragraph>
    <Paragraph position="2"> Chinese chunking is a difficult task, and much work has been done on this topic(Li et al., 2003a; Tan et al., 2005; Wu et al., 2005; Zhao et al., 2000). However, there are many different Chinese chunk definitions, which are derived from different data sets(Li et al., 2004; Zhang and Zhou, 2002). Therefore, comparing the performance of previous studies in Chinese chunking is very difficult. Furthermore, compared with the other languages, there are some special problems for Chinese chunking(Li et al., 2004).</Paragraph>
    <Paragraph position="3"> In this paper, we extracted the chunking corpus from UPENN Chinese Treebank-4(CTB4). We presented an empirical study of Chinese chunking on this corpus. First, we made an evaluation on the corpus to clarify the performance of state-of-the-art models in Chinese chunking. Then we proposed two approaches in order to improve the performance of Chinese chunking. 1) We proposed an approach to resolve the special problems of Chinese chunking. This approach extended the chunk tags for every problem by a tag-extension function. 2) We proposed two novel voting methods based on the characteristics of chunking task. Compared with traditional voting methods, the proposed voting methods considered long distance information. The experimental results showed the proposed approaches can improve the performance of Chinese chunking significantly.</Paragraph>
    <Paragraph position="4"> The rest of this paper is as follows: Section 2 describes the definitions of Chinese chunks. Sec- null tion 3 simply introduces the models and features for Chinese chunking. Section 4 proposes a tag-extension method. Section 5 proposes two new voting approaches. Section 6 explains the experimental results. Finally, in section 7 we draw the conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML