File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2013_metho.xml
Size: 14,988 bytes
Last Modified: 2025-10-06 14:10:24
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2013"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An Empirical Study of Chinese Chunking</Title> <Section position="4" start_page="97" end_page="97" type="metho"> <SectionTitle> 2 Definitions of Chinese Chunks </SectionTitle> <Paragraph position="0"> We defined the Chinese chunks based on the CTB4 dataset1. Many researchers have extracted the chunks from different versions of CTB(Tan et al., 2005; Li et al., 2003b). However, these studies did not provide sufficient detail. We developed a tool2 to extract the corpus from CTB4 by modifying the tool Chunklink3.</Paragraph> <Section position="1" start_page="97" end_page="97" type="sub_section"> <SectionTitle> 2.1 Chunk Types </SectionTitle> <Paragraph position="0"> Here we define 12 types of chunks4: ADJP, ADVP, CLP, DNP, DP, DVP, LCP, LST, NP, PP, QP, VP(Xue et al., 2000). Table 1 provides definitions of these chunks.</Paragraph> </Section> <Section position="2" start_page="97" end_page="97" type="sub_section"> <SectionTitle> 2.2 Data Representation </SectionTitle> <Paragraph position="0"> To represent the chunks clearly, we represent the data with an IOB-based model as the CoNLL00 shared task did, in which every word is to be tagged with a chunk type label extended with I (inside a chunk), O (outside a chunk), and B (inside a chunk, but also the first word of the chunk). other chunk types are FRAG, PRN, and UCP.</Paragraph> <Paragraph position="1"> Each chunk type could be extended with I or B tags. For instance, NP could be represented as two types of tags, B-NP or I-NP. Therefore, we have 25 types of chunk tags based on the IOB-based model. Every word in a sentence will be tagged with one of these chunk tags. For instance, the sentence (word segmented and Part-of-</Paragraph> <Paragraph position="3"> be tagged as follows:</Paragraph> </Section> </Section> <Section position="5" start_page="97" end_page="97" type="metho"> <SectionTitle> B-NP /rB-VP /B-NP /I-NP /bO / </SectionTitle> <Paragraph position="0"> Here S1 denotes that the sentence is tagged with chunk types, and S2 denotes that the sentence is tagged with chunk tags based on the IOB-based model.</Paragraph> <Paragraph position="1"> With data representation, the problem of Chinese chunking can be regarded as a sequence tagging task. That is to say, given a sequence of tokens (words pairing with Part-of-Speech tags), x = x1,x2,...,xn, we need to generate a sequence of chunk tags, y = y1,y2,...,yn.</Paragraph> <Section position="1" start_page="97" end_page="97" type="sub_section"> <SectionTitle> 2.3 Data Set </SectionTitle> <Paragraph position="0"> CTB4 dataset consists of 838 files. In the experiments, we used the first 728 files (FID from chtb 001.fid to chtb 899.fid) as training data, and the other 110 files (FID from chtb 900.fid to chtb 1078.fid) as testing data. In the following sections, we use the CTB4 Corpus to refer to the extracted data set. Table 2 lists details on the</Paragraph> </Section> </Section> <Section position="6" start_page="97" end_page="98" type="metho"> <SectionTitle> 3 Chinese Chunking </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="97" end_page="98" type="sub_section"> <SectionTitle> 3.1 Models for Chinese Chunking </SectionTitle> <Paragraph position="0"> In this paper, we applied four models, including SVMs, CRFs, TBL, and MBL, which have achieved good performance in other languages.</Paragraph> <Paragraph position="1"> We only describe these models briefly since full details are presented elsewhere(Kudo and Matsumoto, 2001; Sha and Pereira, 2003; Ramshaw and Marcus, 1995; Sang, 2002).</Paragraph> <Paragraph position="2"> Support Vector Machines (SVMs) is a powerful supervised learning paradigm based on the Structured Risk Minimization principle from computational learning theory(Vapnik, 1995). Kudo and Matsumoto(Kudo and Matsumoto, 2000) applied SVMs to English chunking and achieved the best performance in the CoNLL00 shared task(Sang and Buchholz, 2000). They created 231 SVMs classifiers to predict the unique pairs of chunk tags.The final decision was given by their weighted voting. Then the label sequence was chosen using a dynamic programming algorithm.</Paragraph> <Paragraph position="3"> Tan et al. (Tan et al., 2004) applied SVMs to Chinese chunking. They used sigmoid functions to extract probabilities from SVMs outputs as the post-processing of classification. In this paper, we used Yamcha (V0.33)5 in our experiments.</Paragraph> <Paragraph position="4"> Conditional Random Fields is a powerful sequence labeling model(Lafferty et al., 2001) that combine the advantages of both the generative model and the classification model. Sha and Pereira(Sha and Pereira, 2003) showed that state-of-the-art results can be achieved using CRFs in English chunking. CRFs allow us to utilize a large number of observation features as well as different state sequence based features and other features we want to add. Tan et al. (Tan et al., 2005) applied CRFs to Chinese chunking and their experimental results showed that the CRFs approach provided better performance than HMM. In this paper, we used MALLET (V0.3.2)6(McCallum, 2002) to implement the CRF model.</Paragraph> </Section> </Section> <Section position="7" start_page="98" end_page="99" type="metho"> <SectionTitle> 3.1.3 TBL </SectionTitle> <Paragraph position="0"> Transformation based learning(TBL), first introduced by Eric Brill(Brill, 1995), is mainly based on the idea of successively transforming the data in order to correct the error. The transformation rules obtained are usually few , yet powerful. TBL was applied to Chinese chunking by Li et al.(Li et al., 2004) and TBL provided good performance on their corpus. In this paper, we used fnTBL (V1.0)7 to implement the TBL model.</Paragraph> <Paragraph position="1"> based learning) is a non-parametric inductive learning paradigm that stores training instances in a memory structure on which predictions of new instances are based(Walter et al., 1999). The similarity between the new instance X and example Y in memory is computed using a distance metric.</Paragraph> <Paragraph position="2"> Tjong Kim Sang(Sang, 2002) applied memory-based learning(MBL) to English chunking. MBL performs well for a variety of shallow parsing tasks, often yielding good results. In this paper, we used TiMBL8(Daelemans et al., 2004) to implement the MBL model.</Paragraph> <Section position="1" start_page="98" end_page="99" type="sub_section"> <SectionTitle> 3.2 Features </SectionTitle> <Paragraph position="0"> The observations are based on features that are able to represent the difference between the two events. We utilize both lexical and Part-Of-Speech(POS) information as the features.</Paragraph> <Paragraph position="1"> We use the lexical and POS information within a fixed window. We also consider different combinations of them. The features are listed as follows: * WORD: uni-gram and bi-grams of words in an n window.</Paragraph> <Paragraph position="2"> * POS: uni-gram and bi-grams of POS in an n window.</Paragraph> <Paragraph position="3"> * WORD+POS: Both the features of WORD and POS.</Paragraph> <Paragraph position="4"> where n is a predefined number to denote window size.</Paragraph> <Paragraph position="5"> For instance, the WORD features at the 3rd position (-NR) in Example 1 (set n as 2): In Chinese chunking, there are some difficult problems, which are related to Special Terms, Noun-Noun Compounds, Named Entities Tagging and Coordination. In this section, we propose an approach to resolve these problems by extending the In the current data representation, the chunk tags are too generic to construct accurate models. Therefore, we define a tag-extension function fs in order to extend the chunk tags as follows:</Paragraph> <Paragraph position="7"> where, T denotes the original tag set, Q denotes the problem set, and Te denotes the extended tag set. For instance, we have an q problem(q [?] Q).</Paragraph> <Paragraph position="8"> Then we extend the chunk tags with q. For NP Recognition, we have two new tags: B-NP-q and I-NP-q. Here we name this approach as TagExtension. null In the following three cases study, we demonstrate that how to use Tag-Extension to resolve the difficult problems in NP Recognition.</Paragraph> <Paragraph position="9"> 1) Special Terms: this kind of noun phrases is special terms such as &quot;y/</Paragraph> <Paragraph position="11"> u(Forbidden Zone)/z/&quot;, which are bracketed with the punctuation &quot;y,z,w,x,u,v&quot;.</Paragraph> <Paragraph position="12"> They are divided into two types: chunks with these punctuation and chunks without these punctuation. For instance, &quot;y/ 2) Coordination: These problems are related to the conjunctions &quot;(and),(and),(or), (and)&quot;. They can be divided into two types: chunks with conjunctions and chunks without conjunctions. For instance, &quot;P(HongKong)/ 3n(living maintenance)/&quot; it is difficult to tell whether &quot;K&quot; is a shared modifier or not, even for people. We extend the tags with COO for Coordination: B-NP-COO and I-NP-COO.</Paragraph> <Paragraph position="13"> 3) Named Entities Tagging: Named Entities(NE)(Sang and Meulder, 2003) are not distinguished in CTB4, and they are all tagged as &quot;NR&quot;. However, they play different roles in chunks, especial in noun phrases. For instance, &quot; -NR(Macau)/-NN(Airport)&quot; and &quot;P -NR(Hong Kong)/-NN(Airport)&quot; vs &quot;l -NR(Deng Xiaoping)/5 3-NN(Mr.)&quot; and &quot; -NR(Song Weiping)-NN(President)&quot;.</Paragraph> <Paragraph position="14"> Here &quot; &quot; and &quot;P&quot; are LOCATION, while &quot;l&quot; and &quot; &quot; are PERSON. To investi null gate the effect of Named Entities, we use a LOCA-TION dictionary, which is generated from the PFR corpus9 of ICL, Peking University, to tag location words in the CTB4 Corpus. Then we extend the tags with LOC for this problem: B-NP-LOC and I-NP-LOC.</Paragraph> <Paragraph position="15"> From the above cases study, we know the steps of Tag-Extension. Firstly, identifying a special problem of chunking. Secondly, extending the chunk tags via Equation (1). Finally, replacing the tags of related tokens with new chunk tags. After Tag-Extension, we use new added chunk tags to describe some special problems.</Paragraph> </Section> </Section> <Section position="8" start_page="99" end_page="100" type="metho"> <SectionTitle> 5 Voting Methods </SectionTitle> <Paragraph position="0"> Kudo and Matsumoto(Kudo and Matsumoto, 2001) reported that they achieved higher accuracy by applying voting of systems that were trained using different data representations. Tjong Kim Sang et al.(Sang and Buchholz, 2000) reported similar results by combining different systems.</Paragraph> <Paragraph position="1"> In order to provide better results, we also apply the voting of basic systems, including SVMs, CRFs, MBL and TBL. Depending on the characteristics in the chunking task, we propose two new voting methods. In these two voting methods, we consider long distance information.</Paragraph> <Paragraph position="2"> In the weighted voting method, we can assign different weights to the results of the individual system(van Halteren et al., 1998). However, it requires a larger amount of computational capacity as the training data is divided and is repeatedly used to obtain the voting weights. In this paper, we give the same weight to all basic systems in our voting methods. Suppose, we have K basic systems, the input sentence is x = x1,x2,...,xn, and the results of K basic systems are tj = t1j,t2j,...,tnj,1 [?] j [?] K. Then our goal is to gain a new result y = y1,y2,...,yn by voting.</Paragraph> <Section position="1" start_page="99" end_page="100" type="sub_section"> <SectionTitle> 5.1 Basic Voting </SectionTitle> <Paragraph position="0"> This is traditional voting method, which is the same as Uniform Weight in (Kudo and Matsumoto, 2001). Here we name it as Basic Voting. For each position, we have K candidates from K basic systems. After voting, we choose the candidate with the most votes as the final result for each position.</Paragraph> </Section> <Section position="2" start_page="100" end_page="100" type="sub_section"> <SectionTitle> 5.2 Sent-based Voting </SectionTitle> <Paragraph position="0"> In this paper, we treat chunking as a sequence labeling task. Here we apply this idea in computing the votes of one sentence instead of one word. We name it as Sent-based Voting. For one sentence, we have K candidates, which are the tagged sequences produced by K basic systems. First, we vote on each position, as done in Basic Voting.</Paragraph> <Paragraph position="1"> Then we compute the votes of every candidate by accumulating the votes of each position. Finally, we choose the candidate with the most votes as the final result for the sentence. That is to say, we make a decision based on the votes of the whole sentence instead of each position.</Paragraph> </Section> <Section position="3" start_page="100" end_page="100" type="sub_section"> <SectionTitle> 5.3 Phrase-based Voting </SectionTitle> <Paragraph position="0"> In chunking, one phrase includes one or more words, and the word tags in one phrase depend on each other. Therefore, we propose a novel voting method based on phrases, and we compute the votes of one phrase instead of one word or one sentence. Here we name it as Phrase-based Voting.</Paragraph> <Paragraph position="1"> There are two steps in the Phrase-based Voting procedure. First, we segment one sentence into pieces. Then we calculate the votes of the pieces.</Paragraph> <Paragraph position="2"> Table 3 is the algorithm of Phrase-based Voting, where F(tij,tik) is a binary function:</Paragraph> <Paragraph position="4"> In the segmenting step, we seek the &quot;O&quot; or &quot;B-XP&quot; (XP can be replaced by any type of phrase) tags, in the results of basic systems. Then we get a new piece if all K results have the &quot;O&quot; or &quot;B-XP&quot; tags at the same position.</Paragraph> <Paragraph position="5"> In the voting step, the goal is to choose a result for each piece. For each piece, we have K candidates. First, we vote on each position within the piece, as done in Basic Voting. Then we accumulate the votes of each position for every candidate. Finally, we pick the one, which has the most votes, as the final result for the piece.</Paragraph> <Paragraph position="6"> The difference in these three voting methods is that we make the decisions in different ranges: Basic Voting is at one word; Phrase-based Voting is in one piece; and Sent-based Voting is in one sentence. null</Paragraph> </Section> </Section> class="xml-element"></Paper>