File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2013_evalu.xml
Size: 7,471 bytes
Last Modified: 2025-10-06 13:59:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2013"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An Empirical Study of Chinese Chunking</Title> <Section position="9" start_page="100" end_page="102" type="evalu"> <SectionTitle> 6 Experiments </SectionTitle> <Paragraph position="0"> In this section, we investigated the performance of Chinese chunking on the CTB4 Corpus.</Paragraph> <Paragraph position="1"> Input: Sequence: x = x1,...,xn; K results: tj = t1j,...,tnj,1 [?] j [?] K. Output: Voted results: y = y1,y2,...,yn Segmenting: Segment the sentence into pieces. Pieces[]=null; begin = 1 For each i in (2, n){ For each j in (1,K) if(tij is not &quot;O&quot; and &quot;B-XP&quot;) break; if(j > K){ add new piece: p = xbegin,...,xi[?]1 into Pieces; begin = i; }} Voting: Choose the result with the most votes for each</Paragraph> <Paragraph position="3"> Choose tbegin,kmax,...,tend,kmax as the result for</Paragraph> <Section position="1" start_page="100" end_page="100" type="sub_section"> <SectionTitle> 6.1 Experimental Setting </SectionTitle> <Paragraph position="0"> To investigate the chunker sensitivity to the size of the training set, we generated different sizes of training sets, including 1%, 2%, 5%, 10%, 20%, 50%, and 100% of the total training data.</Paragraph> <Paragraph position="1"> In our experiments, we used all the default parameter settings of the packages. Our SVMs and CRFs chunkers have a first-order Markov dependency between chunk tags.</Paragraph> <Paragraph position="2"> We evaluated the results as CONLL2000 sharetask did. The performance of the algorithm was measured with two scores: precision P and recall R. Precision measures how many chunks found by the algorithm are correct and the recall rate contains the percentage of chunks defined in the corpus that were found by the chunking program. The two rates can be combined in one measure:</Paragraph> <Paragraph position="4"> In this paper, we report the results with F1 score.</Paragraph> </Section> <Section position="2" start_page="100" end_page="101" type="sub_section"> <SectionTitle> 6.2 Experimental Results </SectionTitle> <Paragraph position="0"> 6.2.1 POS vs. WORD+POS In this experiment, we compared the performance of different feature representations, in- null and set the window size as 2. We also investigated the effects of different sizes of training data. The SVMs and CRFs approaches were used in the experiments because they provided good performance in chunking(Kudo and Matsumoto, 2001)(Sha and Pereira, 2003).</Paragraph> <Paragraph position="1"> Figure 1 shows the experimental results, where xtics denotes the size of the training data, &quot;WP&quot; refers to WORD+POS, &quot;P&quot; refers to POS. We can see from the figure that WORD+POS yielded better performance than POS in the most cases. However, when the size of training data was small, the performance was similar. With WORD+POS, SVMs provided higher accuracy than CRFs in all training sizes. However, with POS, CRFs yielded better performance than SVMs in large scale training sizes. Furthermore, we found SVMs with WORD+POS provided 4.07% higher accuracy than with POS, while CRFs provided 2.73% higher accuracy.</Paragraph> </Section> <Section position="3" start_page="101" end_page="101" type="sub_section"> <SectionTitle> 6.2.2 Comparison of Models </SectionTitle> <Paragraph position="0"> In this experiment, we compared the performance of the models, including SVMs, CRFs, MBL, and TBL, in Chinese chunking. In the experiments, we used the feature WORD+POS and set the window size as 2 for the first two models. For MBL, WORD features were within a onewindow size, and POS features were within a twowindow size. We used the original data for TBL without any reformatting.</Paragraph> <Paragraph position="1"> Table 4 shows the comparative results of the models. We found that the SVMs approach was superior to the other ones. It yielded results that were 0.72%, 1.51%, and 3.58% higher accuracy than respective CRFs, TBL, and MBL approaches.</Paragraph> <Paragraph position="2"> Giving more details for each category, the SVMs approach provided the best results in ten categories, the CRFs in one category, and the TBL in five categories.</Paragraph> </Section> <Section position="4" start_page="101" end_page="102" type="sub_section"> <SectionTitle> 6.2.3 Comparison of Voting Methods </SectionTitle> <Paragraph position="0"> In this section, we compared the performance of the voting methods of four basic systems, which were used in Section 6.2.2. Table 5 shows the results of the voting systems, where V1 refers to Basic Voting, V2 refers to Sent-based Voting, and V3 refers to Phrase-based Voting. We found that Basic Voting provided slightly worse results than SVMs. However, by applying the Sent-based Voting method, we achieved higher accuracy than any single system. Furthermore, we were able to achieve more higher accuracy by applying Phrase-based Voting. Phrase-based Voting provided 0.22% and 0.94% higher accuracy than respective SVMs, CRFs approaches, the best two single systems.</Paragraph> <Paragraph position="1"> The results suggested that the Phrase-based Voting method is quite suitable for chunking task. The Phrase-based Voting method considers one chunk as a voting unit instead of one word or one sentence. null</Paragraph> </Section> <Section position="5" start_page="102" end_page="102" type="sub_section"> <SectionTitle> 6.2.4 Tag-Extension </SectionTitle> <Paragraph position="0"> NP is the most important phrase in Chinese chunking and about 47% phrases in the CTB4 Corpus are NPs. In this experiment, we presented the results of Tag-Extension in NP Recognition.</Paragraph> <Paragraph position="1"> Table 6 shows the experimental results of Tag-Extension, where &quot;NPR&quot; refers to chunking without any extension, &quot;SPE&quot; refers to chunking with Special Terms Tag-Extension, &quot;COO&quot; refers to chunking with Coordination Tag-Extension, &quot;LOC&quot; refers to chunking with LOCATION Tag-Extension, &quot;NPR*&quot; refers to voting of eight systems(four of SPE and four of COO), and &quot;V3&quot; refers to Phrase-based Voting method.</Paragraph> <Paragraph position="2"> For NP Recognition, SVMs also yielded the best results. But it was surprised that TBL provided 0.17% higher accuracy than CRFs. By applying Phrase-based Voting, we achieved better results, 0.30% higher accuracy than SVMs.</Paragraph> <Paragraph position="3"> From the table, we can see that the Tag-Extension approach can provide better results. In COO, TBL got the most improvement with 0.16%.</Paragraph> <Paragraph position="4"> And in SPE, TBL and CRFs got the same improvement with 0.42%. We also found that Phrase-based Voting can improve the performance significantly. NPR* provided 0.51% higher than SVMs, the best single system.</Paragraph> <Paragraph position="5"> For LOC, the voting method helped to improve the performance, provided at least 0.33% higher accuracy than any single system. But we also found that CRFs and MBL provided better results while SVMs and TBL yielded worse results. The reason was that our NE tagging method was very simple. We believe NE tagging can be effective in Chinese chunking, if we use a highly accurate Named Entity Recognition system.</Paragraph> </Section> </Section> class="xml-element"></Paper>