File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2010_evalu.xml

Size: 6,826 bytes

Last Modified: 2025-10-06 13:59:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2010">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Hybrid Convolution Tree Kernel for Semantic Role Labeling</Title>
  <Section position="7" start_page="76" end_page="78" type="evalu">
    <SectionTitle>
5 Experiments and Discussion
</SectionTitle>
    <Paragraph position="0"> The aim of our experiments is to verify the effectiveness of our hybrid convolution tree kernel and and its combination with the standard flat features.</Paragraph>
    <Section position="1" start_page="76" end_page="77" type="sub_section">
      <SectionTitle>
5.1 Experimental Setting
5.1.1 Corpus
</SectionTitle>
      <Paragraph position="0"> We use the benchmark corpus provided by CoNLL-2005 SRL shared task (Carreras and M`arquez, 2005) provided corpus as our training, development, and test sets. The data consist of sections of the Wall Street Journal (WSJ) part of the Penn TreeBank (Marcus et al., 1993), with information on predicate-argument structures extracted from the PropBank corpus (Palmer et al., 2005). We followed the standard partition used in syntactic parsing: sections 02-21 for training, section 24 for development, and section 23 for test. In addition, the test set of the shared task includes three sections of the Brown corpus. Table 2 provides counts of sentences, tokens, annotated propositions, and arguments in the four data sets.</Paragraph>
      <Paragraph position="1">  The preprocessing modules used in CONLL2005 include an SVM based POS tagger (Gim'enez and M`arquez, 2003), Charniak (2000)'s full syntactic parser, and Chieu and Ng (2003)'s Named Entity recognizer.</Paragraph>
      <Paragraph position="2">  The system is evaluated with respect to precision, recall, and Fb=1 of the predicted arguments. Precision (p) is the proportion of arguments predicted by a system which are correct. Recall (r) is the proportion of correct arguments which are predicted by a system. Fb=1 computes the harmonic mean of precision and recall, which is the final measure to evaluate the performances of systems. It is formulated as:</Paragraph>
      <Paragraph position="4"> program of the CoNLL-2005 SRL shared task to evaluate a system performance.</Paragraph>
      <Paragraph position="5">  We use constituents as the labeling units to form the labeled arguments. In order to speed up the learning process, we use a four-stage learning ar- null et al., 2005) is used to handle some unmatched arguments with constituents, such as AM-MOD, AM-NEG.</Paragraph>
      <Paragraph position="6">  We use the Voted Perceptron (Freund and Schapire, 1998) algorithm as the kernel machine. The performance of the Voted Perceptron is close to, but not as good as, the performance of SVM on the same problem, while saving computation time and programming effort significantly. SVM is too slow to finish our experiments for tuning parameters. null The Voted Perceptron is a binary classifier. In order to handle multi-classification problems, we adopt the one vs. others strategy and select the one with the largest margin as the final output. The training parameters are chosen using development data. After 5 iteration numbers, the best performance is achieved. In addition, Moschitti (2004)'s Tree Kernel Tool is used to compute the tree kernel function.</Paragraph>
    </Section>
    <Section position="2" start_page="77" end_page="78" type="sub_section">
      <SectionTitle>
5.2 Experimental Results
</SectionTitle>
      <Paragraph position="0"> In order to speed up the training process, in the following experiments, we ONLY use WSJ sections 02-05 as training data. The same as Moschitti (2004), we also set the u = 0.4 in the computation of convolution tree kernels.</Paragraph>
      <Paragraph position="1"> In order to study the impact of l in hybrid convolution tree kernel in Eq. 1, we only use the hybrid kernel between Kpath and Kcs. The performance curve on development set changing with l is shown in Figure 6.</Paragraph>
      <Paragraph position="2">  The performance curve shows that when l = 0.5, the hybrid convolution tree kernel gets the best performance. Either the Path kernel (l = 1, Fb=1 = 61.26) or the Constituent Structure kernel (l = 0, Fb=1 = 54.91) cannot perform better than the hybrid one. It suggests that the two individual kernels are complementary to each other. In addition, the Path kernel performs much better than the Constituent Structure kernel. It indicates that the predicate-constituent related features are more effective than the constituent features for SRL.</Paragraph>
      <Paragraph position="3"> Table 3 compares the performance comparison among our Hybrid convolution tree kernel, Moschitti (2004)'s PAF kernel, standard flat features with Linear kernels, and Poly kernel (d = 2). We can see that our hybrid convolution tree kernel out-performs the PAF kernel. It empirically demonstrates that the weight linear combination in our hybrid kernel is more effective than PAF kernel for SRL.</Paragraph>
      <Paragraph position="4"> However, our hybrid kernel still performs worse than the standard feature based system. This is simple because our kernel only use the syntactic structure information while the feature-based method use a large number of hand-craft diverse features, from word, POS, syntax and semantics, NER, etc. The standard features with polynomial kernel gets the best performance. The reason is that the arbitrary binary combination among features implicated by the polynomial kernel is useful to SRL. We believe that combining the two methods can perform better.</Paragraph>
      <Paragraph position="5"> In order to make full use of the syntactic information and the standard flat features, we present a composite kernel between hybrid kernel (Khybrid) and standard features with polynomial</Paragraph>
      <Paragraph position="7"> where 0 [?] g [?] 1.</Paragraph>
      <Paragraph position="8"> The performance curve changing with g in Eq. 2 on development set is shown in Figure 7.  We can see that when g = 0.5, the system achieves the best performance and Fb=1 = 70.78. It's statistically significant improvement (kh2 test with p = 0.1) than only using the standard features with the polynomial kernel (g = 0, Fb=1 = 70.25) and much higher than only using the hybrid convolution tree kernel (g = 1, Fb=1 = 66.01). The main reason is that the convolution tree kernel can represent more general syntactic features than standard flat features, and the standard flat features include the features that the convolution tree kernel cannot represent, such as Voice, Sub-Cat. The two kind features are complementary to each other.</Paragraph>
      <Paragraph position="9"> Finally, we train the composite method using the above setting (Eq. 2 with when g = 0.5) on the entire training set. The final performance is shown in Table 4.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML