File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2010_metho.xml
Size: 10,665 bytes
Last Modified: 2025-10-06 14:10:25
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2010"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Hybrid Convolution Tree Kernel for Semantic Role Labeling</Title> <Section position="5" start_page="73" end_page="74" type="metho"> <SectionTitle> 3 Feature-based methods for SRL </SectionTitle> <Paragraph position="0"> Usually feature-based methods refer to the methods which use the flat features to represent instances. At present, most of the successful SRL systems use this method. Their features are usually extended from Gildea and Jurafsky (2002)'s work, which uses flat information derived from a parse tree. According to the literature, we select the Constituent, Predicate, and Predicate-Constituent related features shown in Table 1. However, to find relevant features is, as usual, a complex task. In addition, according to the description of the standard features, we can see that the syntactic features, such as Path, Path Length, bulk large among all features. On the other hand, the previous researches (Gildea and Palmer, 2002; Punyakanok et al., 2005) have also recognized the necessity of syntactic parsing for semantic role labeling. However, the standard flat features cannot model the syntactic information well. A predicate-argument pair has two different Path features even if their paths differ only for a node in the parse tree. This data sparseness problem prevents the learning algorithms from generalizing unseen data well. In order to address this problem, one method is to list all sub-structures of the parse tree. However, both space complexity and time complexity are too high for the algorithm to be realized.</Paragraph> </Section> <Section position="6" start_page="74" end_page="76" type="metho"> <SectionTitle> 4 Hybrid Convolution Tree Kernels for SRL </SectionTitle> <Paragraph position="0"> In this section, we introduce the previous kernel method for SRL in Subsection 4.1, discuss our method in Subsection 4.2 and compare our method with previous work in Subsection 4.3.</Paragraph> <Section position="1" start_page="74" end_page="74" type="sub_section"> <SectionTitle> 4.1 Convolution Tree Kernels for SRL </SectionTitle> <Paragraph position="0"> Moschitti (2004) proposed to apply convolution tree kernels (Collins and Duffy, 2001) to SRL.</Paragraph> <Paragraph position="1"> He selected portions of syntactic parse trees, which include salient sub-structures of predicatearguments, to define convolution kernels for the task of predicate argument classification. This portions selection method of syntactic parse trees is named as predicate-arguments feature (PAF) kernel. Figure 2 illustrates the PAF kernel feature space of the predicate buy and the argument Arg1 in the circled sub-structure.</Paragraph> <Paragraph position="2"> The kind of convolution tree kernel is similar to Collins and Duffy (2001)'s tree kernel except the sub-structure selection strategy. Moschitti (2004) only selected the relative portion between a predicate and an argument.</Paragraph> <Paragraph position="3"> Given a tree portion instance defined above, we design a convolution tree kernel in a way similar to the parse tree kernel (Collins and Duffy, 2001).</Paragraph> <Paragraph position="4"> Firstly, a parse tree T can be represented by a vector of integer counts of each sub-tree type (regardless of its ancestors):</Paragraph> <Paragraph position="6"> This results in a very high dimension since the number of different subtrees is exponential to the tree's size. Thus it is computationally infeasible to use the feature vector Ph(T) directly. To solve this problem, we introduce the tree kernel function which is able to calculate the dot product between the above high-dimension vectors efficiently. The kernel function is defined as following:</Paragraph> <Paragraph position="8"> where N1 and N2 are the sets of all nodes in trees T1 and T2, respectively, and Ii(n) is the indicator function whose value is 1 if and only if there is a sub-tree of type i rooted at node n and 0 otherwise. Collins and Duffy (2001) show that K(T1,T2) is an instance of convolution kernels over tree structures, which can be computed in O(|N1 |x |N2|) by the following recursive definitions (Let [?](n1,n2) = summationtexti Ii(n1)[?]Ii(n2)): (1) if the children of n1 and n2 are different then [?](n1,n2) = 0; (2) else if their children are the same and they are leaves, then [?](n1,n2) = u; (3) else [?](n1,n2) = uproducttextnc(n1)j=1 (1 + [?](ch(n1,j),ch(n2,j))) where nc(n1) is the number of the children of n1, ch(n,j) is the jth child of node n and u(0 < u < 1) is the decay factor in order to make the kernel value less variable with respect to the tree sizes.</Paragraph> </Section> <Section position="2" start_page="74" end_page="76" type="sub_section"> <SectionTitle> 4.2 Hybrid Convolution Tree Kernels </SectionTitle> <Paragraph position="0"> In the PAF kernel, the feature spaces are considered as an integral portion which includes a predicate and one of its arguments. We note that the PAF feature consists of two kinds of features: one is the so-called parse tree Path feature and another one is the so-called Constituent Structure feature.</Paragraph> <Paragraph position="1"> These two kinds of feature spaces represent different information. The Path feature describes the linking information between a predicate and its arguments while the Constituent Structure feature captures the syntactic structure information of an argument. We believe that it is more reasonable to capture the two different kinds of features separately since they contribute to SRL in different feature spaces and it is better to give different weights to fuse them. Therefore, we propose two convolution kernels to capture the two features, respectively and combine them into one hybrid convolution kernel for SRL. Figure 3 is an example to illustrate the two feature spaces, where the Path feature space is circled by solid curves and the Constituent Structure feature spaces is circled by dotted curves. We name them Path kernel and Constituent Structure kernel respectively.</Paragraph> <Paragraph position="2"> Figure 4 illustrates an example of the distinction between the PAF kernel and our kernel. In the PAF kernel, the tree structures are equal when considering constitutes NP and PRP, as shown in Figure 4(a). However, the two constituents play different roles in the sentence and should not be looked as equal. Figure 4(b) shows the computing example with our kernel. During computing the hybrid convolution tree kernel, the NP-PRP substructure is not computed. Therefore, the two trees are distinguished correctly.</Paragraph> <Paragraph position="3"> On the other hand, the constituent structure feature space reserves the most part in the traditional PAF feature space usually. Then the Constituent Structure kernel plays the main role in PAF kernel computation, as shown in Figure 5. Here, believes is a predicate and A1 is a long sub-sentence. According to our experimental results in Section 5.2, we can see that the Constituent Structure kernel does not perform well. Affected by this, the PAF kernel cannot perform well, either. However, in our hybrid method, we can adjust the compromise of the Path feature and the Constituent Structure feature by tuning their weights to get an optimal result.</Paragraph> <Paragraph position="4"> Having defined two convolution tree kernels, the Path kernel Kpath and the Constituent Structure kernel Kcs, we can define a new kernel to compose and extend the individual kernels. According to Joachims et al. (2001), the kernel function set is closed under linear combination. It means that the following Khybrid is a valid kernel if Kpath and Kcs are both valid.</Paragraph> <Paragraph position="5"> Khybrid = lKpath +(1[?]l)Kcs (1) where 0 [?] l [?] 1.</Paragraph> <Paragraph position="6"> According to the definitions of the Path and the Constituent Structure kernels, each kernel is explicit. They can be viewed as a matching of fea- null tures. Since the features are enumerable on the given data, the kernels are all valid. Therefore, the new kernel Khybrid is valid. We name the new kernel hybrid convolution tree kernel, Khybrid.</Paragraph> <Paragraph position="7"> Since the size of a parse tree is not constant, we normalize K(T1,T2) by dividing it byradicalbig</Paragraph> <Paragraph position="9"/> </Section> <Section position="3" start_page="76" end_page="76" type="sub_section"> <SectionTitle> 4.3 Comparison with Previous Work </SectionTitle> <Paragraph position="0"> It would be interesting to investigate the differences between our method and the feature-based methods. The basic difference between them lies in the instance representation (parse tree vs. feature vector) and the similarity calculation mechanism (kernel function vs. dot-product). The main difference between them is that they belong to different feature spaces. In the kernel methods, we implicitly represent a parse tree by a vector of integer counts of each sub-tree type. That is to say, we consider all the sub-tree types and their occurring frequencies. In this way, on the one hand, the predicate-argument related features, such as Path, Position, in the flat feature set are embedded in the Path feature space. Additionally, the Predicate, Predicate POS features are embedded in the Path feature space, too. The constituent related features, such as Phrase Type, Head Word, Last Word, and POS, are embedded in the Constituent Structure feature space. On the other hand, the other features in the flat feature set, such as Named Entity, Previous, and Next Word, Voice, SubCat, Suffix, are not contained in our hybrid convolution tree kernel. From the syntactic viewpoint, the tree representation in our feature space is more robust than the Parse Tree Path feature in the flat feature set since the Path feature is sensitive to small changes of the parse trees and it also does not maintain the hierarchical information of a parse tree.</Paragraph> <Paragraph position="1"> It is also worth comparing our method with the previous kernels. Our method is similar to the Moschitti (2004)'s predicate-argument feature (PAF) kernel. However, we differentiate the Path feature and the Constituent Structure feature in our hybrid kernel in order to more effectively capture the syntactic structure information for SRL. In addition Moschitti (2004) only study the task of argument classification while in our experiment, we report the experimental results on both identification and classification.</Paragraph> </Section> </Section> class="xml-element"></Paper>