File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0631_metho.xml

Size: 14,953 bytes

Last Modified: 2025-10-06 14:09:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0631">
  <Title>Semantic Role Labeling using libSVM</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
CoNLL-2005 Shared Tasks (Carreras and
</SectionTitle>
    <Paragraph position="0"> Marquez, 2005). The SRL system described here depends on a full syntactic parse from the Charniak parser, and investigates aspects of using Support Vector Machines (SVMs) as the machine learning technique for the SRL problem, using the libSVM package.</Paragraph>
    <Paragraph position="1"> In common with many other systems, this system uses the two-level strategy of first identifying which phrases can be arguments to predicates in general, and then labeling the arguments according to that predicate. The argument identification phase is a binary classifier that decides whether each constituent in the full syntax tree of the sentence is a potential argument. These potential arguments are passed into the argument labeling classifier, which uses binary classifiers for each label to decide if that label should be given to that argument. A post-processing phase picks the best labeling that satisfies the constraints of labeling the predicate arguments.</Paragraph>
    <Paragraph position="2"> For overall classification strategy and for suggestions of features, we are indebted to the work of Pradhan et al (2005) and to the work of many authors in both the CoNLL-2004 shared task and the similar semantic roles task of Senseval-3. We used the results of their experiments with features, and worked primarily on features for the identifying classifier and with the constraint satisfaction problem on the final argument output.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="207" type="metho">
    <SectionTitle>
2 System Description
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="205" type="sub_section">
      <SectionTitle>
2.1 Input Data
</SectionTitle>
      <Paragraph position="0"> In this system, we chose to use full syntax trees from the Charniak parser, as the constituents of those trees more accurately represented argument phrases in the training data at the time of the data release. Within each sentence, we first map the predicate to a constituent in the syntax tree. In the cases that the predicate is not represented by a constituent, we found that these were verb phrases of length two or more, where the first word was the main verb (carry out, gotten away, served up, etc.).</Paragraph>
      <Paragraph position="1"> In these cases, we used the first word constituent as the representation of the predicate, for purposes of computing other features that depended on a relative position in the syntax tree.</Paragraph>
      <Paragraph position="2">  We next identify every constituent in the tree as a potential argument, and label the training data accordingly. Although approximately 97% of the arguments in the training data directly matched constituents in the Charniak tree, only 91.3% of the arguments in the development set match constituents. Examination of the sentences with incorrect parses show that almost all of these are due to some form of incorrect attachment, e.g. prepositional attachment, of the parser. Heuristics can be derived to correct constituents with quotes, but this only affected a small fraction of a percent of the incorrect arguments. Experiments with corrections to the punctuation in the Collins parses were also unsuccessful in identifying additional constituents.</Paragraph>
      <Paragraph position="3"> Our recall results on the development directory are bounded by the 91.3% alignment figure.</Paragraph>
      <Paragraph position="4"> We also did not use the the partial syntax, named entities or the verb senses in the development data.</Paragraph>
    </Section>
    <Section position="2" start_page="205" end_page="205" type="sub_section">
      <SectionTitle>
2.2 Learning Components: SVM classifiers
</SectionTitle>
      <Paragraph position="0"> For our system, we chose to use libSVM, an open source SVM package (Chang and Lin, 2001).</Paragraph>
      <Paragraph position="1"> In the SRL problem, the features are nominal, and we followed the standard practice of representing a nominal feature with n discrete values as n binary features. Many of the features in the SRL problem can take on a large number of values, for example, the head word of a constituent may take on as many values as there are different words present in the training set, and these large number of features can cause substantial performance issues.</Paragraph>
      <Paragraph position="2"> The libSVM package has several kernel functions available, and we chose to use the radial basis functions (RBF). For the argument labeling problem, we used the binary classifiers in libSVM, with probability estimates of how well the label fits the distribution. These are normally combined using the &amp;quot;one-against-one&amp;quot; approach into a multi-class classifier. Instead, we combined the binary classifiers in our own post-processing phase to get a labeling satisfying the constraints of the problem.</Paragraph>
    </Section>
    <Section position="3" start_page="205" end_page="205" type="sub_section">
      <SectionTitle>
2.3 The Identifier Classifier Features
</SectionTitle>
      <Paragraph position="0"> One aspect of our work was to use fewer features for the identifier classifier than the basic feature set from (Gildea and Jurafsky, 2002). The intuition behind the reduction is that whether a constituent in the tree is an argument depends primarily on the structure and is independent of the lexical items of the predicate and headword. This reduced feature set is: Phrase Type: The phrase label of the argument.</Paragraph>
      <Paragraph position="1"> Position: Whether the phrase is before or after the predicate.</Paragraph>
      <Paragraph position="2"> Voice: Whether the predicate is in active or passive voice. Passive voice is recognized if a past participle verb is preceded by a form of the verb &amp;quot;be&amp;quot; within 3 words.</Paragraph>
      <Paragraph position="3"> Sub-categorization: The phrase labels of the children of the predicate's parent in the syntax tree. Short Path: The path from the parent of the argument position in the syntax tree to the parent of the predicate.</Paragraph>
      <Paragraph position="4"> The first four features are standard, and the short path feature is defined as a shorter version of the standard path feature that does not use the argument phrase type on one end of the path, nor the predicate type on the other end.</Paragraph>
      <Paragraph position="5"> The use of this reduced set of features was confirmed experimentally by comparing the effect of this reduced feature set on the F-measure of the identifier classifier, compared to feature sets that also added the predicate, the head word and the path features, as normally defined.</Paragraph>
    </Section>
    <Section position="4" start_page="205" end_page="206" type="sub_section">
      <SectionTitle>
2.4 Using the Identifier Classifier for Train-
ing and Testing
</SectionTitle>
      <Paragraph position="0"> Theoretically, the input for training the identifier classifier is that, for each predicate, all constituents in the syntax tree are training instances, labeled true if it is any argument of that predicate, and false otherwise. However, this leads to too many negative (false) instances for the training. To correct this, we experimented with two filters for negative instances. The first filter is simply a random filter; we randomly select a percentage of arguments for each argument label. Experiments with the percentage showed that 30% yielded the best F-measure for the identifier classifier.</Paragraph>
      <Paragraph position="1"> The second filter is based on phrase labels from the syntax tree. The intent of this filter was to remove one word constituents of a phrase type that was never used. We selected only those phrase  labels whose frequency in the training was higher than a threshold. Experiments showed that the best threshold was 0.01, which resulted in approximately 86% negative training instances.</Paragraph>
      <Paragraph position="2"> However, in the final experimentation, comparison of these two filters showed that the random filter was best for F-measure results of the identifier classifier.</Paragraph>
      <Paragraph position="3"> The final set of experiments for the identifier classifier was to fine tune the RBF kernel training parameters, C and gamma. Although we followed the standard grid strategy of finding the best parameters, unlike the built-in grid program of libSVM with its accuracy measure, we judged the results based on the more standard F-measure of the classifier. The final values are that C = 2 and gamma = 0.125.</Paragraph>
      <Paragraph position="4"> The final result of the identifier classifier trained on the first 10 directories of the training set is: Precision: 78.27% Recall: 89.01% (F-measure: 83.47) Training on more directories did not substantially improve these precision and recall figures.</Paragraph>
    </Section>
    <Section position="5" start_page="206" end_page="206" type="sub_section">
      <SectionTitle>
2.5 Labeling Classifier Features
</SectionTitle>
      <Paragraph position="0"> The following is a list of the features used in the labeling classifiers.</Paragraph>
      <Paragraph position="1"> Predicate: The predicate lemma from the training file.</Paragraph>
      <Paragraph position="2"> Path: The syntactic path through the parse tree from the argument constituent to the predicate. Head Word: The head word of the argument constituent, calculated in the standard way, but also stemmed. Applying stemming reduces the number of unique values of this feature substantially, 62% in one directory of training data. Phrase Type, Position, Voice, and Subcategorization: as in the identifier classifier. In addition, we experimented with the following features, but did not find that they increased the labeling classifier scores.</Paragraph>
      <Paragraph position="3"> Head Word POS: the part of speech tag of the head word of the argument constituent.</Paragraph>
      <Paragraph position="4"> Temporal Cue Words: These words were compiled by hand from ArgM-TMP phrases in the training data.</Paragraph>
      <Paragraph position="5"> Governing Category: The phrase label of the parent of the argument.</Paragraph>
      <Paragraph position="6"> Grammatical Rule: The generalization of the subcategorization feature to show the phrase labels of the children of the node that is the lowest parent of all arguments of the predicate.</Paragraph>
      <Paragraph position="7"> In the case of the temporal cue words, we noticed that using our definition of this feature increased the number of false positives for the ARGM-TMP label; we guess that our temporal cue words included too many words that occured in other labels. Due to lack of time, we were not able to more fully pursue these features.</Paragraph>
    </Section>
    <Section position="6" start_page="206" end_page="206" type="sub_section">
      <SectionTitle>
2.6 Using the Labeling Classifier for Train-
ing and Testing
</SectionTitle>
      <Paragraph position="0"> Our strategy for using the labeling classifier is that in the testing, we pass only those arguments to the labeling classifier that have been marked as true by the identifier classifier. Therefore, for training the labeling classifier, instances were constituents that were given argument labels in the training set, i.e. there were no &amp;quot;null&amp;quot; training examples. null For the labeling classifier, we also found the best parameters for the RBF kernel of the classifier. For this, we used the grid program of libSVM that uses the multi-class classifier, using the accuracy measure to tune the parameters, since this combines the precision of the binary classifiers for each label. The final values are that C = 0.5 and gamma = 0.5.</Paragraph>
      <Paragraph position="1"> In order to show the contribution of the labeling classifier to the entire system, a final test was done on the development set, but passing it the correct arguments. We tested this with a labeling classifier trained on 10 directories and one trained on 20 directories, showing the final F-measure:</Paragraph>
    </Section>
    <Section position="7" start_page="206" end_page="207" type="sub_section">
      <SectionTitle>
2.7 Post-processing the classifier labels
</SectionTitle>
      <Paragraph position="0"> The final part of our system was to use the results of the binary classifiers for each argument label to produce a final labeling subject to the labeling constraints. null For each predicate, the constraints are: two constituents cannot have the same argument label, a constituent cannot have more than one label, if two constituents have (different) labels, they cannot have any overlap, and finally, no argument can overlap the predicate.</Paragraph>
      <Paragraph position="1">  WSJ test (bottom).</Paragraph>
      <Paragraph position="2"> To achieve these constraints, we used the probabilities produced by libSVM for each of the binary argument label classifiers. We produced a constraint satisfaction module that uses a greedy algorithm that uses probabilities from the matrix of potential labeling for each constituent and label. The algorithm iteratively chooses a label for a node with the highest probability and removes any potential labeling that would violate constraints with that chosen label. It continues to choose labels for nodes until all probabilities in the matrix are lower than a threshold, determined by experiments to be .3. In the future, it is our intent to replace this greedy algorithm with a dynamic optimization algorithm. null</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="207" end_page="207" type="metho">
    <SectionTitle>
3 Experimental Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="207" end_page="207" type="sub_section">
      <SectionTitle>
3.1 Final System and Results
</SectionTitle>
      <Paragraph position="0"> The final system used an identifier classifier trained on (the first) 10 directories, in approximately 7 hours, and a labeling classifier trained on 20 directories, in approximately 23 hours. Testing took approximately 3.3 seconds per sentence.</Paragraph>
      <Paragraph position="1"> As a further test of the final system, we trained both the identifier classifier and the labeling classifier on the first 10 directories and used the second 10 directories as development tests. Here are some of the results, showing the alignment and F-measure on each directory, compared to 24.</Paragraph>
      <Paragraph position="2">  Finally, we note that we did not correctly anticipate the final notation for the predicates in the test set for two word verbs. Our system assumed that two word verbs would be given a start and an end, whereas the test set gives just the one word predicate. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML