File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1072_metho.xml

Size: 22,123 bytes

Last Modified: 2025-10-06 14:09:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1072">
  <Title>Semantic Role Labeling Using Different Syntactic Views[?]</Title>
  <Section position="4" start_page="581" end_page="581" type="metho">
    <SectionTitle>
PREDICATE LEMMA
</SectionTitle>
    <Paragraph position="0"> PATH: Path from the constituent to the predicate in the parse tree.</Paragraph>
    <Paragraph position="1"> POSITION: Whether the constituent is before or after the predicate.</Paragraph>
  </Section>
  <Section position="5" start_page="581" end_page="581" type="metho">
    <SectionTitle>
VOICE
PREDICATE SUB-CATEGORIZATION
PREDICATE CLUSTER
HEAD WORD: Head word of the constituent.
HEAD WORD POS: POS of the head word
NAMED ENTITIES IN CONSTITUENTS: 7 named entities as 7 binary features.
</SectionTitle>
    <Paragraph position="0"> PARTIAL PATH: Path from the constituent to the lowest common ancestor of the predicate and the constituent.</Paragraph>
    <Paragraph position="1"> VERB SENSE INFORMATION: Oracle verb sense information from PropBank HEAD WORD OF PP: Head of PP replaced by head word of NP inside it, and PP replaced by PP-preposition</Paragraph>
  </Section>
  <Section position="6" start_page="581" end_page="581" type="metho">
    <SectionTitle>
FIRST AND LAST WORD/POS IN CONSTITUENT
ORDINAL CONSTITUENT POSITION
CONSTITUENT TREE DISTANCE
CONSTITUENT RELATIVE FEATURES: Nine features representing
</SectionTitle>
    <Paragraph position="0"> the phrase type, head word and head word part of speech of the parent, and left and right siblings of the constituent.</Paragraph>
  </Section>
  <Section position="7" start_page="581" end_page="582" type="metho">
    <SectionTitle>
TEMPORAL CUE WORDS
DYNAMIC CLASS CONTEXT
SYNTACTIC FRAME
CONTENT WORD FEATURES: Content word, its POS and named entities
</SectionTitle>
    <Paragraph position="0"> in the content word  As described in (Pradhan et al., 2004), we post-process the n-best hypotheses using a trigram language model of the argument sequence. We analyze the performance on three tasks: Argument Identification - This is the process of identifying the parsed constituents in the sentence that represent semantic arguments of a given predicate.</Paragraph>
    <Paragraph position="1">  using hand-corrected parses and automatic parses on PropBank data.</Paragraph>
    <Paragraph position="2"> Table 2 shows the performance of the system using the hand corrected, TreeBank parses (HAND) and using parses produced by a Charniak parser (AUTOMATIC). Precision (P), Recall (R) and F1 scores are given for the identification and combined tasks, and Classification Accuracy (A) for the classification task.</Paragraph>
    <Paragraph position="3"> Classification performance using Charniak parses is about 3% absolute worse than when using Tree-Bank parses. On the other hand, argument identification performance using Charniak parses is about 12.7% absolute worse. Half of these errors - about 7% are due to missing constituents, and the other half - about 6% are due to mis-classifications. Motivated by this severe degradation in argument identification performance for automatic parses, we examined a number of techniques for improving argument identification. We made a number of changes to the system which resulted in improved performance. The changes fell into three categories: i) new features, ii) feature selection and calibration, and iii) combining parses from different syntactic representations.</Paragraph>
  </Section>
  <Section position="8" start_page="582" end_page="583" type="metho">
    <SectionTitle>
3 Additional Features
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="582" end_page="582" type="sub_section">
      <SectionTitle>
3.1 CCG Parse Features
</SectionTitle>
      <Paragraph position="0"> While the Path feature has been identified to be very important for the argument identification task, it is one of the most sparse features and may be difficult to train or generalize (Pradhan et al., 2004; Xue and Palmer, 2004). A dependency grammar should generate shorter paths from the predicate to dependent words in the sentence, and could be a more robust complement to the phrase structure grammar paths extracted from the Charniak parse tree. Gildea and Hockenmaier (2003) report that using features extracted from a Combinatory Categorial Grammar (CCG) representation improves semantic labeling performance on core arguments. We evaluated features from a CCG parser combined with our baseline feature set. We used three features that were introduced by Gildea and Hockenmaier (2003): Phrase type - This is the category of the maximal projection between the two words - the predicate and the dependent word.</Paragraph>
      <Paragraph position="1"> Categorial Path - This is a feature formed by concatenating the following three values: i) category to which the dependent word belongs, ii) the direction of dependence and iii) the slot in the category filled by the dependent word.</Paragraph>
      <Paragraph position="2"> Tree Path - This is the categorial analogue of the path feature in the Charniak parse based system, which traces the path from the dependent word to the predicate through the binary CCG tree.</Paragraph>
      <Paragraph position="3"> Parallel to the hand-corrected TreeBank parses, we also had access to correct CCG parses derived from the TreeBank (Hockenmaier and Steedman, 2002a). We performed two sets of experiments.</Paragraph>
      <Paragraph position="4"> One using the correct CCG parses, and the other using parses obtained using StatCCG4 parser (Hockenmaier and Steedman, 2002). We incorporated these features in the systems based on hand-corrected TreeBank parses and Charniak parses respectively.</Paragraph>
      <Paragraph position="5"> For each constituent in the Charniak parse tree, if there was a dependency between the head word of the constituent and the predicate, then the corresponding CCG features for those words were added to the features for that constituent. Table 3 shows the performance of the system when these features were added. The corresponding baseline performances are mentioned in parentheses.</Paragraph>
    </Section>
    <Section position="2" start_page="582" end_page="583" type="sub_section">
      <SectionTitle>
3.2 Other Features
</SectionTitle>
      <Paragraph position="0"> We added several other features to the system. Position of the clause node (S, SBAR) seems to be  CCG features to the Baseline system.</Paragraph>
      <Paragraph position="1"> an important feature in argument identification (Hacioglu et al., 2004) therefore we experimented with four clause-based path feature variations. We added the predicate context to capture predicate sense variations. For some adjunctive arguments, punctuation plays an important role, so we added some punctuation features. All the new features are shown in</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="583" end_page="583" type="metho">
    <SectionTitle>
CLAUSE-BASED PATH VARIATIONS:
</SectionTitle>
    <Paragraph position="0"> I. Replacing all the nodes in a path other than clause nodes with an &amp;quot;*&amp;quot;.</Paragraph>
    <Paragraph position="1"> For example, the path NP|S|VP|SBAR|NP|VP|VBD becomes NP|S|*S|*|*|VBD II. Retaining only the clause nodes in the path, which for the above example would produce NP|S|S|VBD, III. Adding a binary feature that indicates whether the constituent is in the same clause as the predicate, IV. collapsing the nodes between S nodes which gives NP|S|NP|VP|VBD. PATH N-GRAMS: This feature decomposes a path into a series of trigrams. For example, the path NP|S|VP|SBAR|NP|VP|VBD becomes: NP|S|VP, S|VP|SBAR, VP|SBAR|NP, SBAR|NP|VP, etc. We used the first ten trigrams as ten features. Shorter paths were padded with nulls.</Paragraph>
  </Section>
  <Section position="10" start_page="583" end_page="583" type="metho">
    <SectionTitle>
SINGLE CHARACTER PHRASE TAGS: Each phrase category is clustered
</SectionTitle>
    <Paragraph position="0"> to a category defined by the first character of the phrase label.</Paragraph>
    <Paragraph position="1"> PREDICATE CONTEXT: Two words and two word POS around the predicate and including the predicate were added as ten new features.</Paragraph>
    <Paragraph position="2"> PUNCTUATION: Punctuation before and after the constituent were added as two new features.</Paragraph>
    <Paragraph position="3"> FEATURE CONTEXT: Features for argument bearing constituents were added as features to the constituent being classified.</Paragraph>
  </Section>
  <Section position="11" start_page="583" end_page="584" type="metho">
    <SectionTitle>
4 Feature Selection and Calibration
</SectionTitle>
    <Paragraph position="0"> In the baseline system, we used the same set of features for all the n binary ONE VS ALL classifiers.</Paragraph>
    <Paragraph position="1"> Error analysis showed that some features specifically suited for one argument class, for example, core arguments, tend to hurt performance on some adjunctive arguments. Therefore, we thought that selecting subsets of features for each argument class might improve performance. To achieve this, we performed a simple feature selection procedure. For each argument, we started with the set of features introduced by (Gildea and Jurafsky, 2002). We pruned this set by training classifiers after leaving out one feature at a time and checking its performance on a development set. We used the kh2 significance while making pruning decisions. Following that, we added each of the other features one at a time to the pruned baseline set of features and selected ones that showed significantly improved performance. Since the feature selection experiments were computationally intensive, we performed them using 10k training examples.</Paragraph>
    <Paragraph position="2"> SVMs output distances not probabilities. These distances may not be comparable across classifiers, especially if different features are used to train each binary classifier. In the baseline system, we used the algorithm described by Platt (Platt, 2000) to convert the SVM scores into probabilities by fitting to a sigmoid. When all classifiers used the same set of features, fitting all scores to a single sigmoid was found to give the best performance. Since different feature sets are now used by the classifiers, we trained a separate sigmoid for each classifier.</Paragraph>
    <Paragraph position="3">  Foster and Stine (2004) show that the pooladjacent-violators (PAV) algorithm (Barlow et al., 1972) provides a better method for converting raw classifier scores to probabilities when Platt's algorithm fails. The probabilities resulting from either conversions may not be properly calibrated. So, we binned the probabilities and trained a warping function to calibrate them. For each argument classifier, we used both the methods for converting raw SVM scores into probabilities and calibrated them using a development set. Then, we visually inspected the calibrated plots for each classifier and chose the method that showed better calibration as the calibration procedure for that classifier. Plots of the predicted probabilities versus true probabilities for the ARGM-TMP VS ALL classifier, before and after calibration are shown in Figure 2. The performance improvement over a classifier that is trained using all the features for all the classes is shown in Table 5.</Paragraph>
    <Paragraph position="4"> Table 6 shows the performance of the system after adding the CCG features, additional features ex-</Paragraph>
  </Section>
  <Section position="12" start_page="584" end_page="585" type="metho">
    <SectionTitle>
5 Alternative Syntactic Views
</SectionTitle>
    <Paragraph position="0"> Adding new features can improve performance when the syntactic representation being used for classification contains the correct constituents. Additional features can't recover from the situation where the parse tree being used for classification doesn't contain the correct constituent representing an argument. Such parse errors account for about 7% absolute of the errors (or, about half of 12.7%) for the Charniak parse based system. To address these errors, we added two additional parse representations: i) Minipar dependency parser, and ii) chunking parser (Hacioglu et al., 2004). The hope is that these parsers will produce different errors than the Charniak parser since they represent different syntactic views. The Charniak parser is trained on the Penn TreeBank corpus. Minipar is a rule based dependency parser. The chunking parser is trained on PropBank and produces a flat syntactic representation that is very different from the full parse tree produced by Charniak. A combination of the three different parses could produce better results than any single one.</Paragraph>
    <Section position="1" start_page="584" end_page="585" type="sub_section">
      <SectionTitle>
5.1 Minipar-based Semantic Labeler
</SectionTitle>
      <Paragraph position="0"> Minipar (Lin, 1998; Lin and Pantel, 2001) is a rule-based dependency parser. It outputs dependencies between a word called head and another called modifier. Each word can modify at most one word. The dependency relationships form a dependency tree.</Paragraph>
      <Paragraph position="1"> The set of words under each node in Minipar's dependency tree form a contiguous segment in the original sentence and correspond to the constituent in a constituent tree. We formulate the semantic labeling problem in the same way as in a constituent structure parse, except we classify the nodes that represent head words of constituents. A similar formulation using dependency trees derived from Tree-Bank was reported in Hacioglu (Hacioglu, 2004).</Paragraph>
      <Paragraph position="2"> In that experiment, the dependency trees were derived from hand-corrected TreeBank trees using head word rules. Here, an SVM is trained to assign PropBank argument labels to nodes in Minipar dependency trees using the following features: Table 8 shows the performance of the Minipar-based semantic parser.</Paragraph>
      <Paragraph position="3"> Minipar performance on the PropBank corpus is substantially worse than the Charniak based system.</Paragraph>
      <Paragraph position="4"> This is understandable from the fact that Minipar is not designed to produce constituents that would exactly match the constituent segmentation used in TreeBank. In the test set, about 37% of the argu- null PREDICATE LEMMA HEAD WORD: The word representing the node in the dependency tree. HEAD WORD POS: Part of speech of the head word.</Paragraph>
      <Paragraph position="5"> POS PATH: This is the path from the predicate to the head word through the dependency tree connecting the part of speech of each node in the tree. DEPENDENCY PATH: Each word that is connected to the head  word has a particular dependency relationship to the word. These are represented as labels on the arc between the words. This feature is the dependencies along the path that connects two words.</Paragraph>
    </Section>
  </Section>
  <Section position="13" start_page="585" end_page="585" type="metho">
    <SectionTitle>
VOICE
POSITION
</SectionTitle>
    <Paragraph position="0"> ments do not have corresponding constituents that match its boundaries. In experiments reported by Hacioglu (Hacioglu, 2004), a mismatch of about 8% was introduced in the transformation from hand-corrected constituent trees to dependency trees. Using an errorful automatically generated tree, a still higher mismatch would be expected. In case of the CCG parses, as reported by Gildea and Hockenmaier (2003), the mismatch was about 23%. A more realistic way to score the performance is to score tags assigned to head words of constituents, rather than considering the exact boundaries of the constituents as reported by Gildea and Hockenmaier (2003). The results for this system are shown in Table 9.</Paragraph>
    <Section position="1" start_page="585" end_page="585" type="sub_section">
      <SectionTitle>
5.2 Chunk-based Semantic Labeler
</SectionTitle>
      <Paragraph position="0"> Hacioglu has previously described a chunk based semantic labeling method (Hacioglu et al., 2004). This system uses SVM classifiers to first chunk input text into flat chunks or base phrases, each labeled with a syntactic tag. A second SVM is trained to assign semantic labels to the chunks. The system is trained on the PropBank training data.</Paragraph>
    </Section>
  </Section>
  <Section position="14" start_page="585" end_page="585" type="metho">
    <SectionTitle>
WORDS
PREDICATE LEMMAS
PART OF SPEECH TAGS
</SectionTitle>
    <Paragraph position="0"> BP POSITIONS: The position of a token in a BP using the IOB2 representation (e.g. B-NP, I-NP, O, etc.) CLAUSE TAGS: The tags that mark token positions in a sentence with respect to clauses.</Paragraph>
    <Paragraph position="1"> NAMED ENTITIES: The IOB tags of named entities.</Paragraph>
    <Paragraph position="2"> TOKEN POSITION: The position of the phrase with respect to the predicate. It has three values as &amp;quot;before&amp;quot;, &amp;quot;after&amp;quot; and &amp;quot;-&amp;quot; (for the predicate) PATH: It defines a flat path between the token and the predicate CLAUSE BRACKET PATTERNS CLAUSE POSITION: A binary feature that identifies whether the token is inside or outside the clause containing the predicate HEADWORD SUFFIXES: suffixes of headwords of length 2, 3 and 4. DISTANCE: Distance of the token from the predicate as a number of base phrases, and the distance as the number of VP chunks. LENGTH: the number of words in a token.</Paragraph>
    <Paragraph position="3"> PREDICATE POS TAG: the part of speech category of the predicate PREDICATE FREQUENCY: Frequent or rare using a threshold of 3. PREDICATE BP CONTEXT: The chain of BPs centered at the predicate  within a window of size -2/+2.</Paragraph>
    <Paragraph position="4"> PREDICATE POS CONTEXT: POS tags of words immediately preceding and following the predicate.</Paragraph>
  </Section>
  <Section position="15" start_page="585" end_page="585" type="metho">
    <SectionTitle>
PREDICATE ARGUMENT FRAMES: Left and right core argument patterns
</SectionTitle>
    <Paragraph position="0"> around the predicate.</Paragraph>
  </Section>
  <Section position="16" start_page="585" end_page="586" type="metho">
    <SectionTitle>
NUMBER OF PREDICATES: This is the number of predicates in
</SectionTitle>
    <Paragraph position="0"> For each token (base phrase) to be tagged, a set of features is created from a fixed size context that surrounds each token. In addition to the above features, it also uses previous semantic tags that have already been assigned to the tokens contained in the linguistic context. A 5-token sliding window is used for the context.</Paragraph>
    <Paragraph position="2"> combined task of Id. and classification.</Paragraph>
    <Paragraph position="3"> SVMs were trained for begin (B) and inside (I) classes of all arguments and outside (O) class for a total of 78 one-vs-all classifiers. Again, TinySVM5 along with YamCha6 (Kudo and Matsumoto, 2000; Kudo and Matsumoto, 2001) are used as the SVM training and test software.</Paragraph>
    <Paragraph position="4"> Table 11 presents the system performances on the PropBank test set for the chunk-based system.</Paragraph>
  </Section>
  <Section position="17" start_page="586" end_page="587" type="metho">
    <SectionTitle>
6 Combining Semantic Labelers
</SectionTitle>
    <Paragraph position="0"> We combined the semantic parses as follows: i) scores for arguments were converted to calibrated probabilities, and arguments with scores below a threshold value were deleted. Separate thresholds were used for each parser. ii) For the remaining arguments, the more probable ones among overlapping ones were selected. In the chunked system, an argument could consist of a sequence of chunks.</Paragraph>
    <Paragraph position="1"> The probability assigned to the begin tag of an argument was used as the probability of the sequence of chunks forming an argument. Table 12 shows the performance improvement after the combination. Again, numbers in parentheses are respective baseline performances.</Paragraph>
    <Paragraph position="2">  mance on argument identification and argument identification and classification tasks after combining all three semantic parses.</Paragraph>
    <Paragraph position="3"> The main contribution of combining both the Minipar based and the Charniak-based parsers was significantly improved performance on ARG1 in addition to slight improvements to some other arguments. Table 13 shows the effect on selected arguments on sentences that were altered during the the combination of Charniak-based and Chunk-based parses.</Paragraph>
    <Paragraph position="4">  changed during pair-wise Charniak and Chunk combination. null A marked increase in number of propositions for which all the arguments were identified correctly from 0% to about 46% can be seen. Relatively few predicates, 107 out of 4500, were affected by this combination.</Paragraph>
    <Paragraph position="5"> To give an idea of what the potential improvements of the combinations could be, we performed an oracle experiment for a combined system that tags head words instead of exact constituents as we did in case of Minipar-based and Charniak-based semantic parser earlier. In case of chunks, first word in prepositional base phrases was selected as the head word, and for all other chunks, the last word was selected to be the head word. If the correct argument was found present in either the Charniak, Minipar or Chunk hypotheses then that was selected. The results for this are shown in Table 14. It can be seen that the head word based performance almost approaches the constituent based performance reported on the hand-corrected parses in Table 3 and there seems to be considerable scope for improvement.</Paragraph>
    <Paragraph position="6">  based scoring after oracle combination. Charniak (C), Minipar (M) and Chunker (CH).</Paragraph>
    <Paragraph position="7"> Table 15 shows the performance improvement in the actual system for pairwise combination of the parsers and one using all three.</Paragraph>
    <Paragraph position="8">  based scoring after combination. Charniak (C), Minipar (M) and Chunker (CH).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML