File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2421_metho.xml

Size: 12,447 bytes

Last Modified: 2025-10-06 14:09:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2421">
  <Title>Semantic Role Labeling Via Generalized Inference Over Classifiers</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 SNoW Learning Architecture
</SectionTitle>
    <Paragraph position="0"> The learning algorithm used is a variation of the Winnow update rule incorporated in SNoW (Roth, 1998; Roth and Yih, 2002), a multi-class classifier that is specifically tailored for large scale learning tasks. SNoW learns a sparse network of linear functions, in which the targets (phrase border predictions or argument type predictions, in this case) are represented as linear functions over a common feature space. It incorporates several improvements over the basic Winnow update rule. In particular, a regularization term is added, which has the affect of trying to separate the data with a think separator (Grove and Roth, 2001; Hang et al., 2002). In the work presented here we use this regularization with a fixed parameter.</Paragraph>
    <Paragraph position="1"> Experimental evidence has shown that SNoW activations are monotonic with the confidence in the prediction Therefore, it can provide a good source of probability estimation. We use softmax (Bishop, 1995) over the raw activation values as conditional probabilities. Specifically, suppose the number of classes is n, and the raw activation values of class i is acti. The posterior estimation for class i is derived by the following equation.</Paragraph>
    <Paragraph position="3"/>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 First Phase: Find Argument Candidates
</SectionTitle>
    <Paragraph position="0"> The first phase is to predict the phrases of a given sentence that correspond to some argument (given the verb).</Paragraph>
    <Paragraph position="1"> Unfortunately, it turns out that it is difficult to predict the exact phrases accurately. Therefore, the goal of the first phase is to output a superset of the correct phrases by filtering out unlikely candidates.</Paragraph>
    <Paragraph position="2"> Specifically, we learn two classifiers, one to detect beginning phrase locations and a second to detect end phrase locations. Each multi-class classifier makes predictions over forty-three classes - thirty-two argument types, ten continuous argument types, one class to detect not begging and one class to detect not end. The following features are used:  * Word feature includes the current word, two words before and two words after.</Paragraph>
    <Paragraph position="3"> * Part-of-speech tag (POS) feature includes the POS tags of the current word, two words before and after.</Paragraph>
    <Paragraph position="4"> * Chunk feature includes the BIO tags for chunks of the current word, two words before and after.</Paragraph>
    <Paragraph position="5"> * Predicate lemma &amp; POS tag show the lemma form and POS tag of the active predicate.</Paragraph>
    <Paragraph position="6"> * Voice feature indicates the voice (active/passive) of the current predicate. This is extracted with a simple rule: a verb is identified as passive if it follows a tobe verb in the same phrase chuck and its POS tag is VBN(past participle) or it immediately follows a noun phrase.</Paragraph>
    <Paragraph position="7"> * Position feature describes if the current word is before of after the predicate.</Paragraph>
    <Paragraph position="8"> * Chunk pattern feature encodes the sequence of chunks from the current words to the predicate.</Paragraph>
    <Paragraph position="9"> * Clause tag indicates the boundary of clauses.</Paragraph>
    <Paragraph position="10"> * Clause path feature is a path formed from a semiparsed tree containing only clauses and chunks.</Paragraph>
    <Paragraph position="11"> Each clause is named with the chunk immediately preceding it. The clause path is the path from predicate to target word in the semi-parsed tree.</Paragraph>
    <Paragraph position="12"> * Clause position feature is the position of the target word relative to the predicate in the semi-parsed  tree containing only clauses. Specifically, there are four configurations--target word and predicate share same parent, parent of target word is ancestor of predicate, parent of predicate is ancestor of target word, or otherwise.</Paragraph>
    <Paragraph position="13"> Because each phrase consists of a single beginning and a single ending, these classifiers can be used to construct a set of potential phrases (by combining each predicted begin with each predicted end after it of the same type). Although the outputs of this phase are potential argument candidates, along with their types, the second phase re-scores the arguments using all possible types. After eliminating the types from consideration, the first phase achieves 98.96% and 88.65% recall (overall, without verb) on the training and the development set, respectively. Because these are the only candidates that are passed to the second phase, 88.65% is an upper bound of the recall for our overall system.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Second Phase: Phrase Classification
</SectionTitle>
    <Paragraph position="0"> The second phase of our system assigns the final argument classes to (a subset) of the phrases supplied from the first phase. This task is accomplished in two steps. First, a multi-class classifier is used to supply confidence scores corresponding to how likely individual phrases are to have specific argument types. Then we look for the most likely solution over the whole sentence, given the matrix of confidences and linguistic information that serves as a set of global constraints over the solution space.</Paragraph>
    <Paragraph position="1"> Again, the SNoW learning architecture is used to train a multi-class classifier to label each phrase to one of the argument types, plus a special class - no argument.</Paragraph>
    <Paragraph position="2"> Training examples are created from the phrase candidates supplied from the first phase using the following features:  * Predicate lemma &amp; POS tag, voice, position, clause Path, clause position, chunk pattern Same features as the first phase.</Paragraph>
    <Paragraph position="3"> * Word &amp; POS tag from the phrase, including the first/last word and tag, and the head word1.</Paragraph>
    <Paragraph position="4"> * Named entity feature tells if the target phrase is, embeds, overlaps, or is embedded in a named entity.</Paragraph>
    <Paragraph position="5"> * Chunk features are the same as named entity (but with chunks, e.g. noun phrases).</Paragraph>
    <Paragraph position="6"> * Length of the target phrase, in the numbers of words and chunks.</Paragraph>
    <Paragraph position="7"> * Verb class feature is the class of the active predicate described in the frame files.</Paragraph>
    <Paragraph position="8"> * Phrase type uses simple heuristics to identify the target phrase like VP, PP, or NP.</Paragraph>
    <Paragraph position="9"> * Sub-categorization describes the phrase structure around the predicate. We separate the clause where the predicate is in into three part - the predicate chunk, segments before and after the predicate. The sequence of the phrase types of these three segments is our feature.</Paragraph>
    <Paragraph position="10"> * Baseline follows the rule of identifying AM-NEG and AM-MOD and uses them as features.</Paragraph>
    <Paragraph position="11"> * Clause coverage describes how much of local clause (from the predicate) is covered by the target phrase.</Paragraph>
    <Paragraph position="12"> * Chunk pattern length feature counts the number of patterns in the phrase.</Paragraph>
    <Paragraph position="13"> * Conjunctions join every pair of the above features as new features.</Paragraph>
    <Paragraph position="14"> * Boundary words &amp; POS tags include one or two words/tags before and after the target phrase.</Paragraph>
    <Paragraph position="15"> 1We use simple rules to first decide if a candidate phrase type is VP, NP, or PP. The headword of an NP phrase is the  right-most noun. Similarly, the left-most verb/proposition of a VP/PP phrase is extracted as the headword * Bigrams are pairs of words/tags in the window from two words before the target to the first word of the target, and also from the last word to two words after the phrase.</Paragraph>
    <Paragraph position="16"> * Sparse colocation picks one word/tag from the two words before the phrase, the first word/tag, the last word/tag of the phrase, and one word/tag from the two words after the phrase to join as features.</Paragraph>
    <Paragraph position="17"> Alternately, we could have derived a scoring function from the first phase confidences of the open and closed predictors for each argument type. This method has proved useful in the literature for shallow parsing (Punyakanok and Roth, 2001). However, it is hoped that additional global features of the phrase would be necessary due to the variety and complexity of the argument types. See Table 1 for a comparison.</Paragraph>
    <Paragraph position="18"> Formally (but very briefly), the phrase classifier is attempting to assign labels to a set of phrases, S1:M, indexed from 1 to M. Each phrase Si can take any label from a set of phrase labels, P, and the indexed set of phrases can take a set of labels, s1:M [?] PM. If we assume that the classifier returns a score, score(Si = si), corresponding to the likelihood of seeing label si for phrase Si, then, given a sentence, the unaltered inference task that is solved by our system maximizes the score of</Paragraph>
    <Paragraph position="20"> The second step for phrase identification is eliminating labelings using global constraints derived from linguistic information and structural considerations. Specifically, we limit the solution space through the used of a filter function, F, that eliminates many phrase labelings from consideration. It is interesting to contrast this with previous work that filters individual phrases (see (Carreras and M`arquez, 2003)). Here, we are concerned with global constraints as well as constraints on the phrases. Therefore, the final labeling becomes</Paragraph>
    <Paragraph position="22"> The filter function used considers the following constraints: null  1. Arguments cannot cover the predicate except those that contain only the verb or the verb and the following word.</Paragraph>
    <Paragraph position="23"> 2. Arguments cannot overlap with the clauses (they can be embedded in one another).</Paragraph>
    <Paragraph position="24"> 3. If a predicate is outside a clause, its arguments cannot be embedded in that clause.</Paragraph>
    <Paragraph position="25"> 4. No overlapping or embedding phrases.</Paragraph>
    <Paragraph position="26"> 5. No duplicate argument classes for A0-A5,V. 6. Exactly one V argument per sentence.</Paragraph>
    <Paragraph position="27"> 7. If there is C-V, then there has to be a V-A1-CV pattern. null 8. If there is a R-XXX argument, then there has to be a XXX argument.</Paragraph>
    <Paragraph position="28"> 9. If there is a C-XXX argument, then there has to be  a XXX argument; in addition, the C-XXX argument must occur after XXX.</Paragraph>
    <Paragraph position="29"> 10. Given the predicate, some argument classes are illegal (e.g. predicate 'stalk' can take only A0 or A1). Constraint 1 is valid because all the arguments of a predicate must lie outside the predicate. The exception is for the boundary of the predicate itself. Constraint 1 through constraint 3 are actually constraints that can be evaluated on a per-phrase basis and thus can be applied to the individual phrases at any time. For efficiency sake, we eliminate these even before the second phase scoring is begun. Constraints 5, 8, and 9 are valid for only a subset of the arguments.</Paragraph>
    <Paragraph position="30"> These constraints are easy to transform into linear constraints (for example, for each class c, constraint 5 becomessummationtextMi=1[Si = c] [?] 1) 2. Then the optimum solution of the cost function given in Equation 2 can be found by integer linear programming3. A similar method was used for entity/relation recognition (Roth and Yih, 2004). Almost all previous work on shallow parsing and phrase classification has used Constraint 4 to ensure that there are no overlapping phrases. By considering additional constraints, we show improved performance (see</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML