File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1617_metho.xml
Size: 19,209 bytes
Last Modified: 2025-10-06 14:10:47
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1617"> <Title>Semantic Role Labeling of NomBank: A Maximum Entropy Approach</Title> <Section position="4" start_page="138" end_page="138" type="metho"> <SectionTitle> 2 Overview of NomBank </SectionTitle> <Paragraph position="0"> The NomBank (Meyers et al., 2004c; Meyers et al., 2004b) annotation project originated from the NOMLEX (Macleod et al., 1997; Macleod et al., 1998) nominalization lexicon developed under the New York University Proteus Project. NOMLEX lists 1,000 nominalizations and the correspondences between their arguments and the arguments of their verb counterparts. NomBank frames combine various lexical resources (Meyers et al., 2004a), including an extended NOMLEX and PropBank frames, and form the basis for annotating the argument structures of common nouns.</Paragraph> <Paragraph position="1"> Similar to PropBank, NomBank annotation is made on the Penn TreeBank II (PTB II) corpus.</Paragraph> <Paragraph position="2"> For each common noun in PTB II that takes arguments, its core arguments are labeled with ARG0, ARG1, etc, and modifying arguments are labeled with ARGM-LOC to denote location, ARGM-MNR to denote manner, etc. Annotations are made on PTB II parse tree nodes, and argument boundaries align with the span of parse tree nodes.</Paragraph> <Paragraph position="3"> A sample sentence and its parse tree labeled in the style of NomBank is shown in Figure 1.</Paragraph> <Paragraph position="4"> For the nominal predicate &quot;replacement&quot;, &quot;Ben Bernanke&quot; is labeled as ARG0 and &quot;Greenspan 's&quot; is labeled as ARG1. There is also the special label &quot;Support&quot; on &quot;nominated&quot; which introduces &quot;Ben Bernanke&quot; as an argument of &quot;replacement&quot;. The support construct will be explained in detail in Section 4.2.3.</Paragraph> <Paragraph position="5"> We are not aware of any NomBank-based automatic SRL systems. The work in (Pradhan et al., 2004) experimented with an automatic SRL system developed using a relatively small set of manually selected nominalizations from FrameNet and Penn Chinese TreeBank. The SRL accuracy of their system is not directly comparable to ours.</Paragraph> </Section> <Section position="5" start_page="138" end_page="139" type="metho"> <SectionTitle> 3 Model training and testing </SectionTitle> <Paragraph position="0"> We treat the NomBank-based SRL task as a classification problem and divide it into two phases: argument identification and argument classification. During the argument identification phase, each parse tree node is marked as either argument or non-argument. Each node marked as argument is then labeled with a specific class during the argument classification phase. The identification model is a binary classifier , while the classification model is a multi-class classifier.</Paragraph> <Paragraph position="1"> Opennlp maxent1, an implementation of Maximum Entropy (ME) modeling, is used as the classification tool. Since its introduction to the Natural Language Processing (NLP) community (Berger et al., 1996), ME-based classifiers have been shown to be effective in various NLP tasks. ME modeling is based on the insight that the best model is consistent with the set of constraints imposed and otherwise as uniform as possible. ME models the probability of label l given input x as in Equation 1. fi(l,x) is a feature function that maps label l and input x to either 0 or 1, while the summation is over all n feature functions and with li as the weight parameter for each feature function fi(l,x). Zx is a normalization factor. In the identification model, label l corresponds to either &quot;argument&quot; or &quot;non-argument&quot;, and in the classification model, label l corresponds to one of the specific NomBank argument classes. The classification output is the label l with the highest conditional probability p(l|x).</Paragraph> <Paragraph position="3"> To train the ME-based identification model, training data is gathered by treating each parse tree node that is an argument as a positive example and the rest as negative examples. Classification training data is generated from argument nodes only.</Paragraph> <Paragraph position="4"> During testing, the algorithm of enforcing non-overlapping arguments by (Toutanova et al., 2005) is used. The algorithm maximizes the log-probability of the entire NomBank labeled parse tree. Specifically, assuming we only have two classes &quot;ARG&quot; and &quot;NONE&quot;, the log-probability of a NomBank labeled parse tree is defined by Equation 2.</Paragraph> <Paragraph position="6"> (2) Max(T) is the maximum log-probability of a tree T, NONE(T) and ARG(T) are respectively the log-probability of assigning label &quot;NONE&quot; and &quot;ARG&quot; by our argument identification model to tree node T, child ranges through each of T's children, and NONETree(child) is the log-probability of each node that is dominated by node child being labeled as &quot;NONE&quot;. Details are presented in Algorithm 1.</Paragraph> <Paragraph position="7"> Algorithm 1 Maximizing the probability of an SRL tree Input p{syntactic parse tree} Input m{argument identification model, assigns each constituent in the parse tree log likelihood of being a semantic argument} Output score{maximum log likelihood of the parse tree p with arguments identified using model m} MLParse(p, m) if parse p is a leaf node then</Paragraph> <Paragraph position="9"> if parse p is a leaf node then return NONEscore else for each node ci in Children(p) do</Paragraph> <Paragraph position="11"> Subroutine: Children(p) returns the list of children nodes of p. Score(p,m,state) returns the log likelihood assigned by model m, for parse p with state. state is either ARG or NONE.</Paragraph> <Paragraph position="12"> NomBank sections 02-21 are used as training data, section 24 and 23 are used as development and test data, respectively.</Paragraph> <Section position="1" start_page="139" end_page="139" type="sub_section"> <SectionTitle> 3.1 Training data preprocessing </SectionTitle> <Paragraph position="0"> Unlike PropBank annotation which does not contain overlapping arguments (in the form of parse tree nodes domination) and does not allow predicates to be dominated by arguments, NomBank annotation in the September 2005 release contains such cases. In NomBank sections 02-21, about 0.6% of the argument nodes dominate some other argument nodes or the predicate. To simplify our task, during training example generation, we ignore arguments that dominate the predicate. We also ignore arguments that are dominated by other arguments, so that when argument domination occurs, only the argument with the largest word span is kept. We do not perform similar pruning on the test data.</Paragraph> </Section> </Section> <Section position="6" start_page="139" end_page="142" type="metho"> <SectionTitle> 4 Features and feature selection </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="139" end_page="139" type="sub_section"> <SectionTitle> 4.1 Baseline NomBank SRL features </SectionTitle> <Paragraph position="0"> Table 1 lists the baseline features we adapted from previous PropBank-based SRL systems (Pradhan et al., 2005; Xue and Palmer, 2004). For ease of description, related features are grouped, with a specific individual feature given individual reference name. For example, feature b11FW in the group b11 denotes the first word spanned by the constituent and b13LH denotes the left sister's head word. We also experimented with various feature combinations, inspired by the features used in (Xue and Palmer, 2004). These are listed as features b31 to b34 in Table 1.</Paragraph> <Paragraph position="1"> Suppose the current constituent under identification or classification is &quot;NP-Ben Bernanke&quot; in features that fail to instantiate.</Paragraph> </Section> <Section position="2" start_page="139" end_page="141" type="sub_section"> <SectionTitle> 4.2 NomBank-specific features </SectionTitle> <Paragraph position="0"> 4.2.1 NomBank predicate morphology and class The &quot;NomBank-morph&quot; dictionary provided by the current NomBank release maps the base form of a noun to various morphological forms. Besides singular-plural noun form mapping, it also maps base nouns to hyphenated and compound nouns. For example, &quot;healthcare&quot; and &quot;medicalcare&quot; both map to &quot;care&quot;. For NomBank SRL fea- null tures, we use this set of more specific mappings to replace the morphological mappings based on WordNet. Specifically, we replace feature b1 in Table 1 with feature a1 in Table 3.</Paragraph> <Paragraph position="1"> The current NomBank release also contains the &quot;NOMLEX-PLUS&quot; dictionary, which contains the class of nominal predicates according to their origin and the roles they play. For example, &quot;employment&quot; originates from the verb &quot;employ&quot; and is classified as &quot;VERB-NOM&quot;, while the nouns &quot;employer&quot; and &quot;employee&quot; are classified as &quot;SUBJECT&quot; and &quot;OBJECT&quot; respectively. Other classes include &quot;ADJ-NOM&quot; for nominalization of adjectives and &quot;NOM-REL&quot; for relational nouns. The class of a nominal predicate is very indicative of the role of its arguments. We would expect a &quot;VERB-NOM&quot; predicate to take both ARG0 and ARG1, while an &quot;OBJECT&quot; predicate to take only ARG0. We incorporated the class of nominal predicates as additional features in our NomBank SRL system. We add feature a2 in Table 3 to use this information.</Paragraph> <Paragraph position="3"> Additional Features of Neighboring Arguments n1 for each argument already classified, b3-b4-b5-b6r, where r is the argument class, otherwise b3-b4-</Paragraph> <Paragraph position="5"> About 14% of the argument node instances in NomBank sections 02-21 are identical to their nominal predicate nodes. Most of these nominal predicates are DEFREL relational nouns (Meyers et al., 2004c). Examples of DEFREL relational nouns include &quot;employee&quot;, &quot;participant&quot;, and &quot;husband&quot;, where the nominal predicate itself takes part as an implied argument.</Paragraph> <Paragraph position="6"> We include in our classification features an indicator of whether the argument coincides with the nominal predicate. We also include a feature testing if the argument is one of the DEFREL nouns we extracted from NomBank training sections 0221. These two features correspond to a3 and a4 in Statistics show that almost 60% of the arguments of nominal predicates occur locally inside the noun phrase headed by the nominal predicate. For the cases where an argument appears outside the local noun phrase, over half of these arguments are introduced by support verbs.</Paragraph> <Paragraph position="7"> Consider our example &quot;Ben Bernanke was nominated as Greenspan's replacement.&quot;, the argument &quot;Ben Bernanke&quot; is introduced by the support verb &quot;nominate&quot;. The arguments introduced by support verbs can appear syntactically distant from the nominal predicate.</Paragraph> <Paragraph position="8"> To capture the location of arguments and the existence of support verbs, we add features indicating whether the argument is under the noun phrase headed by the predicate, whether the noun phrase headed by the predicate is dominated by a VP phrase or has neighboring VP phrases, and whether there is a verb between the argument and the predicate. These are represented as features a5, a6, and a7 in Table 3. Feature a7 was also proposed by the system in (Pradhan et al., 2004).</Paragraph> <Paragraph position="9"> We also experimented with various feature combinations, inspired by the features used in (Xue and Palmer, 2004). These are listed as features a11 to a16 in Table 3.</Paragraph> <Paragraph position="10"> The research of (Jiang et al., 2005; Toutanova et al., 2005) has shown the importance of capturing information of the global argument frame in order to correctly classify the local argument.</Paragraph> <Paragraph position="11"> We make use of the features {b3,b4,b5,b6} of the neighboring arguments as defined in Table 1.</Paragraph> <Paragraph position="12"> Arguments are classified from left to right in the textual order they appear. For arguments that are already labeled, we also add their argument class r. Specifically, for each argument to the left of the current argument, we have a feature b3-b4-b5-b6r. For each argument to the right of the current argument, the feature is defined as b3-b4-b5-b6.</Paragraph> <Paragraph position="13"> We extract features in a window of size 7, centered at the current argument. We also add a backoff version (b3-b6-r or b3-b6) of this specific feature. These additional features are shown as n1 and n2 in Table 3.</Paragraph> <Paragraph position="14"> Suppose the current constituent under identification or classification is &quot;NP-Ben Bernanke&quot;. The instantiations of the additional features in Table 3 are listed in Table 4.</Paragraph> <Paragraph position="15"> ing the current constituent is &quot;NP-Ben Bernanke&quot; in Figure 1.</Paragraph> </Section> <Section position="3" start_page="141" end_page="142" type="sub_section"> <SectionTitle> 4.3 Feature selection </SectionTitle> <Paragraph position="0"> Features used by our SRL system are automatically extracted from PTB II parse trees manually labeled in NomBank. Features from Table 1 and Table 3 are selected empirically and incrementally according to their contribution to test accuracy on the development section 24. The feature selection process stops when adding any of the remaining features fails to improve the SRL accuracy on development section 24. We start the selection process with the basic set of features {b1,b2,b3,b4,b5,b6}. The detailed feature selection algorithm is presented in Algorithm 2.</Paragraph> <Paragraph position="1"> Features for argument identification and argument classification are independently selected. To select the features for argument classification, we assume that all arguments have been correctly identified.</Paragraph> <Paragraph position="2"> After performing greedy feature selection, the baseline set of features selected for identification is {b1-b6, b11FW, b11LW, b12L, b13RH, b13RP, b14, b15H, b18, b20, b32-b34}, and the baseline</Paragraph> <Paragraph position="4"> if Fcandidate == ph or Emax [?] Eselect then return Fselect,Mselect end if end loop Subroutine: Evaluate(Model,Data) returns the accuracy score by evaluating Model on Data.</Paragraph> <Paragraph position="5"> Train(FeatureSet) returns maxent model trained on the given feature set.</Paragraph> <Paragraph position="6"> set of features selected for classification is {b1-b6, b11, b12, b13LH, b13LP, b13RP, b14, b15, b16, b17P, b20, b31-b34}. Note that features in {b19, b21} are not selected. For the additional features in Table 3, greedy feature selection chose {a1, a5, a6, a11, a12, a14} for the identification model and {a1, a3, a6, a11, a14, a16, n1, n2} for the classification model.</Paragraph> </Section> </Section> <Section position="7" start_page="142" end_page="143" type="metho"> <SectionTitle> 5 Experimental results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="142" end_page="142" type="sub_section"> <SectionTitle> 5.1 Scores on development section 24 </SectionTitle> <Paragraph position="0"> After applying the feature selection algorithm in Section 4.3, the SRL F1 scores on development section 24 are presented in Table 5. We separately present the F1 score for identification-only and classification-only model. We also apply the classification model on the output of the identification phase (which may contain erroneously identified arguments in general) to obtain the combined accuracy. During the identification-only and combined identification and classification testing, the tree log-probability maximization algorithm based on Equation 2 (and its extension to multi-classes) is used. During the classification-only testing, we 23, based on correct parse trees classify each correctly identified argument using the classification ME model. The &quot;baseline&quot; row lists the F1 scores when only the baseline features are used, and the &quot;additional&quot; row lists the F1 scores when additional features are added to the baseline features.</Paragraph> </Section> <Section position="2" start_page="142" end_page="142" type="sub_section"> <SectionTitle> 5.2 Testing on section 23 </SectionTitle> <Paragraph position="0"> The identification and classification models based on the chosen features in Section 4.3 are then applied to test section 23. The resulting F1 scores are listed in Table 6. Using additional features, the identification-only, classification-only, and combined F1 scores are 82.50, 87.80, and 72.73, respectively. null Performing chi-square test at the level of significance 0.05, we found that the improvement of the classification model using additional features compared to using just the baseline features is statistically significant, while the corresponding improvements due to additional features for the identification model and the combined model are not statistically significant.</Paragraph> <Paragraph position="1"> The improved classification accuracy due to the use of additional features does not contribute any significant improvement to the combined identification and classification SRL accuracy. This is due to the noisy arguments identified by the inadequate identification model, since the accurate determination of the additional features (such as those of neighboring arguments) depends critically on an accurate identification model.</Paragraph> </Section> <Section position="3" start_page="142" end_page="143" type="sub_section"> <SectionTitle> 5.3 Using automatic syntactic parse trees </SectionTitle> <Paragraph position="0"> So far we have assumed the availability of correct syntactic parse trees during model training and testing. We relax this assumption by using the re-ranking parser presented in (Charniak and Johnson, 2005) to automatically generate the syntactic parse trees for both training and test data. The F1 scores of our best NomBank SRL system, when applied to automatic syntactic parse trees, are 66.77 for development section 24 and 69.14 for test section 23. These F1 scores are for combined identification and classification, with the use of additional features. Comparing these scores with those in Table 5 and Table 6, the usage of automatic parse trees lowers the F1 accuracy by more than 3%. The decrease in accuracy is expected, due to the noise introduced by automatic syntactic parsing.</Paragraph> </Section> </Section> class="xml-element"></Paper>