File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1006_metho.xml
Size: 22,655 bytes
Last Modified: 2025-10-06 14:08:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1006"> <Title>Use of Deep Linguistic Features for the Recognition and Labeling of Semantic Arguments</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Overview </SectionTitle> <Paragraph position="0"> Gildea and Palmer (2002) show that semantic role labels can be predicted given syntactic features derived from the PTB with fairly high accuracy. Furthermore, they show that this method can be used in conjunction with a parser to produce parses annotated with semantic labels, and that the parser out-performs a chunker. The features they use in their experiments can be listed as follows.</Paragraph> <Paragraph position="1"> Head Word (HW.) The predicate's head word as well as the argument's head word is used.</Paragraph> <Paragraph position="2"> Phrase Type. This feature represents the type of phrase expressing the semantic role. In Figure 3 phrase type for the argument prices is NP.</Paragraph> <Paragraph position="3"> Path. This feature captures the surface syntactic relation between the argument's constituent and the predicate. See Figure 3 for an example.</Paragraph> <Paragraph position="4"> Position. This binary feature represents whether the argument occurs before or after the predicate in the sentence.</Paragraph> <Paragraph position="5"> Voice. This binary feature represents whether the predicate is syntactically realized in either passive or active voice.</Paragraph> <Paragraph position="6"> Notice that for the exception of voice, the features solely represent surface syntax aspects of the input parse tree. This should not be taken to mean that deep syntax features are not important. For example, in their inclusion of voice, Gildea and Palmer (2002) note that this deep syntax feature plays an important role in connecting semantic role with surface grammatical function.</Paragraph> <Paragraph position="7"> Aside from voice, we posit that other deep linguistic features may be useful to predict semantic role. In this work, we explore the use of more general, deeper syntax features. We also experiment with semantic features derived from the PropBank.</Paragraph> <Paragraph position="8"> tween the predicate falling and the argument prices, the path feature is VBG&quot;VP&quot;VP&quot;S#NP. Our methodology is as follows. The first stage entails generating features representing different levels of linguistic analysis. This is done by first automatically extracting several kinds of TAG from the PropBank. This may in itself generate useful features because TAG structures typically relate closely syntactic arguments with their corresponding predicate. Beyond this, our TAG extraction procedure produces a set of features that relate TAG structures on both the surface-syntax as well as the deep-syntax level. Finally, because a TAG is extracted from the PropBank, we have a set of semantic features derived indirectly from the PropBank through TAG.</Paragraph> <Paragraph position="9"> The second stage of our methodology entails using these features to predict semantic roles. We first experiment with prediction of semantic roles given gold-standard parses from the test corpus. We subsequently experiment with their prediction given raw text fed through a deterministic dependency parser.</Paragraph> <Paragraph position="10"> 4 Extraction of TAGs from the PropBank Our experiments depend upon automatically extracting TAGs from the PropBank. In doing so, we follow the work of others in extracting grammars of various kinds from the PTB, whether it be TAG (Xia, 1999; Chen and Vijay-Shanker, 2000; Chiang, 2000), combinatory categorial grammar (Hockenmaier and Steedman, 2002), or constraint dependency grammar (Wang and Harper, 2002). We will discuss TAGs and an important principle guiding their formation, the extraction procedure from the PTB that is described in (Chen, 2001) including extensions to extract a TAG from the PropBank, and finally the extraction of deeper linguistic features Prices are falling has been fragmented into three tree frames.</Paragraph> <Paragraph position="11"> from the resulting TAG.</Paragraph> <Paragraph position="12"> A TAG is defined to be a set of lexicalized elementary trees (Joshi and Schabes, 1991). They may be composed by several well-defined operations to form parse trees. A lexicalized elementary tree where the lexical item is removed is called a tree frame or a supertag. The lexical item in the tree is called an anchor. Although the TAG formalism allows wide latitude in how elementary trees may be defined, various linguistic principles generally guide their formation. An important principle is that dependencies, including long-distance dependencies, are typically localized the same elementary tree by appropriate grouping of syntactically or semantically related elements.</Paragraph> <Paragraph position="13"> The extraction procedure fragments a parse tree from the PTB that is provided as input into elementary trees. See Figure 4. These elementary trees can be composed by TAG operations to form the original parse tree. The extraction procedure determines the structure of each elementary tree by localizing dependencies through the use of heuristics. Salient heuristics include the use of a head percolation table (Magerman, 1995), and another table that distinguishes between complements and adjunct nodes in the tree. For our current work, we use the head percolation table to determine heads of phrases. Also, we treat a PropBank argument (ARG0 : : : ARG9) as a complement and a PropBank adjunct (ARGM's) as an adjunct when such annotation is available.1 Otherwise, we basically follow the approach of (Chen, mon predicates are.</Paragraph> <Paragraph position="14"> 2Specifically, CA1.</Paragraph> <Paragraph position="15"> procedure, (Chen, 2001) introduces the notion of grouping linguistically-related extracted tree frames together. In one approach, each tree frame is decomposed into a feature vector. Each element of this vector describes a single linguistically-motivated characteristic of the tree.</Paragraph> <Paragraph position="16"> The elements comprising a feature vector are listed in Table 1. Each elementary tree is decomposed into a feature vector in a relatively straightforward manner. For example, the POS feature is obtained from the preterminal node of the elementary tree. There are also features that specify the syntactic transformations that an elementary tree exhibits. Each such transformation is recognized by structural pattern matching the elementary tree against a pattern that identifies the transformation's existence. For more details, see (Chen, 2001).</Paragraph> <Paragraph position="17"> Given a set of elementary trees which compose a TAG, and also the feature vector corresponding to each tree, it is possible to annotate each node representing an argument in the tree with role information. These are syntactic roles including for example subject and direct object. Each argument node is labeled with two kinds of roles: a surface syntactic role and a deep syntactic role. The former is obtained through determining the position of the node with respect to the anchor of the tree using the usually positional rules for determining argument status in English. The latter is obtained from the former and also from knowledge of the syntactic transformations that have been applied to the tree. For example, we determine the deep syntactic role of a whmoved element by &quot;undoing&quot; the wh-movement by using the trace information in the PTB.</Paragraph> <Paragraph position="18"> The PropBank contains all of the notation of the Penn Treebank as well as semantic notation. For our current work, we extract two kinds of TAG from the PropBank. One grammar, SEM-TAG, has elementary trees annotated with the aforementioned syntactic information as well as semantic information. Semantic information includes semantic role as well as semantic subcategorization information. The other grammar, SYNT-TAG, differs from SEM-TAG only by the absence of any semantic role information.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Corpora </SectionTitle> <Paragraph position="0"> For our experiments, we use a version of the Prop-Bank where the most commonly appearing predicates have been annotated, not all. Our extracted TAGs are derived from Sections 02-21 of the PTB.</Paragraph> <Paragraph position="1"> Furthermore, training data for our experiments are always derived from these sections. Section 23 is used for test data.</Paragraph> <Paragraph position="2"> The entire set of semantic roles that are found in the PropBank are not used in our experiments.</Paragraph> <Paragraph position="3"> In particular, we only include as semantic roles those instances in the propbank such that in the extracted TAG they are localized in the same elementary tree. As a consequence, adjunct semantic roles (ARGM's) are basically absent from our test corpus. Furthermore, not all of the complement semantic roles are found in our test corpus. For example, cases of subject-control PRO are ignored because the surface subject is found in a different tree frame than the predicate. Still, a large majority of complement semantic roles are found in our test corpus (more than 87%).</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Semantic Roles from Gold-Standard Linguistic Information </SectionTitle> <Paragraph position="0"> This section is devoted towards evaluating different features obtained from a gold-standard corpus in the task of determining semantic role. We use the feature set mentioned in Section 3 as well as features derived from TAGs mentioned in Section 4. In this section, we detail the latter set of features. We then describe the results of using different feature sets.</Paragraph> <Paragraph position="1"> These experiments are performed using the C4.5 decision tree machine learning algorithm. The standard settings are used. Furthermore, results are always given using unpruned decision trees because we find that these are the ones that performed the best on a development set.</Paragraph> <Paragraph position="2"> These features are determined during the extraction of a TAG: Supertag Path. This is a path in a tree frame from its preterminal to a particular argument node in a tree frame. The supertag path of the subject of the right-most tree frame in Figure 4 is VBG&quot;VP&quot;S#NP. Supertag. This can be the tree frame corresponding to either the predicate or the argument.</Paragraph> <Paragraph position="3"> Srole. This is the surface-syntactic role of an argument. Example of values include 0 (subject) and 1 (direct object).</Paragraph> <Paragraph position="4"> Ssubcat. This is the surface-syntactic subcategorization frame. For example, the ssubcat corresponding to a transitive tree frame would be NP0 NP1. PPs as arguments are always annotated with the preposition. For example, the ssubcat for the passive version of hit would be NP1 NP2(by).</Paragraph> <Paragraph position="5"> Drole. This is the deep-syntactic role of an argument. Example of values include 0 (subject) and 1 (direct object).</Paragraph> <Paragraph position="6"> Dsubcat. This is the deep-syntactic subcategorization frame. For example, the dsubcat corresponding to a transitive tree frame would be NP0 NP1. Generally, PPs as arguments are annotated with the preposition. For example, the dsubcat for load is NP0 NP1 NP2(into). The exception is when the argument is not realized as a PP when the predicate is realized in a non-syntactically transformed way. For example, the dsubcat for the passive version of hit would be NP0 NP1.</Paragraph> <Paragraph position="7"> Semsubcat. This is the semantic subcategorization frame.</Paragraph> <Paragraph position="8"> We first experiment with the set of features described in Gildea and Palmer (2002): Pred HW, Arg HW, Phrase Type, Position, Path, Voice. Call this feature set GP0. The error rate, 10.0%, is lower than that reported by Gildea and Palmer (2002), 17.2%. This is presumably because our training and test data has been assembled in a different manner as mentioned in Section 5.</Paragraph> <Paragraph position="9"> Our next experiment is on the same set of features, with the exception that Path has been replaced with Supertag Path. (Feature set GP1). The error rate is reduced from 10.0% to 9.7%. This is statistically significant (t-test, p < 0:05), albeit a small improvement. One explanation for the improvement is that Path does not generalize as well as Supertag path does. For example, the path feature value VBG&quot;VP&quot;VP&quot;S#NP reflects surface sub-ject position in the sentence Prices are falling but so does VBG&quot;VP&quot;S#NP in the sentence Sellers regret prices falling. Because TAG localizes dependencies, the corresponding values for Supertag path in these sentences would be identical.</Paragraph> <Paragraph position="10"> We now experiment with our surface syntax features: Pred HW, Arg HW, Ssubcat, and Srole.</Paragraph> <Paragraph position="11"> (Feature set SURFACE.) Its performance on SEM-TAG is 8.2% whereas its performance on SYNT-TAG is 7.6%, a tangible improvement over previous models. One reason for the improvement could be that this model is assigning semantic labels with knowledge of the other roles the predicate assigns, unlike previous models.</Paragraph> <Paragraph position="12"> Our next experiment involves using deep syntax features: Pred HW, Arg HW, Dsubcat, and Drole.</Paragraph> <Paragraph position="13"> (Feature set DEEP.) Its performance on both SEM-TAG and SYNT-TAG is 6.5%, better than previous models. Its performance is better than SURFACE presumably because syntactic transformations are taken to account by deep syntax features. Note also that the transformations which are taken into account are a superset of the transformations taken into account by Gildea and Palmer (2002).</Paragraph> <Paragraph position="14"> This experiment considers use of semantic features: Pred HW, Arg HW, Semsubcat, and Drole.</Paragraph> <Paragraph position="15"> (Feature set SEMANTIC.) Of course, there are only results for SEM-TAG, which turns out to be 1.9%.</Paragraph> <Paragraph position="16"> This is the best performance yet.</Paragraph> <Paragraph position="17"> In our final experiment, we use supertag features: pertag, Drole. (Feature set SUPERTAG.) The error rates are 2.8% for SEM-TAG and 7.4% for SYNT-TAG. Considering SEM-TAG only, this model performs better than its corresponding DEEP model, probably because supertag for SEM-TAG include crucial semantic information. Considering SYNT-TAG only, this model performs worse than its corresponding DEEP model, presumably because of sparse data problems when modeling supertags.</Paragraph> <Paragraph position="18"> This sparse data problem is also apparent by comparing the model based on SEM-TAG with the corresponding SEM-TAG SEMANTICmodel.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 7 Semantic Roles from Raw Text </SectionTitle> <Paragraph position="0"> In this section, we are concerned with the problem of finding semantic arguments and labeling them with their correct semantic role given raw text as input. In order to perform this task, we parse this raw text using a combination of supertagging and LDA, which is a method that yields partial dependency parses annotated with TAG structures. We perform this task using both SEM-TAG and SYNT-TAG. For the former, after supertagging and LDA, the task is accomplished because the TAG structures are already annotated with semantic role information. For the latter, we use the best performing model from Section 6 in order to find semantic roles given syntactic features from the parse.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 7.1 Supertagging </SectionTitle> <Paragraph position="0"> Supertagging (Bangalore and Joshi (1999)) is the task of assigning a single supertag to each word given raw text as input. For example, given the sentence Prices are falling, a supertagger might return the supertagged sentence in Figure 4. Supertagging returns an almost-parse in the sense that it is performing much parsing disambiguation. The typical technique to perform supertagging is the trigram model, akin to models of the same name for part-of-speech tagging. This is the technique that we use here.</Paragraph> <Paragraph position="1"> Data sparseness is a significant issue when supertagging with extracted grammar (Chen and Vijay-Shanker (2000)). For this reason, we smooth the emit probabilities P(wjt) in the trigram model using distributional similarity following Chen (2001). In particular, we use Jaccard's coefficient as the similarity metric with a similarity threshold of 0.04 and a radius of 25 because these were found to attain optimal results in Chen (2001).</Paragraph> <Paragraph position="2"> Training data for supertagging is Sections 02-21 of the PropBank. A supertagging model based on SEM-TAG performs with 76.32% accuracy on Section 23. The corresponding model for SYNT-TAG performs with 80.34% accuracy. Accuracy is measured for all words in the sentence including punctuation. The SYNT-TAG model performs better than the SEM-TAG model, understandably, because SYNT-TAG is the simpler grammar.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 7.2 LDA </SectionTitle> <Paragraph position="0"> LDA is an acronym for Lightweight Dependency Analyzer (Srinivas (1997)). Given as input a supertagged sequence of words, it outputs a partial dependency parse. It takes advantage of the fact that supertagging provides an almost-parse in order to dependency parse the sentence in a simple, deterministic fashion. Basic LDA is a two step procedure. The first step involves linking each word serving as a modifier with the word that it modifies. The second step involves linking each word serving as an argument with its predicate. Linking always only occurs so that grammatical requirements as stipulated by the supertags are satisfied. The version of LDA that is used in this work differs from Srinivas (1997) in that there are other constraints on the linking process.3 In particular, a link is not established if its existence would create crossing brackets or cycles in the dependency tree for the sentence.</Paragraph> <Paragraph position="1"> We perform LDA on two versions of Section 23, one supertagged with SEM-TAG and the other with SYNT-TAG. The results are shown in Table 3. Evaluation is performed on dependencies excluding leaf-node punctuation. Each dependency is evaluated according to both whether the correct head and dependent is related as well as whether they both receive the correct part of speech tag. The F-measure scores, in the 70% range, are relatively low compared to Collins (1999) which has a corresponding score of around 90%. This is perhaps to be expected because Collins (1999) is based on a full parser. Note also that the accuracy of LDA is highly dependent on the accuracy of the supertagged input. This explains, for example, the fact that the accuracy on SEM-TAG supertagged input is lower than the accuracy with SYNT-TAG supertagged input.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 7.3 Semantic Roles from LDA Output </SectionTitle> <Paragraph position="0"> The output of LDA is a partial dependency parse annotated with TAG structures. We can use this output to predict semantic roles of arguments. The manner in which this is done depends on the kind of grammar that is used. The LDA output using SEM-TAG is already annotated with semantic role information because it is encoded in the grammar itself. On the other hand, the LDA output using SYNT-TAG contains strictly syntactic information. In this case, we use the highest performing model from Section 6 in order to label arguments with semantic roles.</Paragraph> <Paragraph position="1"> Evaluation of prediction of semantic roles takes the following form. Each argument labeled by a semantic role in the test corpus is treated as one trial. Certain aspects of this trial are always checked for correctness. These include checking that the semantic role and the dependency-link are correct. There are other aspects which may or may not be checked, depending on the type of evaluation. One aspect, &quot;bnd,&quot; is whether or not the argument's bracketing as specified in the dependency tree is correct. An- null Task: determine Recall Precision F base + arg 0.39 0.84 0.53 base + bnd 0.28 0.61 0.38 base + bnd + arg 0.28 0.61 0.38 other aspect, &quot;arg,&quot; is whether or not the headword of the argument is chosen to be correct.</Paragraph> <Paragraph position="2"> Table 4 show the results when we use SEM-TAG in order to supertag the input and perform LDA.</Paragraph> <Paragraph position="3"> When the boundaries are found, finding the head word additionally does not result in a decrease of performance. However, correctly identifying the head word instead of the boundaries leads to an important increase in performance. Furthermore, note the low recall and high precision of the &quot;base + arg&quot; evaluation. In part this is due to the nature of the PropBank corpus that we are using. In particular, because not all predicates in our version of the PropBank are annotated with semantic roles, the supertagger for SEM-TAG will sometimes annotate text without semantic roles when in fact it should contain them.</Paragraph> <Paragraph position="4"> Table 5 shows the results of first supertagging the input with SYNT-TAG and then using a model trained on the DEEP feature set to annotate the resulting syntactic structure with semantic roles. This two-step approach greatly increases performance over the corresponding SEM-TAG based approach.</Paragraph> <Paragraph position="5"> These results are comparable to the results from Gildea and Palmer (2002), but only roughly because of differences in corpora. Gildea and Palmer (2002) achieve a recall of 0.50, a precision of 0.58, and an F-measure of 0.54 when using the full parser of Collins (1999). They also experiment with using a chunker which yields a recall of 0.35, a precision of 0.50, and an F-measure of 0.41.</Paragraph> </Section> </Section> class="xml-element"></Paper>