File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2416_metho.xml
Size: 9,252 bytes
Last Modified: 2025-10-06 14:09:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2416"> <Title>Semantic Role Labeling by Tagging Syntactic Chunks</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 System Description </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Data Representation </SectionTitle> <Paragraph position="0"> In this paper, we change the representation of the original data as follows: Bracketed representation of roles is converted into IOB2 representation (Ramhsaw and Marcus, 1995; Sang and Veenstra, 1995) Word tokens are collapsed into base phrase (BP) tokens. null Since the semantic annotation in the PropBank corpus does not have any embedded structure there is no loss of information in the first change. However, this results in a simpler representation with a reduced set of tagging labels. In the second change, it is possible to miss some information in cases where the semantic chunks do not align with the sequence of BPs. However, in Section 3.2 we show that the loss in performance due to the misalignment is much less than the gain in performance that can be achieved by the change in representation.</Paragraph> <Paragraph position="1"> phrase data representation used in this paper. Words are collapsed into base phrase types retaining only headwords with their respective features. Bracketed representation of semantic role labels is converted into IOB2 representation. See text for details.</Paragraph> <Paragraph position="2"> The new representation is illustrated in Figure 1 along with the original representation. Comparing both we note the following differences and advantages in the new representation: null BPs are being classified instead of words.</Paragraph> <Paragraph position="3"> Only the BP headwords (rightmost words) are retained as word information.</Paragraph> <Paragraph position="4"> The number of tagging steps is smaller.</Paragraph> <Paragraph position="5"> A fixed context spans a larger segment of a sentence.</Paragraph> <Paragraph position="6"> Therefore, the P-by-P semantic role chunker classifies larger units, ignores some of the words, uses a relatively larger context for a given window size and performs the labeling faster.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Features </SectionTitle> <Paragraph position="0"> The following features, which we refer to as the base features, are provided in the shared task data for each sentence; null the IOB2 representation (e.g. B-NP, I-NP, O etc.) Clause tags: The tags that mark token positions in a sentence with respect to clauses. (e.g *S)*S) marks a position that two clauses end) Named entities: The IOB tags of named entities. There are four categories; LOC, ORG, PERSON and MISC.</Paragraph> <Paragraph position="1"> Using available information we have created the following token level features: Token Position: The position of the phrase with respect to the predicate. It has three values as &quot;before&quot;, &quot;after&quot; and &quot;-&quot; for the predicate. Path: It defines a flat path between the token and the predicate as a chain of base phrases. At both ends, the chain is terminated with the POS tags of the predicate and the headword of the token.</Paragraph> <Paragraph position="2"> Clause bracket patterns: We use two patterns of clauses for each token. One is the clause bracket chain between the token and the predicate, and the other is from the token to sentence begin or end depending on token's position with respect to the predicate. null Clause Position: a binary feature that indicates the token is inside or outside of the clause which contains the predicate Headword suffixes: suffixes of headwords of length 2, 3 and 4.</Paragraph> <Paragraph position="3"> Distance: we have two notions of distance; the first is the distance of the token from the predicate as a number of base phrases, and the second is the same distance as the number of VP chunks.</Paragraph> <Paragraph position="4"> Length: the number of words in a token.</Paragraph> <Paragraph position="5"> We also use some sentence level features: Predicate POS tag: the part of speech category of the predicate Predicate Frequency; this is a feature which indicates whether the predicate is frequent or rare with respect to the training set. The threshold on the counts is currently set to 3.</Paragraph> <Paragraph position="6"> Predicate BP Context : The chain of BPs centered at the predicate within a window of size -2/+2.</Paragraph> <Paragraph position="7"> Predicate POS Context : The POS tags of the words that immediately precede and follow the predicate. The POS tag of a preposition is replaced with the preposition itself.</Paragraph> <Paragraph position="8"> Predicate Argument Frames: We used the left and right patterns of the core arguments (A0 through A5) for each predicate . We used the three most frequent argument frames for both sides depending on the position of the token in focus with respect to the predicate. (e.g. raise has A0 and A1 AO (A0 being the most frequent) as its left argument frames, and A1, A1 A2 and A2 as the three most frequent right argument frames) Number of predicates: This is the number of predicates in the sentence.</Paragraph> <Paragraph position="9"> For each token (base phrase) to be tagged, a set of ordered features is created from a fixed size context that surrounds each token. In addition to the above features, we also use previous semantic IOB tags that have already been assigned to the tokens contained in the context. A 5-token sliding window is used for the context. A greedy left-to-right tagging is performed.</Paragraph> <Paragraph position="10"> All of the above features are designed to implicitly capture the patterns of sentence constructs with respect to different word/predicate usages and senses. We acknowledge that they significantly overlap and extensive experiments are required to determine the impact of each feature on the performance.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Classifier </SectionTitle> <Paragraph position="0"> All SVM classifiers were realized using TinySVM1 with a polynomial kernel of degree 2 and the general purpose SVM based chunker YamCha 2. SVMs were trained for Both systems use the base features provided (i.e. no feature engineering is done). Results are on dev set.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Method Sentences Training Examples </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"/> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Experimental Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Data and Evaluation Metrics </SectionTitle> <Paragraph position="0"> The data provided for the shared task is a part of the February 2004 release of the PropBank corpus. It consists of sections from the Wall Street Journal part of the Penn Treebank. All experiments were carried out using Sections 15-18 for training Section-20 for development and Section-21 for testing. The results were evaluated for precision, recall and F =1 numbers using the srl-eval.pl script provided by the shared task organizers.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 W-by-W and P-by-P Experiments </SectionTitle> <Paragraph position="0"> In these experiments we used only the base features to compare the two approaches. Table 1 illustrates the over-all performance on the dev set. Although both systems were trained using the same number of sentences, the actual number of training examples in each case were quite different. Those numbers are presented in Table 2. It is clear that P-by-P method uses much less data for the same number of sentences. Despite this we particularly note a considerable improvement in recall. Actually, the data reduction was not without a cost. Some arguments have been missed as they do not align with the base phrase chunks due to inconsistencies in semantic annotation and due to errors in automatic base phrase chunking. The percentage of this misalignment was around 2.5% (over the dev set). We observed that nearly 45% of the mismatches were for the &quot;outside&quot; chunks. Therefore, sequences of words with outside tags were not collapsed.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Best System Results </SectionTitle> <Paragraph position="0"> In these experiments all of the features described earlier were used with the P-by-P system. Table 3 presents our best system performance on the development set. Additional features have improved the performance from 61.02 to 71.72. The performance of the same system on the test set is similarly illustrated in Table 4.</Paragraph> </Section> </Section> class="xml-element"></Paper>