File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2420_metho.xml
Size: 8,138 bytes
Last Modified: 2025-10-06 14:09:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2420"> <Title>Two-Phase Semantic Role Labeling based on Support Vector Machines</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Two Phase Semantic Role Labeling </SectionTitle> <Paragraph position="0"> based on SVMs We regard the semantic role labeling as a classification problem of a syntactic constituent. However, a syntactic constituent can be a chunk, or a clause. Therefore, we have to identify the boundaries of semantic arguments before we assign roles to the arguments.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Semantic Argument Identification </SectionTitle> <Paragraph position="0"> This phase is the step of finding the boundary of semantic arguments. A sequence of chunks or a subclause in the immediate clause of a predicate can be a semantic argument of the predicate. A chunk or a subclause of the predicate becomes a unit of the constituent of an argument. The chunks within the subclause are ignored.</Paragraph> <Paragraph position="1"> For identifying the semantic arguments of a target predicate, it is necessary to find the dependency relation between each constituent and a predicate. Identifying a dependency relation is important for identifying a subject/object relation (S. Buchholz, 2002) and also for identifying the semantic arguments of a target predicate.</Paragraph> <Paragraph position="2"> Therefore, the features for finding dependency relations are implicitly represented in the feature set for the identification task.</Paragraph> <Paragraph position="3"> For implementing the method based on the SVMs, we represent a constituent of an argument with B/I/O notation, and assign one of the following classes to each constituent: B-ARG class representing the beginning of semantic argument, I-ARG class representing a part of a semantic argument, or O class indicating that the constituent does not belong to the semantic arguments.</Paragraph> <Paragraph position="4"> Because we decide the unit of a constituent as a chunk or a subclause, words except the predicate in the target phrase 1 do not belong to constituent. Therefore, these words have to be handled independently. In the training data, we often observed that the beginning of semantic arguments starts from the word right after the predicate.</Paragraph> <Paragraph position="5"> For the agreement with the chunk boundary, we regard the word following a predicate as the beginning word of a new chunk. Namely, when the beginning of chunk tag is I, we change I to B. Also, the words located in front of the predicate in the target phrase are post-processed by 4 hand-crafted rules 2 and 211 automated rules 3 based on frequency in the training data.</Paragraph> <Paragraph position="6"> In order to restrict the search space in terms of the constituents, we use the clause boundaries. The left search boundary for identifying the semantic argument is set to the left boundary of the second upper clause, and the right search boundary is set to the right boundary of the immediate clause.</Paragraph> <Paragraph position="7"> For this phase, we use 29 features for representing syntactic and semantic information related to constituent and predicate. Table1 shows a set of features employed. The features can be described as follows: position: This is a binary feature identifying whether the constituent is before (-1) or after (1) the predicate in the immediate clause. The feature value target phrase containing the predicate deliver, and C means the constituent such as a chunk (e.g. Under) or a subclause (e.g. Rockwell said)) (-2) means that the constituent is out of the immediate clause.</Paragraph> <Paragraph position="8"> distance: The distance is measured by the number of chunks between the predicate and the constituent.</Paragraph> <Paragraph position="9"> # of VP, NP, SBAR: These are numeric features representing the number of the specific chunk types between the predicate and the constituent.</Paragraph> <Paragraph position="10"> # of POS [CC], [,], [:]: These are numeric features representing the number of the specific POS types between the predicate and the constituent.</Paragraph> <Paragraph position="11"> POS [&quot;] & POS [&quot;]: This is used as a feature representing the difference between # of POS[&quot;] and # of POS[&quot;] counted in the range from the predicate to the constituent. In Table 1, the feature value (-1) means that # of POS[&quot;] is larger than # of POS[&quot;]. The feature value (1) conversly means that # of POS[&quot;] is larger than # of POS[&quot;]. The featue value (0) means that # of POS[&quot;] is equal to # of POS[&quot;]. path: This is the syntactic path from the predicate to the constituent, and is a symbolic feature comprising all the elements (chunk or subclause) between the predicate and the constituent.</Paragraph> <Paragraph position="12"> beginning word's POS: In the target phrase, these values appear only with VPs and represent the POS of the syntactic head (MD, TO, VB, VBD, VBG, VBN, VBP, VBZ). This represents the property of the target phrase, for example, the feature value TO indicates that the target phrase is to-infinitive.</Paragraph> <Paragraph position="13"> context: These are information for the predicate itself, the left context of the predicate, the constituent itself, and the left and right context of the constituent. In Table 1, - means the left context, and + means the right context. In case that the constituent is the subclause, the chunk type of the constituent is set to the first chunk type of the subclause.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Semantic Role Assignment </SectionTitle> <Paragraph position="0"> In this phase, we assign appropriate semantic roles to the identified semantic arguments. For learning SVM classifiers, we consider not all semantic roles, but only 18 semantic roles based on frequency in the training data (Table 2). The (AM-MOD, AM-NEG) are post-processed by hand-crafted rules. As we decrease the number of SVM classifiers to be learned in the training data, the training cost of classifiers can be reduced. Furthermore, we can alleviate the unbalanced class distribution problem by exsemantic role A0, A1, A2, A3, A4, R-A0, R-A1, R-A2, C-A1 AM-TMP, AM-ADV, AM-MNR, AM-LOC, AM-DIS AM-PNC, AM-CAU, AM-DIR, AM-EXT This phase also uses all features applied in the semantic argument identification phase, except for # of POS [:] and POS[&quot;] & POS[&quot;]. In addition, we use the following feature.</Paragraph> <Paragraph position="1"> voice: This is a binary feature identifying whether the target phrase is active or passive.</Paragraph> <Paragraph position="2"> In Figure 2, we show two-phase semantic role labeling procedure using the example sentence in Figure 1.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Experiments </SectionTitle> <Paragraph position="0"> For experiments, we utilized the SVM light package (T.</Paragraph> <Paragraph position="1"> Joachims, 2002). In both the semantic argument identification and the semantic role assignment phase, we used a polynomial kernel (degree 2) with the one-vs-rest classification method. Table 3 shows the experimental results on the test set and Table 4 shows the experimental results on the development set. Table 4 also shows the performance of each phase.</Paragraph> <Paragraph position="2"> For improving the performance, we try to select the discrminative features for each subtask. Especially, since the performance of the identification phase is critical to the total performance, we concentrate on improving the identification performance. Our system obtains a F-measure of 74.08 in the identification phase, as presenteded in Table 4. For the argument classification task, the our system obtains a classification accuracy (A) of 85.45.</Paragraph> </Section> class="xml-element"></Paper>