XML Viewer - w04-2418

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2418_metho.xml
Size: 8,315 bytes
Last Modified: 2025-10-06 14:09:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2418">
  <Title>A Memory-Based Approach for Semantic Role Labeling</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 The Recognition Module
</SectionTitle>
    <Paragraph position="0"> This module identifies the arguments of a proposition, without assigning a label. For this task we use the IOB2 format, where B marks an element at the beginning of an argument, I an element inside an argument and O an element that does not belong to an argument.</Paragraph>
    <Paragraph position="1"> As all argument boundaries, except for those within the target verb chunks, coincide with base chunk boundaries, the data is processed by words only within the target verb chunk, and by chunks otherwise.</Paragraph>
    <Paragraph position="2"> The recognition module uses the following features: Head word and POS of the focus element, where the head of a multi-word chunk is its last words.</Paragraph>
    <Paragraph position="3"> Chunk type: one of the 12 chunks types, without the B- or I- prefix.</Paragraph>
    <Paragraph position="4"> Clause information: whether the element is at the beginning, at the end or inside a clause.</Paragraph>
    <Paragraph position="5"> Directionality: whether the focus element comes before the target verb, after the target verb, or coincides with the target verb.</Paragraph>
    <Paragraph position="6"> Distance: numerical distance (1 .. n) between the focus element and the target verb.</Paragraph>
    <Paragraph position="7">  of k Adjacency: whether the focus element is adjacent to the verb chunk or not, or it is within the verb chunk. The target verb and voice: the voice is passive if the target verb is a past participle preceded by a form of to be, and active otherwise.</Paragraph>
    <Paragraph position="8"> Context: in addition, the features head word, part of speech, chunk type and adjacency of the three chunks each to the left and right of the focus chunk are used as context information.</Paragraph>
    <Paragraph position="9"> Testing each feature separately showed the directionality and adjacency features to be most useful. Omitting one feature at a time showed to decrease performance for every omitted feature. Therefore, all of the above features were used in the final system.</Paragraph>
    <Paragraph position="10"> The best TiMBL parameter setting for this task was determined to be the Modified Value Difference metric paired with a set of seven nearest neighbors. As we anticipated, the nature of the task requires a more subtle differentiation than the Overlap metric can provide. Furthermore, the size of the training set is apparently sufficient to take full advantage of MVDM. The results for both metrics and all values of k are summarized in Table 1. It is interesting to observe the effect of the k value for each class. Although the results for the I- and O-classes decrease after k=7, those for the B-class do not. However, since the overall results are best for k=7, this values was chosen for the final system.</Paragraph>
    <Paragraph position="11"> For all metric/k combination, the results for the I class are much lower than for the other two. The most common error is the assignment of the O class to I-elements, or vice versa. This performance distribution implies that while the beginning of most arguments is recognized correctly, their span is not, which results in many &amp;quot;brokenup&amp;quot; arguments.</Paragraph>
    <Paragraph position="12"> To filter out the actual arguments, we try a strict and a lenient approach. For the latter, any sequence of elements that is not labeled as O is considered an argument (i.e. also those not starting with a B-element). Although this approach slightly reduces the number of missed arguments, it also vastly overgenerates, which ultimately decreases performance. The former approach recognizes as arguments only those sequences beginning with a Belement. Since B is the class most reliably predicted by the classifier, this approach yields better overall performance. null</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 The Labeling Module
</SectionTitle>
    <Paragraph position="0"> This module assigns one of the 30 semantic role labels to the arguments extracted by the recognition module. Here, we used only ten features, of which four are &amp;quot;recycled&amp;quot; from the previous module: Word, POS and chunk sequence: the head words of all the chunks in the argument, their respective parts of speech and chunk types. As TiMBL only allows feature vectors of a fixed length, each of the sequences represents one value.</Paragraph>
    <Paragraph position="1"> Clause information: as an element sequences can be a whole clause we added this value to the beginning, end and inside values described in Section 3. Length: the length in chunks of the argument.</Paragraph>
    <Paragraph position="2"> Directionality and adjacency: same as in Section  The target verb and voice: same as in Section 3.</Paragraph>
    <Paragraph position="3"> Prop Bank roleset of the target verb: as an analysis of the training data showed that about 86% of the verbs were used in their first sense, and many times, the rolesets for the first two senses are identical, we only considered the roleset of first sense.</Paragraph>
    <Paragraph position="4"> Just as for the recognition module, the directionality and adjacency features had the highest information gain. The POS sequence and length features showed no effect, and their omission even slightly improved performance. Therefore, the final system uses only eight features. To test the performance of this module independently from the first, it was evaluated on the gold-standard arguments (i.e. recognition score of 100). While MVDM once again outperforms the Overlap metric, the optimal value for k in this setting is one. The former supports the assumption that for feature values such as words, or word sequences, some values are more similar than others. The latter suggest that the size of the nearest neighbor set (1 vs. 7) should be somewhat proportional to the length of the feature vector (8 vs. 45).</Paragraph>
    <Paragraph position="5"> The results for each semantic role are summarized in</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
gument spans
</SectionTitle>
      <Paragraph position="0"> are fairly easy to predict. However, it must be noted that given the correct span, the complex (and most frequently occurring) arguments A0 and A1 can be also predicted with very high accuracy. On the down side, the accuracy for most adjuncts is rather low, even though their surface patterns are thought to be somewhat restricted (e.g. AM-</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
LOC, AM-TMP, AM-MNR, AM-EXT).
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> Tables 3 and 4 show the final results for the development and test set, respectively. Although each module performs fairly well separately, their combined results are suboptimal. This is probably due to the fact that the labeling module is trained with gold standard arguments, and is not able to deal with noise induced by the recognition module. The argument type whose results suffer the most is A1, because it usually spans over several chunks, and is difficult to retrieve correctly by the recognition module.</Paragraph>
    <Paragraph position="1"> Improvements to the system could be made on the syn- null tactic, lexical, as well as semantic levels. Firstly, it is crucial to improve the performance of the recognition module on I-elements. This could either be done by using a head-lexicalized parser, or, on a lower level, by a pre-processing module that resolves prepositional phrase attachment. Performance for adjuncts such as AM-LOC or AM-TMP could be improved, by using gazetteers of trigger words (e.g. Tuesday) or morphemes (e.g. -day).</Paragraph>
    <Paragraph position="2"> Furthermore, one could use a semantic database such as WordNet to cluster words. Last but not least, more advantage could be taken from the information in Prop Bank, so different representations of the rolesets should be explored. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML