XML Viewer - w04-2413

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-2413_concl.xml
Size: 3,964 bytes
Last Modified: 2025-10-06 13:54:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2413">
  <Title>Semantic Role Labelling With Chunk Sequences</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Experiments and Results
</SectionTitle>
    <Paragraph position="0"> Optimising Step 1 (Argument Identification). On the development set, we explored the impact of different features from Section 3.3 on Step 1. Our optimal model contained as shallow features: all except the sequence's position; as divider features: divider sequence; as higher-level features: the preposition and the superchunk; as EM features: all. Adding more features deteriorated the model.</Paragraph>
    <Paragraph position="1">  (evaluation scores category-specific for LABEL) Table 1 presents an overview of different combinations of feature sets. We optimised category-specific Fa51 -score for LABEL, since only examples with LABEL are forwarded to Step 2. The first line (all) shows that the main problem in the first step is the recall, which limits the amount of arguments available for Step 2. For this reason, we varied the parameter a42 of the classification procedure: LABEL(a106 ) if a79 a5 LABELa7 a106 a9a46a107 a42 . We found the optimal category-specific F-score for a42 a11a108a94a10a83 a109</Paragraph>
    <Paragraph position="3"> ing the recall at the cost of precision.</Paragraph>
    <Paragraph position="4"> Optimising Step 2 (Argument Labelling). We performed the same optimisation for Step 2, using the output of our best model of Step 1 as input. The best model for Step 2 uses all shallow features except the sequence's position; all higher-level features but negation; all divider features; no EM-clustering features. Table 2 shows the performance of the complete system for different feature sets. We also give two upper bounds for our system, one caused by the arguments lost in the sequence computation, and one caused by the arguments missed by Step 1.  (based on the best argument identification model) The final model on the test set. Our best model combines the two models for Steps 1 and 2 indicated in boldface. Table 3 shows detailed results on the test set. Discussion. During the development phase, we compared the performance of our final architecture with one that did not filter out on the basis of infrequent dividers as outlines in Sec. 2. Even though we lose 7.5% of the arguments in the development set by filtering, the F-score improves by about 12%. This shows that intelligent filtering is a crucial factor in a chunk sequence-based system. The main problem for both subtasks is recall. This might also be the reason for the disappointing performance of the EM features, since the small amount of available training data limits the coverage of the models. As a consequence, EM features tend to increase the precision of a model at the cost of recall. At the overall low level of recall, the addition of EM features results in a virtually unchanged performance for Step 1 and even a suboptimal result for Step 2.</Paragraph>
    <Paragraph position="5"> For both of our subtasks, adding more features to a given model can harm its performance. Evidently, some features predict the training data better than the development data, and can mislead the model. This can be seen as a kind of overfitting. Therefore, it is important to test not only feature sets, but also single features.</Paragraph>
    <Paragraph position="6"> The two subtasks have rather different profiles. Table 1 shows that Step 1 hardly uses higher-level features,  while the single divider feature has some impact. Step 2, on the other hand, improves considerably when higher-level features are added; divider features are less important (see Table 2). It appears that the split of semantic role labelling into argument identification and argument labelling mirrors a natural division of the problem, whose two parts rely on different types of information.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML