File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1008_evalu.xml

Size: 5,163 bytes

Last Modified: 2025-10-06 13:59:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1008">
  <Title>Identifying Semantic Roles Using Combinatory Categorial Grammar</Title>
  <Section position="8" start_page="3" end_page="3" type="evalu">
    <SectionTitle>
6 Results
</SectionTitle>
    <Paragraph position="0"> Because of the mismatch between the constituent structures of CCG and the Treebank, we score both systems according to how well they identify the head words of PropBank's arguments. Table 2 gives the performance of the system on both PropBank's core, or numbered, arguments, and on all PropBank roles including the adjunct-like ArgM roles. In order to analyze the impact of errors in the syntactic parses, we present results using features extracted from both automatic parser output and the gold standard parses in the Penn Treebank (without functional tags) and in CCGbank. Using the gold standard parses provides an upper bound on the performance of the system based on automatic parses. Since the Collins parser does not provide trace information, its upper bound is given by the system tested on the gold-standard Treebank representation with traces removed. In Table 2, &amp;quot;core&amp;quot; indicates results on PropBank's numbered arguments (ARG0...ARG5) only, and &amp;quot;all&amp;quot; includes numbered arguments as well as the ArgM roles. Most of the numbered arguments (in particular ARG0 and ARG1) correspond to arguments that the CCG category of the verb directly subcategorizes for. The CCG-based system outperforms the system based on the Collins parser on these core arguments, and has comparable performance when all PropBank labels are considered. We believe that the superior performance of the CCG system on this core arguments is due to its ability to recover long-distance dependencies, whereas we attribute its lower performance on non-core arguments mainly to the mismatches between PropBank and CCGbank.</Paragraph>
    <Paragraph position="1"> The importance of long-range dependencies for our task is indicated by the fact that the performance on the Penn Treebank gold standard without traces  row in this table corresponds to the second row in Table 2. is significantly lower than that on the Penn Treebank with trace information. Long-range dependencies are especially important for core arguments, shown by the fact that removing trace information from the Treebank parses results in a bigger drop for core arguments (83.5 to 76.3 F-score) than for all roles (74.1 to 70.2). The ability of the CCG parser to recover these long-range dependencies accounts for its higher performance, and in particular its higher recall, on core arguments.</Paragraph>
    <Paragraph position="2"> The CCG gold standard performance is below that of the Penn Treebank gold standard with traces.</Paragraph>
    <Paragraph position="3"> We believe this performance gap to be caused by the mismatches between the CCG analyses and the PropBank annotations described in Section 5.2. For the reasons described, the head words of the constituents that have PropBank roles are not necessarily the head words that stand in a predicate-argument relation in CCGbank. If two words do not stand in a predicate-argument relation, the CCG system takes recourse to the path feature. This feature is much sparser in CCG: since CCG categories encode sub-categorization information, the number of categories in CCGbank is much larger than that of Penn Tree-bank labels. Analysis of our system's output shows that the system trained on the Penn Treebank gold standard obtains 55.5% recall on those relations that require the CCG path feature, whereas the system using CCGbank only achieves 36.9% recall on these.</Paragraph>
    <Paragraph position="4"> Also, in CCG, the complement-adjunct distinction is represented in the categories for the complement (eg. PP) or adjunct (eg. (SnNP)n(SnNP) and in the categories for the head (eg. (S[dcl]nNP)=PP or S[dcl]nNP). In generating the CCGbank, various heuristics were used to make this distinction. In particular, for PPs, it depends on the &amp;quot;closely-related&amp;quot; (CLR) function tag, which is known to be unreliable. The decisions made in deriving the CCGbank often do not match the hand-annotated complement-adjunct distinctions in PropBank, and this inconsistency is likely to make our CCGbank-based features less predictive. A possible solution is to regenerate the CCGbank using the Propbank annotations.</Paragraph>
    <Paragraph position="5"> The impact of our head-word based scoring is analyzed in Table 3, which compares results when only the head word must be correctly identified (as in Table 2) and to results when both the beginning and end of the argument must be correctly identified in the sentence (as in Gildea and Palmer (2002)). Even if the head word is given the correct label, the boundaries of the entire argument may be different from those given in the PropBank annotation. Since constituents in CCGbank do not always match those in PropBank, even the CCG gold standard parses obtain comparatively low scores according to this metric. This is exacerbated when automatic parses are considered.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML