File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-6007_intro.xml

Size: 5,067 bytes

Last Modified: 2025-10-06 14:03:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-6007">
  <Title>Syntactic Identification of Attribution in the RST Treebank</Title>
  <Section position="2" start_page="0" end_page="57" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> There has been a growing interest in recent years in Discourse Structure. A prominent example of this is the RST Treebank (Carlson et al., 2002), which imposes hierarchical structures on multi-sentence discourses. Since the texts in the RST Treebank are taken from the syntactically annotated Penn Treebank (Marcus et al., 1993), it is natural to ask what the relation is between the discourse structures in the RST Treebank and the syntactic structures of the Penn Treebank.</Paragraph>
    <Paragraph position="1"> In our view, the most natural relationship would be that discourse structures always relate well-formed syntactic expressions, typically sentences. Discourse trees would then be seen as elaborations of syntactic trees, adding relations between sentential nodes that are not linked by syntactic relations. This would allow discourse structures and syntactic structures to coexist in a combined hierarchical structure.</Paragraph>
    <Paragraph position="2"> Surprisingly, this is not what we have found in examining the syntax-discourse relation in the RST Treebank. A large proportion of relations apply to subsentential spans of text;1 spans that may or may not correspond to nodes in the syntax tree.</Paragraph>
    <Paragraph position="3"> Is this complicated relation between syntax and discourse necessary? Our hypothesis is that the subsentential relations in the RST Treebank are in fact redundant; if this is true it should be possible to automatically infer these relations based solely on Penn Treebank syntactic information.</Paragraph>
    <Paragraph position="4"> In this paper, we present the results of an initial study that strongly supports our hypothesis. We examine the Attribution relation, which is of par- null ticular interest for the following reasons: a2 It appears quite frequently in the RST Tree-bank (15% of all relations, according to Marcu et al. (1999)) a2 It always appears within, rather than across, sentence boundaries a2 It conflicts with Penn Treebank syntax, al null ways relating text spans that do not correspond to nodes in the syntax tree We describe a system that identifies Attributions by simple, clearly defined syntactic features.  This system identifies RST Attributions within precision and recall over 90%. In our view, this strongly supports the view that Attribution is in fact a syntactic relation. The system performs dramatically better than the results reported in (Soricut and Marcu, 2003) for automatic identification of such relations, where the precision and recall were reported at below .76. Furthermore, human annotator agreement reported in the RST Treebank project is also well below our results, with reported f-scores no higher than .77. (Soricut and Marcu, 2003) In what follows, we first describe Attributions as they are understood in the RST Treebank project. Next we present the Attribution identification procedure, followed by a presentation of results. We compare these results with related work, as well as with inter-coder agreement reported in the RST Treebank project. Finally, we discuss plans for future work.</Paragraph>
    <Paragraph position="5"> 2 Attributions in the RST Treebank The RST coding manual (Carlson and Marcu, 2001) gives the following definition of Attribution: null Instances of reported speech, both direct and indirect, should be marked for the rhetorical relation of ATTRIBU-TION. The satellite is the source of the attribution (a clause con- taining a reporting verb, or a phrase beginning with according to), and the nucleus is the content of the reported message (which must be in a separate clause). The AT-TRIBUTION relation is also used with cognitive predicates, to include feelings, thoughts, hopes, etc.</Paragraph>
    <Paragraph position="6"> The following is an example cited in the coding manual: [The legendary GM chairman declared] [that his company would make &amp;quot;a car for every purse and purpose.&amp;quot;]wsj 1377 According to the RST Treebank, the attribution verb is grouped with the subject into a single text span. This constitutes the Attribution Satellite, while the Nucleus is the SBAR complement of the attribution verb, as shown below in Figure 1. that his company  This conflicts with the syntactic structure in the Penn Treebank. As shown in Figure 2, the attribution verb is grouped with its SBAR complement, forming a VP, which is related to the subject. declared that his company  The main difference in the two structures regards the position of the verb; in the RST Treebank, the verb is grouped with the subject, while in the Penn Treebank, it is grouped with the SBAR complement. In the following section, we describe our method for identifying RST Attributions, based on the Penn Treebank syntactic structure. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML