File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2703_intro.xml
Size: 4,548 bytes
Last Modified: 2025-10-06 14:02:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2703"> <Title>Annotating Discourse Connectives And Their Arguments</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Large scale annotated corpora have played a critical role in speech and natural language research. The Penn Tree-Bank (PTB) is an example of such a resource with world-wide impact on natural language processing (Marcus et al., 1993). However, the PTB deals with text only at the sentence level: with the demand for more powerful NLP applications comes a need for greater richness in annotation. At the sentence level, Penn Propbank is adding predicate-argument annotation to sentences in PTB (Kingsbury and Palmer, 2002). At the discourse-level are efforts to produce corpora annotated with rhetorical relations (Carlson et al., 2003). This paper describes a more basic discourse-level annotation project - the Penn Discourse TreeBank (PDTB) - that aims to produce a large-scale corpus in which discourse connectives are annotated, along with their arguments.</Paragraph> <Paragraph position="1"> There have been several approaches to describing discourse in terms of discourse relations (Mann and Thompson, 1988; Asher and Lascarides, 1998; Polanyi and van den Berg, 1996). In these approaches, the additional meaning the discourse contributes beyond the sentence derives from discourse relations. Specification of the discourse relations for a discourse thus constitutes a description of a certain level of discourse structure.</Paragraph> <Paragraph position="2"> Rather than starting from (abstract) discourse relations, we describe an approach to annotating a large-scale corpus in terms of a more basic characterisation of discourse structure in terms of discourse connectives and their arguments. The motivation for such an approach stems from work by Webber and Joshi (1998), Webber et al. (1999a), Webber et al. (2000) which integrates sentence level structures with discourse level structure (using tree-adjoining grammars for both cases, LTAG and DLTAG, respectively).1 This allows structural composition and its associated semantic composition at the sentence level to be smoothly carried over to the discourse level, a goal also shared by Gardent (1997), Schilder (1997) and Polanyi and van den Berg (1996), among others.2 Discourse connectives and their arguments can be successfully annotated with high reliability (cf. Section 4). This is not surprising, given that the task resembles that of annotating verbs and their arguments at the sentence level (Kingsbury and Palmer, 2002). In fact, we use a fine-grained, lexically grounded annotation in which argument labels are specific to the dis1In the PDTB annotations, we have deliberately adopted a policy to make the annotations independent of the DLTAG framework for two reasons: (1) to make the annotated corpus widely useful to researchers working in different frameworks and (2) to make the annotators' task easier, thereby increasing interannotator reliability.</Paragraph> <Paragraph position="3"> 2However, the approaches in Gardent (1997), Schilder (1997), and Polanyi and van den Berg (1996) are different in two ways: a) the process by which discourse derives compositional aspects of meaning is considered entirely separate from how clauses do so, and b) only two mechanisms are used for deriving discourse semantics - compositional semantics and inference.</Paragraph> <Paragraph position="4"> course connectives involved, in much the same way as in Kingsbury and Palmer (2002). In contrast, a recent attempt (Carlson et al., 2003) at using RST-type relations for annotating a much smaller corpus has already revealed difficulties involved in reliably annotating more abstract discourse relations. Moreover, this type of annotation does not contain any record of the basis on which a relation was assigned.</Paragraph> <Paragraph position="5"> The paper is organized as follows. Section 2 provides a brief overview of the fundamental ideas that provide the basis for the design of the PDTB annotation. Section 3 gives a detailed description of the annotation project, including information about the size of the corpus, completed annotations as well as annotation instructions as formulated in the guidelines. Section 4 presents data analysis based on current annotations as well as results from inter-annotator agreement. Section 5 wraps up with a summary of the work.</Paragraph> </Section> class="xml-element"></Paper>