File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2605_intro.xml

Size: 5,589 bytes

Last Modified: 2025-10-06 14:04:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2605">
  <Title>Discourse Parsing: Learning FOL Rules based on Rich Verb Semantic Representations to automatically label Rhetorical Relations</Title>
  <Section position="2" start_page="0" end_page="33" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The availability of corpora annotated with syntactic information have facilitated the use of probabilistic models on tasks such as syntactic parsing. Current state of the art syntactic parsers reach accuracies between 86% and 90%, as measured by different types of precision and recall (for more details see (Collins, 2003)). Recent semantic (Kingsbury and Palmer, 2002) and discourse (Carlson et al., 2003) annotation projects are paving the way for developments in semantic and discourse parsing as well. However unlike syntactic parsing, significant development in discourse parsing remains at large.</Paragraph>
    <Paragraph position="1"> Previous work on discourse parsing ((Soricut and Marcu, 2003) and (Forbes et al., 2001)) have focused on syntactic and lexical features only. However, discourse relations connect clauses/sentences, hence, descriptions of events and states. It makes linguistic sense that the semantics of the two clauses --generally built around the semantics of the verbs, composed with that of their arguments-- affects the discourse relation(s) connecting the clauses. This may be even more evident in our instructional domain, where relations derived from planning such as Precondition-Act may relate clauses.</Paragraph>
    <Paragraph position="2"> Of course, since semantic information is hard to come by, it is not surprising that previous work on discourse parsing did not use it, or only used shallow word level ontological semantics as specified in WordNet (Polanyi et al., 2004). But when rich sentence level semantics is available, it makes sense to experiment with it for discourse parsing.</Paragraph>
    <Paragraph position="3"> A second major difficulty with using such rich verb semantic information, is that it is represented using complex data structures. Traditional Machine Learning methods cannot handle highly structured data such as First Order Logic (FOL), a representation that is suitably used to represent sentence level semantics. Such FOL representations cannot be reduced to a vector of attribute/value pairs as the relations/interdependencies that exist among the predicates would be lost.</Paragraph>
    <Paragraph position="4"> Inductive Logic Programming (ILP) can learn structured descriptions since it learns FOL descriptions. In this paper, we present our first steps using ILP to learn semantic descriptions of discourse relations. Also of relevance to the topic of this workshop, is that discourse structure is inherently highly structured, since discourse structure is generally described in hierarchical terms: basic units of analysis, generally clauses, are related by discourse relations, resulting in more complex units, whichinturncanberelatedviadiscourserelations. At the moment, we do not yet address the problem of parsing at higher levels of discourse.</Paragraph>
    <Paragraph position="5"> We intend to build on the work we present in this paper to achieve that goal.</Paragraph>
    <Paragraph position="6"> The task of discourse parsing can be divided into two disjoint sub-problems ((Soricut and Marcu, 2003) and (Polanyi et al., 2004)). The two sub-problems are automatic identification of segment boundaries and the labeling of rhetorical relations. Though we consider the problem of automatic segmentation to be an important part in discourse parsing, we have focused entirely on the latter problem of automatically labeling rhetorical  tics1 of elementary discourse units (EDUs)2 based on VerbNet(Kipper et al., 2000) as background knowledge and manually annotated rhetorical relations as training examples. It is trained on a lot fewer examples than the state of the art syntax-based discourse parser (Soricut and Marcu, 2003). Nevertheless, it achieves a comparable level of performance with an F-Score of 60.24. Figure 1 shows a block diagram of SemDP's system architecture. Segmentation, annotation of rhetorical relations and parsing constitute the data collection phase of the system. Learning is accomplished using an ILP based system, Progol (Muggleton, 1995). As can be seen in Figure 1, Progol takes as input both rich verb semantic information of pairs of EDUs and the rhetorical relations between them. The goal was to learn rules using the semantic information from pairs of EDUs as in Ex- null ample 1: (1) EDU1: &amp;quot;Sometimes, you can add a liquid to the water EDU2: &amp;quot;to hasten the process&amp;quot;  relation(EDU1,EDU2,&amp;quot;Act:goal&amp;quot;).</Paragraph>
    <Paragraph position="7"> to automatically label unseen examples with the correct rhetorical relation.</Paragraph>
    <Paragraph position="8"> The rest of the paper is organized as follows.</Paragraph>
    <Paragraph position="9"> Section 2 describes our data collection methodology. In section 3, Progol, the ILP system that we 1The semantic information we used is composed of VerbNet semantic predicates that capture event semantics as well as thematic roles.</Paragraph>
    <Paragraph position="10"> 2EDUs are minimal discourse units produced as a result of discourse segmentation.</Paragraph>
    <Paragraph position="11"> used to induce rules for discourse parsing is detailed. Evaluation results are presented in section 4 followed by the conclusion in section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML