File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-1124_intro.xml
Size: 6,229 bytes
Last Modified: 2025-10-06 14:06:44
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1124"> <Title>Improving summarization through rhetorical parsing tuning</Title> <Section position="3" start_page="206" end_page="207" type="intro"> <SectionTitle> 2 Background work </SectionTitle> <Paragraph position="0"> RST. The discourse theory that we are going to use is Rhetorical Structure Theory(RST) (Mann and Thompson, 1988). Central to RST is the notion of rhetorical relation, which is a relation that holds between two non-overlapping text spans called NUCLEUS and SATELLITE.</Paragraph> <Paragraph position="1"> (There are a few exceptions to this rule: some relations, such as CONTRAST, are multinuclear.) The distinction between nuclei and satellites comes from the empirical observation that the nucleus expresses what is more essential to the writer's purpose than the satellite; and that the nucleus of a rhetorical relation is comprehensible independent of the satellite, but not vice versa.</Paragraph> <Paragraph position="2"> Text coherence in RST is assumed to arise from a set of constraints. The constraints operate on the nucleus, on the satellite, and on the combination of nucleus and satellite. For example, an EVIDENCE relation holds between the nucleus (labelled as 5 in text (1), which is shown below) and the satellite (labelled as 6 in text (1)), because the nucleus presents some information that the writer believes to be insufficiently supported to be accepted by the reader; the satellite presents some information that is thought to be believed by the reader or that is credible to her; and the comprehension of the satellite increases the reader's belief in the nucleus. Rhetorical relations can be assembled into rhetorical structure trees (RS-trees) by recursively applying individualrelations to spans that range in size from one clause-like unit to the whole text.</Paragraph> <Paragraph position="3"> Rhetorical parsing. Recent developments in computational linguistics have created the means for the automatic derivation of rhetorical structures of unrestricted texts. For example, when the text shown in (1), below, is given as input to the rhetorical parsing algorithm that is discussed in detail by Marcu (1997b; 1997c), it is broken into ten elementary units (those surrounded by square brackets). The rhetorical parsing algorithm then uses cue phrases and a simple notion of semantic similarity in order to hypothesize rhetorical relations among the elementary units. Eventually, the algorithm derives the rhetorical structure tree shown in figure 1.</Paragraph> <Paragraph position="4"> (1) \[With its distant orbit -- 50 percent farther from the sun than Earth -- and slim atmospheric blanket, z \] \[Mars ex- null periences frigid weather conditions?\] \[Surface temperarures typically average about -60 degrees Celsius (-76 degrees Fahrenheit) at the equator and can dip to -123 degrees C near the poles, a \] \[Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion, 4\] \[but any liquid water formed in this way would evaporate almost instantly S\] \[because of the low atmospheric pressure. 6 \] \[Although the atmosphere holds a small amount * of water, and water-ice clouds sometimes develop, r\] \[most Martian weather involves blowing dust or carbon dioxide)\] \[Each winter, for example, a blizzard of frozen carbon dioxide rages over one pole, and a few meters of this dry-ice snow accumulate as previously frozen carbon dioxide evaporates from the opposite polar cap. 9 \] \[Yet even on the summer pole. where the sun remains in the sky all day long, temperatures never warm enough to melt * frozen waterJ deg \] parser (Marcu, 1997c) for text (1).</Paragraph> <Paragraph position="5"> This discourse structure obeys the constraints put forth by Mann and Thompson (1988) and Marcu (1996). It is a binary tree whose leaves are the elementary textual units in (1). Each node in the tree plays either the role of nucleus or satellite. In figure 1, nuclei are represented by solid boxes, while satellites are represented by dotted boxes. The internal nodes of the discourse structure are labelled with names of rhetorical relations and with numbers. The numbers denote the salient or promotion units of that node; they correspond to the most important units in the subsumed text span. They are determined in a bottom-up fashion, as follows: the salient unit of a leaf is the leaf itself; the salient units of an internal node are given by the union of the salient units of its immediate nuclear children. For example, the node that spans units \[4---6\] has salient units 4 and 5 because the immediate children of the node labelled with relation CONTRAST are both nuclei, which have promotion units 4 and 5 respectively; the root node, which spans units \[1-10\] has 2 as its salient unit because only the node that corresponds to span \[1-6\] is a nucleus, whose salient unit is 2. In figure 1, parent nodes are linked to subordinated nuclei by solid arrows; parent nodes are linked to subordinated satellites by dotted lines.</Paragraph> <Paragraph position="6"> Discourse-based summarization. Once a discourse structure such as that shown in figure 1 is created, we can derive a partial ordering of the important units in the original text by considering that the units that are promoted closer to the root are more important than those that are promoted less close. By applying this criterion to tree 1, we obtain the partial ordering shown in (2), below, because unit 2 is the only promotion unit associated with the root, unit 8 is the only unit found one level below the root, units 3 and 10 are the only units found two levels below the root, and so on.</Paragraph> <Paragraph position="7"> (2) 2>8>3,10> 1,4,5,7,9>6 Using partial ordering (2) we can obtain a summary that contains k% of the original text by selecting the first k% units in the partial ordering.</Paragraph> <Paragraph position="8"> By applying this algorithm, Marcu (1997a; 1997c) has built a summarization system that recalled 52.77% (with precision 50.00%) of the clause-like units that were considered important by human judges in a collection of:five texts.</Paragraph> </Section> class="xml-element"></Paper>