File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1426_metho.xml

Size: 19,645 bytes

Last Modified: 2025-10-06 14:07:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1426">
  <Title>Can text structure be incompatible with rhetorical structure?</Title>
  <Section position="4" start_page="194" end_page="195" type="metho">
    <SectionTitle>
2 Rhetorical structure and
</SectionTitle>
    <Paragraph position="0"> text structure To distinguish clearly between FthetRep and DocRep, we need to define the kinds of information that should be included in the two representations. Bateman and Rondhius (1997) compare several approaches to rhetorical representation, citing in particular RST (Mann and Thompson, 1988) and Segmented Discourse Representation Theory (Asher, 1993). These approaches share the idea that rhetorical representations are composed of propositions linked by rhetorical relations; SDRT includes as well the logical apparatus of DRT, thus covering notions like necessity and logical scope which are missing from RST. For the most part, NLG applications have used the RST framework, adapted in various ways; the most common representation, proposed also as the RAGS standard, is that of a tree in which terminal nodes represent elementary propositions, while non-terminal nodes represent rhetorical relationships. This representation, proposed for example by Scott and Souza (1990), is illustrated by figure 1, which might be realized by the following passage: (1) Elixir occasionally provokes a mild allergic reaction B, because it contains gestodene C.</Paragraph>
    <Paragraph position="1"> However, Elixir has no serious side-effects A .</Paragraph>
    <Paragraph position="2"> Assuming an RST-based framework, an important issue is whether the rhetorical represen. tation should already.imply a linear order. Most researchers have followed Scott and Souza in assuming that linear order should be left unspecified; it is during the transition to the document representation that the material is distributed among linguistic units (or perhaps diagrams, in a multimedia document) arranged in a specific order. Thus the cause relation in figure 1. for example, could be realized with nucleus first, or satellite first, or satellite embedded within nu- null (2a) Elixir occasionally provokes a mild allergic reaction B, because it contains gestodene c.</Paragraph>
    <Paragraph position="3"> (2b) Because it contains gestodene C, Elixir occasionally provokes a mild allergic reaction B.</Paragraph>
    <Paragraph position="4"> (2c) Elixir, because it contains gestodene C,  occasionally provokes a mild allergic reaction B .</Paragraph>
    <Paragraph position="5"> In the RAGS proposal, which aims to extract a useful common approach from current work in NLG, the DocRep comprises an ordered tree corresponding roughly to the 'logical markup' in notations like HTML and LaTeX. More precisely, a distinction is made between abstract and concrete levels of representation, where the abstract representation corresponds to logical markup (e.g., concepts like 'paragraph' and :emphasis'), while the concrete representation also covers graphical markup (concepts like ~vertical space' and 'bold face'). In terms of this distinction, it is the AbsDocRep that is specified during text planning; graphical markup can be deferred to a later formatting stage.</Paragraph>
    <Paragraph position="6"> Figure 2 shows two alternative document representations expressing the rhetorical content in figure 1. Following Power (2000), the nodes of the tree are labelled with 'text-categories' using a system that extends the 'text grammar' proposed by Nunberg (1990). 1 These document 1Nunberg's terms 'text-phrase', 'text-clatise',and 'text-sentence' refer to textual categories, which should not be confused with their syntactic counterparts. They are defined not by syntactic formation rules but by their role in text-structure, which is typically marked as follows: tezt-sentences begin with a capital letter and end in a full stop; text-clauses are separated by semicolons; tezt-phrases are  representations can now be passed to the tactical generator for the syntactic realization of the elementary propositions; the resulting texts might be as follows:  (3a) Elixir occasionally provokes a mild allergic reaction B, because it contains gestodene C.</Paragraph>
    <Paragraph position="7"> However, Elixir has no serious side-effects A .</Paragraph>
    <Paragraph position="8"> (3b) Elixir contains gestodeneC; consequently, it occasionally provokes a mild allergic reactionS; however, Elixir has no serious side-effects A .</Paragraph>
  </Section>
  <Section position="5" start_page="195" end_page="196" type="metho">
    <SectionTitle>
3 Structural compatibility
</SectionTitle>
    <Paragraph position="0"> Summarising the argument so far. we have made three main points: o Rhetorical structure has typically been represented by unordered RST trees such as figure 1.</Paragraph>
    <Paragraph position="1"> o Document structure, which conveys information similar to logical markup in HTML~ can be represented by ordered trees in which nodes are labelled with textcategories (figure 2).</Paragraph>
    <Paragraph position="2"> constituents of text-clauses, sometimes separated by commas, although within text-clauses the hierarchical- structture is expressed mainly through syntax. A given rhetorical representation can be expressed by a variety of different document representations, in which the propositions occur in different orders, and in different text-category configurations, and the rhetorical relations are expressed by different connectives.</Paragraph>
    <Paragraph position="3"> This formulation of the problem raises an obvious question: how can we characterize the set of document representations that adequately realize a given rhetorical representation? Elsewhere (Power, 2000), we have argued that an adequate realization must meet three conditions: Correct content: All propositions and nmst be expressed.</Paragraph>
    <Paragraph position="4"> rhetorical relations Well-formed structure: General formation rules for document structure must be respected (e.g. a text-sentence cannot contain a paragraph, unless tile paragraph is indented).</Paragraph>
    <Paragraph position="5"> Structural compatibility: The docmnent representation nmst organize the propositions in a way that is compatible with their organization in rhetorical structure.</Paragraph>
    <Paragraph position="6">  The first two conditions are relatively straightforward, but what is meant,exactly .by 'structural compatibility'? Assuming that we are comparing two trees, the strongest notion of compatibility is isomorphism, which can be defined for our purposes as follows: DocRep is isomorphic with RhetRep if they group the elementary propositions in exactly the same way.</Paragraph>
    <Paragraph position="7"> More formally, every set of propositions that is dominated by a node in DocRep should be dominated by a node in RhetRep, and vice-versa.</Paragraph>
    <Paragraph position="8"> Under this definition, the rhetorical representation in figure 1 is isomorphic with the document representation in figure 2a, but not with that in figure 2b: * Proceeding top-down and left-to-right, the five nodes in figure 1 dominate the proposition sets {A,B, C}, {A}, {S,C}, {B}, and {c}. o Ignoring nodes that express discourse connectives, the nodes in figure 2a dominate the proposition sets {A,B,C}, {B,C}, {B}, {C} (twice), and {A} (twice). These are exactly the same sets that were obtained for figure 1.</Paragraph>
    <Paragraph position="9"> * Tile corresponding sets for figure 2b are {A,B,C}, {C}, {B} (twice), and {A} (twice). Since the set {B,C} is missing from this list, there is a grouping in figure 1 that is not realized in figure 2b, so these representations are not isomorphic.</Paragraph>
    <Paragraph position="10"> Since structures like figure 2b are common, isotnorphism seems too strong a constraint; we have therefore proposed (Power, 2000) the following weaker notion of compatibility: DocRep is compatible with RhetRep if every grouping of the elementary propositions in Docgep is also found in R.hetRep.</Paragraph>
    <Paragraph position="11"> Formally, every set of propositionS that is dominated by a node in DocRep sh.ould be dominated by a node in RhetRep -- bat the converse is not required. null Under this constraint, we allow tim document representation t.o omit rhetorical groupings, but &amp;quot;you forfA~ T~ITE to take C~ your tablet&amp;quot; SUS_ 1 NUC~USD 2 &amp;quot;Go on as before&amp;quot;  not to introduce new ones. The resulting structures may be ambiguous, but this will not matter if the unexpressed rhetorical relationships can .be inferred from the content.</Paragraph>
  </Section>
  <Section position="6" start_page="196" end_page="198" type="metho">
    <SectionTitle>
4 Extraposition
</SectionTitle>
    <Paragraph position="0"> The compatibility rule may be a useful text-planning heuristic, but as a constraint on adequacy it still seems too strong. Looking through our corpus of patient information leaflets, we have noticed some exceptions, especially in passages giving conditional instructions: (4) If you forget to take your tablet A, take another as soon as you remember B or wait until it is time to take your next dose C.</Paragraph>
    <Paragraph position="1"> Then go on as before D.</Paragraph>
    <Paragraph position="2"> From the point of view of document structure, this passage is a paragraph comprising two textsentences: thns the proposition D is separated from the other three propositions, which are grouped in tile .first sentence. However, rhetorically speaking, D belongs to the consequent of the conditional: it is the final step of the plan that should be activated .if_the patient forgets to take a dose (figure 3). Compatibility is violated because tile DocRep contains a node (the first text-sentence) dominating the proposition set {A, B, C}. which is not dominated by any node in figure 3.</Paragraph>
    <Paragraph position="3"> Such examples might be explained as the result of loose punctuation or layout, perhal)S  through imitation of the patterns of conversation, in which extra:.materi~! is_often ~tagged. onas an afterthought. Thus proposition D remains grouped with B and C -- they occur consecutively -- but through a minor relaxation of normal punctuation it has been separated by a fullstop rather than a comma. However, this explanation fails to cover variations of the example in which the propositions in the consequent are not realized consecutively in the DocRep: (5) Consult your doctor immediately A if a rash develops B. It might become seriously infected C.</Paragraph>
    <Paragraph position="4"> In this example, A must be grouped rhetorically with C rather than with B, unless we take the radical step of allowing rhetorical structure to contradict logical structure. The proposition C cannot be logically conjoined with the conditional because it contains a hypothetical discourse referent (the rash) that is bound to the antecedent, and is therefore inaccessible outside the conditional.</Paragraph>
    <Paragraph position="5"> If passages of this kind are not artifacts of loose punctuation, why do they occur? A plausible reason, we suggest, is that some complex rhetorical patterns cannot easily be realized in a way that maintains structural compatibility, usually because text-clauses are overloaded. Conditionals are especially prone to this problem because the only common discourse connective ('if') is a subordinating conjunction which can only link spans within a syntactic sentence (and thus within a text-clause). If either the antecedent or the consequent is complex, the author is faced with a tricky problem. We have found examples in patient information leaflets of conditional sentences so long that they are ahnost incomprehensible. More skilled authors, however, succeed in presenting the material clearly either by using layout (e.g., a complex antecedent is presented as an indented list), or by a trick of rhetorical reorganization that we will call eztraposition. It is this trick that introduces an incompatibility between RhetRep and DocRep.</Paragraph>
    <Paragraph position="6"> Extraposition typically occurs when a rhetorical representation R contains a complex embedded constituent C. To respect structural compatibility, R should be realized by a document unit that contains the realization of C: instead, in extraposition, a document unit realising/?. - C is coordinated with one realizing C. so that the extraposed material C is raised in the DocRep to the same level as R. To recon...... struct ~:the:.: meanings.of .the:-.whole:. passage, .the reader has to plug C back into R. In most cases, the author facilitates this task through an explicit deictic reference to the extraposed  material (Bouayad-Agha et al., 2000): (6) If you have any of the following, tell your  doctor: difficulty in breathing ........... al)dominal..Dains nausea or vomiting Occasionally, however, the author leaves the extraposition implicit, assuming that the reader can infer the correct location of C within R from the propositional content. In such cases, the extraposition looks like an afterthought, because the unit realizing R - C contains no signal that a gap in its content will be filled in later.</Paragraph>
    <Paragraph position="7"> We have also come across rare examples of another kind of incompatibility in which Marcu's (1996) principle of nuclearity is violated by grouping together two satellites which have the same nucleus. Suppose that the rhetorical representation in figure 1 is realized by the following passage, in a context in which the reader knows nothing about gestodene: (7) Although Elixir has no serious side-effects A, it contains gestodene c. Consequently, it occasionally provokes a mild allergic reaction 8.</Paragraph>
    <Paragraph position="8"> The apparent concession relation between A and C here is paradoxical, since in rhetorical structure they are unrelated. Of course a contrast between A and C nfight be perceived by a medical expert; however, one can construct similar examples in which the apparent relation is even less plausible: (8a) Although we usually work fl'om nine' to five A, today is Friday C. Consequently, we can go home early B.</Paragraph>
    <Paragraph position="9"> This may be rather loose, but many people find it acceptable. It could be explained as a rhetorical trick in which the sheer paradox of the concession serves as a signal that it is incomplete. The device might be spelled out as follows: Although Elixir has no serious side-effects A, there exists a contrasting state of a~hirs resulting fl'om the flzct that it contains gestodene c. This state of affairs is that it occasionally provokes a nfild allergic reaction t3.</Paragraph>
    <Paragraph position="10">  Unlike the conditional examples above, this device works only.when t he-.rhetorically grouped propositions B and C are consecutive in the DocRep. Thus whatever view is taken of example (Sa) , everyone finds its variant (Sb) much worse: (Sb) # Today is Friday C although we usually work from nine to five A. Consequently, we can go home early s.</Paragraph>
  </Section>
  <Section position="7" start_page="198" end_page="199" type="metho">
    <SectionTitle>
5 Implications for NLG
</SectionTitle>
    <Paragraph position="0"> For many NLG applications, the notion of compatibility defined above is a useful hard constraint; even if violations of this constraint are sometimes acceptable, they are not essential.</Paragraph>
    <Paragraph position="1"> However, for some kinds of material (e.g., complex instructions), extraposition is a convenient rhetorical device which might improve the readability of the generated texts, so it is worth considering how a text planner might be configured so as to allow solutions that violate compatibility. null In terms of the RAGS framework, there are broadly two possible approaches. First, we could introduce incompatibility by defining transformations on the RhetRep; alternatively, we could relax the constraints governing the transition from RhetRep to DocRep.</Paragraph>
    <Paragraph position="2"> The RAGS proposal (1999) allows for rhetorical transformations through a distinction between abstract and concrete rhetorical representations. The abstract representation AbsRhetRep expresses the rhetorical content of the underlying message, while the concrete RhetRep expresses the rhetorical structure directly realized in the text and corresponds to the representation used by Scott and Souza (1990) to discuss textual realisation. If KhetRep is incompatible with AbsRhetRep, the text structure DocRep will also be incompatible with AbsRhetRep, even though the rules for realizing rhetorical structure by document structure are themselves compatibility-preserving, qYaalsformation operations are also used by Marcu (2000) to map Japanese rhetorical structures onto English-like rhetorical structures, but these are mappings between two PdaetReps rather than from an AbsRhetRep to a RhetRep.</Paragraph>
    <Paragraph position="3"> If transformations are allowed, there are obvious dangers that the message will be expressed in such a distorted way that the reader cannot recover the original intention. For this reason, rhetori(:al transformations must be defined with care. A fairly safe option would appear to be .... -the ..extraposition-iof.:a ,proposition. ~lab6rai~ing the antecedent of a conditional --.even though such a transformation would violate Marcu's (1996) 'nuclearity principle' (assuming that the antecedent is regarded as the satellite). The fop lowing examples show that this transformation leads to acceptable texts regardless of the order of nucleus and satellite within the conditional: (9a)~ Dcr.uot&amp;quot; use :Elixirif you :have had' an al: &amp;quot; lergic reaction to Elixir. An allergic reaction may be recognised as a rash, itching or shortness of breath.</Paragraph>
    <Paragraph position="4"> (9b) If you have had an allergic reaction to Elixir, do not use Elixir. An allergic reaction may be recognised as a rash, itching or shortness of'breath.</Paragraph>
    <Paragraph position="5"> However, the approach based on rhetorical transformations leads to difficulties when the acceptability of the resulting text depends on linear order as well as grouping. For instance, suppose that we try extraposing the elaboration of a satellite when the main relation is not a conditional, but a concession. The following passages show two texts that might result, but in this case the second version sounds anomalous: even if they are not grouped together in the DocRep, the satellite and its elaboration at least need to be consecutive.</Paragraph>
    <Paragraph position="6"> (10a) You should not stop taking Elixir, even though you might experience some mild effects. For example, feelings of dizziness and nausea are very common at the beginning of treatment.</Paragraph>
    <Paragraph position="7"> (lOb) # Even though you might experience some mild effects at tile beginning of tile treatment, you should not stop taking Elixir.</Paragraph>
    <Paragraph position="8"> For example, feelings of dizziness and nausea are very common at the beginning of treatment.</Paragraph>
    <Paragraph position="9"> A transformation from AbsKhetRep to RhetRep cannot distinguish these cases, so that 10a is,allowed while 10b is protfibited; unless the l:l.hetRep is at least partially specified for linear order. Adhering strictly to the RAGS framework, where linear order is specified only in tbsDocRep, one would have to adopt the alternative of building an incompatible /~bsDocRep from RhetRep. constraining the linear order at, this stage.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML