File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1411_metho.xml

Size: 14,914 bytes

Last Modified: 2025-10-06 14:15:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1411">
  <Title>EXPERIMENTS USING STOCHASTIC SEARCH FOR TEXT PLANNING</Title>
  <Section position="4" start_page="99" end_page="99" type="metho">
    <SectionTitle>
3 Evaluating RST trees* *
</SectionTitle>
    <Paragraph position="0"> A key requirement for the use of any stochastic search approach is the ability to: assess the quality of a possible solution. Thus we are forced to confront *directly the task of evaluating RST trees.</Paragraph>
    <Paragraph position="1"> We assign a candidate tree a score which is the sum of scores for particular features the tree may have. A positive score here indicates a good feature and a negative one indicates a bad one.</Paragraph>
    <Paragraph position="2"> We cannot make any claims to have the best way of evaluating RS trees. The problem is far too complex and our knowledge of the issues involved so meagre that only a token gesture can be made</Paragraph>
    <Paragraph position="4"> at this point. We offer the following evaluation scheme merely so that the basis of our experiments is clear and because we believe that some of the ideas are starting in the right direction. Here are the features that we score for: Topic and Interestingness We assume that the entity that the text is &amp;quot;about&amp;quot;is specified with the input. It is highly desirable that the &amp;quot;top nucleus&amp;quot; (most important nucleus) of the text be about this entity. Also we prefer texts that use interesting relations. We score as follows:  -10 for a top nucleus not mentioning the subject of the text -30 for a joint relation +21 for a relation other than joint and elaboration * Size of Substructures - Scott and de Souza \[Scott and de Souza 90\] say that the greater the amount of intervening text between the propositions of a relation, the more difficult it will be to reconstruct its message. We score as follows: -4 for each fact that will come textually between a satellite and its nucleus Constraints on Information Ordering Our relations have preconditions which are facts that should be conveyed before them. we score as follows: -20 for an unsatisfied precondition for a relation Focus Movement We do nothave a complex model of focus development through the text, though development of such a model would be worthwhile. As McKeown and others have done, we prefer certain transitions over others. If consecutive facts mention the same entities or verb, the prospects for aggregation are greater, and this is usually desirable. We score as follows: -9 for a fact (apart from the first) not mentioning any previously mentioned entity -3 for a fact not mentioning any entity in the previous fact, but whose subject is a previously mentioned entity * +3 for a fact retaining the subject of the last fact as its subject * +3 for a fact using the same verb as the previous one Object Introduction When an entity is first introduced as the subject of a fact, it is usual for that to be a very general statement about the entity. Preferring this introduces a mild schema-like influence to the system. We score as follows: +3 for the first fact with a given entity as subject having verb &amp;quot;is&amp;quot;</Paragraph>
  </Section>
  <Section position="5" start_page="99" end_page="103" type="metho">
    <SectionTitle>
4 Using Stochastic Search for Text Planning
</SectionTitle>
    <Paragraph position="0"> Using the above evaluation metric for RS trees, we have experimented with a range* of stochastic search methods. Space does not permit us to discuss more than one initial experiment in this section. In the next section, we describe a Couple of methods based on genetic algorithms which proved more productive.</Paragraph>
    <Section position="1" start_page="99" end_page="99" type="sub_section">
      <SectionTitle>
I01
4.1 Subtree Swapping
</SectionTitle>
      <Paragraph position="0"> The subtree swapping approach produces new trees by swapping random subtrees in a candidate  solution. It works as follows: 1. Initialise with a tree for each combination of interesting (non-elaboration) relations, with any fact only appearing in one. Make into a complete tree by combining together these relations and any unused facts with &amp;quot;joint&amp;quot; relations (or better ones if available). 2. Repeatedly select a random tree and swap over two random subtrees, repairing all relations.  Add the new tree to the population.</Paragraph>
      <Paragraph position="1"> When two subtrees are swapped over in an RS tree, some of the relations indicated in the tree no longer apply (i:e. those higher relations that make use of the nuclei of the subtrees). These are &amp;quot;repaired&amp;quot; by in each case selecting the &amp;quot;best&amp;quot; valid relation that really relates the top nuclei (i.e. a non-elaboration relation is chosen if possible, otherwise an elaboration if that is valid, with &amp;quot;joint&amp;quot; as a last resort).</Paragraph>
      <Paragraph position="2"> We investigated variations on this algorithml including having initial random balanced trees (including the &amp;quot;best&amp;quot; relation at each point) and focussing the subtree swapping On subtrees that contributed to bad scores, :but the above algorithm was the one that seemed most successful.</Paragraph>
    </Section>
    <Section position="2" start_page="99" end_page="103" type="sub_section">
      <SectionTitle>
4.2 Initial Results* .... : :
</SectionTitle>
      <Paragraph position="0"> Figure 2 shows an example tex t generated by subtree swapping. Note that we have taken liberties in editing by hand the surface text (for instance, by introducing better referring expressions and aggregation). For clarity, coreference has been indicated by subscripts. The ordering of the material and the use of rhetorical relations &amp;quot;are the only things which are determined by the algorithm.</Paragraph>
      <Paragraph position="1"> Results for subtree swapping are shown together with later results in Figure 5 (the example text shown for subtree swapping is for the item named j-342540). The most obvious feature of these results is the huge variability of the results , which suggests that there are many local maxima in the search space. Looking at the texts produced, we can see a number of problems. If there is only * one way smoothly to include a fact in the text, the chance of finding it by random subtree swapping is very low. The Same goes for fixing other local problems in the text. The introduction of &amp;quot;the previous jewel&amp;quot; is an example of this. This entity can only be introduced elegantly through the fact that it, like the current item, is encrusted with jewels. The text is also still suffering from material getting between a satellite and its nucleus. For instance, there is a relation (indicated by the colon) between &amp;quot;It is encrusted with jewels&amp;quot; and &amp;quot;it has silver links encrusted asymmetrically...&amp;quot;, but this is weakened by the presence of &amp;quot;and is an Organic style jewel&amp;quot; in the middle).</Paragraph>
      <Paragraph position="2"> The trouble is that subtree swapping needs incrementally to acquire all good features not present in whichever initial tree develops into the best solution. It can only acquire these features &amp;quot;acCidentally&amp;quot; and the chances of stumbling on them are small. Different initial trees will contain * different good fragments, and it seems desirable to be able to combine the good parts of different * solutions. This motivates using some sort of Crossover operation that can combine elements of two solutions into a new one \[Goldberg 89\]. But it is not immediately clear how crossover could work on two RS trees, tn particular, two chosen trees will rarely have non-trivial subtrees with equal fringes. Their way of breaking up the material may be so different that it is hard to imagine how one could combine elements of both. i .- !!i</Paragraph>
      <Paragraph position="4"> This jewel/ is made from diamonds, yellow metal, pearls, oxidized white metal and opals.</Paragraph>
      <Paragraph position="5"> It~ was made in 1976 and was made in London.</Paragraph>
      <Paragraph position="6"> This jewe4 draws on natural themes for inspiration: itl uses natural pearls. Iti was made by Flockinger who is an English designer.</Paragraph>
      <Paragraph position="7"> Flockinger lived in London which is a city.</Paragraph>
      <Paragraph position="8"> This jeweli is a necklace and is set with jewels.</Paragraph>
      <Paragraph position="9"> Iti is encrusted with jewels and is an Organic style jewel: iti has silver links encrusted asymetrically with pearls and diamonds.</Paragraph>
      <Paragraph position="10"> Indeed, Organic style jewels are usually encrusted with jewels. Organic style jewels usually draw On natural themes for inspiration and are made up of asymmetrical shapes.</Paragraph>
      <Paragraph position="11"> Organic style jewels usually have a coarse texture.</Paragraph>
      <Paragraph position="12"> * This jewel/is 72.0 cm long.</Paragraph>
      <Paragraph position="13"> The previous \]ewelj has little diamonds scattered around its edges and has an encrusted bezel. Itj is encrusted with jewels: itj features diamonds encrusted on a natural shell.  5 Restricting the Space of RST Trees As a way of making a crossover operation conceivable, our first step has been to reduce the planning problem to that of planning the sequential order of the facts (in a way that echoes Marcu's approach to some extent). We have done this by making certain restrictions on the RS trees that we are prepared to build. In particular, we make the following assumptions: * 1. The nucleus and satellite of a non-joint relation can never be separated. 2. &amp;quot;Joint&amp;quot; relations are used to connect unrelated paragraphs.  With these assumptions, an RS tree is characterised (almost) by the sequence of facts at its leaves. Indeed, we have an algorithm that almost deterministically builds a tree from a sequence of facts, according to these principles. * (The algorithm is not completely deterministic, * because there may be more than one non-elaboration relation that can be used with two given facts as nucleus and satellite - our evaluation function won't, of course, differentiate between these). The algorithm for building a tree from a sequence essentially makes a tree that can be processed by a reader with minimal short-term memory. The tree will be right-branching and if the reader just remembers the last fact at any point, then they can follow the connection between the text so far and the next fact 2 Interestingly, Marcu uses &amp;quot;right skew&amp;quot; to b disambiguate between alternative ~tree s produced in rhetorical parsing. Here we are setting it as a much harder constraint. The only 2In fact, there is local left-branching for (non-nested) relations whose satellite is presented first. Such relations are often presented using embedded clauses in a way that signals the deviation from right-branching clearly to the reader.</Paragraph>
      <Paragraph position="14">  exception is &amp;quot;joint&amp;quot; relations, which can join together texts of any size, but since there is no real relation involved in them there is no memory load in interpreting them.</Paragraph>
      <Paragraph position="15"> The first two assumptions above make fundamental use of the order in which facts will appear in the text. For simplicity, we assume that every relation has a fixed order Of nucleus and satellite (though this assumption could be relaxed). The approach i s controversial in that it takes into account realisati0n order in the criterion for a legal tree. It is likely that the above assumptions will not apply equally well to all types of text. Still, they mean that the planningproblem can :be reduced to that of planning a sequence. The next experiments were an attempt to evaluate this idea.</Paragraph>
      <Paragraph position="16"> * 6 Using a Genetic Algorithm The genetic algorithm we used takes the following form:  1. Enumerate a set of random initial sequences by loosely following sequences of facts where consecutive facts mention the same entity.</Paragraph>
      <Paragraph position="17"> 2. Evaluate sequences by evaluating the trees they give rise to.</Paragraph>
      <Paragraph position="18"> - 3. Perform mutation and crossover on the sequences, with mutation * having a relatively small probability.</Paragraph>
      <Paragraph position="19"> 4. When the &amp;quot;best'/ sequence has not changed for a time, invoke mutation repeatedly until it does.</Paragraph>
      <Paragraph position="20"> 5. Stop after a given number of iterations, and return the tree for the &amp;quot;best&amp;quot;* sequence.  Notice that although the algorithm manipulates sequences, the evaluation is one that operate s on trees. Mutation is a unary operation which, given one sequence, generates a new one. Crossover is binary in that it generates new solution(s ) based on two existing ones. The choice of mutation and crossover operations depends on how the sequences are internally represented and should facilitate the exchange of useful subparts of solutions. Two different representations have been tried so far. The relevant features are summariSed in Figure 3.</Paragraph>
    </Section>
    <Section position="3" start_page="103" end_page="103" type="sub_section">
      <SectionTitle>
6.1 Ordinal Representation
</SectionTitle>
      <Paragraph position="0"> The ordinal representation \[Michalewicz 92\] assumes that ~ there is an initial canonical sequence of facts (in the figure, this is assumed to be 1,2,3,4). A given sequence is represented by a sequence of numbers, where the ith element indicates the position of the ith element of the sequence in that canonical sequence with all previous elements deleted. So the ith element is always a number between 1 and n + 1 - i, where n is the length of the sequence. Mutation is implemented by a change of a random element to a random legal value. *Crossover (here) is implemented by two-point crossover - the material between two random points *of the sequences (the same points for both)is swapped over, yielding two new sequences. The ordina ! representation has been used extensively for tasks such as the travelling salesman problem , and it has the advantage that the crossover operation is particulariy simple.</Paragraph>
    </Section>
    <Section position="4" start_page="103" end_page="103" type="sub_section">
      <SectionTitle>
6.2 Path Representation
</SectionTitle>
      <Paragraph position="0"> to the new ones they give rise to. A sequence of facts is represented simply as that sequence.</Paragraph>
      <Paragraph position="1"> Mutation selects a random element, removes it from the sequence and then inserts it again in a random place. Crossover inserts a random subsequence of one solution into another, deleting duplicates that occur outside the inserted subsequence.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML