XML Viewer - w96-0402

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/w96-0402_metho.xml
Size: 23,540 bytes
Last Modified: 2025-10-06 14:14:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0402">
  <Title>Learning Micro-Planning Rules for Preventative Expressions*</Title>
  <Section position="4" start_page="11" end_page="71" type="metho">
    <SectionTitle>
3 Corpus Analysis
</SectionTitle>
    <Paragraph position="0"> In terms of text generation, our interest is in finding mappings from features related to the function I Hom (1989) gives a more complete categofisation of negative forms.</Paragraph>
    <Paragraph position="1"> of these expressions, to those related to their grammaticalform. Functional features include the semantic features of the message being expressed, the pragmatic features of the context of communication, and the features of the surrounding text being generated. In this section we will briefly discuss the nature of our corpus, and the fimction and form features that we have coded. We will conclude with a discussion of the inter-coder reliability. A more detailed discussion of this portion of the work is given elsewhere (Vander Linden and Di Eugenio, 1996).</Paragraph>
    <Section position="1" start_page="11" end_page="71" type="sub_section">
      <SectionTitle>
3.1 Corpus
</SectionTitle>
      <Paragraph position="0"> The corpus from which we take all our coded examples has been collected opportunistically off the intemet and from other sources. It is 4.5 MB in size and is made entirely of written English instructional texts. As a collection, these texts are the result of a variety of authors working in a variety of contexts.</Paragraph>
      <Paragraph position="1"> We broke the corpus texts into expressions using a simple sentence breaking algorithm and then collected the negative imperatives by probing for expressions that contain the grammatical forms we were interested in (i.e., expressions containing phrases such as don 7, never, and take care). The grammatical forms we found, 1283 occurrences in all, constitute 2.7% of the expressions in the filll corpus. The first line in Table 1, marked &amp;quot;Raw Grep&amp;quot;, indicates the quantity of each type. We then filtered the results. When the probe returned more than 100 examples for a grammatical form, we randomly selected around 100 of those returned, as shown in line 2 of Table 1 (labelled &amp;quot;Raw Sample&amp;quot;). We then removed those examples that, although they contained the desired lexical string, did not constitute negative imperatives (e.g., &amp;quot;If you don ~ like the colors of the file .... , use Binder to change them.&amp;quot;), as shown in line 3, labelled &amp;quot;Final Coding&amp;quot;.</Paragraph>
      <Paragraph position="2"> The final corpus sample is made up of 279 examples, all of which have been coded for the features to be discussed in the next two sections.</Paragraph>
      <Paragraph position="3"> Table 2 also shows the relative sizes of the various types of instructions in the corpus as well as the number of examples from this sample that came from each type.</Paragraph>
      <Paragraph position="4">  be careful  be sure  71,</Paragraph>
    </Section>
    <Section position="2" start_page="71" end_page="71" type="sub_section">
      <SectionTitle>
3.2 Form
</SectionTitle>
      <Paragraph position="0"> Because of its syntactic nature, the form feature coding was very robust. The possible feature values were: DONT -- for the do not and don forms discussed above; NEVER, for imperatives containing never; and neg-TC -- for take care, make sure, be careful, and be sure expressions with negative arguments. The two authors agreed on their coding of this feature in all cases.</Paragraph>
    </Section>
    <Section position="3" start_page="71" end_page="71" type="sub_section">
      <SectionTitle>
3.3 Function Features
</SectionTitle>
      <Paragraph position="0"> We will now briefly discuss three of the function features we have coded: IINTENTIONALITY, AWARENESS, and SAFETY. We illustrate them in turn using a to refer to the prevented action and using &amp;quot;agent&amp;quot; to refer to the reader and executer of the instructions.</Paragraph>
      <Paragraph position="1"> Intentionality: This feature encodes whether or not the writer believes that the agent will consciously adopt the intention of performing a: CON is used to code situations where the agent intends to perform a. In this case, the agent 2Note that we used a number of examples from Di Eugenio's thesis (1993) which were included as excerpts. In this table we include only an estimate of the full size of that portion of the corpus.</Paragraph>
      <Paragraph position="2"> must be aware that a is one of his or her possible alternatives.</Paragraph>
      <Paragraph position="3"> UNC is used to code situations in which the agent doesn't realize that there is a choice involved (cf. Di Eugenio, 1993). It is used in two situations: when a is totally accidental, or the agent may not take into account a crucial feature of a.</Paragraph>
      <Paragraph position="4"> Awareness: This feature captures whether or not the writer believes that the agent is aware that the consequences of ~ are bad: AW is used when the agent is aware that a is bad. For example, the agent may be told &amp;quot;Be careful not to burn the garlic&amp;quot; when he or she is perfectly well aware that burning things when cooking them is bad.</Paragraph>
      <Paragraph position="5"> UNAW is used when the agent is perceived to be unaware that a is bad.</Paragraph>
      <Paragraph position="6"> Safety: This feature captures whether or not the author believes that the agent's safety is put at risk by performing a: BADP is used when the agent's safety is put at risk by performing a.</Paragraph>
      <Paragraph position="7"> NOT is used when it is not unsafe to perform c~, but may, rather, be simply inconvenient.</Paragraph>
    </Section>
    <Section position="4" start_page="71" end_page="71" type="sub_section">
      <SectionTitle>
3.4 Inter-coder reliability
</SectionTitle>
      <Paragraph position="0"> Each author independently coded each of the features for all the examples in the sample. The percentage agreement for each of the features is shown in the following table: feature percent agreement form 100% intentionality 74.9% awareness 93.5% safety 90.7% As advocated by Carletta (1996), we have used the Kappa coefficient (Siegel and Castellan, 1988) as a measure of coder agreement. For nominal data, this statistic not only measures agreement, but also factors out chance agreement. If P(A) is the proportion of times the coders agree, and P(E) is the proportion of times that coders are expected to agree by chance, K is computed as follows:</Paragraph>
      <Paragraph position="2"> There are various ways of computing P(E) according to Siegel and Castellan (1988); most researchers agree on the following formula, which we also adopted:</Paragraph>
      <Paragraph position="4"> where m is the number of categories, andpj is the proportion of objects assigned to category j.</Paragraph>
      <Paragraph position="5"> The mere fact that K may have a value k greater than zero is not sufficient to draw any conclusion, however, as it must be established whether k is significantly different from zero. There are suggestions in the literature that allow us to draw general conclusions without these further computations. For example, Rietveld and van Hout (1993) suggest the correlation between K values and inter-coder reliability shown in the following  For the form feature, the Kappa value is 1.0, indicating perfect agreement. The function features, which are more subjective in nature, engender more disagreement among coders, as shown by the K values in the following table: feature K</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="71" end_page="71" type="metho">
    <SectionTitle>
INTENTIONALITY 0.46
AWARENESS 0.76
SAFETY 0.71
</SectionTitle>
    <Paragraph position="0"> According to this table, therefore, the AWARENESS and SAFETY features show &amp;quot;substantial&amp;quot; agreement and the INTENTIONALITY feature shows &amp;quot;moderate&amp;quot; agreement. We have coded other functional features as well, but they have either not proven as reliable as these, or are not as useful in text planning.</Paragraph>
    <Paragraph position="1"> In addition, Siegel and Castellan (1988) point out that it is possible to check the significance of K when the number of objects is large; this involves computing the distribution of K itself. Under this approach, the three values above are significant at the .000005 level.</Paragraph>
  </Section>
  <Section position="6" start_page="71" end_page="71" type="metho">
    <SectionTitle>
4 Automated Learning
</SectionTitle>
    <Paragraph position="0"> The corpus analysis results in a set of examples coded with the values of the function and form features. This data can be used to find correlations between the two types of features, correlations, which, in text generation, are typically implemented as decision trees or rule sets mapping from function features to forms.</Paragraph>
    <Paragraph position="1"> In this study, we used 179 coded examples as input to the learning algorithm. These are the examples on which the two authors agreed on their coding of all the features. The distribution of the grammatical forms in these examples is shown in the following table: form frequency</Paragraph>
    <Paragraph position="3"> The learning algorithm used these examples to derive a decision tree which we then integrated into an existing micro-planner.</Paragraph>
    <Section position="1" start_page="71" end_page="71" type="sub_section">
      <SectionTitle>
4.1 Data Mining
</SectionTitle>
      <Paragraph position="0"> We have used Quinlan's C4.5 learning algorithm (1993) in this study; this algorithm can induce either decision trees or rules. To provide a more convenient learning environment, we have used Clementine (1995), a tool which allows rapid reconfiguration of various data manipulation facilities, including C4.5. Figure I shows the basic control stream we used for learning and testing decision trees. Data is input from the split-output file node on the left of the figure and is passed through filtering modules until it reaches the output modules on the right. The two select modules (pointed to by the main input node) select the examples reserved for the training set and the testing set respectively. The upper stream processes the training set and contains a type module which marks the main syntactic form (i.e., DONT, NEVER, or Neg-TC) as the variable to be predicted and the AWARENESS, SAFETY, and INTENTIONALITY features as the inputs. Its output is passed to the C4.5 node, labelled reform, which produces the decision tree. We then use two copies of the resulting decision tree, represented by the diamond shaped nodes marked with mform, to test the accuracy of the testing and the training sets.</Paragraph>
      <Paragraph position="1"> One run of the system, for example, gave the following decision tree:</Paragraph>
      <Paragraph position="3"> This tree takes the three function features and predicts the DONT, NEVER, and Neg-TC forms. It confirms our intuitions that never imperatives are used when personal safety may be endangered (coded as safety=&amp;quot;BADP&amp;quot;), and that Neg-TC forms are used when the reader is expected to be aware of the danger that may arise (cf. Vander Linden and Di Eugenio, 1996). It accurately predicts the grammatical form of 74.5% of the 161 training examples, and 83.3% of the 18 testing examples.</Paragraph>
      <Paragraph position="4"> Because there are relatively few training examples in our coded corpus, we have also performed a 10-way cross-validation test. 3 None of the derived trees in this test were -emarkably different from the one just shown, although they did order the INTENTIONALITY and AWARENESS features differently. The average accuracy of the learned decision trees on the testing sets was 75.4%.</Paragraph>
      <Paragraph position="5"> Note that although this level of accuracy is better than 55.9%, the score achieved by simply selecting DONT in all cases, there is still more work to be done. The current features must be refined, and more features may be need to be added. We are currently experimenting with a number of possibilities. Note also that we have not distinguished between the various sub-forms of DONT and Neg-TC shown in Table l; this will require yet more features.</Paragraph>
      <Paragraph position="6"> Clementine can also &amp;quot;balance&amp;quot; the input to C4.5 by duplicating training examples with under-represented feature values. We used this to increase the number of NEVER and Neg-TC examples to match the number of DONT examples. Ultimately, this reduced the accuracy of the learned trees to 68.0% in a cross-validation test. The resulting decision trees tended not to include all three features.</Paragraph>
    </Section>
    <Section position="2" start_page="71" end_page="71" type="sub_section">
      <SectionTitle>
4.2 Integration
</SectionTitle>
      <Paragraph position="0"> Because it is common for us to rebuild decision trees frequently during analysis, we implemented a routine which automatically converts the decision tree into the appropriate KPML-style system networks with their associated choosers, inquiries, and inquiry implementations (Bateman, 1995). This makes the network compatible with the DRAFTER micro-planner, a descendent of IM-AGENE (Vander Linden and Martin, 1995). The conversion routine takes the following inputs: * the applicable language(s) --C4.5 produces its decision trees based on examples from a particular language, and KPML is capable of being conditionalised for particular languages. Thus, we may perform separate corpus analyses of a particular phenomenon for various languages, and learn separate micro-planning trees; 3A cross-validation test is a test where C4.5 breaks the data into different combinations of training and testing sets, builds and tests decision trees for each, and averages the  . the input feature(s) -- The sub-network being built must fit into the overall categorisations of the full micro-planner, and thus we must specify the text functions that would trigger entry to the new sub-network;  * the decision tree itself; * a feature-value function -- To traverse the new sub-network, the KPML inquiries require a function that can determine the value of the features for each pass through the network; null * grammatical form specifications- The sub-network must eventually build sentence plan language (SPL) commands for input to KPML, and thus must be told the appropriate SPL terms to use to specify the required grammatical forms; * an output file name.</Paragraph>
      <Paragraph position="1">  For our example, the system sub-network shown in Figure 2 is produced based on the decision tree shown above. 4 It is important to note here that although the micro-planner is implemented as a systemic resource, the machine learning algorithm is no respecter of systemic linguistic theory. It simply builds decision trees. This gives rise to three distinctly non-systemic features of these learned networks: ~Only the systems are shown in the KPML dump given in Figure 2. The realisation statements, choosers, ii,quiries, and inquiry implementations are not shown.</Paragraph>
      <Paragraph position="2">  1. The realisation statements are included only at the leaf nodes of the network. We have built no intelligent facility for decomposing the realisation statements and filtering common realisations up the tree.</Paragraph>
      <Paragraph position="3"> 2. The learning algorithm will freely reuse systems (i.e., features) as various points in the tree. This did not happen in Figure 2, but occasionally one of the features is independently used in different sub-trees of the network. We are forced, therefore, to index the system and feature names with integers to disambiguate.</Paragraph>
      <Paragraph position="4"> 3. There is no meta-functional distinction in the  network, but rather, all the features, regardless of their semantic type, are included in the same tree.</Paragraph>
      <Paragraph position="5"> The sub-network derived in this section was spliced into the existing micro-planning network for the full generation system. As mentioned above, this integration was done by manually specifying the desired input conditions for the sub-network when the micro-planning rules are built. For the preventative expression subnetwork, this turned out to be a relatively simple matter. DRAFTER'S model of procedural relations includes a warning relation which may be attached by the author where appropriate. The microplanner, therefore, is able to identify those portions of the procedure which are to be expressed as warnings, and to enter the derived sub-network  appropriately. This same process could be done with any of the other procedural relations (e.g., purpose, precondition). This assumes, however, the existence of a core set of micro-plans which perform the procedural categorisation properly; these were built by hand. We have only just begun to experiment with the possibility of building the entire network automatically from a more exhaustive corpus analysis.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="71" end_page="71" type="metho">
    <SectionTitle>
5 A DRAFTER Example
</SectionTitle>
    <Paragraph position="0"> Given the corpus analysis and the learned system networks discussed above, we will present an example of how preventative expressions can be delivered in DRAFTER, an implemented text generation application. DRAFTER is a instructional text authoring tool that allows technical authors to specify a procedural structure, and then uses that structure as input to a multilingual text generation facility (Paris and Vander Linden, 1996).</Paragraph>
    <Paragraph position="1"> The instructions are generated in English and in French.</Paragraph>
    <Paragraph position="2"> To date, our domain of application has been manuals for software user interfaces, but because this domain does not commonly contain preventative expressions (see Table 2), we have extended DRAFTER's domain model to include coverage for do-it-yourself applications. Although this switch has entailed some additions to the domain model, DRAFTER's input and generation facilities remain as they were.</Paragraph>
    <Section position="1" start_page="71" end_page="71" type="sub_section">
      <SectionTitle>
5.1 Input Specification
</SectionTitle>
      <Paragraph position="0"> In DRAFTER, technical authors specify the content of instructions in a language independent manner using the DRAFTER specification tool. This tool allows the authors to specify both the propositional representations of the actions to be included, and the procedural relationships between those propositions. Figure 3 shows the DRAFTER interface after this has been done. We will use the procedure shown there as an example in this section, details off how to build it can be found elsewhere (Paris and Vander Linden, 1996).</Paragraph>
      <Paragraph position="1"> The INTERFACE and ACTIONS panes on the left of figure 3 list all the objects and actions defined so far. These are all shown in terms of a pseudo-text which gives an indication, albeit ungrammatical, of the nature of the action. For example, the main goal, &amp;quot;repair device&amp;quot;, represents the action of the reader repairing an arbitrary device. This node may be expressed in any number of different grammatical forms depending upon context.</Paragraph>
      <Paragraph position="2"> The WORKSPACE pane shows the procedure, represented in an outline format. The main user goal of repairing the device is represented by the largest, enclosing box. Within this box, there is a single method, called &amp;quot;Repair Method&amp;quot; which details how the repair should be done. There are three sub-actions: consulting the manual, unplugging the device, and removing the cover. There is also a waming slot filled with the action &amp;quot;\[reader\] damage service cover&amp;quot;. This indicates that the reader should avoid damaging the service cover. 5 Neither the propositional nor the procedural information discussed so far specify the three features needed by the decision network derived in the previous section (i.e., intentionality, awareness, and safety). At this point, we see no straight-forward way in which they could be determined automatically (see Ansari's discussion of this issue (1995)). We, therefore, rely on the author to set them manually. DR.AFTER allows authors to set generation parameters on individual actions using a dialog box mechanism. Figure 4 shows a case in which the author has marked the following four features for the warning action &amp;quot;damage service cover&amp;quot;: 5Actually, this could also be interpreted as an ensurative warning, meaning that the reader should make sure to damage the service cover (although this is clearly nonsensical in this case). We have not yet analysed such expressions and thus do not support them in DRAFTER.</Paragraph>
      <Paragraph position="3">  * The action is to be prevented, rather than ensured; * Performing the action would result in inconvenience, but not in personal danger; * The user is likely to do the action accidentally, rather than consciously; * The user is likely to be aware that performing the action would create problems;</Paragraph>
    </Section>
    <Section position="2" start_page="71" end_page="71" type="sub_section">
      <SectionTitle>
5.2 Text Generation
</SectionTitle>
      <Paragraph position="0"> Once the input procedure is specified, the author may initiate text generation from any node in the procedural hierarchy. When the technical author generates from the root goal node in Figure 3, for example, the following texts are produced: English.&amp;quot; To repair the device  1. Consult the repair manual.</Paragraph>
      <Paragraph position="1"> 2. Unplug the device.</Paragraph>
      <Paragraph position="2"> 3. Remove the service cover.</Paragraph>
      <Paragraph position="3">  Note that the French version employs Oviter (avoid) rather than the less common prendre soin de ne pas (take care not). This is possible because the French text is produced by a separate micro-planning sub-network. This sub-network was not based on a corpus study of French preventatives, but rather was implemented by taking the leamed English decision tree, modifying it in accordance with the intuitions of a French speaker, and automatically constructing French systems from that modified decision tree. Clearly, a corpus study French of preventatives is still needed, but this does show DRAFTER'S ability to make use of KPML's language conditionalised resources. Were we to replace the warning with other sorts of warnings, the expression would also change according to the learned micro-planning network. If authors, for example, wish to prevent the reader from performing the action of dismantling the frame of the device, and they decide that the reader is unaware of this danger, that the action is consciously performed and not unsafe, DRAFTER produces the following text: Do not dismantle the frame.</Paragraph>
      <Paragraph position="4"> Ne pas d6monter l'armature.</Paragraph>
      <Paragraph position="5"> If authors wish to prevent the reader from disconnecting the ground connection, and they decide that the reader is unaware of this danger, that the action would be unconsciously performed, and that the consequences are indeed life-threatening, DRAFTER produces the following text: Never disconnect the ground.</Paragraph>
      <Paragraph position="6"> Ne jamais deconnecter la borne de terre.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML