File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/a92-1006_metho.xml
Size: 27,252 bytes
Last Modified: 2025-10-06 14:12:54
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1006"> <Title>Applied Text Generation*</Title> <Section position="4" start_page="40" end_page="41" type="metho"> <SectionTitle> 3 The Structure of Joyce </SectionTitle> <Paragraph position="0"> Joyce consists of three separate modules, which perform distinct tasks and access their own knowledge bases (Figure 5).</Paragraph> <Paragraph position="1"> 1. The text planner accesses the domain representation and produces a list of propositions, which represents both the content and the structure of the intended text. Each proposition is expressed in a languageindependent, conceptual frame-like formalism. It encodes a minimal amount of information, but can be realized as an independent sentence if necessary. The text planner draws on domain communication knowledge expressed in a high-level schema language (see Section 4).</Paragraph> <Paragraph position="2"> 2. The sentence planner takes the list of propositions and determines how to express them in natural language. This task includes choosing lexicalizations and a syntactic structure for each proposition, and assembling these lexico-syntactic structures, called Deep Syntactic Representation or DSyntR, into larger sentences. It draws on knowledge captured in the conceptual/English dictionary.</Paragraph> <Paragraph position="3"> 3. The linguistic realizer takes the syntactic structures and produces surface sentences. It draws on syntactic and morphological knowledge, expressed in the English lexicon.</Paragraph> <Paragraph position="4"> Usually, the different tasks of text generation are divided among two modules (planning and realization), rather than three. However, there is a certain amount of disagreement about where the line between the two is to be drawn. For example, MeKeown's TEXT (McKeown 1985) performs the tasks that Joyce classifies as sentence planning as part of the realization process, whereas Meteer's SPOKESMAN (Meteer 1989) classifies them as part of text planning. (See (Meteer 1990, p.23sq) for a useful summary of the terminological issues 1.) In this paper, &quot;text planning&quot; will always be used in the narrow sense of &quot;content selection and organization&quot;. The architecture of Joyce is directly influenced by that of the SEMSYN system (RSsner 1987; RSsner 1988). RSsner divides the realization component into two parts, the &quot;generator kernel&quot; and the &quot;generator front end&quot;. This distinction is mirrored exactly by the distinction between sentence planning and realization in Joyce.</Paragraph> <Paragraph position="5"> There are two main advantages to such a tripartite architecture, one conceptual and the other practical. Conceptually, the advantage is that linguistic planning tasks are clearly separated from the actual grammar, which comprises word order and morphological rules. These rules can be stated independently of the formulation of purely semantic rules that determine lexical and syntactic choices. This modularity makes the system more maintainable. The linguistic planning tasks should, however, be clearly separated from the textual planning tasks: while the linguistic planning tasks are language- null dependent, the textual planning tasks appear not to be 2. 1 Note that the tasks Meteer groups together as &quot;Syntax&quot; - choosing the syntactic structure and linearization - are inseparable only in certain syntactic representations. In Joyce, the Deep-Syntactic Representation encodes syntactic struc null ture but not linear order (see Section 6 for details).</Paragraph> <Paragraph position="6"> 2We are not aware of any example in which different text plans (as defined here) are needed for different languages. The fact that functionally similar texts may display different structures in different cultures should not be confused with language-specific constraints on text structure.</Paragraph> <Paragraph position="7"> Thus, if multi-lingual generation is desired, text planning and sentence planning ought to be performed by distinct components.</Paragraph> <Paragraph position="8"> On a more practical level, modularity in design and implementation can be exploited by parallel processing of independent modules. While the current implementations of Joyce do not allow for parallel execution, the incremental processing of parallel computing tasks on a serial machine is also advantageous, as is argued in the WIP project (Wahlster et al 1991; Harbusch et al 1991) 3. Incrementality reduces the initial response time of the system (though not the overall processing time). This can be crucial if multi-paragraph text is to be generated by an interface tool. In the Joyce system, the text planner cedes control to the sentence planner as soon as the text planner has defined a proposition. Once the sentence planner has constructed the DSyntR of a complete sentence, it sends it to the realizer which generates the English sentence. Thus, the first sentence is output by Joyce shortly after the text generator is invoked; text continues to be output approximately at reading speed. The effect is that a user of the text generator has the impression that he or she never has to wait for the system to respond, even when it is generating lengthy texts.</Paragraph> <Paragraph position="9"> Throughout the system, processing is message-driven in the sense of (McDonald et al 1987): control lies in the input, which is used to construct the next level of representation. There is no need for backtracking or feedback from one level of processing to an earlier one. As is argued by McDonald el al., such an architecture contributes to processing efficiency.</Paragraph> <Paragraph position="10"> We will now discuss the three modules of Joyce in more detail.</Paragraph> </Section> <Section position="5" start_page="41" end_page="42" type="metho"> <SectionTitle> 4 The Text Planner </SectionTitle> <Paragraph position="0"> Prior to the design of the text planning component of Joyce, several existing approaches were studied. Since the structure of the descriptive text (Figure 2) does not mirror the structure of the domain, Paris's &quot;procedural strategy&quot; (Paris and McKeown 1987) cannot be used in general. Hovy's RST-based planner (Hovy 1988) assumes that content selection has Mready been performed, contrary to the situation in the Ulysses application; furthermore, there are efficiency problems in a pure STRIPS-like planning paradigm. We therefore found McKeown's schema-based approach (McKeown 1985) to be the most promising. However, it turned out that general rhetorical schemas cannot adequately capture the structure of the intended texts. In (Kittredge et al 1991), we argue that planning certain types of texts - such as reports and descriptions - requires domain-specific knowledge about how to communicate in that domain. That knowledge we call &quot;domain communication knowledge&quot; (DCK). For example, in describing secure system designs 3Incrementality within the realizer has little practical benefit when the realizer is reasonably fast; its study is mainly motivated by psycholinguistic considerations. Therefore, there was no attempt in Joyce to make the realizer incremental.</Paragraph> <Paragraph position="1"> you must relate the security level of each component, but not, say, the number of ports or their security levels. Furthermore, the connectivity of components should be stated before their functionality. In the flow analyzer text, the security levels of the components need not be communicated at all, but if a component (other than the final component of the path) downgrades information, it must be stated whether and why the component is secure. This very precise knowledge about which domain information needs to communicated and in what order cannot simply be derived from general principles. We have also argued that in many existing text planning systems, such DCK has been encoded implicitly. In the interest of efficiency, modularity and portability we have decided to represent DCK explicitly in Joyce.</Paragraph> <Paragraph position="2"> We have developed a &quot;schema language&quot; for easy representation of DCK, called DICKENS (Domain Communication Knowledge ENcoding Schemas). The schemas are similar in form to those used by McKeown. Basically, schemas can be seen as a description of text structure.</Paragraph> <Paragraph position="3"> The system, however, interprets each schema as a list of instructions. The instructions can be calls to other schemas, recursive calls to the same schema, or they can be one of a set of special commands provided by the schema language. One special command produces a specific proposition and sends it to the sentence planner.</Paragraph> <Paragraph position="4"> Other special commands support conditional branching and iteration. During execution, each schema is associated with a particular subset of the domain representation, which is called the focus (in the sense of McKeown's &quot;global focus&quot;). In the Ulysses application, the focus always corresponds to one component. There are special commands to shift the focus. In addition to the focus, which limits the domain representation from which information can be communicated, a theme can be set which determines information structure within individual propositions. The theme corresponds to McKeown's &quot;local focus&quot;. As has been widely recognized, thematic structure affects issues such as grammatical voice at the linguistic level.</Paragraph> <Paragraph position="5"> In addition, two further special commands were found to be necessary in order to perform text planning: * A portion of the text plan can be edited. To do this, a schema is called, but any propositions that are created (by the schema or by any schema it calls) are not sent to the sentence planner. They are kept on a separate list in the order they are created. When the execution of the schema terminates, an editing function is applied to the list. The editing function can delete propositions, change their order, change their contents or create new ones. The choice of an editing function depends on the domain and on the particular requirements of the text. Further study is needed in order to determine the types of editing operations that can be made and to devise a high-level language to express them; the goal is to eventually establish a library of editing operations. Typical editing operations we have used include juxtaposing similar propositions or juxtaposing propositions with certain similar slots (typically, the agent slot).</Paragraph> <Paragraph position="6"> An example is given in Section 7.</Paragraph> <Paragraph position="7"> This type of revision is different from the revision discussed in (Gabriel 1988) and (Meteer 1991). In these systems, the linguistic specification of the target texts is revised. In Joyce, it is the text plan itself, i.e. the pre-linguistic representation of text content and structure, that is subject to revision.</Paragraph> <Paragraph position="8"> * Schemas can post to a &quot;blackboard&quot;, and check this blackboard for messages. This allows for additional control and communication between schemas which are called at different times during the text planning process and cannot communicate with each other directly.</Paragraph> <Paragraph position="9"> Instead of being templates that limit the structure of the text to certain preconceived types, the schemas are now an explicit and compact representation of domain communication knowledge.</Paragraph> </Section> <Section position="6" start_page="42" end_page="43" type="metho"> <SectionTitle> 5 The Sentence Planner </SectionTitle> <Paragraph position="0"> The sentence planner combines all those planning tasks that are specific to the target language. It receives propositions from the text planner and sends the DSyntR of complete sentences to the realizer for processing. It has two main tasks: first, it chooses lexical and syntactic realizations by consulting the Conceptual/English dictionary; second, it determines sentence scope by merging the DSyntR of individual propositions.</Paragraph> <Paragraph position="1"> We will discuss each of these steps in turn.</Paragraph> <Paragraph position="2"> The Conceptual/English dictionary is implemented as a set of procedures that operate on the propositions.</Paragraph> <Paragraph position="3"> Each proposition is mapped into the DSyntR of a clause (i.e., its root is a verb). Lexicalization can take pragmatic factors into account. It can also refer to a history of lexicalizations if lexical variation is desired. After a DSyntR has been constructed, certain syntactic paraphrase operations are performed if necessary, for example passivization if a grammatical object is the theme of the sentence, or if the subject is absent.</Paragraph> <Paragraph position="4"> The second task of the sentence planner is to determine the scope of sentences. Combining the linguistic realization of propositions into larger sentences is a crucial issue because it increases the quality of the generated text. For example, The low-level Address Register and the multilevel Locator are data-bases (from the Host text in Figure 2) is significantly better than the four clauses from which it was formed: The Address Register is a data-base. It is low-level. The Locator is a data-base. It is multilevel. An informal study in which subjects were asked to revise a (grammatical) text containing only single-proposition sentences supported the claim that longer sentences are preferred over shorter ones whenever possible and reasonable.</Paragraph> <Paragraph position="5"> The first question that arises is at what level propositions should be combined. To date, the issue of sentence scoping has always been dealt with at a pre-linguistic, conceptual level (e.g. (Dale 1988) or (Carcagno and Iordanskaja 1989)). However, different languages have different syntactic means of combining clauses; clause combining must refer to the specific linguistic resources of the target language. Therefore, in Joyce the task is performed by the sentence planner rather than the text planner 4. Joyce performs the following syntactic clause-combining operations: Relative clause formation, adjectival attachment (the process by which an adjective from a copula-construction is embedded in an NP), and conjunction. Conjunction includes multiple conjunctions of more than one clause, and may lead to elision of repeated sentence elements (&quot;conjunction reduction&quot;). For example, in the example quoted above, the lexeme data base occurs only once in the conjoined sentence.</Paragraph> <Paragraph position="6"> The second question that arises is how clause combination should be restricted. We have identified stylistic and discourse constraints. The stylistic constraints are constraints against the sentence becoming too long (an upper bound on the number of clauses that can be combined into one sentence), and a constraint on recursive embedding of relative clauses. Discourse constraints are imposed by the structure of the text: clauses belonging to conceptually distinct text units should not be combined. The text planner can send a special message, called conceptual-break, to the sentence planner. It signals the beginning of a new textual unit. These special messages are triggered by appropriate indications in the DICKENS specifcation of the DCK.</Paragraph> <Paragraph position="7"> The algorithm is as follows. The sentence planner maintains a &quot;current&quot; DSyntR. Each incoming proposition is translated into a DSyntR, which the sentence planner then attempts to merge with the current DSyntR. If none of the clause combination strategies work, or if stylistic heuristics interfere, or if the incoming proposition is a conceptual-break, the current DSyntR is sent to the realizer and the new DSyntR becomes the current one. The process of clause combination can be very easily modeled at the DSyntR level: relative clause formation and conjunction reduce to simple tree composition operations. (In the case of adjectival attachment only the adjective node is attached.) Issues such as word order in relative clauses, the morphological form of the complementizer, and conjunction reduction can be dealt with at further stages of processing.</Paragraph> </Section> <Section position="7" start_page="43" end_page="43" type="metho"> <SectionTitle> 6 The Linguistic Realizer </SectionTitle> <Paragraph position="0"> The linguistic component is based on Meaning-Text Theory (MTT) (Mel'~uk 1988), and is a reimplementation (in Lisp) of Polgu~re's Prolog implementation of a Meaning-Text model for English (Iordanskaja et al 1988; Iordanskaja et al 1991).</Paragraph> <Paragraph position="1"> MTT defines three successive levels of representation.</Paragraph> <Paragraph position="2"> With each level of representation is associated a component which transforms the representation into the next higher level. Each component is implemented as a separate module in Joyce.</Paragraph> <Paragraph position="3"> * The Deep-Syntactic Representation (DSyntR) is a dependency grammar tree representing the syntactic relationships between the meaning-bearing 4In Section 7, we discuss an example in which two propositions are merged by the text planner. The crucial point is that in that example, the two propositions are merged into a single proposition. Here, we are discussing cases in which two distinct propositions are linguistically realized in the same sentence.</Paragraph> <Paragraph position="4"> words of a sentence. Sister nodes are unordered with respect to each other. The nodes are labelled with lexemes which are annotated with features. Numerical arc labels represent the syntactic arguments of the governing lexeme, while ATTR represents the attributive relation. An example is shown in Figure 6. Note that the function words the, is, to are not yet represented.</Paragraph> </Section> <Section position="8" start_page="43" end_page="43" type="metho"> <SectionTitle> * The Surface-Syntactic Representation (SSyntR) is </SectionTitle> <Paragraph position="0"> also a dependency grammar representation, but it includes all lexemes of the final sentence. The transition between DSyntR and SSyntR is achieved by looking up function words in the English lexicon.</Paragraph> <Paragraph position="1"> and by expanding grammatical features such as verb tenses.</Paragraph> <Paragraph position="2"> the written form of the English sentence. Morphological processing is done by a component closel~ based on SUTRA-S (Emele and Momma 1985).</Paragraph> <Paragraph position="3"> While linguistic realizers based on other theories coulc have been used, this MTT-based approach offers the following advantages: * The approach is based on an independently moti. vated linguistic theory. Much linguistic work ha., already been done in the MTT framework (for ex. ample (Mel'6uk and Pertsov 1987)).</Paragraph> <Paragraph position="4"> * The modularization of different types of linguisti~ knowledge makes the grammar easier to maintain Parallelism in computation could be exploited.</Paragraph> <Paragraph position="5"> * The dependency grammar used to express the tw( syntactic levels of representation permits the sepa ration of the semantically relevant issue of grammat ical relations (e.g., subjecthood) from pragmaticall\] relevant issues of surface word order (e.g., topical ization).</Paragraph> </Section> <Section position="9" start_page="43" end_page="44" type="metho"> <SectionTitle> 7 An Example </SectionTitle> <Paragraph position="0"> As an example, consider the sample text in Figure 4. I describes the occurrence of an insecure flow in compo nent Black Box. The texts that explain insecure flow; are generated by a set of eight schemas, one of which i shown in Figure 7. It is the first that is invoked.</Paragraph> <Paragraph position="1"> Special commands are preceded by a colon; command not starting with a colon are calls to other schemas The arguments to special commands immediately follow the command. The :title special command generates a title. Command :theme sets the initial theme of the paragraph, influencing issues such as passivization. Then follow three :make-proposition commands, which each produce one proposition. The first argument to :make-proposition is the class of the proposition. The slots are typically filled with pointers into the domain representation of the application program.</Paragraph> <Paragraph position="2"> focus is a pointer maintained by the text planner which refers to the global focus (currently the component Black Box, represented by pointer #<COMPONENT Black Box>), while get-information and entry-port are functions provided by the underlying application program. Not all arguments must be filled by a :make-proposition command; the sentence planner will choose lexical and syntactic realizations accordingly. The text planner sends an insecure-flow proposition to the sentence planner, which translates it into a DSyntR tree (which represents the clause In the Black Box an insecure flow occurs) and returns control to the text planner. The text planner then proceeds to the next :make-proposition command, and sends the proposition shown in Figure 8 to the sentence planner. When the sentence planner re-</Paragraph> </Section> <Section position="10" start_page="44" end_page="45" type="metho"> <SectionTitle> ENTER AGENT #<information> OBJECT #<COMPONENT Black Box> LOCATION #<PORT P9> </SectionTitle> <Paragraph position="0"> ceives the enter proposition, it translates it into the DSyntR tree shown in Figure 9, which could be expressed as the clause information enters the Black Box through P6. Note that the choice of enter as verb is due to the fact that information is currently the theme; if Black Box were the theme, the choice would have been receives. The sentence planner then tries to combine the new DSyntR with the current one (which was derived from the previous proposition). This fails (since the two clauses have different verbs and different actants), so the current DSyntR is sent to the realizer, which prints out the first sentence. The new DSyntR Black Box information through definite indefinite prep \[ 2 becomes the current one. Control is returned to the text planner, which processes the third :make-proposition command and sends the appropriate proposition to the sentence planner. The sentence planner generates the clausal DSyntR tree shown in Figure 10 (the information is classified). It then attempts to combine the new information classified def'mite adjective Figure 10: DSyntR of sentence The information is classified null clause with the &quot;current DSyntR&quot;, first using the adjectival attachment strategy. This succeeds, yielding the tree shown in Figure 11. It then returns control to the text planner, since another clause could be merged with the current DSyntR. The text planner then calls schema conceptual-break. The only effect of this schema is to send a conceptual-break message to the sentence planner, which thereupon sends its current DSyntR to the realizer. The realizer prints out the surface sentence Classified information enters the Black Box through P6.</Paragraph> <Paragraph position="1"> The last command of the schema first shifts the (global) focus to next-component, which is the next component traversed by the insecure flow. The second argument of the : shift-focus-and-edit command designates the next schema to be called. This corn- null enters the Black Box through P6 mand also initiates the editing process. All propositions that are generated as a result of this command are kept on a list rather than sent to the sentence planner. When the command has been executed, the list is edited by the function given as the third argument, #'merge-send-data. The effect of this function is to combine two successive send propositions into a single, new one, so that two clauses such as the Analyzer sends the information to the Incrementor and the Incrementor sends the information to the Formater yield the Analyzer sends the information to the Formater through the Incrementor. Note that this combination is not a linguistic one but a conceptual one, since it relies on facts about sending data in this domain, rather than on the syntax or lexical semantics about the verb send. It must therefore be performed by the text planner, and not the sentence planner.</Paragraph> </Section> <Section position="11" start_page="45" end_page="45" type="metho"> <SectionTitle> 8 Porting the System </SectionTitle> <Paragraph position="0"> Porting is an important way to evaluate complete applied text generation systems, since there is no canonical set of tasks that such a system must be able to perform and on which it can be tested. (Realization components, on the other hand, can be tested for their syntactic and perhaps lexical coverage.) Joyce was originally designed to generate only component descriptions (as in Figure 2).</Paragraph> <Paragraph position="1"> The &quot;flow analyzer&quot; heuristic tool was added later to the system, and the completely different type of text it required was a first successful test of Joyce and its text planner in particular.</Paragraph> <Paragraph position="2"> The modular design of Joyce proved beneficial during the porting to the new application. The following conceptually well-defined tasks were required during the development of the &quot;flow analyzer&quot; application: 1. Since the flow analyzer is a new type of tool, no corpus of texts was available for study. Instead, sample texts were written by hand and critiqued by domain experts. The texts were then revised and resubmitted to the experts. The &quot;ideal text&quot; that emerged was then analyzed and the DCK needed to generate it expressed in terms of schemas. We interpret the cycle of writing, critiquing and revising as a process of DCK acquisition.</Paragraph> <Paragraph position="3"> 2. New classes of proposition were defined. These include enter, upgrade and downgrade. Some of the proposition classes from the earlier descriptive application could be reused, such as send.</Paragraph> <Paragraph position="4"> 3. The Conceptual/English dictionary was extended to account for the new proposition classes.</Paragraph> <Paragraph position="5"> 4. Several new lexical items were entered into the English lexicon. For example, the English lexeme downgrade subcategorizes for two nouns and a propositional phrase obligatorily headed by to.</Paragraph> <Paragraph position="6"> Note that those parts of Joyce that deal with facts of English (including clause combination) needed no attention (other than updating the lexicon).</Paragraph> <Paragraph position="7"> We are currently working on porting a successor of Joyce to several new applications, including the generation of project management reports. Initial results, including a prototype, are encouraging.</Paragraph> </Section> class="xml-element"></Paper>