File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/87/t87-1045_metho.xml
Size: 11,275 bytes
Last Modified: 2025-10-06 14:12:02
<?xml version="1.0" standalone="yes"?> <Paper uid="T87-1045"> <Title>What is Special About Natural Language Generation Research?</Title> <Section position="2" start_page="0" end_page="227" type="metho"> <SectionTitle> 1 Shared Foundations </SectionTitle> <Paragraph position="0"> While there are considerable differences in the tasks to be solved in Text Generation and NL Understanding, the two areas of research draw on a significant number of shared ideas and knowledge. 2 They constitute an account of what the facts and phenomena of natural language are. Moving from fine-grained to coarse-grained phenomena, they include: 1. Lexicon: Most work in both understanding and generation assumes a taxonomy of basic word classes, a notion of the semantic senses of words and a morphology. Also in both, there is currently a strong trend toward recognizing many sorts of lexical complexities: idioms, collocations, lexical functions (in several senses) and other inter-item interactions.</Paragraph> <Paragraph position="1"> 2. Grammar: Ther are shared descriptions of the types of constructions that are available in a specific language. At a minimum, a language processing 1Legal Notice: This research was supported by the Air Force Office of Scientific Research contract No. F49620-84-C-0100. The views and conclusions contained in this document are those of the author and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Office of Scientific Research of the U.S. Government. program includes a grammar, some specification of a set of syntactic patterns.</Paragraph> <Paragraph position="2"> . Discourse Phenomena: Descriptions of various discourse phenomena are important in both lines of work. Anaphora is particularly prominent. A cluster of phenomena identified with terms such as theme, focus and topic is also basic. There is also a general recognition that ordinary language does not make explicit everything that is being conveyed, and that the non-explicit material is just as important as the explicit material in effective language use. It seems likely that there will be substantial cross-fertilization from these two lines of current work on discourse, partly because the available descriptions of discourse are still not well agreed upon.</Paragraph> <Paragraph position="3"> . Situational Phenomena: The situation in which the language is used, including a description of the language user and the task at hand, are acknowledged as important and actively studied. Goal pursuit by the language user(s) is regarded as an important orienting notion.</Paragraph> <Paragraph position="4"> Both generation and understanding are working hard on all of these. Inevitably, there is some complemenarity (see Section 3.) But although the descriptive foundations are shared in a loose way, we will see that the sorts of problems addressed differ sharply.</Paragraph> <Paragraph position="5"> More substantial sharing occurs in the areas of knowledge representation and inference. Here the problems and solutions, not just the recognition of phenomena, are shared. There is hope for convergence, for one all-sufficient underlying representational form, and for a non-directional view of language. It is often suggested that an adequate text generator must have an understander inside to check its work. Still, the research activity is dominated by the differences rather than the shared elements.</Paragraph> </Section> <Section position="3" start_page="227" end_page="229" type="metho"> <SectionTitle> 2 Technical Distinctives of Text Generation </SectionTitle> <Paragraph position="0"> Just observing work on understanding and generation, it's clear that the people working and writing on these topics are usually not writing about the same things. To start to understand the situation we can look at the technical differences and then later judge how fundamental these differences are.</Paragraph> <Paragraph position="1"> What are the apparent differences? One class consists of problems which are major sources of difficulty in NL Understandin~ but which are minor or absent in NL Generation: 1. Coverin~ all the ways to say thin~s is not a problem. These days it's sufficient (and difficult enough) to have one way to say everything, with just enough perturbations to get sufficiently fluent text.</Paragraph> <Paragraph position="3"> Goal identification is not a problem. A generation system can know its own goals easily. Of course, coming up with the right goals is still a problem.</Paragraph> <Paragraph position="4"> Vocabulary coverage is not a problem. The lexicon of a generator can be created in correspondence with available knowledge; the user's unbounded number of other ways of expressing the knowledge do not have an impact.</Paragraph> <Paragraph position="5"> Ambiguity is a secondary problem. People, operating in context with a rich knowledge of the subject matter, can disambiguate generated language very well.</Paragraph> <Paragraph position="6"> Another class consists of problems which are important in Generation but minor or absent in Understanding: . Deciding how much to say, and what things to not say, are problems. This involves maintaining brevity, avoiding saying what is too obvious, and yet providing sufficient background information to make the generated text comprehensible.</Paragraph> <Paragraph position="7"> . Design of text structure is a problem. This is sometimes taken to be the coherence problem as well: text must be coherent, and appropriate structure makes it so. Structure design has many identifiable subproblems: adeg Structure building includes addin~ material to make presentation of the basic sub iect matter work. For example, it is often necessary to add evidence, concessives, circumstantials, antithesis, contrast and other supporting material.</Paragraph> <Paragraph position="8"> b. Structurin~ a text causes assertion-like effects in addition to the expected effects of individual clauses. Controlling these effects, and taking advantage of them as a resource, is a problem.</Paragraph> <Paragraph position="9"> c. Orderin~ the material for presentation is very consequential.</Paragraph> <Paragraph position="10"> d. Various sorts of text carry the expectation of special patterns and formulaic text: titles, abstracts, salutations, origination dates, authorship notes and acknowledgments.</Paragraph> <Paragraph position="11"> e. Makin~ the text smooth flowin~ and easy to comprehend involves leading the reader's attention. There are many particular techniques which contribute. This requirement constrains structure design and requires extra work at the structural and sentential levels.</Paragraph> <Paragraph position="12"> 3. Even after creating a detailed text plan, with all clauses identified, there are substantial additional technical issues in carrying out the plan.</Paragraph> <Paragraph position="13"> a. Presuming that the plan is in terms of a sequence of (effects of) clauses, the sentence boundaries are not determined. Which clauses should be combined into sentences? What relations need to be expressed by conjunctions? What conjunction uses can be reduced to noun conjunction or some other lower rank? b. Decidin~ when to use anaphora is a problem.</Paragraph> <Paragraph position="14"> C.</Paragraph> <Paragraph position="15"> d.</Paragraph> <Paragraph position="16"> Lexical selection is a problem. Related, there are many varieties of idioms and lexical colocations whose restricted character is important only for generation, not understanding.</Paragraph> <Paragraph position="17"> English has rather elaborate provisions which enable the reader's attention to flow smoothly over the material. These include emphasis devices, and also various kinds of theme control (including passivization as one of many kinds). These must be controlled in order to create high quality running text.</Paragraph> </Section> <Section position="4" start_page="229" end_page="229" type="metho"> <SectionTitle> 3 The Alternative View: The Differences in the Tasks are Unreal </SectionTitle> <Paragraph position="0"> The claim has been made that there are really no underlying language problems that are unique to either generation or understanding. Rather, every evident problem has a counterpart which may or may not be evident on the other side of the fence. So, for example, the counterpart of (Generation: deciding how much to say) is (Understanding: identify the selectivity involved in saying just this much.) The counterpart of (Generation: lexical selection) is (Understanding: drawing conclusions from the fact that this particular term was used rather than alternative terms.) And so forth. The underlying claim is that if a process is used in generation, it has effects which may be discernible, interpretable, even significant. The earliest use of this claim that I know was by Chip Bruce, in the presentation of \[Bruce 75\].</Paragraph> <Paragraph position="1"> As a statement of what sorts of effects can (in principle) be found, this has a certain plausibility, and may be technically correct. Nevertheless, it does not represent the state of the art in terms of problems actually worked on. Instead, the lists of problems being addressed by generation and understanding research differ substantially, and will remain different for a long time to come. This is because the problems that limit the achievable quality of performance~ the problems that pace progress, differ strongly between ~eneration and understanding.</Paragraph> </Section> <Section position="5" start_page="229" end_page="230" type="metho"> <SectionTitle> 4 Distinctives of Text Generation as a Research Task </SectionTitle> <Paragraph position="0"> There are non-technical factors that make research into text generation very different from text understanding research: 1. In both duration and number of workers, there has been far less activity in ~eneration than in understanding. In spite of much recent expansion in .</Paragraph> <Paragraph position="1"> .</Paragraph> <Paragraph position="2"> generation work (see \[Kempen S6\] for a representative collection) there are far fewer precedents and established results in generation. Work in generation is less known, so much so that some people habitually conceive of all AI language research as NLU (natural language understanding.) (See, for example, the IJCAI87 call for papers.) It is easier to control a ~eneration task, since it is not subject to an uncontrolled input source (the user.) There is inherently more control over vocabulary, lexical phenomena, syntactic range and semantic diversity in generation.</Paragraph> <Paragraph position="3"> Generation and understanding need to be understood in terms of an overall model of human communication. The nature of language and the constraints on its use come from its role in communication. If investigation of communication is taken as the underlyin~ task, then ~eneration ~ives much better access to that task~ just because it is much easier to develop methods and programs that work with whole discourses rather than being restricted to tiny numbers of sentences.</Paragraph> </Section> class="xml-element"></Paper>