File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/w00-1419_abstr.xml

Size: 18,929 bytes

Last Modified: 2025-10-06 13:41:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1419">
  <Title>Generating a Controlled Language</Title>
  <Section position="1" start_page="0" end_page="144" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper argues for looking at Controlled Languages (CL) from a Natural Language Generation (NLG) perspective. We show that CLs are used in a normative environment in which different textual modules can be identified, each having its own set of rules constraining the text.</Paragraph>
    <Paragraph position="1"> These rules can be used as a basis for natural language generation. These ideas were tested in a proof of concept generator for the domain of aircraft maintenance manuals.</Paragraph>
    <Paragraph position="2"> 1 What is a Controlled Language? Controlled Languages (CLs) result from a growing concern about technical documentation quality and translation, be it human or automatic. A CL consists of a glossary and of writing rules for the linguistic aspect of the documentation. These rules are given as recommendations or prohibitions for both the lexicon and the grammar. Currently, most CLs are varieties of &amp;quot;controlled English&amp;quot; which derive froth the Caterpillar Tractor Company Fundaveloped for CL users, the best known being conformity checkers/controllers such as AlethCL or SECC (CLA, 1996).</Paragraph>
    <Paragraph position="3"> A writer expects that the checking tool should not only detect errors but also propose a CL conformable expression. A. Nasr (Nasr, 1996), who worked on the problem of CL refornmlation, underlines the difficulties of this task. Reformulation cannot make any hypotheses about the conformity of the input sentences, and therefore must deal with a wider variety of lexico-syntactical constructions than those allowed in a CL. Some instances of noncompliance are relatively easy to detect but much more difficult to correct: for example, sentences that are longer than the prescribed number of words.</Paragraph>
    <Paragraph position="4"> So there is little hope that human writers will ever produce documentation complying strictly with a CL even with the help of a conformity checker. We argue that it may be more promising to use NLG technology for generating doc.umentation in. CL instead of analyzing it afterwards, as it is the case with a conformity checker. Few researchers have looked at CLs mental English tha t vi~..elab0rated.in the S~: ...... from-:a~-~generation point of view.. (Nasr, 1996; ties (Scheursand Adriaens, 1992). However CLs Hartley and Paris, 1996); but we think that are presently being defined for German, Swedish and French.</Paragraph>
    <Paragraph position="5"> Technical writers find it difficult to comply with the writing rules of a CL which are often hard to justify (CLA. 1996). For them, a CL is seen as an additional constraint on an already complex task. This is why tools have been de&amp;quot; Work done while at the Adrospatiale Research Center there are very compelling reasons for taking a generation perspective, in addition to the advantages of NLG for CLs that will be presented in section 3: * As CLs can be viewed as linguistic specifications for human beings, it seems natural &amp;quot;to, .consider them 'a:s specifica'tkms for the linguistic component of an NLG system.</Paragraph>
    <Paragraph position="6">  e CL writing specifications come on top of other writing norms which deal with document structuring. For example, in the aeronautical industry, CLs such Simplified English (SE) (AEC, 1995) and Fran~ais Rationalisd (FR) (GIFAS, 1996) extend the ATA 100 norms (Bur, 1995) which describe the divisionof the document into chapl:ers, sections, subsections, etc. reflecting a tree-structured functional organization of the airplane: a chapter corresponds to a system (e.g. main rotor), a section to a sub-system (e.g. gear box), a subsection to a sub-sub-system (e.g. set of gears), and so on. Over this thematic structure is added a communicative structure to fulfill two main goals: describe all systems of the airplane and prescribe all maintenance instructions for the airplane. The norms of the ATA can be viewed as specifications for the text structuring component of an NLG system.</Paragraph>
    <Paragraph position="7"> * The thematic and communicative structuring of the document must also conform to a systematic non-linear page numbering system and strict formatting rules using SGML tags. These constraints can be viewed as specifications for the layout component of an NLG system.</Paragraph>
    <Paragraph position="8"> So we claim that CLs should not be considered outside the context of the production of complex structured documents, which naturally raises the question of the automatic generation of this documentation given some forreal representation. This claim led V. Lux (Lux, 1998) to redefine the notion of a CL. Her study has shown that only a few syntactic constraints (e.g. coordination constraints) are applicable to the whole document. Most constraints are only valid for sub-parts of the document, identified as &amp;quot;textual modules&amp;quot;. Each textual module has a particular communicative goal and a precise theme according to the ATA 100 norms. It can be divided into smaller modules: for example, the Task module is divided into simpler Sub-Task modules which are themselves composed of simpler Instructions modules. From a linguistic point of view, a textual module uses only a controlled sublanguage. V. Lux thus extended FR to a new CL .called.:RREM (.Fr.aa~gais Rationalise'. Etendu Modulaire) comprising many CLs, each having its own syntactic rules for a specific textual module. She also performed a corpus study showing that the same textual modules could be identified for both French and English. It should thus be possible to remodularize SE similarly to what has been done to FR with FREM. In this paper, we therefore introduce the: notion of aii Exteiided Modular Controlled Language (EMCL) which first defines some general rules and then some more specific ones for each textual module. We now look at the problem of automatically generating technical documentation complying both to structuration norms such as ATA 100 and to the rules of an EMCL.</Paragraph>
    <Paragraph position="9"> 2 How to generate technical documentation? We assume that a generation system can be divided into a What to say and How to say it components, even though this may be considered as a gross simplification.</Paragraph>
    <Section position="1" start_page="141" end_page="142" type="sub_section">
      <SectionTitle>
2.1 What to say component
</SectionTitle>
      <Paragraph position="0"> The main difficulty for NLG in a real environment lies in knowledge modeling. For aircraft maintenance manuals, existing ontologies could probably be reused, but even then the modeling efforts required are huge. Nevertheless, we assume that it is possible to design forms which are sequentially presented to the user to be filled, as in Drafter (Paris et al., 1995), through which the technical writer provides the information to convey in an appropriate fornlalism.</Paragraph>
      <Paragraph position="1"> These forms can be derived directly fi'om the tree-like structure of the document given in the ATA norms. The goal is that, once the writer has finished filling in these forms, the technical docunmntation is already properly structured in an abstract language instead of a natural one.</Paragraph>
      <Paragraph position="2"> In a general text generation setting, using forms to describe What is to be said might seem like a difficult task; but in the context of technical writing, the informational content is almost already prescribed and forms are thus a sin&gt; ple way of complying with the rules of a CL.</Paragraph>
      <Paragraph position="3"> Indeed in the now comlnon web enviromnents, forms are frequently used for eliciting information from users. This input can then be processed by the &amp;quot;tIow to say it and layout components. null  The writers who find it very difficult to comply with the rules of a CL have no problem complying with the ATA 100 norms, thereby producing documents with the right thematic and communicative structuration. This can be seen as an illustration of observations made in -However, many writing rules in a CL place particular syntactic constraints on the use of, a given lexical item, e.g. in FR a rule forbids the use of emp~cher (prevent) when followed by an infinitive clause. To handle such numerous lexically dependent syntactic rules, a formalpsycholinguistics. null describes a model of the speaker's activity in which choices in the What to say component are conscious, while choices in the How to say it component are automatic. This model helps understand some of the difficulties that CL users face. A CL forces the writer to become conscious of behavioral mechanisms that are usually automatic; The writer is thus distracted from choices made earlier in her/his writing task. So s/he often ends up writing it in the way it has to be written but does not write exactly what had to be written, thus defeating the whole purpose of a CL which was meant to produce a better expression of the information.</Paragraph>
      <Paragraph position="4"> This model also explains why a human writer has less difficulties following the ATA norms: this part of the job corresponds to conscious choices. In the NLG scenario, this is replaced by filling in some information in the forms that are presented.</Paragraph>
      <Paragraph position="5"> To sum up, the What to say component requires a modelization of the domain model and the design of a series of forms to be filled. A human writer using the NLG system has to fill forms but on the other hand, s/he does not have to learn a CL, since compliance with the CL norms is taken care by the How to say it component which we now describe.</Paragraph>
    </Section>
    <Section position="2" start_page="142" end_page="143" type="sub_section">
      <SectionTitle>
2.2 How to say it component
</SectionTitle>
      <Paragraph position="0"> In this section, it is assumed that if a CL is in fact an EMCL such as FREM, a specific How to say it component is designed for each textual module, but always retaining the same formalism. null The lexicon used in the How to s~zyit corn- .... ponent should be exactly the one enforced by the CL. Similarly, the syntactic constructions and the discourse structures of this component should correspond to the set of allowed constructions / structures in the CL. This can simplify some lexical, syntactic and even discourse choices to be made within the generation systern and thus ensure that .the gener~ed text complies with the rules of the CL.</Paragraph>
      <Paragraph position="1"> Levelt (Levelt , 1989, p. 9): ism based on a lexicalized grammax:is needed. We chose Lexicalized Tree Adjoining Grammar (LTAG) for the following reasons: * A text generation formalism inspired from LTAG, called G-TAG, has been designed, implemented and used in several applications (Danlos and Meunier, 1996; Meunier, 1997; Danlos, 1998; Meunier and Danlos, 1998; Danlos, 2000). G-TAG takes as input an event graph which can be provided by the user by filling in some forms which ensure that all the necessary information for generation is provided.</Paragraph>
      <Paragraph position="2"> o G-TAG deals with textual phenomena such as sentence connectors by extending LTAG to handle discourse comprised of more than one sentence. One of the major innovations of FREM compared to FR (and of EMCL compared to CL) is to implement rules for connecting sentences (clauses). The way to connect sentences has largely been ignored in CLs, although this linguistic issue raises ambiguities which can lead to maintenance errors. For example, simple juxtaposition of sentences is allowed in FR but disallowed in FREM because it is highly dangerous. A technician reading Nettoyer X. Verser Y sur X. (Clean X. Pour Y on X.) could interpret this to mean either &amp;quot;Clean X with Y&amp;quot; or &amp;quot;Clean X with Z, and next pour Y on X&amp;quot;. Only one of these operations is right, the other one may lead to a maintenance error. On the other hand, traditional syntactical ambiguities such as a preposi.... tional attaehment...will-.not, usually lead to maintenance errors because the technician can usually solve them on the basis of some domain knowledge.</Paragraph>
      <Paragraph position="3"> o The lexicalized grammar in G-TAG is compiled from the recta-grammar designed and implemented by M.H. Candito (Candito.</Paragraph>
      <Paragraph position="4"> 1996). This makes it easy to follow the evolution ofrules of an (EM)CL. For example, if the rule to write an Instruction  changes from &amp;quot;Put a verb in the infinitive&amp;quot; to &amp;quot;Insert an imperative&amp;quot;, then this must be changed everywhere in the lexicalized grammar. Using the metagrammar we can achieve this quite easily because of the hierarchical organization of a LTAG: with only one rule, an imperative can be allowed and an-infinitive ~disallowed (in a main clause) for every verb, whatever its argument structure and syntactic construction. null G-TAG thus seems a good candidate for producing technical documentation complying with the constraints of an (EM)CL. A technical documentation generator prototype in the aeronautical domain is described in Section 4. It is written in Flaubert, an implementation of G-TAG (Danlos and Meunier, 1996). The How to say it component would have to be completed by adding a layout component complying with the norms of ATA 100. We should also provide revision tools to allow the writer to fine tune the final text.</Paragraph>
      <Paragraph position="5"> So, automatically generating technical documentation seems technically possible provided the technical writer is willing to fill forms which in principle should be less demanding than learning the rules of an (EM)CL. This approach also has other advantages, described in the next section.</Paragraph>
    </Section>
    <Section position="3" start_page="143" end_page="143" type="sub_section">
      <SectionTitle>
3.1 Multilinguality
</SectionTitle>
      <Paragraph position="0"> One of tile major assets of NLG is its capacity to simultaneously generate texts in several languages, and to regenerate updates as often as necessary, using a single input representation, thus ensuring coherence among the generated texts.</Paragraph>
      <Paragraph position="1"> Until now, CLs-have .dealt-withr muttitinguality by means of the translation hypothesis. It is for this reason that FR was developed by adapting SE, in order to ease the translation from French to English. FR authors try to ensure that everything that can also be written in FR can be translated into SE. From this point of view, the definition of a source CLt, depends on the. defini.tion:.of, a tin:get CL2. Developers of CL1 are more likely to select structures which can be easily or even literally translated into CL2. What then happens if CLt and CL2 are structurally different? This can lead to a situation where CL1 imposes a cumbersome writing style that contravene conventions shared by native speakers of Li, thereby contradicting CLs' aim of enhancing understandability. Rules 0f-aii (EM)CL should be elaborated without such multilingual considerations. Their definition should principally pay attention to the characteristics of one language, trying to avoid typical ambiguities. Such criteria are difficult enough to deal within a single language without taking translation problems into account.</Paragraph>
      <Paragraph position="2"> Now if we consider multilingual generation in (EM)CLs, we find that there are major benefits from the multilingualism modeling proposed by NLG. In particular, defining a common representation is possible since the structure of the documentation is language independent. Recall from section 1 that the thematic structure of the documentation in the aeronautical domain must reflect the functional decomposition of the airplane and that the same textual modules can be identified in many languages. Thus nothing has to be changed in the What to say component (Section 2.1) going from one language to the other. Only the How to say it component (Section 2.2) need be adapted to the target (EM)CL which should be monolingually defined.</Paragraph>
    </Section>
    <Section position="4" start_page="143" end_page="144" type="sub_section">
      <SectionTitle>
3.2 NLCI as an aid for testing and
developing a CL
</SectionTitle>
      <Paragraph position="0"> An NLG system can provide concrete assistance for the testing and for tile development of a CL.</Paragraph>
      <Paragraph position="1"> An NLG system that integrates the CL constraints can help discover contradictions in the CL definition. As an illustration, a major difficulty in CL definition concerns the coherence between the lexicon and the writing rules, as illustrated by (Emorine, 1994) with the following example: o Emp~cher l'oxyg~ne de s'accumuler (Prevent the oxygen from accumulating) does not conform to a FR lexically dependent syntactic rule, according to which empdcher (prevent) should not be followed by an infinitive clause.</Paragraph>
      <Paragraph position="2"> ....... ~ .~ Emp~cher I ~uccumulation ' d.'~ozyg~ne * (Prevent oxygen accumulation) does not con- null form to FR lexicon, according to which the verb s'accumuler (accumulate) should be used instead of the noun accumulation (accumulation) null Emp~cher que l'oxyg~ne ne s'accumule (Prevent that the oxygen accumulates) does not conform to the writing rule that forbids the use of the subjunctive mode.</Paragraph>
      <Paragraph position="3"> So we come to a dead end if we want to use the &amp;quot; verb empdcher (prevent). This problem can be detected automatically by the NLG system.and an appropriate fix be made in the grammar.</Paragraph>
      <Paragraph position="4"> NLG can be used for checking a CL, which is helpful even if the CL is intended for a human writer because it may avoid the discovery of various cases of incoherence by the writer. If tile writers can justify their writing difficulties by pointing out inconsistencies in the CL definition, they won't be motivated to use what they will tend to consider'as an~-abmird invention, by .... people who understand nothing about the .job.</Paragraph>
      <Paragraph position="5"> NLG can also help strengthen CLs' claim to lead to more homogeneous texts, which is equivalent to forbidding certain paraphrases. NLG precisely deals with paraphrase as, for some inputs, a NLG system will produce several texts. In this way, NLG helps identify which paraphrases still remain possible in the CL. In practice, when an NLG system proposes several texts for one input, it raises the question for the CL developer: Should a constraint be added to the CL definition in order to forbid some of these texts ?</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML