File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2090_metho.xml

Size: 18,844 bytes

Last Modified: 2025-10-06 14:13:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-2090">
  <Title>O. Abstract* FROM COGRAM TO ALCO(~RAM: TOWARI) A CONTROLLEI) ENGI,ISI\]f (;RAMMAR CIIECKER</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
O. Abstract*
FROM COGRAM TO ALCO(~RAM:
TOWARI) A CONTROLLEI) ENGI,ISI\]f (;RAMMAR CIIECKER
GEERT ADRIAENS \[1,2\] I)IRK SCIlREIlRS \[21
</SectionTitle>
    <Paragraph position="0"> \[ 11 Siemens-Nixdorf Software ('.enter LiSge, Rue des Foric.s 2, 4020 Liege, Belgium \[21 University of 1,eaven Ceuter for Couqmtational l.iuguistics, Maria-There, siastraat 21, 3000 Leaven, Belgium geert@et.kuleuven.ac.bc In this l~q)er we describe the roots of ControUed English (CE), the analysis of several existing CE grammars, the development of a wcll-lbunded lS0-rule CE grammar (COGRAM), the elaboration of an algorithmic variant (ALCOGRAM) as a basis for NLP applications, the use of ALCOGRAM in a CAI program teaching writers how to use it effectively, aud the preparatory study into a Controlled English grammar and style clmcker within a desktop publishing (ITI~)) environmeut.</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> The use of controlled or simplified languages for text writing is a controversial matter, maiuly because it is felt as an attack of the writer's frecxlom of expression.</Paragraph>
    <Paragraph position="1"> Still, we see more and more attempts to introduce control and simplification in file text writing process, mostly integrated within intelligent text processing environments and complex NLP appieations such as machine translation (see 2. for an short overview). There are at least two types of motivation that Imve led us and other researchers to pursuing this matter with renewed interesL First, experience with large-scale NLP applications that should be capable of handling a wide rouge of inputs (in our case, the METAL MT system, used for the translation of technical and administrative text.s) has shown that there are limits to fine-tuniug big grammars to handle semi-grammatical or otherwise badly written sentences. The degree of complexity added to an already complex NLP grammar tends to lead to a deterioration of overall translation quality and (where relevant) speed. On the other hand, simple pre-editing tools that e.g. help split up overly long seuteuces into shorter mills (a very mild way of simplifying the inpu 0 have proved to lead to amazing improvements in output quality for the application of METAL in administrative text translation (Deprez 1991). In general, the avoidance of lexical, syntactic and stylistic ambiguities is believed to make machine translation or other NLP applications easier.</Paragraph>
    <Paragraph position="2"> Second, there is a growing need in international industrial environments for standardizatiou and simplification of written commnnieation; the experience is that the language used in industrial documents such as manuals needs a thorough revision to be used efficiently by both native and (especially) non-native writers and readers. To ensure that the language of technical documents is unambiguous, well-strnctured, economical and easily translatable, controlled language has been thought to be the solution, be it that this solution is The research reported m this paper Itas been funded by Alcatel Bell in the period 1989-1991.</Paragraph>
    <Paragraph position="3"> often proprietary to a company and hence difficult to access by the NLP re,arch conmamity.</Paragraph>
    <Paragraph position="4"> In this paper, we report {lit ongoing lesearch and development nf a Cnlltollcd L:nglish graluular for technical documenlatioii (ctmrsl; nlaterlal and systems docunlenlatiou) ill the are+( of telecomnninlcation. We started by examining three representative controlled grammars (AECMA, Ericsson, IBM). Fimling them iucmnplete and defective in numy ways, we developed our own controlled gfanlu|ar, COGRAM. Since such a paper gumnuar is riot the most motivaling of texts lbr technical writers to use in tht: writing prtg:ess, we dccided to restructure it in an algorithurie way (ALCOGRAM) with an eye to using it in a cmnpnteraided language learning tool toni a mote anthititms grammar and style checking program. The first application is finish(~l aml currently being lestcd at the Alcatel-Bcll company, We ;ire )alw dcsiguiug the checker for operation within the Interleaf I)TI' environment, which ahcady oflk;rs integrated ludinmntaty lexical control.</Paragraph>
    <Paragraph position="5"> But let us stall by giviltg a shm-t overview oi the history and current application (if controlled English iu the NLP research ;rod the industrial communities.</Paragraph>
    <Paragraph position="6"> 2. The rnots of Contrnlled English The foundation lot most el the current CE umnnals wa.'-; laid by the Catelpillar Tractm Company (Peoria, Illinois, USA) in the mido1960s. This company (currently still active in the CE field) introduced Caterpillar Fundamental English (CFE), on which two significant derivatives, i.e. Smart's t'lain Euglish PMgram (PEP) and White's International Laugnage l~n Sc~ving and Maintenance (II~SAM) were based. PEI' gave birth to grammars used by Clark, Rnckwell International, and ltyster, while II,SAM can be considered the root of gramntars nscd by AECMA (Ass(v,:iadon EurolC,~enne de Constractears de Mat(.'~ici A(~rospatial), IBM, Rank Xerox, and Ericssmt TelecramuunieaLioas. Nowadays, a ctmsidcrable nnmlr~:l of variants of Cmflrolled English can be |inmd in many corporations. In the USA, Boeing successlnlly uses an  publications and to aid translation, whether carried out by conventional or computer-aided methods (Pyre, 1988). At Woll~mn College in Cambridge E. Johnson developed Airspeak and Seaspeak, both restricted languages. Policespeak is currently being developed tn ACrEs DI.: COLING-92, NANIa!S, 23 28 AO(,q' 1992 5 9 5 I'ROC. O~: COl,ING-92, NANq I!s, Air(i. 23-21;, 1992 developed Airspeak and Seaspcak, both restricted languages. Policespeak is currently being developed to enable fast and accurate communication with the French counterparts when the Channel Tunnel opens in 1993 (Jackson, 1990). In the Netherlands, the BSO/DLT machine-based translation project also benefits from the linguistic confines and standardization of terminology (Van der Korst, 1986). In the French TITUS system, controlled language (&amp;quot;Langage Documentaire Canonique&amp;quot;) is used to improve machine translation of abstracts of technical papers on textile fabrics (Ducrot  Since the above-mentioned grammars have been adapted to the individual needs of each company, they might - to some extent - differ from one another. Unfortunately, we were not able to get bold of any grammar of the PEP branch. Despite this limitation, three of the above-mentioned grammars, namely AECMA, Ericsson English, and the IBM manual were taken as the starting point from which our research and development in the domain of CE could evolve.</Paragraph>
    <Paragraph position="7"> 3, Preliminary linguistic study Although our study of 3 CE grammars does not claim to be exhaustive, it does reveal the structural dissimilarities between the AECMA, Ericsson, and IBM grammars.</Paragraph>
    <Paragraph position="8"> Moreover, it underscores some of the qualities and deficiencies of each manual Concerning spelling, syntax, style, and other information such as completeness and readability. Whereas the English used in all three grammars is good, the grammars differ in structure overtly. The following subsections summarize the study (Lemmens 1989: 10).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Spelling
Spelling
</SectionTitle>
      <Paragraph position="0"> word list new words allowed free compounding spelling checker</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
AECMA ERICSSON IBM
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> As to the lexical organization, all three manuals contain a controlled vocabulary list. In particular, Ericsson English uses a two-level lexicon : Level 1 documents may only contain those lexical items that are marked 1, whereas Level 2 documents can be edited using a more extended vocabulary. In the IBM word list a marginal &amp;quot;!&amp;quot; symbol indicates that &amp;quot;the word has some restriction, either a restriction to one meaning or a caution that the word is not at eight-grade level and should only be used with care.&amp;quot; Other words are preceded by a marginal &amp;quot;X&amp;quot; indicating &amp;quot;a word to be avoided&amp;quot;.</Paragraph>
    <Paragraph position="3"> All the words used in the three grammars must conform to the spelling used in the word lists. EE prefers British spelling, whereas AECMA consistently uses American spelling rules as prescribed in the Webster dictionary.</Paragraph>
    <Paragraph position="4"> Obviously, as they were inspired by individual heritage and international business matters, each of these companies have taken pragmatic decisions that match their internal organization.</Paragraph>
    <Paragraph position="5"> To check lexical terminology and spelling in its documents, IBM supports its writers by means of three computer-assisted instruction programs : WORD CHECKER II, SPELL 370, and PROOF.</Paragraph>
    <Paragraph position="6"> The AECMA grammar reveals a remarkable degree of lexical flexibility : &amp;quot;Besides the words in the dictionary, the writer can also use those words which he decides belong to one of two categories : either Technical Names or Manufacturing Processes&amp;quot; (AECMA : iv).</Paragraph>
    <Paragraph position="7"> Nevertheless, controlled rules tell whether or not a term belongs to the field of Technical Words or a Manufacturing Processes. &amp;quot;Inhouse preferences&amp;quot; can be &amp;quot;defined in your company's house rules, or by your editors&amp;quot; (AECMA : vi). In a controlled grammar, however, you cannot deliberately add new meanings to the vocabulary list, and transfer words from one lexical category to another, e.g. the Ericsson grammar demands that no new lexical items may be listed, unless the Ericsson Standards Department gives permission to do so. Similar authority holds for the IBM DPPG Customer and Service Information. Nevertheless, Ericsson describes a special procedure for using nonlisted words : &amp;quot;If you need to use a new word that is useful only in a very specialized context, give a definition of the word in EE, in the document that you are writing. If you need to give several definitions in the document, make an alphabetical list of the definitions at the end of the document&amp;quot; (EE : 8). The IBM grammar restricts the use of new words heavily. Writers can, if really necessary, use X-marked words, provided they have been defined and even illustrated in every line where they might be encountered for the first time, and preferably in a glossary, as well. All three manuals allow noun clusters or compounds, if the number of nouns making up the cluster does not exceed three.</Paragraph>
    <Paragraph position="8"> Adding prefixes or suffixes to items listed in the lexicon is also not allowed.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Syntax
Syntax AECMA ERICSSON IBM
</SectionTitle>
      <Paragraph position="0"> verb forms restricted restricted restricted subclause nothing limited very little grammar checker no no no tense distribution nothing nothing nothing linguistic basis weak weak weak descriptive little little little Grid 2 : Syntax AcrEs DE COLING-92, NANTI~S, 23-28 AOUT 1992 5 9 6 PROC. OF COLING-92. NANTES. AUt3. 23-28. 1992 As to syntax control, Ericsson English states that &amp;quot;the two fundamental principles of writing are : the memfing must be clear; the language must be simple&amp;quot; (EE : 8). Ericsson, AECMA, and IBM control more or less identical grammatical milts, notwithslanding each company has its own way of simplifying syntax. All three grammars control verb torms, but AECMA Simplified English (SE) does not allow either a gerund or a participle. EE only allows gerunds (&amp;quot;EE uses -ing words ... as nouns to describe activities&amp;quot;) and it &amp;quot;doe~s not use present participles or the continuous tenses&amp;quot;. IBM in its turn lets file present participle function either as an adjective or as a noun.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Style
Style AECMA ERICSSON IBM
</SectionTitle>
      <Paragraph position="0"> punctuation basic nothing basic sentence structure +/- little little paragraph structure basic nothing nothing Grid 3 : Style Next to some elementary rules of imnctuation coutrol, the EE grammar does not lbcus on stylistic control. AECMA Simplified English refers to some panctlmtion, and it discusses sentence length, paragraph length, aml structure. IBM has a speciM Information Developmem Guidelines manual called &amp;quot;STYLE&amp;quot;. It goes without saying that uniformity of style and layout eahances the overall quality of documents in coutrollext language.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4. Miscellaneous
</SectionTitle>
      <Paragraph position="0"> Other information AECMA ERICSSON IBM check list no no no completeness no uo no readability +/- ok good Grid 4 : other iutonnatioa  At times, one of the three grammars prolxlses .. besides a rule of control - valuable information, which cannot be found ill the other two grammars. The AECMA grammar, for example, instrncts the writer how to change a passive sentence into an active one and states that no verbs should be left out to reduce rite sentence length. In addition, one particular grammar sometimes does not contain a rule of control which file two others have : file Ericsson grammar does not refer to control of articles; AECMA and IBM do not take into accouut subordinate clauses (except for controlling file participial adverbial subclause). Still, although individually focusing on syntax control, all three manuals are incomplete: since EE considers but a few aspects of subordinate clause control, the grammar reveals insufficiency and incompleteness. &amp;quot;llmre are no satisfactory answers to questions such as : What alxmt gapping and elliptic structures? How about using zerorelative markers and zero-connectives? Are sentential relative clauses allowed? Cau nominal relatives be used? Tire rules of control are vague as, hlr instance, in the EE statement &amp;quot;A comma divides a sentence into its aatmat compoueuts and makes it easier to read&amp;quot;. What does &amp;quot;aataral COlnlRrUPAIIIS&amp;quot; ineau? Numerous examples of rules fllat are not well-defined or vague instructions indubitably cause confusion and lead to grmnmatical ntistakes.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3..~ Collc|llSi()ll
</SectionTitle>
    <Paragraph position="0"> First of all, we concluded that &amp;quot;lhe liuguistic l(madation of these manuals are at times very weak: oversimplifications oth~n lemls to linguistic inaccuracies; frequently linguistic structures are not covered; the instrnctious are at times vagtve ,'rod ambiguous; and ol/en the rules disregard liuguistic reality&amp;quot; (Lemmens 1989 : ill.</Paragraph>
    <Paragraph position="1"> Secoudly, in all three graimnars there is a lack of clear distinction between descriptive aud normative principles. There is uo specification whether the s|ructmes to be avoided are uugrannnatical or simply non..coutrolled. Typical of tile three grammars is die nolmative &amp;quot;IX) not use&amp;quot; uleaning &amp;quot;Avoid&amp;quot;. Seldom - if ever - is dlis phrase used to show that the writer should not use a construction becan~ it is ungrammatical. For exanlple, tile rules for distributing &amp;quot;when&amp;quot; mid &amp;quot;if&amp;quot; do not laention file iucon'eet use of &amp;quot;when&amp;quot; in conditional subclauses.</Paragraph>
    <Paragraph position="2"> Moreover, sometimes descriptive information ntw_.ds to be included, e.g. a list o\[ alternative constructious ill connuon English not to be used by the writers.</Paragraph>
    <Paragraph position="3"> Onfortmtately, to guide the writiug of descriptive documents the rules set forth by the ahove~utentioned gralmums Imve to be violated regularly. To write a new CE gramnmr a clear distinction between the tales h)r editiug, on the one hand, basic instructive technical documents, aud, ou the other hand, &amp;quot;higher-level&amp;quot; descriptive docnments (EE l.evel 1 and 2) will be tequhed.</Paragraph>
    <Paragraph position="4"> Consequently, &amp;quot;... it is not salficient to construct a new grmnmar hy just melting together the three graummrs, as was mentioned earlier. The new grammar should also be linguistically welLfounded, unambiguous, and, where necessary, descriptively adeqmlte&amp;quot; 0,ennnens 1989 : I1).</Paragraph>
    <Paragraph position="5"> 4. Organization of the COGRAM project Since the develolnuent ill the Controlled English grammar (C(KIRAM) o as it will be pre~nted in this pallet' - iiiaiuly consisls of two colnlJonelilS, a word list and a grauuuar, a two-dimensional strategy has to be takeu into iu:connt.</Paragraph>
    <Paragraph position="6"> Ou tile one haud, a lindtod lexical database is being develut~xl. A basic wold list containing 2000 terms has been constituted to Ire nsed in computer-aided language learning exercises. Receudy, this list has been extended to a vocabulary package of approximately 50110 words.</Paragraph>
    <Paragraph position="7"> Moreover, auother 1000 teehuical Ix:ruls were added to make the eontrulted vocabulary mole complete. Oa the other hand, rile fiehl of Controlled English has been studied to geacmle a selectitm of ad~uate granutlar rules tlmt pertain to multiple aspects of technical writing: lexical structures, syntactic patterns, arid stylistic l('.atnics.</Paragraph>
    <Paragraph position="8"> Both the lexical dalalmse aud tile grammar need to be integrated into a powerhd tool tiJr writers. To ensure that an in|roduetiou of the grammar at a company will ACRES DECOLING-92, NANTES, 23-28 hofzr 1992 5 9 7 t'ltoc, ol: (;OI,ING 92, NANTES, AUG. 23-28, 1992 take place without many users psychologically objecting to Controlled English, we have thought of illustrating the grammar rules by means of straight-tothe-point examples, all taken fC/om the users' field of intelest.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML