File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/a88-1027_metho.xml

Size: 24,069 bytes

Last Modified: 2025-10-06 14:12:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="A88-1027">
  <Title>TIlE EXPERIENCE OF DEVELOPING A LARGE-SCALE NATURAL LANGUAGE TEXT PROCFASSING SYSTEM: CRITIQUE</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
TIlE EXPERIENCE OF DEVELOPING A LARGE-SCALE
NATURAL LANGUAGE TEXT PROCFASSING SYSTEM: CRITIQUE
</SectionTitle>
    <Paragraph position="0"> Stephen D. Richardson and Lisa C. Braden-Harder IBM Thomas J. Watson Research Center P.O. Box 704</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="199" type="metho">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper describes our experience in developing the CRITIQUE system. It describes three application areas in which the system is being used and discusses some characteristics of CRITIQUE which we believe are applicable to large-scale natural language systems in general: performance, robustness, flexibility, presentation, and accuracy.</Paragraph>
    <Paragraph position="1"> Introduction CRITIQUE is a large-scale natural language text processing system that identifies grammar and style errors in English text. This advanced prototype, which is currently being developed at IBM Research, is based on a broad-coverage natural language parser (Richardson, 1985). The parser provides a unique approximate syntactic parse for a large percentage of English text and diagnoses over 100 grammar and style errors.</Paragraph>
    <Paragraph position="2"> Earlier writing-aid systems, such as Writer's Workbench (Macdonald, et al, 1982), contain functions which identify parts-of-speech of words, perform string-level phrase identification, and generate readability statistics. Similar functions are apparent in many systems now commercially available, some of which are described in an issue of The SeyboM Report (1984). To our knowledge, however, no other system uses a parser that produces complete structural analyses for sentences.</Paragraph>
    <Paragraph position="3"> CRITIQUE is an extension of the EPISTLE project which began in 1980 (Heidom, et al, 1982). The parser and grammar are implemented in PLNLP (the Programming Language for Natural language Processing), developed by George Heidom. PEG (the PLNLP English Grammar) has been written by Karen Jensen, and the style rules were written by Yael Ravin. Today, CRITIQUE is being tested in a variety of applications ranging from office correspondence and technical documentation to student essays. Also, PLNLP and PEG have been incorporated into several other research applications, such as machine translation systems.</Paragraph>
    <Paragraph position="4"> At the 1986 A('L meeting, Gary Hendrix described his experience in developing a natural language interface for real users (Hendrix, 1986). In contrast with user interface systems, we consider CRITIQUE to be a text processing system. The latter may be distinguished from the former by its broad coverage of texts that were prepared independently to communicate ideas and not strictly to interact with a computer system. Until now, the experience of developing a large-scale natural language text processing system has not been discussed in the literature.</Paragraph>
    <Paragraph position="5"> This paper first describes the overall processing in the CRITIQUE system. Then it describes three application areas in which the system is being used. The remaining sections discuss some characteristics of CRITIQUE which we believe are applicable to large-scale natural language systems in general: performance, robustness, flexibility, presentation, and accuracy. The discussion draws on our experience in all three application areas.</Paragraph>
    <Paragraph position="6"> Processing in CRITIQUE CRITIQUE processes text in six steps. The first step determines sentence, heading, and paragraph boundaries. In the next step, lexical processing identifies unrecognized words and awkward phrases. The on-line dictionary which is used includes more than 100,000 entries and provides information used in syntactic processing as well. After lexical analysis, text is passed to the parser, which produces a parse tree, and in so doing checks for grammar errors. Then stylistic analysis diagnoses potential style prob- null lems. CRITIQUE also generates statistical information about documents based on the lexical and syntactic analyses. The final step involves error summarization and display.</Paragraph>
    <Paragraph position="7"> CRITIQUE has an interactive processing mode that is fully integrated with a text editor, allowing users to update the text as needed. As the text is modified, new sentences are re-analyzed to ensure that no new errors have been introduced. The system provides three levels of on-line help: the first level identifies the error, the second provides a brief explanation, and the third provides a complete tutorial. Figure 1 is an illustration of the second level of help. The user can also specify style preferences in an individual profde. Possible errors are filtered through the profde to determine whether or not they should be displayed. Ilard-copy output is also available. ! am writing to recommend Susan llayes, who's application you recently received.</Paragraph>
    <Paragraph position="8">  name of error, suggested correction, and a brief explanation Application Areas for CRITIQUE During the development of CRrI'IQUE, we have directed our efforts towards three major application areas: office environments, publications organizations, and educational institutions. Each area has its own particular needs and requirements. null In the office environment, professionals require quick, succinct feedback on their memos and other documents. They are less interested in maintaining a particular style, but want insurance against obvious grammatical and spelling mistakes. Our parsing grammar was originally developed using a data base of office correspondence. There has also been an abundance of feedback at IBM Research, where the system has been made available to hundreds of users. These users submitted over 3,000 pages of text to CRrI'IQUE in 1987.</Paragraph>
    <Paragraph position="9"> Publications organizations usually have strict requirements for style and consistency which exist in the form of tedious style guides. The professional writers in such organizations also want succinct feedback, but are usually willing to wait longer to receive it, since their documents are typically longer and more involved. An IBM technical writing group and the US government have been our source of experience and feedback in this area.</Paragraph>
    <Paragraph position="10"> Use by educational institutions has proven to be the most challenging of the three areas. There is a wide range of ill-formed text to deal with, originating from classes in composition, business writing, technical writing, and ESI, (English as a Second language). The professors in these various areas also sometimes have differing opinions on grammar and style. Although there may not be such a great need for quick processing time (except by those students who procrastinate), processing cost must be minimized to fit most university budgets. We currently are doing joint studies with three universities to help test and refine CRITIQUE.</Paragraph>
    <Paragraph position="11"> Performance Broad-coverage natural language processing is computationaily expensive. To do it in real time is even more so. Whereas large offices and publications organizations may be able to afford extensive computing power, such is not the case in many of the environments where a system such as CRITIQUF, would be most useful.</Paragraph>
    <Paragraph position="12"> Althougla ('RITIQI.IE has been developed in a large IBM-mainframe environment, several significant steps have been taken to improve its performance with a view toward running on much smaller machines. In addition, a version of the PI,NI,P parser on which CRrI'IQUE is based was successfully ported to an IBM PC in the summer of 1986. Work is continuing on other versions which would run the complete PLNLP English Grammar (PEG) on intelligent workstations such as the IBM RT PC and PS/2.</Paragraph>
    <Paragraph position="13"> We have used two complimentary approaches to achieve satisfactory performance. One is to distribute the parts of the system which can run  in parallel over multiple processors (where available), and the other is to optimize the performance of the programs themselves.</Paragraph>
    <Paragraph position="14"> To distribute the processing involved, we have used &amp;quot;parsing server&amp;quot; programs which may operate either on the same physical computer, or on several computers connected by a network.</Paragraph>
    <Paragraph position="15"> When CRITIQUE is invoked by a user, each sentence in the user's document is sent as a separate task to a &amp;quot;manager server&amp;quot; which then distributes such tasks to as many parsing servers as are available. After analysis, information about a sentence is returned via the manager server to the user's editing environment. With this scheme, multiple users can access multiple parsing servers that may reside on different computers linked by a network. In this way, several of a user's sentences may be processed in parallel and asynchronously with respect to other tasks (such as word processing) that the user may be doing.</Paragraph>
    <Paragraph position="16"> Although this distributed processing system is currently implemented on a network of mainframes, the transition to a workstation-based network like those found in small businesses and university environments will not be difficult. The distributed architecture is also well suited to exploit the power of parallel processor machines currently under development. The granularity of the processing involved, which is now at the sentence level, may also be made smaller or larger, depending on resulting efficiency and the possible need to consider larger segments of text for a more complete analysis.</Paragraph>
    <Paragraph position="17"> The parsing servers referred to above consist of the PLNLP parsing engine, the PLNLP English Grammar, and a large set of style rules. PLNLP supports the writing of procedures as well as rules. Consequently, the parsing engine itself is written as a set of PLNLP procedures. In addition to the run-time environment, the translator for the PLNLP language is also written using PLNLP. When an entire programming language system such as this is written in itself, a high degree of portability and language-specific optimization may be achieved, further enhancing overall system performance.</Paragraph>
    <Paragraph position="18"> The PLNLP translator currently turns PLNLP rules and procedures into LISP or PL.8 (a highly-optimized PI,/I variant) code which is then compiled and executed. Work has also been  done using C as a base (for the PC version mentioned earlier), and this work will be extended for portability across computers. Direct compilation into machine code is also being considered.</Paragraph>
    <Paragraph position="19"> In our experience with various programming languages and environments, we have found it desirable to maintain two versions of the system, which share the same PLNLP source code. One is geared toward grammar and style rule development, being somewhat slower, but very flexible, and containing a set of specially designed tools and development aids. This version of the system now runs in LISP. The other version, running in PL.8, is optimized for fast execution and is about ten times faster than the development version. CRITIQUE uses the PL.8 version, which can analyze a sentence of about 15-20 words in one CPU second on an IBM 3081 computer. This translates into a few seconds of elapsed time under an average load.</Paragraph>
    <Paragraph position="20"> Even as computers become more powerful, there will continue to be a corresponding increase in the complexity and amount of computation involved in natural language processing. Through use of a highly-optimized production run-time environment, PLNLP is able to achieve the required performance without sacrificing flexibility during development.</Paragraph>
    <Paragraph position="21"> One last performance issue should be mentioned: the need for a well-integrated dictionary system. As previously stated, CRITIQUE's dictionary is able to recognize well over 100,000 words, providing both morphological and syntactic information about those words. The trade-offs between keeping the dictionary on disk or in memory are more significant in a very large-scale system. Disk I/O's, including &amp;quot;hidden&amp;quot; paging I/O's when the dictionary is in virtual memory, must be carefully considered and minimized. It has been our experience that expensive dynamic morphological processing should also be kept to a minimum, although this may not be possible for other languages.</Paragraph>
    <Section position="1" start_page="197" end_page="199" type="sub_section">
      <SectionTitle>
Robustness
</SectionTitle>
      <Paragraph position="0"> Any computer system should be robust. This is especially true of natural language systems, and, in particular, those which specialize in handling ill-formed input. Robustness should be considered at every level of processing, both for the system in general and for the particulars of dealing with natural language inputs.</Paragraph>
      <Paragraph position="1"> At the system level, the distributed architecture which is used by CRITIQUE for performance reasons requires robust task management mechanisms. The manager server carefully tracks the progress of each task (sentence) and the availability of parser servers on the network. If a parser loses its network connection, exceeds a predetermined time limit, or otherwise fails while processing a task, that task is sent out again to another parser. If a parser fails while processing a task, it automatically restarts itself. Statistics concerning usage and task flow, as well as comments recorded by users about the usefulness or accuracy of critique information, are maintained by the manager server and automatically distributed to system developers each day.</Paragraph>
      <Paragraph position="2"> At the natural language level, robustness first comes into play in handling the various formats of text inputted to the system. Text which has been &amp;quot;manually&amp;quot; formatted (using an editor, &amp;quot;WYSIWYG&amp;quot; style), as well as text with imbedded formatting commands (IBM's SCRIPT and GML commands are currently supported) is scanned by CRITIQUE to identify &amp;quot;parsable&amp;quot; segments. This process excludes tables, figures, headings, addresses, etc., and is table driven to accommodate the varying requirements of users in the different application areas. Publications organizations, for example, typically have special additional sets of formatting commands that must be supported.</Paragraph>
      <Paragraph position="3"> During parsing, words which are not in the dictionary are assigned default morphological and syntactic information so as to avoid a parsing failure. Most such words are generally assumed to be singular nouns, although there are some exceptions. This is usually adequate to obtain a reasonable parse, but can cause problems when it is a verb that is misspelled.</Paragraph>
      <Paragraph position="4"> Parsing may take place in one pass or two, if necessary. The fu'st pass applies the rules of the grammar with all of the constraints in force. If a parse is not obtained, then a second pass is made, applying the rules with selected constraints being relaxed. Certain lexical substitution rules for easily confused words (e.g., whose/who's, its/it's) are also activated during the second pass. If a parse is still not obtained after the second  pass, whether because of an unanticipated error, an unrecognized word, or a possible weakness in the grammar, then the &amp;quot;parse fitting&amp;quot; procedure is invoked (Jensen, et al, 1984). This procedure relies on the fact that the parsing algorithm is bottom-up in nature, and therefore intermediate well-formed parse structures are produced for segments of the sentence. These structures may be &amp;quot;fitted&amp;quot; together to form a parse for the sentence if no other complete structure is found.</Paragraph>
      <Paragraph position="5"> Even when a fitted parse is obtained, grammar and style error detection is still active within the successfully parsed segments.</Paragraph>
      <Paragraph position="6"> If multiple parses are obtained, the system selects one based on a parse metric which favors trees in which modifying words and phrases are attached to the closest qualifying constituent (tleidom, 1982). If the number of parses obtained exceeds a certain threshold, CRITIQUE takes advantage of the situation and informs the user that the sentence is probably unclear. If the parser fails for some system reason, the user will receive a message that the segment of text in question was &amp;quot;too difficult to process.&amp;quot; No one can foresee all the errors that humans can make. It is for this reason that we have included these robust mechanisms, and that we continue to enhance the system to catch new errors as experience and feedback dictate.</Paragraph>
      <Paragraph position="7"> Flexibility By virtue of the significantly different needs of each application area listed earlier, flexibility has been a requirement throughout the development of CRITIQUE. For example, the publication organizations we have dealt with have required large additions of terminology, the handling of special input formats and formatting commands, and additional style critiques dictated by organizational style guidelines. Universities, being pedagogically oriented, have been very much concerned with the format and content of the critique information presented in the output.</Paragraph>
      <Paragraph position="8"> We have attempted to handle the need for this flexibility at the individual, installation, and application area levels.</Paragraph>
      <Paragraph position="9"> The basic CRITIQUE system provides pre-determined critiques, intuitively organized into groups, with default thresholds, if applicable, and general help and tutorial information. It handles the formats of files by default according to certain file naming conventions. The vocabulary in the dictionary comes mainly from Webster's 7th Collegiate dictionary, and the grammar and style error rules have been developed according to several widely accepted sources. Every item of information produced by the system is controlled by a switch or threshold contained in a user profde. We have found that a good set of defaults in this profde is indispensable, since most users often do not bother to change them.</Paragraph>
      <Paragraph position="10"> Individuals who use the system are free to change any of the settings in the profile according to their own tastes and needs. They may also add words to an addendum which is used solely for the purpose of checking spelling.</Paragraph>
      <Paragraph position="11"> Knowledgeable users, or, more commonly, installation administrators, may change the default settings in the system profile or create several profdes for different purposes. Such would be the case for university classes of different types or various publications groups, each with its own particular style requirements. This level of customization also includes changing the grouping of critiques and the associated code (used by the system to flag the occurrence of an error in the output), message, help, and tutorial information, and making large additions of specialized terminology to the system dictionary. New classes of word- and phrase-level errors may be added to the dictionary as well.</Paragraph>
      <Paragraph position="12"> Users at some of our test sites have requested the ability to add classes of style errors. This is not currently possible, because they would have to be able to write their own PLNLP rules. For now, further types of customization, for entire application areas, for example, are performed by the system developers, although there is continual re-evaluation of where to draw the line.</Paragraph>
      <Paragraph position="13"> It is important to point out that the kinds of system customization described above have not, thus far, included tuning the grammar for special handling of the texts common in a particular application area. Every effort has been made to keep PEG as broad-coverage as possible. In fact, there has been a tendency during the development of CRITIQUE to move certain types of error detection, where possible, from the grammar to the style rule component. Since style rules are applied only after a parse has been obtained  by the grammar rules, this lessens the possibility that testing for an error will interfere with grammar rule processing.</Paragraph>
    </Section>
    <Section position="2" start_page="199" end_page="199" type="sub_section">
      <SectionTitle>
Presentation
</SectionTitle>
      <Paragraph position="0"> Systems such as CRITIQUE are generally used to process texts which have been prepared for a human audience often using word processing software. Therefore it seems natural, perhaps even necessary, that these systems be tightly integrated with a word processing environment.</Paragraph>
      <Paragraph position="1"> The CRITIQUE system architecture, which has been described previously from a distributed processing standpoint, may also be viewed as incorporating a word processing environment as a user interface, with a background natural language processor. There is nothing in the CRITIQUE system interface that requires that what the parser servers return be grammatical and stylistic information. The &amp;quot;descriptors&amp;quot; produced by the parsers are general in nature and could be used to send back any kind of information, possibly including a content characterization for information retrieval purposes or even a translation into another language. In this way, the system may be considered as a general purpose natural language processing environment.</Paragraph>
      <Paragraph position="2"> With respect to the presentation of critique information in this integrated environment, the differing needs of the application areas have been evident once again. Several lessons in human factors have been learned and the results implemented. null In a prior version of CRITIQUE, problems were simply underlined on the screen, and users were required to point to a particular problem and request that a window be opened which contained a description of what was wrong. As a result of studying the usage statistics gathered by the manager server at IBM Research, we determined that users were not asking for the descriptions of errors. Instead, they seemed to rely on their intuitions, only making use of the fact that CRITIQUE had flagged a particular word or phrase. This led us to replace the underlining with a brief, highlighted code word or phrase which indicates what the problem is. In cases where CRITIQUE suggests a corrected form of a word, that form is now used as the error indicator. This new format for displaying errors is shown in Figure 2.</Paragraph>
      <Paragraph position="3"> Lets contemplate how a president is selected.</Paragraph>
      <Paragraph position="4"> *Let's In many cases the best candidate in the eyes of</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="199" end_page="199" type="metho">
    <SectionTitle>
\[MISSING COMMA
</SectionTitle>
    <Paragraph position="0"> the public is the one who has the most exposure.</Paragraph>
    <Paragraph position="1"> This is no way to chose a president, but *choose unfortunately it is often true. The total package of a candidates political ideas don't really make *doesn't an impression on the public. His appearance</Paragraph>
  </Section>
  <Section position="4" start_page="199" end_page="199" type="metho">
    <SectionTitle>
IFRAGMEIVT
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
class="xml-element"></Paper>
Download Original XML