File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/88/a88-1027_concl.xml

Size: 10,119 bytes

Last Modified: 2025-10-06 13:56:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="A88-1027">
  <Title>TIlE EXPERIENCE OF DEVELOPING A LARGE-SCALE NATURAL LANGUAGE TEXT PROCFASSING SYSTEM: CRITIQUE</Title>
  <Section position="5" start_page="199" end_page="201" type="concl">
    <SectionTitle>
CRITIQUE
</SectionTitle>
    <Paragraph position="0"> From this experience and other similar ones, we concluded that professionals using CRITIQUE in an office environment preferred a quick, interactive review of memos and documents. The amount of feedback on the screen at any one time should be maximized, and the number of keystrokes and overall review time thereby minimized.</Paragraph>
    <Paragraph position="1"> Publications organizations have proved similar in many respects. However, due to the length and complexity of documents produced in such organizations, users may be more willing to wait for their output, and often make use of overnight batch runs.</Paragraph>
    <Paragraph position="2"> One feature of CRITIQUE that has proved useful in this respect is called &amp;quot;interactive review.&amp;quot; It is based on the fact that the system saves all of the information produced about a given fde on disk at the end of a session or run. This information is then read the next time the same document is processed, thereby eliminating the need to reprocess sentences that have not changed. This means that it is possible for very large fdes to be run overnight, and then be reviewed interactively the following day, thereby lessening the impact on prime shift computer usage.</Paragraph>
    <Paragraph position="3"> Publications groups, through their occasional use of sub-contractors that do not have access to on-line information, provided part of the moti- null vation to optionally produce printed output which is almost identical to what is viewed on the screen. They also required the flexibility of easily integrating the information contained in an organizational style guide with the interactive tutorials for each critique.</Paragraph>
    <Paragraph position="4"> The universities we are working with considered the abbreviated presentation of critique information we developed to be appropriate for their advanced students, but inadequate for others. They want the ability to lengthen explanations where desired and to group critiques by type, only presenting certain types in the output at any one time.</Paragraph>
    <Paragraph position="5"> Our varied experiences in these application areas have resulted in highly flexible, table-driven presentation modes for both batch and interactive output. We continue to experiment and make changes based on feedback.</Paragraph>
    <Paragraph position="6"> Accuracy Accuracy is perhaps the most important aspect of a natural language system's overall performance. It may be evaluated from two perspectives: the actual &amp;quot;under-the-covers&amp;quot; natural language processing involved, and the user's perception. Given the state of the art, we may consider it a blessing that it is possible for the latter to be somewhat better than the former.</Paragraph>
    <Paragraph position="7"> From a processing perspective in CRITIQUE, we reiterate that the PLNLP English Grammar produces parses which are approximate. Without recourse to semantics we cannot hope for much better. However, we are quite pleased with the coverage and accuracy that we have obtained, and fred them to be adequate for the requirements of a system like CRITIQUE. The semantic ambiguities and inaccuracies which remain in the parses have not been a stumbling block to the usefulness of the system. This demonstrates that some degree of inaccuracy at the natural language processing level can be acceptable as long as it is not readily visible to the user. We do not pretend to be completely satisfied with this situation, however, and we are doing research in the area of &amp;quot;dictionary-based&amp;quot; semantic analysis. This will enable us to improve some of the attachments in the parse trees produced by PEG (Binot and Jensen, 1987).</Paragraph>
    <Paragraph position="8"> Even being able to deal with a wide range of ill-formed input, it cannot be expected that a parser without a sophisticated semantic component can successfully parse &amp;quot;gobbledegook.&amp;quot; In the goal to produce a useful and accurate analysis of text, there must also be an assumption in: cluded about the maximum degree of ill-formedness that can be handled. The sentence given below in Figure 3, which was taken from a real student essay, illustrates the kind of ill-formedness which challenges CRITIQUE to its limits. The system did point out the comma splice in this sentence, but nothing else.</Paragraph>
    <Paragraph position="9"> &amp;quot;lte xtarts to condemn Nora for her mistake and made as if she is like poison that can be contagious, Trovald was ready to take away the kids and kick Nora out as an outcast as how they did with Mr. Krogstad.&amp;quot; Figure 3. An ill-formed sentence from a student essay In discussing the robustness of the parser, it was pointed out that error detection is still performed within the successfully processed segments of a fitted parse. Our testing to this point indicates that critiques produced in such situations are about as accurate as those produced in non-fitted parses. This is another case where the user's perception may differ from the underlying performance of the system.</Paragraph>
    <Paragraph position="10"> In general, however, we have found users' perceptions and feedback to be most helpful.</Paragraph>
    <Paragraph position="11"> The facility that CRITIQUE provides for giving feedback allows users to classify advice provided by the system according to the categories correct, useful, missed, and wrong. These categories are self-explanatory except, perhaps, for the useful category. This refers to the case where a critique is not exactly correct; but, since the user's attention is drawn to a particular phrase or sentence, a real problem is noticed. We tend to include these kinds of critiques with those that are correct in evaluating the usefulness and accuracy of the system.</Paragraph>
    <Paragraph position="12"> The most undesirable critiques are those in the wrong category, as they tend to destroy user confidence in the system and are not well tolerated in educational environments. We have found, however, that professionals seem much more forgiving of wrong critiques, as long as the time required to disregard them is minimal. This is similar to using spelling checkers, which wrongly highlight many proper names, acronyms, etc., but are still considered quite useful. In order to analyze CRITIQUE's current accuracy in an educational environment, we recently processed a number of student essays provided by the computer-aided writing program at Colorado State University. We randomly selected 10 essays from each of four groups: freshman composition, business writing, ESL (English as a Second Language), and professional writing.</Paragraph>
    <Paragraph position="13"> The diagnoses made by CRITIQUE in these essays were reviewed and classified according to whether they were correct, useful, or wrong. We did not consider errors that were missed, but simply concentrated on the correctness of the critiques actually provided by the system. The reason for this orientation was our concern with the potentially damaging effect of wrong advice.</Paragraph>
    <Paragraph position="14"> We adjusted the analysis in both directions, in a manner that we believe is fair. On the one hand, we did not count correct critiques of a trivial or mechanical nature, such as misspelled words, superficial punctuation checks, or readability scores. On the other hand, we also did not include a particular class of incorrect comma critiques, the handling of which we need to improve. All other non-trivial critiques generated by the system were counted. The results are shown in Table 1.</Paragraph>
    <Paragraph position="15">  non-trivial critiques The analysis confmned feedback we have received from users at IBM Research that CRITIQUE is most helpful on straightforward  texts before they are significantly revised. The more polished and almost literary style of the professional essays challenged CRITIQUE's ability to provide generally useful advice. The ESL texts, written by native Arabic, Chinese, and Spanish speakers, were also difficult, containing a large percentage of very ill-formed sentences. This is indicated by the higher number of useful critiques for this group, although it could be argued that these critiques may not be as useful to users who lack native intuitions about English.</Paragraph>
    <Paragraph position="16"> For the ESL group, correcting spelling errors first resulted in significantly better grammar-checking performance. This was not true for the other groups. In general, CRITIQUE also appears to be more accurate on texts with a shorter average sentence length.</Paragraph>
    <Paragraph position="17"> Conclusions Based on real experience in the application areas of office environments, publications organizations and educational institutions, CRITIQUE has been developed to a level of apparent usefulness. Acceptable system performance has been achieved through the use of distributed and optimized processing. The system has achieved a high level of robustness and flexibility in most of its aspects, including presentation. The accuracy of the system is currently acceptable for many types of texts and environments, and accuracy continues to improve with exposure in each of the three application areas. CRITIQUE exemplifies a framework for the development of broad-coverage, large-scale natural language text processing systems.</Paragraph>
    <Paragraph position="18"> A ckno wledgemen ts We would like to thank Professor Charles Smith and his colleagues from Colorado State University, who provided the student essays used in the analysis of accuracy, as well as valuable feedback about CRITIQUE. We also express sincere thanks to George Heidorn for his comments and guidance in the preparation of this paper.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML