File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/h89-1032_intro.xml
Size: 2,909 bytes
Last Modified: 2025-10-06 14:04:50
<?xml version="1.0" standalone="yes"?> <Paper uid="H89-1032"> <Title>PLANS FOR A TASK-ORIENTED EVALUATION OF NATURAL LANGUAGE UNDERSTANDING SYSTEMS</Title> <Section position="3" start_page="0" end_page="197" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> This project undertakes to provide meaningful measures of progress in the field of natural language processing (NLP). In particular, it is intended to result in definition of a theory- and implementation-independent test of the text analysis capabilities of text understanding systems that analyze short (paragraph-length) texts taken from military messages. The test is task-oriented in order to facilitate assessment of the general state of the art and provide a meaningful basis for comparing notes across systems. This design would seem to have two major problems, however: the reduction of a system's capabilities to a simple quantification of right versus wrong answers, and the lack of desired focus on understanding capabilities versus application capabilities.</Paragraph> <Paragraph position="1"> It is claimed, however, that if the task performance is recorded on development data as well as test data and is repeated on the test data after updates are made, additional insights can be gained into a sytem's robustness, breadth and depth of coverage, and potential for handling novel text. A measurement of utility can be gained as well, by measuring performance on the original task, versus performance using a version of the inputs in which punctuation and spelling errors, highly elliptical constructions and sublanguage constructions have been eliminated. These additional measurements open up the black box to some extent, providing information that far exceeds what would be obtainable from a single measurement of performance on the test data in a blind test.</Paragraph> <Paragraph position="2"> Also, despite the fact that the NLP systems are treated as black boxes, the evaluation should provide significant insights into their understanding versus application capabilities, because successful performance of the task does not require that back end modules contribute substantive information to the template fills. For example, the template fills do not require that any computations be performed on the data. This aspect of the test design is another way in which the black box has been opened up or narrowed down to increase the meaningfulness of the results. It is important, however, to recognize that the test is applicable only to &quot;complete&quot; and non-interactive systems, ones that are capable of accepting unseen texts and working essentially without human intervention to understand them.</Paragraph> <Paragraph position="3"> This project is funded by DARPA/ISTO under ARPA Order No. 6359, Program Code 8E20, Program Element Code 62301E.</Paragraph> </Section> class="xml-element"></Paper>