File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/84/j84-1002_abstr.xml

Size: 5,965 bytes

Last Modified: 2025-10-06 13:46:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="J84-1002">
  <Title>A Formal Basis for Performance Evaluation of Natural Language Understanding Systems</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Research on natural language processing has recently been featured by the design and implementation of a number of experimental systems. Recent survey reports (Waltz 1977, Kaplan 1982) mention more than one hundred items among the most successful and relevant systems in the classical application fields of data base inquiry, machine translation, question answering, and man-machine interfacing.</Paragraph>
    <Paragraph position="1"> This trend is not surprising in the context of research whose specific aim is that of providing automated tools for the understanding or translating of natural languages; but it is also evident even in natural language research with a more theoretical flavour. The successful construction of a good performing system is  no, Italy.</Paragraph>
    <Paragraph position="2"> in fact often considered as the most evident proof of the validity of a theory, and, therefore, designing running systems is routine, and even sometimes the specific goal of several researchers.</Paragraph>
    <Paragraph position="3"> The task of evaluating the performance of a given system and that of comparing the behaviour of different systems appears, therefore, to be a fundamental issue. Despite its large recognized relevance (Woods 1977, Tennant 1980), measuring the performance of a system for natural language processing is still poorly defined. It mostly relies on intuitive reasoning and lacks a sound theoretical foundation. As Tennant clearly points out (1980), there is a nearly complete absence of meaningful evaluation in current natural language processing research. This leaves several crucial questions unanswered:  Copyright 1984 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not made for direct commercial advantage and the CL reference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission. 0362-613X/84/030015--16503.00 Computational Linguistics, Volume 10, Number 1, January-March 1984 15 Giovanni Guida and Giancarlo Mauri A Formal Basis for Performance Evaluation of NLUS The lack of evaluation constitutes a serious obstacle to the development of a sound technology in natural language processing.</Paragraph>
    <Paragraph position="4"> The purpose of this paper is to provide a formal and quantitative model for the performance evaluation task. In particular, we give a formal definition of &amp;quot;understanding power&amp;quot;, and we propose some techniques for measuring this feature in practice. Our proposal is based on several assumptions we discuss below. null First, we assume as object of our attention only that module of a natural language system that is devoted to understanding natural language, that is, to mapping input expressions into formal internal representations. This can clearly include several kinds of processing activities, such as linguistic analysis, reasoning, inferencing, etc.; but must have as ultimate goal the construction of a correct internal representation, not the production of any type of service to the end user of the natural language system. Thus, for example, a question answering system (Tennant 1979) does not belong to the class of natural language understanding systems that concern us; instead, it is the natural language interface it contains that meets exactly our requirements. null Second, we assume the following naive notion of performance: the extent to which a system is able to correctly understand natural language expressions in a given application domain. The resources needed by the system to accomplish its task are irrelevant in this case. In other words, we want to capture and measure the &amp;quot;power&amp;quot; of the system, in terms of how much and how well it is capable of understanding, not its &amp;quot;efficiency&amp;quot;, that is, how much does it cost (for example, in terms of time and memory requirements) to understand what it is capable of understanding.</Paragraph>
    <Paragraph position="5"> Third, we want to define a measure of performance that allows the evaluation of the input-output characteristics of a particular system in a given domain. This kind of measure is clearly inappropriate to reveal and test features, such as the power of a model as opposed to that of a particular implementation of it, the applicability of the model to other domains, its extensibility, etc., which are more closely related to the internal structure and mode of operation of a system, rather than to its input-output behaviour. The goal of evaluating such more general properties, worked on by Tennant (1980) through the method of abstract analysis (mainly based on taxonomies of conceptual, linguistic, and implementational issues), is not considered in this work.</Paragraph>
    <Paragraph position="6"> This paper is organized in the following way. In section 2 we discuss in an intuitive, yet precise, way the basic concepts involved in the performance evaluation problem, in order to have a sufficiently clear specification of what we want to formalize. Then, in section 3, we give an abstract definition of the formal model, and in section 4 we discuss some actual cases of particular interest. Section 5 presents some techniques that could be used to measure in practice the performance of a natural language understanding system. In section 6 we discuss some concluding remarks, and present open problems and promising topics for future research. A limited case study experimentation with the model proposed is presented in the appendix.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML