File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/84/j84-1002_concl.xml
Size: 3,470 bytes
Last Modified: 2025-10-06 13:56:04
<?xml version="1.0" standalone="yes"?> <Paper uid="J84-1002"> <Title>A Formal Basis for Performance Evaluation of Natural Language Understanding Systems</Title> <Section position="11" start_page="0" end_page="0" type="concl"> <SectionTitle> 26 Computational Linguistics, Volume 10, Number 1, January-March 1984 </SectionTitle> <Paragraph position="0"> Giovanni Guida and Giancarlo Mauri A Formal Basis for Performance Evaluation of NLUS be part of a future paper.</Paragraph> <Paragraph position="1"> For what concerns the main directions in the development of the current research activity, we mention: * experimentation with the model proposed in the evaluation of large systems; * development of appropriate sampling techniques for the experimental evaluation of 7r; * experimentation with several different choices of /~ and p; * design of techniques for special purpose evaluation (choice of the goal, definition of/~ and p, sampling, etc.); * analysis of the adequacy of the notion of/~-p-profile for representing all interesting details of the performance of a system.</Paragraph> <Paragraph position="2"> Beyond these issues we also point out two more ambitious and promising problems; they will be faced in future work. The approach to performance evaluation presented in this paper has two major limitations: first, it is only concerned with input-output behaviour and does not take into account the internal model on which a system is based; second, it does not deal with the efficiency of the natural language understanding process. As far as the former topic is concerned, it is clear that, except in the case where commercial applications are considered, one is primarily interested in models rather than in particular implementations. It is far more significant that a model, a knowledge representation method, and a parsing algorithm have been designed to build natural language understanding systems rather than that a specific system has been constructed in a particular domain for a particular use. Tennant (1980) (see also Woods 1977) proposes a method, called abstract analysis, to organize in an informal but disciplined way the evaluation, through taxonomies of conceptual, linguistic, and implementational issues, of the internal behaviour of a natural language system (including analysis of failure causes, domain dependent features, knowledge base completeness and closure, algorithm deficiencies, extensibility, etc.). A very demanding research issue that could substantially contribute to the development of the research on natural language processing is the definition of more formal methods that, starting from the above proposal, allow a &quot;deep&quot; evaluation and comparison of systems on the basis of their internal structure and mode of operation, opposed to the &quot;surface&quot; measure of their input-output behaviour, as considered in the present paper.</Paragraph> <Paragraph position="3"> Concerning the latter topic, efficiency, two aspects seem worth considering: the experimental measure of the efficiency of a specific system in understanding natural language that could appropriately complete the concept of performance defined in the present work; and the theoretical evaluation of the complexity of the general model underlying the construction of a particular system, which could possibly complete the notion of &quot;deep&quot; evaluation mentioned above.</Paragraph> </Section> class="xml-element"></Paper>