File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/e89-1016_intro.xml
Size: 2,807 bytes
Last Modified: 2025-10-06 14:04:45
<?xml version="1.0" standalone="yes"?> <Paper uid="E89-1016"> <Title>User studies and the design of Natural Language Systems</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.1 Approaches to the evaluation of </SectionTitle> <Paragraph position="0"> NL systems It is clear that a number of different criteria might be employed in the evaluation of Natural Language (NL) systems. It is also clear that there is no consensus on how evaluation should be carried out \[RQR*88, GM84\]. Among the different criteria that have been suggested are (a) Coverage; (b) Learnability; (c) General software requirements; (d) Comparison with other interface media. Coverage is concerned with the set of inputs which the system should be capable of handling and one issue we will discuss is how this set should be identified. Learnability is premised on the fact that complete coverage is not forseeable in the near future. As a consequence, any NL system will have limitations and one problem for users will be to learn to communicate within such limitations. Learnability is measured by the ease with which new users are able to identify these coverage limitations, and exploit what coverage is available to carry out their task. The general software criteria of importance are speed, size, modifiability and installation and maintenance costs. Comparison studies have mainly required users to perform the same task using either a formal query language such as SQL or a restricted natural language and evaluated one against the other on such parameters as time to solution or number of queries per task\[SW83, JTS*85\]. Our discussion will mainly address the problem of coverage: we shall not discuss these other issues further.</Paragraph> <Paragraph position="1"> Our concern here will be with interactive NL interfaces and not other applications of NL technology such as MT or messaging systems. Interactive interfaces are not designed to be used in isolation, rather, they are intended to be connected to some sort of backend system, to improve access to that system.</Paragraph> <Paragraph position="2"> Our view is that NL systems should be evaluated with this in mind: the aim will be to identify the NL inputs which a typical user would want to enter in order to utillse that backend system to carry out a representative task. By representative task we mean the class of task that the back-end system was designed to carry out. In the case of databases, this would be accessing or updating information. For expert systems it might involve identifying or diagnosing faults. deg</Paragraph> </Section> </Section> class="xml-element"></Paper>