File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1009_intro.xml
Size: 5,191 bytes
Last Modified: 2025-10-06 14:05:17
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1009"> <Title>Human-Machine Problem Solving Using Spoken Language Systems (SLS): Factors Affecting Performance and User Satisfaction</Title> <Section position="2" start_page="0" end_page="49" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Data collection is a critical component of the DARPA Spoken Language Systems (SLS) program. Data are crucial not only for system training, development and evaluation, but also for analyses that can provide insight to guide future research and development. By observing users interacting with an SLS under different conditions, we can assess which issues may best be addressed by human factors and which will require technological solutions. System developers can benefit from considering not only initial use of an SLS, but also the experience of a user over time.</Paragraph> <Paragraph position="1"> Systems based on current technology work best when speech and language closely resemble the training data used to develop the system. However, there is considerable variability in the degree to which the speech and language of new users match that of the training data. The current paper examines the importance of this initial match. It is possible that users whose speech does not conform to the system may be able to adapt their behavior over time (e.g., Stem and Rudnicky \[11\]). In order to evaluate technology in terms of the demands of the application, we need to understand the extent and the nature of such adaptation and the conditions that affect it. Although system performance can be measured in a number of ways, in this paper, we focus on (1) self-reports of user satisfaction, and (2) recognition performance. Further studies could include additional measures.</Paragraph> <Paragraph position="2"> SRI has been collecting data in the air travel planning domain using a number of different systems (see Bly et al.</Paragraph> <Paragraph position="3"> \[1\]; Kowtko and Price \[5\]). In moving from wizard-based data collection to the use of SRI's SLS, we observed changes in user behavior that were associated with system errors. Some of these behaviors were adaptive; for example, learning to avoid out-of-vocabulary words or unusual syntax should facilitate successful interaction. Other behaviors, however, were non-adaptive and could actually impede the interaction. For example, speaking more loudly or in a hyperarticulate style may be detrimental to system performance insofar as these styles differ from those observed in training material dominated by wizard-mediated data in which system errors are minimal.</Paragraph> <Paragraph position="4"> It is difficult to predict how well an SLS will need to perform in order to be acceptable to users. Both speed and accuracy are crucial to system acceptability; we have therefore collected data using versions of the system that prioritize one of these parameters at the expense of the other. The present study first addresses the issue of user satisfaction with different levels of system speed and accuracy and then focuses on an example of an adaptive behavior and another that is maladaptive. These behaviors represent a subset of potential factors influencing human-machine interaction.</Paragraph> <Paragraph position="5"> Because these issues are not restricted to any particular system, they should be of general interest to developers of SLS technology.</Paragraph> <Paragraph position="6"> In the first study, we compared three points in the speedaccuracy space for this application: (1) an extremely slow but very accurate wizard-mediated system (described in Bly et al. \[1\]) with a 2-3 minute response time and a minimal error rate; (2) a software version of the DECIPHER recognizer with a response time of several times real time and a fairly low word error rate; and (3) a version of the DECIPHER recognizer implemented in special-purpose hardware using older word models, which has a very fast response time but currently has a higher word error rate.</Paragraph> <Paragraph position="7"> We compared user satisfaction based on responses to a post-session questionnaire.</Paragraph> <Paragraph position="8"> The second study investigated the effect of user experience on syntax and word choice. We hypothesized that one way users might adapt would be to conform to the language mode~s constraining recognition. We therefore measured recognition performance in subjects' first and second scenarios, and compared sentence perplexities in order to determine whether any changes in recognition performance could be attributed to a change in perplexity.</Paragraph> <Paragraph position="9"> The third study examined the effect of hyperarticulate speech on recognition and tested whether instructions to users could reduce this potentially maladaptive behavior.</Paragraph> <Paragraph position="10"> We coded each utterance for hyperarticulation and compared recognizer performance for normal and hyperarticulate utterances. We also compared rates of hyperarticulation for subjects who were either given or not given the instructions. null</Paragraph> </Section> class="xml-element"></Paper>