File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2301_metho.xml
Size: 28,392 bytes
Last Modified: 2025-10-06 14:09:22
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2301"> <Title>Usability and Acceptability Studies of Conversational Virtual Human Technology</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> An &quot;accessible&quot; user interface is one that is easy to learn and easy to use, and can result in measurable goals such as decreased learning time and greater user satisfaction (i.e., acceptance) [28]. Characteristics of easy to learn and easy to use interfaces have been de-scribed as having navigational and visual consistency, clear communication between the user and application, appropriate representations, few and non-catastrophic errors, task support and feedback, and user control [15,20,21,28].</Paragraph> <Paragraph position="1"> As part of our Technology Assisted Learning (TAL) initiative, we have been particularly interested in how accessible responsive virtual human technology (RVHT) applications are. Usability testing, commonly conducted for commercial software to ensure that it meets the needs of the end user, is likewise vital to creating effective training and assessment software employing innovative technologies. This paper presents findings from a series of studies investigating how users accept and evaluate RVHT applications.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.1 Background on RVHT and TAL </SectionTitle> <Paragraph position="0"> Since approximately 1996, we have worked on a series of PC-based applications in which the user interacts with responsive virtual characters. Applications have ranged from trauma patient assessment [13] to learning tank maintenance diagnostic skills [9] to gaining skills in avoiding non-response during field interviews [3]. In these applications, which we collectively categorize as involving RVHT, the PC simulates a person's behavior in response to user input. Users interact with the virtual characters via voice, mouse, menu, and/or keyboard.</Paragraph> <Paragraph position="1"> We are certainly not alone in developing training, assessment, marketing, and other RVHT applications (see, e.g., [2,4,7,16,17,19,22,24,25]), but the breadth across do-mains and combination of technologies is unusual.</Paragraph> <Paragraph position="2"> The RVHT applications are representative of those developed in our TAL division. We define TAL as &quot;proactively applying the benefits of technology to help people train more safely, learn better, retain skills longer, and achieve proficiency less expensively&quot;. We develop TAL applications for jobs requiring complicated knowledge and skills, complex or expensive equipment or work material, a high cost of on-the-job training or failure on the job, jobs where safety or spatial awareness is essential, and for large student throughput requirements [6,12].</Paragraph> <Paragraph position="3"> Practicing skills in a safe and supportive environment allows the student to learn flexible approaches. Flexibility is critical for interaction skills [8] and for performing well under time constraint, informationpoor, and other difficult conditions [4,14]. The consistency that is gained by repeating this practice in virtual environments leads directly to good decisions on the job [24]. By practicing skills in safe, computer-generated settings, students have the opportunity through repetition to develop practical experience and skills which would otherwise be difficult to acquire. Practice also leads to increased confidence prior to the first real on-the-job experience.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.2 RVHT Architecture </SectionTitle> <Paragraph position="0"> We have developed a PC-based architecture, Avatalk, that enables users to engage in unscripted conversations with virtual humans and see and hear their realistic responses [10]. Among the components that underlie the architecture are a Language Processor and a Behavior Engine. The Language Processor accepts spoken input and maps this input to an under-lying semantic representation, and then functions in reverse, mapping semantic representations to gestural and speech output.</Paragraph> <Paragraph position="1"> Our applications variously use spoken natural language interaction [9], text-based inter-action, and menu-based interaction. The Behavior Engine maps Language Processor output and other environmental stimuli to virtual human behaviors. These behaviors include decision-making and problem solving, performing actions in the virtual world, and spoken dialog. The Behavior Engine also controls the dynamic loading of contexts and knowledge for use by the Language Processor. The virtual characters are rendered via a Visualization Engine that performs gesture, movement, and speech actions, through morphing of vertices of a 3D model and playing of key-framed animation files (largely based on motion capture data). Physical interaction with the virtual character (e.g., using medical instruments) is realized via object-based and instrument-specific selection maps [29]. These interactions are controlled by both the Behavior Engine and Visualization Engine.</Paragraph> <Paragraph position="2"> We keep track of domain knowledge via state variable settings and also by taking advantage of the planning structure inherent in our architecture [11]. Our virtual humans reason about social roles and conventions (what can be stated or asked at any point in the dialog) [23] and grammar definitions (how it gets stated or asked). The architecture was designed to al-low application creators flexibility in assigning general and domain-specific knowledge. Hence, our virtual humans discuss relevant concerns or excuses based on specific setup variables indicating knowledge level and initial emotional state. Our personality mod-els and emotion reasoning are based on well-accepted theories that guide realistic emotional behavior [1,4,23,24,26]. After user input, we update emotional state based on lexical, syntactic, and semantic analyses [11].</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.3 Overview of Paper </SectionTitle> <Paragraph position="0"> We present findings from studies of four different applications. The applications are, in order of presentation, a virtual pediatric standardized patient, a trainer for practicing informed consent procedures, a telephone surveys interview trainer, and two implementations of a tradeshow booth marketing product. For each we briefly describe the application, the participants, and the results, concentrating on results that get at accessibility, engagement, and usability. We tie the results together in a lessons learned section.</Paragraph> </Section> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Virtual Pediatric Patient </SectionTitle> <Paragraph position="0"> Training and assessment in pediatrics is complicated by the poor reliability of children to behave in a consistent manner. Consequently, curriculum is difficult to develop, performance assessment is restricted, and practice opportunities are limited. Our goals using RVHT have been to develop specific interactive training sessions using virtual pediatric characters and to explore related educational issues [10].</Paragraph> <Paragraph position="1"> One educational issue in pediatric medicine is instruction. Medical students rotating through pediatrics have limited exposure to children and are given limited one-on-one faculty observation time, hence the curricular material is mostly passive, while on-the-job learning involves variable experiences with behaviors or problems and dispersed learners. Another educational need in pediatric medicine is associated with assessment, since there is no reliable or valid authentic assessment in young children (it is currently text-based or multimedia videos) as is possible with standardized patients for adults, and since interaction skills with children may not be valued by the student.</Paragraph> <Paragraph position="2"> Our use of virtual pediatric patients follows models of experiential learning, where abstract conceptualization leads to active engagement and experimentation, which leads to concrete experience, which leads to reflective observation, which leads back to the beginning of the cycle [15,21]. By adding virtual characters, we are adding experiential learning to the traditional classroom, discussions, and rounds.</Paragraph> <Paragraph position="3"> Our work supports training and assessment not only of verbal interaction skills, but also of medical diagnostic skills, dealing with the spectrum of behavioral responses, and other types of high-level problem solving.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Methods </SectionTitle> <Paragraph position="0"> Three specific interactive pediatric scenarios have been developed to date in our virtual pediatric standardized patient (VPSP) application. In one scenario, the clinician is tasked with obtaining an ear exam in a very young girl. The girl may be helpful if she is healthy but whiny if she has an ear infection. In another scenario, the clinician is asked to examine the lungs of a pre-teen boy. In the last scenario, the clinician must obtain a high-risk behavior history from a teenage girl.</Paragraph> <Paragraph position="1"> Educational issues that we are addressing include defining - and identifying - pediatric interactive strategies, program validity, scoring performance, and providing feedback. Our goal is to provide information for a &quot;gold-standard&quot; setting language acquisition to improve the robust nature of the interaction, and to address face, content, and construct validity. We hypothesize that expert and novice users will provide valuable development information about language and strategies in these scenarios, and the differences will exist based upon expertise with children and technology experience.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Results </SectionTitle> <Paragraph position="0"> Interactive pediatric scenarios were created and shown to content and educational experts. The first rounds of feedback from experts, on the girl and boy scenarios, came at exhibits sessions at the Association of American Medical Colleges annual meeting in November 2002 and the Medicine Meets Virtual Reality conference in January 2003. From comments at these sessions we revised the scenarios and added the adolescent scenario. The latest round of feedback, and the results de-scribed here, came from the Council on Medical Student Education in Pediatrics (COMSEP) annual meeting in April 2003.</Paragraph> <Paragraph position="1"> Fourteen attendees at the COMSEP meeting were recruited to use the VPSP application. The attendees were first given a questionnaire asking about their experience with completing ear exams, lung exams, and adolescent social history, and also about their computing experience. They were then given brief instructions on how to use the application, told to choose whichever of the scenarios they wanted, and handed headphones to avoid distraction. Finally, they were given a questionnaire asking their perceptions of the realism of the application in comparison to clinical experience.</Paragraph> <Paragraph position="2"> In a way, this was the toughest group of all we've tested. These participants were true experts, unaware of the technology (until a debriefing at the end of each session), and presented with an application prototype.</Paragraph> <Paragraph position="3"> Given this rationalization, the data are acceptable. On average, these participants rated the realism of the response time and the realism of the objections, concerns, and questions posed by the virtual characters as &quot;somewhat&quot; realistic. They rated the realism of the overall conversation as a little better than &quot;not very&quot; realistic. However, somewhat surprisingly, when asked to compare the simulated clinical experience with real clinical experience, the participants rated the comparison as somewhat challenging, that is, the comparison is reasonable. Four of the participants even found the simulated experience &quot;moderately&quot; or &quot;extremely&quot; challenging. Analysis of the participants' log files shows they spent an average of almost 4 1/2 minutes in the scenarios, taking eight conversational turns, and collectively covering 32 topics (of a possible 130 topics across all scenarios, and with no prompting). The participants were observed to take the cases seriously, asking strategic questions to get the virtual character to cooperate, and becoming frustrated when their questions were misinterpreted. (We are pleased by frustration, as it implies engagement, though anxious, too, to make the application work better.) We take these data as encouraging, but fully understand the need to revise in depth the language and behavior of the virtual characters to satisfy acceptance criteria.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Practice on Informed Consent </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Methods </SectionTitle> <Paragraph position="0"> Under grant funds to enhance our IRB program, we created a virtual reality simulation for teaching proper informed consent procedures. In the application, a potential survey respondent poses questions regarding the survey, the sponsor, confidentiality, privacy, compensation, and contact information [27].</Paragraph> <Paragraph position="1"> In November 2003, we presented the trainer to a group of five experienced telephone or field interviewers who were being trained for a study intended to better understand the health effects of exposure to smoke, dust, and debris from the collapse of the World Trade Center. We observed the participants and also had them, after completing the interactions, fill out a short questionnaire on their familiarity with computers and their impressions of the application.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Results </SectionTitle> <Paragraph position="0"> The application forced the respondents to touch on all aspects of informed consent before finishing. The only way an interaction could be cut short was if the participant replied incorrectly to a question (e.g., giving the wrong sponsor name, or indicating that participation was mandatory rather than voluntary). Participants interacted no fewer than three times with the virtual character and up to six times.</Paragraph> <Paragraph position="1"> The results were generally positive, particularly in the subjects' assessment of usability and enjoyment.</Paragraph> <Paragraph position="2"> The realism of the character was consistently rated by the participants as moderately realistic (average of 5.2 on a 7 point scale), a decent rating given the virtual character's relatively few body movements and facial gestures. Ease of use (5.8), enjoyment (5.6), and effectiveness (5.4) were all rated moderately to very easy, enjoyable, or effective. An observer also rated the level of engagement of the interaction. Engagement, verbalization, and information seeking were all moderately or highly demonstrated. Participants were judged either relaxed or amused by the interaction, they responded in a moderate amount of time, and they appeared to comprehend what was being asked. As would be expected, they were also judged to find the interaction not at all provocative and needed very little negotiation.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Telephone Survey Interviewer </SectionTitle> <Paragraph position="0"> One of the most difficult skills for a telephone interviewer to learn is gaining cooperation from sample members and avoiding refusals. In telephone interviewing in particular, the first half-minute on the telephone with a sample member is crucial. Sample members almost automatically turn to common phrases to avoid taking part in surveys: &quot;How long will this take?&quot; &quot;How was I selected?&quot; &quot;I don't do surveys.&quot; &quot;I don't have time.&quot; &quot;I'm just not interested.&quot; &quot;What is the survey about?&quot; Non-response research suggests that the best approach to obtaining participa-tion is for the interviewer to immediately reply with an appropriate, informative, tailored response [2,9].</Paragraph> <Paragraph position="1"> We tested an RVHT application designed to simulate the first 30-60 seconds of a telephone interview [21]. Interviewers begin with an introduction and then need to respond to a series of objections or questions raised by the virtual respondent. Ultimately, the virtual character ends the conversation by either granting the interview or hanging-up the telephone. The emotional state of the virtual respondent varies from scenario to scenario. A total of six basic objections were recorded in four different tones of voice for both a male and female virtual respondent.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Methods </SectionTitle> <Paragraph position="0"> The assessment provided here of the interviewer training module is based on researcher/instructor observations, and user debriefings in the form of a questionnaire. Empirical data were collected on users' observed ability to interact with the application as well as their perception of the interaction. The training application was tested with a group of 48 novice telephone interviewers during Spring 2002.</Paragraph> <Paragraph position="1"> To evaluate the accessibility of the application we focused on the users' understanding of the application's basic features, their ability to complete each task, and capabilities shown by different users (e.g., based on ethnicity, job level, and education level. To evaluate acceptance of the application by the trainees, we debriefed participants using a structured questionnaire and moderator-facilitated focus groups to gauge reactions and engagement in the application. We were interested in the virtual humans' realism, speed and accuracy of the speech recognition, and detection of changes in the emotive states of the virtual human.</Paragraph> <Paragraph position="2"> Finally, each training session was observed by either the researchers or training instructors, who made notes of their observations. These observations are included as part of the analysis.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4.2 Results </SectionTitle> <Paragraph position="0"> Ease of Use: Users of the RVHT application found it very accessible, with 84 percent indicating the software was either extremely easy or very easy to use (52% extremely, 31% very, 13% somewhat, 4% not too, 0% not at all). Only eight (17%) of the 48 trainees indicated that they required additional assistance to use the training software (after the initial training received by all trainees). null Realism of the Training Environment: The promise of RVHT-based training tools is that they can simulate a real environment, thereby allowing trainees repetitive practice in conditions that are as close as possible to what they will encounter on the job. For this particular application, the virtual respondent needed to mirror the behaviors and emotions of real respondents encountered when doing live interviewing. This means delivering an array of objections to the trainees in different tones of speech and emotional levels in a fastpaced manner. Interviewers were asked a series of questions to try to assess how well they accepted the virtual environment as a substitute for real work conditions.</Paragraph> <Paragraph position="1"> The answer is somewhat mixed. In general, trainees did not find the virtual environment to be realistic and they cited two primary reasons: the slowness of the response of the virtual respondent and the limited number of different objections/questions offered by the virtual respondent. They did, however, find the responses that were offered to be realistic and stated that they could detect and respond to changes in tone and emotional cues offered by the virtual respondents. A majority of the trainees also indicated that they felt the sessions helped them to improve their skills needed at the outset of an interview either &quot;somewhat&quot; or &quot;a lot&quot;. When asked how realistic they found the overall conversation with the virtual respondent, 17 percent of participants said they thought it was &quot;extremely&quot; or &quot;very&quot; realistic, and 44 percent said it was &quot;somewhat&quot; realistic. Slowness of the virtual respondents in replying (due to the lag caused by the speech recognizer as it interpreted the interviewer's responses and determined the next script to launch) was the primary problem cited by interviewers. Over three-quarters (77%) of the users felt the response time was too slow (4% felt it was too fast and 19% indicated the speed was just right).</Paragraph> <Paragraph position="2"> The trainees were, however, more positive when evaluating the realism of the objections and questions offered by the virtual respondent. A plurality (48%) indicated that the content of what was said was either &quot;extremely&quot; or &quot;very&quot; realistic, with 40 percent saying it was &quot;somewhat&quot; realistic. They also felt it was relatively easy to determine the emotional state of the virtual respondent based on the tone of voice they heard (23% &quot;extremely&quot; easy, 44% &quot;very&quot; easy, 29% &quot;somewhat&quot; easy, and 4% &quot;not too&quot; easy). Likewise, the content of the speech used by the virtual character was also a good cue to trainees as to the virtual human's emotional state (8% &quot;extremely&quot; easy to tell, 54% &quot;very&quot; easy, 27% &quot;somewhat&quot; easy, 10% &quot;not too&quot; easy). Being able to recognize changes in the emotional state of the virtual respondent changed how the interviewer approached the situation. Nearly 60 percent indicated that they behaved differently in the practice scenario based on the tone of the virtual respondent's voice. Thus, the content of the objections raised by the virtual respondent and the emotional behavior of the virtual human were generally accepted by the trainees and caused them to react differently within the various training scenarios. It appears, however, that while the interviewers do recognize and react to emotional cues, they do not necessarily process these as being very distinct. They focus more on the actual content of the argument (regardless of the tone of voice or gender) when considering how diverse the scenarios offered are.</Paragraph> <Paragraph position="3"> Enjoyment and Reuse: An effective training tool is also one that trainees should enjoy using, would use again, and recommend to others. Approximately two-thirds (65%) of the users said that they found using the RVHT software to be fun and enjoyable. Nearly three-quarters (73%) said they would like to use the software again. In addition, 83 percent said they would recommend the program as a training tool for other interviewers. In open-ended responses, a number of interviewers indicated that it would be a very good practice vehicle for new or less experienced interviewers.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 ExhibitAR Applications </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Methods </SectionTitle> <Paragraph position="0"> Using earlier versions of the same underlying technology we created a product called ExhibitAR that was positioned as a virtual tradeshow attendant. It was put into operation as a kiosk, drawing attention to the booth, augmenting the sales and marketing staff, and providing engaging dialog with visitors regarding the company and company products. We report on user data collected at three particular venues, the Exhibitor Show held in February 1999, the Space Congress held in April 1999, and the American Society for Training and Development (ASTD) International Conference & Exposition held in May 1999.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Results </SectionTitle> <Paragraph position="0"> The ExhibitAR product did attract visitors to the booths and answered visitors' questions, a definite advantage on the competitive tradeshow floor. At the Space Congress show, in front of a reasonably technical audience over four days, the application attracted 335 visitors, who conversed with the virtual characters an average of 61.4 seconds with five conversational turns. At ASTD, with less technical attendees, over three days, 197 visitors spoke with the virtual characters for an average of 28.4 seconds and 2.6 conversational turns.</Paragraph> <Paragraph position="1"> We analyzed not only the number of visitors and their conversations, but also the content of the conversations. For ASTD, every single one of the 63 topics of conversation was covered at least once. The average per topic was almost nine occurrences (i.e., nine different visitors asked about the topic). For Space Congress, again every topic was covered, the average number of times per topic for the 39 topics was 35 occurrences.</Paragraph> <Paragraph position="2"> The most common topics for both applications were a greeting, asking about the company and its associates, asking what to do or say, asking how the technology worked, and asking the current date or time, but topics specific to each application were also discussed.</Paragraph> <Paragraph position="3"> The Exhibitor data are less telling, but this was the show at which ExhibitAR was introduced, and this was the only venue where the visitor was not at all prompted. The application attracted 45 visitors over 2 1/2 days, each visitor averaging 2.5 turns and 21.4 seconds. Though each of the 25 topics was covered at least once, the only topic that was covered considerably more often than any other was a request for assistance.</Paragraph> <Paragraph position="4"> (This led us to devise a prompting feature.) Visitor data from RVHT marketing applications are not conclusive of usability or acceptability, but suggestive. Even at the time these data were collected (five years ago), less technical users were sufficiently engaged to converse with the virtual characters for just under half a minute, and more technical users for just over a minute. Given prompting, the users covered the range of topics designed into the applications. It is important that these users had never before seen the applications, had no training or practice time, had to learn to use the applications at that moment, yet stuck with the conversation for a significant period of time.</Paragraph> <Paragraph position="5"> It is only anecdotal data, but RVHT continues to attract visitors to exhibit booths. The various applications described in earlier sections, and others, have been shown since 1999 at least a dozen times to audiences varying from educators to medical practitioners to public health workers to military service personnel. Visitors are increasingly less surprised (skeptical?) to encounter virtual characters, and more impressed with the state of the art. They appear willing to accept virtual characters as sensible for training, assessment, and marketing uses.</Paragraph> </Section> </Section> class="xml-element"></Paper>