File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/p04-3033_evalu.xml
Size: 5,100 bytes
Last Modified: 2025-10-06 13:59:14
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-3033"> <Title>MATCHKiosk: A Multimodal Interactive City Guide</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Discussion and Related Work </SectionTitle> <Paragraph position="0"> A number of design issues arose in the development of the kiosk, many of which highlight differences between multimodal interfaces for kiosks and those for mobile systems.</Paragraph> <Paragraph position="1"> Array Microphone While on a mobile device a close-talking headset or on-device microphone can be used, we found that a single microphone had very poor performance on the kiosk. Users stand in different positions with respect to the display and there may be more than one person standing in front. To overcome this problem we mounted an array microphone above the touchscreen which tracks the loca-tion of the talker.</Paragraph> <Paragraph position="2"> Robust Recognition and Understanding is particularly important for kiosks since they have so many first-time users. We utilize the techniques for robust language modelling and multimodal understanding described in Bangalore and Johnston (2004).</Paragraph> <Paragraph position="3"> Social Interaction For mobile multimodal interfaces, even those with graphical embodiment, we found there to be little or no need to support social greetings and small talk. However, for a public kiosk which different unknown users will approach those capabilities are important. We added basic support for social interaction to the language understanding and dialog components. The system is able to respond to inputs such as Hello, How are you?, Would you like to join us for lunch? and so on.</Paragraph> <Paragraph position="4"> Context-sensitive GUI Compared to mobile systems, on palmtops, phones, and tablets, kiosks can offer more screen real estate for graphical interaction. This allowed for large easy to read buttons for accessing help and other functions. The system alters these as the dialog progresses. These buttons enable the system to support a kind of mixed-initiative in multimodal interaction where the user can take initiative in the spoken and handwritten modes while the system is also able to provide a more system-oriented initiative in the graphical mode.</Paragraph> <Paragraph position="5"> Printing Kiosks can make use of printed output as a modality. One of the issues that arises is that it is frequently the case that printed outputs such as directions should take a very different style and format from onscreen presentations.</Paragraph> <Paragraph position="6"> In previous work, a number of different multi-modal kiosk systems supporting different sets of input and output modalities have been developed.</Paragraph> <Paragraph position="7"> The Touch-N-Speak kiosk (Raisamo, 1998) combines spoken language input with a touchscreen.</Paragraph> <Paragraph position="8"> The August system (Gustafson et al., 1999) is a multimodal dialog system mounted in a public kiosk.</Paragraph> <Paragraph position="9"> It supported spoken input from users and multi-modal output with a talking head, text to speech, and two graphical displays. The system was deployed in a cultural center in Stockholm, enabling collection of realistic data from the general public. SmartKom-Public (Wahlster, 2003) is an interactive public information kiosk that supports multimodal input through speech, hand gestures, and facial expressions. The system uses a number of cameras and a video projector for the display. The MASK kiosk (Lamel et al., 2002) , developed by LIMSI and the French national railway (SNCF), provides rail tickets and information using a speech and touch interface. The mVPQ kiosk system (Narayanan et al., 2000) provides access to corporate directory information and call completion. Users can provide input by either speech or touching options presented on a graphical display. MACK, the Media Lab Autonomous Conversational Kiosk, (Cassell et al., 2002) provides information about groups and individuals at the MIT Media Lab. Users interact using speech and gestures on a paper map that sits between the user and an embodied agent.</Paragraph> <Paragraph position="10"> In contrast to August and mVPQ, MATCHKiosk supports composite multimodal input combining speech with pen drawings and touch. The SmartKom-Public kiosk supports composite input, but differs in that it uses free hand gesture for pointing while MATCH utilizes pen input and touch.</Paragraph> <Paragraph position="11"> August, SmartKom-Public, and MATCHKiosk all employ graphical embodiments. SmartKom uses an animated character, August a model-based talking head, and MATCHKiosk a sample-based videorealistic talking head. MACK uses articulated graphical embodiment with ability to gesture. In Touch-N-Speak a number of different techniques using time and pressure are examined for enabling selection of areas on a map using touch input. In MATCHKiosk, this issue does not arise since areas can be selected precisely by drawing with the pen.</Paragraph> </Section> class="xml-element"></Paper>