File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3004_intro.xml
Size: 3,140 bytes
Last Modified: 2025-10-06 14:02:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3004"> <Title>Virtual Modality: a Framework for Testing and Building Multimodal Applications</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Multimodal systems have recently drawn significant attention from researchers, and the reasons for such an interest are many. First, speech recognition based applications and systems have become mature enough for larger-scale deployment. The underlying technologies are gradually exhibiting increased robustness and performance, and from the usability point of view, users can see some clear benefits from speech-driven applications. The next evolutionary step is the extension of the &quot;one dimensional&quot; (i.e., speech-only) interface capabilities to include other modalities, such as gesture, sketch, gaze, and text. This will lead to a better and more comprehensive user experience.</Paragraph> <Paragraph position="1"> A second reason is the widely accepted, and expected, mobility and pervasiveness of computers. Devices are getting more and more powerful and versatile; they can be connected anywhere and anytime to networks, as well as to each other. This poses new demands for the user interface. It is no longer sufficient to support only a single input modality. Depending on the specific application, the given usage scenario, and the context, for example, users should be offered a variety of options by which to interact with the system in an appropriate and efficient way.</Paragraph> <Paragraph position="2"> Third, as the output capabilities of devices provide ever-increasing multimedia experiences, it is natural that the input mechanism must also deal with various modalities in an intuitive and comprehensive manner. If a map is displayed to the user, it is natural to expect that the user may want to relate to this physical entity, for instance, via gestures, pointing, gazing or by other, not necessarily speech-based, communicative means.</Paragraph> <Paragraph position="3"> Multimodal interfaces give the user alternatives and flexibility in terms of the interaction; they are enabling rather than restricting. The primary goal is to fully understand the user's intention, and this can only be realized if all intentional user inputs, as well as any available contextual information (e.g., location, pragmatics, sensory data, user preferences, current and previous interaction histories) are taken into account.</Paragraph> <Paragraph position="4"> This paper is organized as follows. Section 2 introduces the concept of Virtual Modality and how the multimodal data are generated. Section 3 explains the underlying Galaxy environment and briefly summarizes the operation of the Context Resolution module responsible for, among other tasks, resolving deictic references. The data generation as well as statistics is covered in Section 4. The experimental methodology is described in Section 5. Finally, the results are summarized and directions for future work are outlined.</Paragraph> </Section> class="xml-element"></Paper>