File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-1046_intro.xml
Size: 7,825 bytes
Last Modified: 2025-10-06 14:00:39
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-1046"> <Title>The Efficiency of Multimodal Interaction for a Map-based Task</Title> <Section position="3" start_page="334" end_page="336" type="intro"> <SectionTitle> 3 Results </SectionTitle> <Paragraph position="0"> Analyses revealed that multimodal interaction resulted in a 3.7-fold speed increase in creating units compared to the GUI, paired t-test, t (3) = 5.791, p < 0.005, one-tailed. In addition, it provided a 3.3-fold increase in creating control measures paired t-test t (3) = 8.298, p < 0.002, one-tailed (see Table I). 6 Much of this speed differential can be traced to the need to browse the echelons of the US military, scrolling long lists of units with the GUI (e.g., 126 units are in the list of US Army companies), followed by a separate dragging operation to position the selected unit. In contrast, QuickSet users specified the type of entity directly, and supplied its location in parallel. Likewise, the speed differential for the control measures may be attributed to the user's ability to both-draw and speak in parallel, where the GUI required separate actions for going into and out of drawing mode, for selecting the type of control measure, and for selecting appropriate points on the map.</Paragraph> <Paragraph position="1"> Although there were fewer errors on average when using the direct manipulation GUI, they were not significantly fewer than when interacting multimodally. In contrast, the time needed to repair an error was significantly lower when interacting multimodally than with the GUI, paired t-test, t (3) = 4.703, p<0.009, onetailed. On balance, the same users completing the same tasks spent 26% more total time correcting errors with the GUI than with the multimodal interface.</Paragraph> <Paragraph position="2"> s In general, the user could at that point say anything that would unify with the type of entity being created, such as &quot;facing two two five degrees in defensive posture.&quot; This would add additional data to the type of entity being created. Similar data could be added via the GUI, but it required interacting with a dialogue box that was only created after the unit's constituents were loaded (a time-consuming operation). Since QuickSet users could supply the data before the constituents were loaded, it was deemed more fair to ignore this QuickSet capability even though it speeds up multimodal interaction considerably, and employs more extensive natural language processing.</Paragraph> <Paragraph position="3"> It should be pointed out that the paired t-test takes into consideration the number of subjects. Thus, these findings at these significance levels are particularly strong. A second set of nonparametric tests (Wilcox on signed ranks) were also performed, indicating that multimodal interaction was significantly faster (p < 0.034, one-tailed), in creating units and control measures, and also in correcting errors.</Paragraph> <Paragraph position="4"> various types of entities and to repair errors when interacting multimodally versus with the Exlnit GUI The expert users were interviewed after the study regarding which interface they preferred and why. Multimodal interaction was strongly preferred by all users. Reasons cited included its efficiency and its support of precise drawing of linear and area features.</Paragraph> <Paragraph position="5"> Conclusions This study indicates that when the user knows what s/he wants, there can be substantial efficiency advantages of multimodal interaction over direct manipulation GUIs for a map-based taste. Despite having only four subjects, the results exhibited extremely strong statistical significance. These results stand in contrast to prior research \[6, 9, 10, 18\] in which speed advantages of spoken input were washed out by the cost of correcting recognition errors.</Paragraph> <Paragraph position="6"> In the present study, not only was multimodal interaction substantially faster than GUI-based interaction, even including error correction times, error correction itself was four times more costly with a GUI than with multimodal interaction. These findings do not support those of Karat et al. \[9\] who found that for correcting errors in a dictation task, keyboard-mouse input led to a 2.3-fold speed increase over speech.</Paragraph> <Paragraph position="7"> Both sets of findings might be reconciled by noting that advantages of any type of user interface, especially spoken and multimodal interaction, may be task dependent.</Paragraph> <Paragraph position="8"> We attribute the findings here to the ability of multimodal interfaces to support parallel specification of complementary parts of a communicative act, as well as direct rather than hierarchical or scrolled access to types of entities. Moreover, because the user can employ each mode for its strengths s/he can offload different aspects of the communication to different human cognitive systems, leading to greater efficiency \[21\] and fewer user errors \[131.</Paragraph> <Paragraph position="9"> It might be claimed that these results apply only to this GUI, and that a different GUI might offer superior performance. First, it is worth noting that the same pattern of results were found for the two GUI elements (drop-down list and hierarchical browser). Thus, the results cannot simply be attributed to the misuse of a hierarchical tool. Second, we point out that this GUI was developed as a product, and that many military systems use very similar user interface tools for the same purposes (selecting units)/ Thus, these results may have substantial practical impact for users performing this task.</Paragraph> <Paragraph position="10"> More generally, one study cannot establish results for all possible user interfaces. There will certainly be occasions in which a menu-based GUI will be superior to a multimodal interface - e.g., when the user does not in fact know what s/he wants and needs to browse.</Paragraph> <Paragraph position="11"> Other GUI interface tools, such as a search field with command completion, can be envisioned that would provide direct access. However, it is arguable that such an interface element belongs squarely to graphical user interfaces, but draws more on features of language. Also, it would require the user to type, even in circumstances (such as mobile usage) where typing would be infeasible. Given our philosophy of using each modality for its strengths, we believe multimodal and graphical user interfaces should be integrated, rather than cast as opposites.</Paragraph> <Paragraph position="12"> Finally, we would expect that these advantages of multimodal interaction may generalize to other tasks and other user interfaces in which 7 In fact, a recent experiment by the US Marines had mobile combatants using small portable computers with a similar direct manipulation interface as they participated in field exercises. The user interface was generally regarded as the weakest aspect of the experiment.</Paragraph> <Paragraph position="13"> selection among many possible options is required.</Paragraph> <Paragraph position="14"> Obviously, a small experiment only illuminates a small space. But it should be clear that when current technologies are blended into a synergistic multimodal interface the result may provide substantial improvements on some types of tasks heretofore performed with graphical user interface technologies. We conjecture that the more we can take advantage of the strengths of spoken language technology, the larger this advantage will become. Future research should be searching for more such tasks, and developing more general toolkits that support rapid adaptation of multimodal technologies to support them.</Paragraph> </Section> class="xml-element"></Paper>