File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/e06-1009_evalu.xml
Size: 2,684 bytes
Last Modified: 2025-10-06 13:59:34
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1009"> <Title>Information Presentation in Spoken Dialogue Systems</Title> <Section position="7" start_page="70" end_page="71" type="evalu"> <SectionTitle> 5.2 Results </SectionTitle> <Paragraph position="0"> A significant preference for our system was observed. (In the diagrams, our system which combines user modelling and stepwise refinement is called UMSR, whereas the system based on Polifroni's approach is called SR.) There were a total of 190 forced choices in the experiment (38 participants * 5 dialogue pairs). UMSR was preferred 120 times ([?] 0.63%), whereas SR was preferred only70times([?] 0.37%). Thisdifferenceishighly significant (p < 0.001) using a two-tailed binomial test. Thus, the null-hypothesis that both systems are preferred equally often can be rejected with high confidence.</Paragraph> <Paragraph position="1"> The evaluation results for the Likert scale questions confirmed our expectations. The SR dialogues received on average slightly higher scores for understandability (question 1), which can be explained by the shorter length of the system turns for that system. However, the difference is not statistically significant (p = 0.97 using a two-tailed paired t-test). The differences in results for the other questions are all highly statistically significant, especially for question 2, assessing the quality of overview of the options given by the system responses, and question 3, assessing the confidence that all relevant options were mentioned by the system. Both were significant at p < 0.0001. These results confirm our hypothesis that our strategy of presenting tradeoffs explicitly and summarizing irrelevant options improves users' overview of the option space and also increases their confidence in having heard about all relevant options, and thus their confidence in the system. The difference for question 4 (accessibility of the optimal option) is also statistically significant (p < 0.001). Quite surprisingly, subjects reported that they felt they could access options more quickly even though the dialogues were usually longer. The average scores (based on 190 val- null ues) are shown in Figure 7.</Paragraph> <Paragraph position="2"> To get a feel for whether the content given by our system is too complex for oral presentation and requires participants to read system turns several times, we recorded reading times and correlated them to the number of characters in a system turn. We found a linear relation, which indicates that participants did not re-read passages and is a promising sign for the use of our strategy in SDS.</Paragraph> </Section> class="xml-element"></Paper>