File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/p01-1012_concl.xml
Size: 4,294 bytes
Last Modified: 2025-10-06 13:53:00
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1012"> <Title>Detecting problematic turns in human-machine interactions: Rule-induction versus memory-based learning approaches</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> In this study we have looked at automatic methods for problem detection using simple features which are available in the vast majority of spoken dialogue systems, and require little or no computational overhead. We have investigated two approaches to problem detection. The first approach is aimed at testing whether a user utterance, captured in a noisya106 word graph, and/or the recent history of system utterances, would be predictive of whether the utterance itself would be misrecognised. The results, which basically represents a signal quality test, show that problematic cases could be discerned with an accuracy of about 65%. Although this is somewhat above the baseline of 58% decision accuracy when no problems would be predicted, signalling recognition problems with word graph features and previous system question types as predictors is a hard task. As other studies suggest (e.g., Hirschberg et al. 1999), confidence scores and acoustic/prosodic features could be of help.</Paragraph> <Paragraph position="1"> The second approach tested whether the word graph for the current user utterance and/or the recent history of system question types could be employed to predict whether the previous user a107 In the sense that it is not a perfect image of the users input.</Paragraph> <Paragraph position="2"> utterance caused communication problems. The underlying assumption is that users will signal problems as soon as they become aware of them through the feedback provided by the system.</Paragraph> <Paragraph position="3"> Thus, in a sense, this second approach represents a noisy channel filtering task: the current utterance has to be decoded as signalling a problem or not.</Paragraph> <Paragraph position="4"> As the results show, this task can be performed at a surprisingly high level: about 91% decision accuracy (which is an error reduction of 38%), with an a10a41a11a42a13 of the problem category of 89. This result can only be obtained using a combination of features; neither the word graph features in isolation nor the system question types in isolation offer enough predictive power to reach above the sharp baseline of 86% accuracy and an a10a41a11a42a13 on the problem category of 79.</Paragraph> <Paragraph position="5"> Keeping information sources isolated or combining them directly influences the relative performances of the memory-based IB1-IG algorithm versus the RIPPER rule induction algorithm. When features are of the same type, accuracies of the memory-based and the rule-induction systems do not differ significantly (with one exception). In contrast, when features from different sources (e.g., words in the word graph and question type features) are combined, RIPPER profits more than IB1-IG does, causing RIPPER to perform significantly more accurately. The feature independence assumption of memory-based learning appears to be the harming cause: by its definition, IB1-IG does not give extra weight to apparently relevant interactions of feature values from different sources. In contrast, in nine out of the twelve rules that RIPPER produces, word graph features and system questions type features are explicitly integrated as joint left-hand side conditions.</Paragraph> <Paragraph position="6"> The current results show that for on-line detection of communication problems at the utterance level it is already beneficial to pay attention only to the lexical information in the word graph and the sequence of system question types, features which are present in most spoken dialogue system and which can be obtained with little or no computational overhead. An approach to automatic problem detection is potentially very useful for spoken dialogue systems, since it gives a quantitative criterion for, for instance, changing the dialogue strategy (initiative, verification) or speech recognition engine (from one trained on normal speech to one trained on hyperarticulate speech).</Paragraph> </Section> class="xml-element"></Paper>