File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-0221_concl.xml
Size: 3,852 bytes
Last Modified: 2025-10-06 13:53:17
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0221"> <Title>Training a Dialogue Act Tagger For Human-Human and Human-Computer Travel Dialogues</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 4 Discussion and Future Work </SectionTitle> <Paragraph position="0"> In summary our results show that: (1) It is possible to assign DATE dialogue act tags to system utterances in HC dialogues from many different systems for the same domain with high accuracy; (2) A DATE tagger trained on data from an earlier version of the system only achieves moderate accuracy on a later version of the system without a small amount labelled training data from that later version; (3) Labelled training data from HC dialogues can improve the performance of a DATE tagger for HH dialogue when only a small amount of HH training data is available.</Paragraph> <Paragraph position="1"> Previous work has also reported results for dialogue act taggers, using similar features to those we use, with accuracies ranging from 62a0 to 75a0 (Reithinger and Klesen, 1997; Shriberg et al., 2000; Samuel et al., 1998). Our best accuracy for the HC data is 98a0 . The best performance for the HH corpus is 76a0 accuracy for the cross-validation study using only HH data. However, accuracies reported for previous work are not directly comparable to ours for several reasons. First, some of our results concern labelling the system side of utterances in HC dialogues for the purpose of automatic evaluation of system performance. It is much easier to develop a high accuracy tagger for HC dialogue than it is for HH dialogue.</Paragraph> <Paragraph position="2"> We also applied the DATE tagger to HH dialogue, and focused on the travel agent side of the dialogue.</Paragraph> <Paragraph position="3"> Here the accuracies that we report are more comparable with that of other researchers, but large differences should nevertheless be expected due to differences in the types of corpora, dialogue act tagging schemes, and features used.</Paragraph> <Paragraph position="4"> We considered the possibility of generating dialogue acts automatically in the log les. This idea was attractive because it is possible to easily implement the generation of dialogue acts tags in the log les. Large amounts of human-computer data would then be available for the human-human labelling task or for evaluation efforts. However, this turned out to be impractical because we found it difcult to get dialogue designers across the different participating sites to agree on a labelling standard.</Paragraph> <Paragraph position="5"> We therefore believe that machine learning methods for classi cation such as the one discussed here might still be necessary to automate the tagging task for rapid evaluation and labelling efforts.</Paragraph> <Paragraph position="6"> As part of the ISLE NSF/EU project, the labelled corpus that we developed for this work will soon be released by the LDC, and other researchers will then be able to utilize it to improve upon our results. In addition, we believe this corpus could be useful as a training resource for spoken response generation in dialogue systems. For example, the dialogue act representation can be used to provide a broad range of text-planning inputs for a stochastic sentence planner in the travel domain (Walker et al., 2001b), or to represent the systems' dialogue strategies for reinforcement learning (Walker, 2000; Schef er and Young, 2002). In future work, we hope to demonstrate that features derived from the labelling of the system side of the dialogue can also improve performance of a dialogue act tagger for the human utterances in the dialogue, and to conduct additional analyses demonstrating the utility of this representation for cross-site evaluation.</Paragraph> </Section> class="xml-element"></Paper>