File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-0221_intro.xml
Size: 2,870 bytes
Last Modified: 2025-10-06 14:01:30
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0221"> <Title>Training a Dialogue Act Tagger For Human-Human and Human-Computer Travel Dialogues</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2000 Corpus </SectionTitle> <Paragraph position="0"> U: hello A: people's travel. what city do you want to y to U: chicago A: on what date U: on the twenty second of may there will be two people travelling A: what time do you need to depart U: as soon as possible after ve p.m. A: the rst ight after ve p.m. on that date is at ve thiry ve p.m. arriving in chicago at six oh six p.m. on u.s. air U: is that o'hare A: yes it is U: okay. i'll take that A: on what date do you want to return U: the following sunday A: at what time do you need to depart U: i want to arrive no later than seven in the evening</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Corpus </SectionTitle> <Paragraph position="0"> perience and on users' behavior. Furthermore, users in the HC dialogues rarely took initiative and their utterances showed very little variation (Doran et al., 2001). In addition, we believe that once the system side of the dialogues is labelled, it will be much easier to label the user side of the dialogues.</Paragraph> <Paragraph position="1"> We report the results of applying a rule-induction method to train and test DATE taggers on various combinations of the DARPA Communicator June-2000 and October-2001 HC corpora, and the CMU HH corpus in the travel planning domain. The accuracy of a DATE tagger trained and tested on the June-2000 corpus is 98.5a0 . On the October-2001 corpus, this tagger achieves an accuracy of only 71.8a0 , but adding 2000 utterances from the 2001 corpus to the training data improves accuracy on the rest of the 2001 corpus to 93.8a0 . The accuracy of a tagger trained on the HC corpora and tested on the CMU-corpus is 36.7a0 (a signi cant improvement over the baseline of 28a0 ). A DATE tagger trained on 305 examples of the HH data achieves an accuracy of 48.75a0 , but the addition of the HC training data improves accuracy to 55.5a0 (majority class baseline=28a0 ). This pair of results demonstrates quantitatively that the HC data can be used to improve performance of a tagger for HH data. However, a larger training corpus of HH data improves performance to 76.6a0 accuracy, as estimated by 20-fold cross-validation on the CMU-corpus.</Paragraph> <Paragraph position="2"> Section 2 describes the corpora, the DATE dialogue act tagging scheme, methods for tagging the corpora for the experiments, and the features used to train a DATE dialogue act tagger for DATE labelling of the corpora. Section 3 presents our results. We postpone discussion and comparison with related work till Section 4.</Paragraph> </Section> </Section> class="xml-element"></Paper>