File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/h05-1127_concl.xml
Size: 2,337 bytes
Last Modified: 2025-10-06 13:54:31
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1127"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 1011-1018, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Learning Mixed Initiative Dialog Strategies By Using Reinforcement Learning On Both Conversants</Title> <Section position="7" start_page="1017" end_page="1017" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we proposed using reinforcement for learning a dialog strategy for the system. Our approach differs from past research in that we learn the system policy in conjunction with learning a user policy. This approach of learning the user policy allows us to minimize human involvement, as neither a training corpus must be collected nor a simulated user built. Thus, the only human input required for this approach was to define the domain task and to define success in that domain. While our training approach did not always find an effective policy, we overcame this obstacle by carefully choosing a ratio for the weights in the objective function and by running the learning algorithm multiple times. Our approach resulted in learned system and user dialog policies that achieved comparable performance with handcrafted system and user policy pairs. Furthermore, the learned system policies were robust.</Paragraph> <Paragraph position="1"> When the learned system policies 'conversed' with the handcrafted user policies, the resulting dialogs had comparable solution quality to what the hand-crafted system and user policies achieved together.</Paragraph> <Paragraph position="2"> Even with the lack of convergence our approach could be applied to more complicated domains in order to learn an effective dialog policy. Our approach would be especially useful in situations where there are no existing corpora of human-human interactions for the domain or as a way to provide a check against a policy based on human intuition. In most situations where the domain requires significant collaboration between the dialog system and the user, training both the system and a user simultaneously will prove to be much less costly and labor intensive approach.</Paragraph> </Section> class="xml-element"></Paper>