File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1024_concl.xml
Size: 2,995 bytes
Last Modified: 2025-10-06 13:55:12
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1024"> <Title>Learning More Effective Dialogue Strategies Using Limited Dialogue Move Features</Title> <Section position="8" start_page="190" end_page="191" type="concl"> <SectionTitle> 5 Conclusion and Future Work </SectionTitle> <Paragraph position="0"> We have used user simulations that are n-gram models learned from COMMUNICATOR data to explore reinforcement learning of full dialogue strategies with some high-level context information (the user and and system's last dialogue moves). Almost all previous work (e.g. (Singh et al., 2002; Pietquin, 2004; Schef er and Young, 2001)) has included only low-level information in state representations. In contrast, the exploration of very large state spaces to date relies on a hybrid supervised/reinforcement learning technique, where the reinforcement learning element has not been shown to signi cantly improve policies over the purely supervised case (Henderson et al., 2005).</Paragraph> <Paragraph position="1"> We presented our experimental environment, the reinforcement learner, the simulated users, and our methodology. In testing with the simulated COMMUNICATOR users, the new strategies learned with higher-level (i.e. dialogue move) information in the state outperformed the low-level RL baseline (only slot status information) by 7.8% and the original COMMUNICATOR systems by 65.9%. These strategies obtained more reward than the RL baseline by lling and conrming all of the slots with fewer system turns on average. Moreover, the learned strategies show interesting emergent dialogue behaviour such as making effective use of the 'give help' action and switching focus to different subtasks when the current subtask is proving problematic.</Paragraph> <Paragraph position="2"> In future work, we plan to use even more realistic user simulations, for example those developed following (Georgila et al., 2005a), which incorporate elements of goal-directed user behaviour. We will continue to investigate whether we can maintain tractability and learn superior strategies as we add incrementally more high-level contextual information to the state. At some stage this may necessitate using a generalisation method such as linear function approximation (Henderson et al., 2005). We also intend to use feature selection techniques (e.g. CFS subset evaluation (Rieser and Lemon, 2006)) on in order to determine which contextual features this suggests are important.</Paragraph> <Paragraph position="3"> We will also carry out a more direct comparison with the hybrid strategies learned by (Henderson et al., 2005). In the slightly longer term, we will test our learned strategies on humans using a full spoken dialogue system. We hypothesize that the strategies which perform the best in terms of task completion and user satisfaction scores (Walker et al., 2000) will be those learned with high-level dialogue context information in the state.</Paragraph> </Section> class="xml-element"></Paper>