XML Viewer - w02-0213

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0213_metho.xml
Size: 16,043 bytes
Last Modified: 2025-10-06 14:07:57
<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0213">
  <Title>Dialogue Act Recognition with Bayesian Networks for Dutch Dialogues</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Bayesian Networks and Speech Act
Recognition
</SectionTitle>
    <Paragraph position="0"> Since Austin and Searle deliberately producing a linguistic utterance ('locutionary act') is performing a speech act ('illocutionary act'). Many researchers have contributed in distinguishing and categorising types of speech acts we can perform.</Paragraph>
    <Paragraph position="1"> See (Traum, 2000) for a valuable discussion on dialogue act taxonomies and an extensive bibliography. null A dialogue system needs a user model. The better the user model the better the system is able to understand the user's intentions from the locutionary act. We consider the human participant in a dialogue as a source of communicative actions. Actions can be some verbal dialogue act or some non-verbal pointing act (the act of pointing at some object). We assume the user is rational: there is a dependency between the action performed and the intentional state of the user. If we restrict to communicative acts that are realized by uttering (speaking or typing) a sentence we can model the user by a probability distribution</Paragraph>
    <Paragraph position="3"> produces an utterance u (the stochastic variable U has value u) given that he performs a dialogue act da ( DA has the value da). Or - maybe better: the confidence we can have in believing that the user uses utterance u if we know that the dialogue act he performs is da. Since there are many distinct wordings u for performing a given dialogue act da and on the other hand there are distinct dialogue acts that can be performed by the same utterance, we need more than superficial linguistic information to decide upon the intended dialogue act given an utterance. The task of the dialogue act recognition (DAR) module of a dialogue system is to answer the question: what is the most likely dialogue act da intended by the user given the system has observed the utterance u in a dialogue context c. (Notice that we have equated the utterance produced by the user with the utterance recognised by the system: there is no information loss between the module that records the utterance and the input of the dialogue act recognition module.) To make this problem tractable we further restrict the model by assuming that a) the user engaged in a dialogue can only have the intention to perform one of a finite number of possible dialogue acts; b) each of the possible natural language utterances u produced by the user and observed by the system can be represented by a finite number of feature value pairs (fi = vi); and c) the dialogue context can be represented by a finite number of feature value pairs (gi = ci).</Paragraph>
    <Paragraph position="4"> Given these restrictions the DAR problem becomes to find that value da of DA that maximises</Paragraph>
    <Paragraph position="6"> For the probabilistic model from which this can be computed we use a Bayesian network (Pearl, 1988). A Bayesian network is a directed acyclic graph in which the nodes represent the stochastic variables considered, while the structure (given by the arcs between the nodes) constitutes a set of conditional independencies among these variables: a variable is conditionally independent of its non-descendants in the network, given its parents in the network. Consider the network in Figure 2: it contains one node representing the dialogue act (DA), 3 nodes representing utterance features (NumWrds, CanYou and IWant) and a node representing a context feature (PrevDA).</Paragraph>
    <Paragraph position="7"> From the network structure follows that for example variable DA is conditionally independent of variable NumWrds, given variable CanYou.</Paragraph>
    <Paragraph position="8"> The conditional independencies make the model computationally more feasible: finding a specification of the joint probability distribution (jpd) for the model reduces to finding the conditional probability distributions of each of the variables given their network parents. In our example network, the following jpd specification holds:</Paragraph>
    <Paragraph position="10"/>
    <Paragraph position="12"> The construction of a Bayesian network hence amounts to choosing a network structure (the conditional independencies) and choosing the conditional probability distributions. In practice, the probabilities will have to be assessed from empirical data by using statistical techniques. The structure can generated from data too, but another option is to choose it manually: the arcs in the network can be chosen, based on the intuition that they represent a causal or temporal relationship between two variables. Strictly spoken however, a Bayesian network only represents informational relationships between variables.</Paragraph>
    <Paragraph position="13"> Notice that the machine learning technique known as Naive Bayes Classifier (see for instance (Mitchell, 1997)) assumes that all variables are conditionally independent of each other given the variable that has to be classified. A Naive Bayes classifier can be seen as a special case of a Bayesian network classifier, where the network structure consists of arcs from the class variable to all variables representing the features: see Fig- null Naive Bayes classifiers will perform as good as the Bayesian network technique only if indeed all feature variables are conditionally independent, given the class variable. The problem is of course how do we know that they are conditionally independent? If we don't have complete analytical knowledge about the (in)dependencies, only analysing the data can give an answer to this question. The advantage of using Bayesian networks is that methods exist to construct the network structure as well as the conditional probabilities. Moreover Bayesian networks are more flexible in their use: unlike Bayesian classifiers we can retrieve the posterior probabilities of all the network variables without re-computation of the model. The same advantage do Bayesian networks have over Decision Tree learning methods like C4.5 that output a decision tree for classifying instances with respect to a given selected class variable. Experiments have shown that Naive Bayesian classifiers give results that are as good as or even better than those obtained by decision tree classification techniques. Hence, there are theoretical as well as practical reasons to use Bayesian networks. However, since there is hardly any experience in using Bayesian networks for dialogue act classification we have to do experiments to see whether this technique also performs better than the alternatives mentioned above for this particular application.</Paragraph>
    <Paragraph position="14"> The next two sections describe experiments with 1) the SCHISMA corpus - elaborating on previous work described in (Keizer, 2001) - and 2) a preliminary small corpus of navigation dialogues.</Paragraph>
    <Paragraph position="15"> We motivate our choice of dialogue acts and features and present some first results in training a Bayesian network and testing its performance.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Experiments with the Schisma corpus
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Dialogue acts and features
</SectionTitle>
      <Paragraph position="0"> The current dialogue system for interacting with Karin is based on analyses of the SCHISMA corpus. This is a corpus of 64 dialogues, obtained through Wizard of Oz experiments. The interaction between the wizard - a human simulating the system to be developed - and the human user was established through keyboard-entered utterances, so the dialogues are textual. The task at hand is information exchange and transaction: users are enabled to make inquiries about theatre performances scheduled and if desired, make ticket reservations.</Paragraph>
      <Paragraph position="1"> We have manually annotated 20 dialogues from the SCHISMA corpus, using two layers of the DAMSL multi-layer annotation scheme (Allen and Core, 1997), a standard for annotating task-oriented dialogues in general. The layer of Forward-looking Functions contains acts that characterise the effect an utterance has on the subsequent dialogue, while acts on the layer of Backward-looking Functions indicate how an utterance relates to the previous dialogue. Because DAMSL does not provide a refined set of dialogue acts concerning information-exchange, we have added some new dialogue acts. For example, ref-question, if-question and altsquestion were added as acts that further specify the existing info-request.</Paragraph>
      <Paragraph position="2"> For the experiments, we selected a subset of forward- and backward-looking functions from the hierarchy that we judged as the most important ones to recognise: those are listed in Table 1. In Figure 4, a fragment of an example dialogue between S (the server) and C (the client) is given, in which we have indicated what forwardand backward-looking functions were performed in each utterance.</Paragraph>
      <Paragraph position="3">  The user utterances have also been tagged manually with linguistic features. We have distinguished the features in Table 2, assuming they can be provided for by a linguistic parser.</Paragraph>
      <Paragraph position="4"> The dialogue context features selected include the backward-looking function of the last system utterance and the forward-looking function of the previous user utterance. In the experiment with S: Hello, how can I help you?  the SCHISMA dialogues we have constructed a network structure (see Figure 5) by hand and then used the data of the annotated dialogues to train the required conditional probabilities.</Paragraph>
      <Paragraph position="5">  The choice of structure is based on the intuition that the model reflects how a client decides which communicative action to take; although the arcs themselves have no explicit meaning - they only contribute to the set of conditional independencies - they can be seen here as a kind of temporal or causal relationships between the variables (as mentioned earlier in Section 2): given the dialogue context defined by the previous forward-looking function of the client (PFFC) and the previous backward-looking function of the server (PBFS), the client decides which forward-looking function to perform (FFC); from this decision he/she formulates a natural language utterance with certain features including the sentence type (SeTp) the subject type (SuTp) and punctuation (Punct).</Paragraph>
      <Paragraph position="6"> Recalling the notion of conditional independence in Bayesian networks described in Section 2, it follows that by choosing the network structure of Figure 5, we have made the (admittedly, disputable) assumption that, given the forward-looking function of the client, the three utterance features are conditionally independent of each other.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Results and evaluation
</SectionTitle>
      <Paragraph position="0"> For assessing the conditional probability distributions, we have used the Maximum A Posteriori (MAP) learning technique - see e.g. (Heckerman, 1999). For training we have used 330 data samples which is 75% of the available data; the remaining samples have been used for testing. We have measured the performance of the network in terms of the accuracy of estimating the correct forward-looking function for different cases of available evidence, varying from having no evidence at all to having evidence on all features.</Paragraph>
      <Paragraph position="1"> This resulted in an average accuracy of 43.5%.</Paragraph>
      <Paragraph position="2"> Adding complete evidence to the network for every test sample resulted in 38.7% accuracy.</Paragraph>
      <Paragraph position="3"> As the amount of data from the SCHISMA corpus currently available is rather small, the results cannot expected to be very good and more data have to be collected for further experiments. Still, the testing results show that the accuracy is significantly better than an expected accuracy of 8.3% in the case of guessing the dialogue act randomly.</Paragraph>
      <Paragraph position="4"> A tighter baseline commonly used is the relative frequency of the most frequent dialogue act. For the data used here, this gives a baseline of 32.5%, which is still less than our network's accuracy.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Experiments with the navigation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
corpus
4.1 Dialogue acts and features
</SectionTitle>
      <Paragraph position="0"> A small corpus of dialogues was derived from the first implementation of a dialogue system for interaction with the navigation agent. For the experiments with the navigation corpus we also use the DAMSL layers of Forward- and Backward-looking functions. On each of these two layers we only distinguish dialogue acts on the first level of the hierarchies (see Table 3 for the dialogue acts used); a more refined subcategorisation should be performed by a second step in the DAR module. The dialogue acts in Table 1 can be found at the deeper levels of the DAMSL hierarchy, e.g. a request is a special case of a infl addr fut act and an acknowledge is a special case of an understanding. The dialogue act recogniser may also use more application specific knowledge in further identification of the user intention. Information that may be used is dialogue information concerning topic/focus.</Paragraph>
      <Paragraph position="1">  For the navigation dialogues, we have chosen a set of surface features of what will eventually be spoken utterances, in contrast to the typed dialogues in the SCHISMA corpus. Therefore, we don't use a textual feature like punctuation. For each utterance, the feature values are found automatically using a tagger (the features in the SCHISMA dialogues were tagged manually). In Table 4 we have listed the features with their possible values we initially consider relevant.</Paragraph>
      <Paragraph position="2"> The dialogue context features include the backward- and forward-looking function of the previous dialogue act. This is always a dialogue act performed by the system. The possible dialogue acts performed by the system are the same as those performed by the user.</Paragraph>
      <Paragraph position="3"> The network is generated from data that were obtained by manually annotating the user utterances in the navigation corpus following the DAMSL instructions as close as possible. As with every categorisation there are problematic</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Features Values
</SectionTitle>
      <Paragraph position="0"> lenq one, few, many iswh true, false not in prev true, false startsWithCanYou true, false startsWithCanI true, false startsWithIWant true, false  their possible values.</Paragraph>
      <Paragraph position="1"> border cases, e.g. when to annotate with indirect speech acts. We used the criterion that such an act should be recognised without task-specific considerations. Therefore the utterance &amp;quot;I want to make a phone-call&amp;quot; is annotated as a statementalthough eventually it should be interpreted as an info request (&amp;quot;where can I find a phone?&amp;quot;) in the context of a navigation dialogue. null After the dialogue act has been recognised the navigation agent will make a plan for further actions and perform the planned actions. We will not discuss that here.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML