XML Viewer - p95-1016

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/p95-1016_metho.xml
Size: 14,750 bytes
Last Modified: 2025-10-06 14:14:07
<?xml version="1.0" standalone="yes"?>
<Paper uid="P95-1016">
  <Title>Utilizing Statistical Dialogue Act Processing in Verbmobil</Title>
  <Section position="3" start_page="116" end_page="116" type="metho">
    <SectionTitle>
2 The Dialogue Model and
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="116" end_page="116" type="sub_section">
      <SectionTitle>
Predictions of Dialogue Acts
</SectionTitle>
      <Paragraph position="0"> Like previous approaches for modeling task-oriented dialogues we assume that a dialogue can be described by means of a limited but open set of dialogue acts (see e.g. (Bilange, 1991), (Mast et al., 1992)). We selected the dialogue acts by examining the VERBMOBIL corpus, which consists of transliterated spoken dialogues (German and English) for appointment scheduling. We examined this corpus for the occurrence of dialogue acts as proposed by e.g.</Paragraph>
      <Paragraph position="1"> (Austin, 1962; Searle, 1969) and for the necessity to introduce new, sometimes problem-oriented dialogue acts. We first defined 17 dialogue acts together with semi-formal rules for their assignment to utterances (Maier, 1994). After one year of experience with these acts, the users of dialogue acts in VERBMOBIL selected them as the domain independent &amp;quot;upper&amp;quot; concepts within a more elaborate hierarchy that becomes more and more propositional and domain dependent towards its leaves (Jekat et al., 1995). Such a hierarchy is useful e.g. for translation purposes.</Paragraph>
      <Paragraph position="2"> Following the assignment rules, which also served as starting point for the automatic determination of dialogue acts within the semantic evaluation component, we hand-annotated over 200 dialogues with dialogue act information to make this information available for training and test purposes.</Paragraph>
      <Paragraph position="3"> Figure 1 shows the domain independent dialogue acts and the transition networks which define admissible sequences of dialogue acts. In addition to the dialogue acts in the main dialogue network, there are five dialogue acts, which we call deviations, that can occur at any point of the dialogue. They are represented in an additional subnetwork which is shown at the bottom of figure 1. The networks serve as the basis for the implementation of a parser which determines whether an incoming dialogue act is compatible with the dialogue model.</Paragraph>
      <Paragraph position="4"> As mentioned in the introduction, it is not only important to extract the dialogue act of the current utterance, but also to predict possible follow up dialogue acts. Predictions about what comes next are needed internally in the dialogue component and externally by other components in VERB-MOBIL. An example of the internal use, namely the treatment of unexpected input by the plan recognizer, is described in section 4. Outside the dialogue component dialogue act predictions are used e.g. by the abovementioned semantic evaluation component and the keyword spotter. The semantic evaluation component needs predictions when it determines the dialogue act of a new utterance to narrow down the set of possibilities. The keyword spotter can only detect a small number of keywords that are selected for each dialogue act from the VERBMOBIL corpus of annotated dialogues using the Keyword Classification Tree algorithm (Kuhn, 1993; Mast, 1995).</Paragraph>
      <Paragraph position="5"> For the task of dialogue act prediction a knowledge source like the network model cannot be used since the average number of predictions in any state of the main network is five. This number increases when the five dialogue acts from the subnetwork which can occur everywhere are considered as well. In that case the average number of predictions goes up to 10. Because the prediction of 10 dialogue acts from a total number of 17 is not sufficiently restrictive and because the dialogue network does not represent preference information for the various dialogue acts we need a different model which is able to make reliable dialogue act predictions. Therefore we developed a statistical method which is described in detail in the next section.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="116" end_page="119" type="metho">
    <SectionTitle>
3 The Statistical Prediction Method
</SectionTitle>
    <Paragraph position="0"> and its Evaluation In order to compute weighted dialogue act predictions we evaluated two methods: The first method is to attribute probabilities to the arcs of our network by training it with annotated dialogues from our corpus. The second method adopted information theoretic methods from speech recognition. We  implemented and tested both methods and currently favor the second one because it is insensitive to deviations from the dialogue structure as described by the dialogue model and generally yields better prediction rates. This second method and its evaluation will be described in detail in this section.</Paragraph>
    <Paragraph position="1"> Currently, we use n-gram dialogue act probabilities to compute the most likely follow-up dialogue act. The method is adapted from speech recognition, where language models are commonly used to reduce the search space when determining a word that can match a part of the input signal (Jellinek, 1990). It was used for the task of dialogue act prediction by e.g. (Niedermair, 1992) and (Nagata and Morimoto, 1993). For our purpose, we consider a dialogue S as a sequence of utterances Si where each utterance has a corresponding dialogue act si. If P(S) is the statistical model of S, the probability can be approximated by the n-gram probabilities</Paragraph>
    <Paragraph position="3"> Therefore, to predict the nth dialogue act sn we can use the previously uttered dialogue acts and determine the most probable dialogue act by comput-</Paragraph>
    <Paragraph position="5"> To approximate the conditional probability P(.I.) the standard smoothing technique known as deleted interpolation is used (Jellinek, 1990) with</Paragraph>
    <Paragraph position="7"> where f are the relative frequencies computed from a training corpus and qi weighting factors with ~&amp;quot;~qi = 1.</Paragraph>
    <Paragraph position="8"> To evaluate the statistical model, we made various experiments. Figure 2 shows the results for three representative experiments (TS1-TS3, see also (Reithinger, 1995)).</Paragraph>
    <Paragraph position="9">  In all experiments 41 German dialogues (with 2472 dialogue acts) from our corpus are used as training data, including deviations. TS1 and TS2 use the same 81 German dialogues as test data. The difference between the two experiments is that in TS1 only dialogue acts of the main dialogue network are processed during the test, i.e. the deviation acts of the test dialogues are not processed. As can be seen -- and as could be expected -- the prediction rate drops heavily when unforseeable deviations occur. TS3 shows the prediction rates, when all currently available annotated dialogues (with 7197 dialogue acts) from the corpus are processed, including deviations.</Paragraph>
    <Paragraph position="11"> Compared to the data from (Nagata and Morimoto, 1993) who report prediction rates of 61.7 %, 77.5% and 85.1% for one, two or three predictions respectively, the predictions are less reliable. However, their set of dialogue acts (or the equivalents, called illocutionary force types) does not include dialogue acts to handle deviations. Also, since the dialogues in our corpus are rather unrestricted, they have a big variation in their structure. Figure 3 shows the variation in prediction rates of three dialogue acts for 47 dialogues which were taken at random from our corpus. The x-axis represents the different diMogues, while the y-axis gives the hit rate for three predictions. Good examples for the differences in the dialogue structure are the diMogue pairs #15/#16 and #41/#42. The hit rate for dialogue #15 is about 54% while for #16 it is about 86%.</Paragraph>
    <Paragraph position="12"> Even more extreme is the second pair with hit rates of approximately 93% vs. 53%. While diMogue #41 fits very well in the statisticM model acquired from the training-corpus, dialogue #42 does not. This figure gives a rather good impression of the wide variety of material the dialogue component has to cope  The dialogue model specified in the networks models all diMogue act sequences that can be usually expected in an appointment scheduling dialogue. In case unexpected input occurs repair techniques have  to be provided to recover from such a state and to continue processing the dialogue in the best possible way. The treatment of these cases is the task of the dialogue plan recognizer of the dialogue component.</Paragraph>
    <Paragraph position="13"> The plan recognizer uses a hierarchical depth-first left-to-right technique for dialogue act processing (Vilain, 1990). Plan operators have been used to encode both the dialogue model and methods for recovery from erroneous dialogue states. Each plan operator represents a specific goal which it is able to fulfill in case specific constraints hold. These constraints mostly address the context, but they can also be used to check pragmatic features, like e.g. whether the dialogue participants know each other. Also, every plan operator can trigger follow-up actions, h typical action is, for example, the update of the dialogue memory. To be able to fulfill a goal a plan operator can define subgoals which have to be achieved in a pre-specified order (see e.g.</Paragraph>
    <Paragraph position="14"> (Maybury, 1991; Moore, 1994) for comparable approaches). null fmwl_2_01: der Termin den wir neulich abgesprochen haben am zehnten an dem Samstag (MOTIVATE) (the date we recently agreed upon, the lOth that Saturday) da kann ich doch nich' (REJECT) (then I can not) wit sollten einen anderen ausmachen (INIT) (we should make another one) mpsl_2_02: wean ich da so meinen Termin-Kalender anschaue, (DELIBERATE) (if I look at my diary) dan sieht schlecht aus (REJECT).</Paragraph>
    <Paragraph position="15">  Since the VERBMOBIL system is not actively participating in the appointment scheduling task but only mediating between two dialogue participants it has to be assumed that every utterance, even if it is not consistent with the dialogue model, is a legal dialogue step. The first strategy for error recovery therefore is based on the hypothesis that the attribution of a dialogue act to a given utterance has been incorrect or rather that an utterance has various facets, i.e. multiple dialogue act interpretations. Currently, only the most plausible dialogue act is provided by the semantic evaluation component. To find out whether there might be an additional interpretation the plan recognizer relies on information provided by the statistics module. If an incompatible dialogue act is encountered, an alternative dialogue act is looked up in the statistical module which is most likely to come after the preceding dialogue act and which can be consistently followed by the current dialogue act, thereby gaining an admissible dialogue act sequence.</Paragraph>
    <Paragraph position="16"> To illustrate this principle we show a part of the processing of two turns (fmwl..2_01 and mpsl_2_02, see figure 4) from an example dialogue with the dialogue act assignments as provided by the semantic evaluation component. The translations stick to the German words as close as possible and are not provided by VERBMOBIL. The trace of the dialogue component is given in figure 5, starting with pro- null In this example the case for statistical repair occurs when a REJECT does not - as expected - follow a SUGGEST. Instead, it comes after the INIT of the topic to be negotiated and after a DELIBERATE. The latter dialogue act can occur at any point of the dialogue; it refers to utterances which do not contribute to the negotiation as such and which can be best seen as &amp;quot;thinking aloud&amp;quot;. As first option, the plan recognizer tries to repair this state using statistical information, finding a dialogue act which is able to connect INIT and REJECT 1. As can be seen in figure 5 the dialogue acts REQUEST_COMMENT, DE-LIBERATE, and SUGGEST can be inserted to achieve a consistent dialogue. The annotated scores are the product of the transition probabilities times 1000 between the previous dialogue act, the potential insertion and the current dialogue act which are provided  by the statistic module. Ordered according to their scores, these candidates for insertion are tested for compatibility with either the previous or the current dialogue act. The notion of compatibility refers to dialogue acts which have closely related meanings or which can be easily realized in one utterance.</Paragraph>
    <Paragraph position="17"> To find out which dialogue acts can be combined we examined the corpus for cases where the repair mechanism proposes an additional reading. Looking at the sample dialogues we then checked which of the proposed dialogue acts could actually occur together in one utterance, thereby gaining a list of admissible dialogue act combinations. In the VERBMOBIL corpus we found that dialogue act combinations like SUGGEST and REJECT can never be attributed to one utterance, while INIT can often also be interpreted as a SUQGEST therefore getting a typical follow-up reaction of either an acceptance or a rejection. The latter case can be found in our example: INIT gets an additional reading of SUGeEST.</Paragraph>
    <Paragraph position="18"> In cases where no statistical solution is possible plan-based repair is used. When an unexpected dialogue act occurs a plan operator is activated which distinguishes various types of repair. Depending on the type of the incoming dialogue act specialized repair operators are used. The simplest case covers dialogue acts which can appear at any point of the dialogue, as e.g. DELIBERATE and clarification dialogues (CLARIFY_QUERY and CLARIFY-ANSWER).</Paragraph>
    <Paragraph position="19"> We handle these dialogue acts by means of repair in order to make the planning process more efficient: since these dialogue acts can occur at any point in the dialogue the plan recognizer in the worst case has to test for every new utterance whether it is one of the dialogue acts which indicates a deviation. To prevent this, the occurrence of one of these dialogue acts is treated as an unforeseen event which triggers the repair operator. In figure 5, the plan recognizer issues a warning after processing the DELIBERATE dialogue act, because this act was inserted by means of a repair operator into the dialogue structure.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML