File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2125_metho.xml

Size: 12,582 bytes

Last Modified: 2025-10-06 14:14:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2125">
  <Title>Learning dialog act processing</Title>
  <Section position="3" start_page="740" end_page="740" type="metho">
    <SectionTitle>
2 The Task
</SectionTitle>
    <Paragraph position="0"> 'Fb.e main task is the examinatiotl of learning h)r (liMog act processing and the donlain is (,\]tc arrangement of business dates. For this domain we have developed a classification of dialog acts which is shown in table 1 together with examples. Our guideline for the choice of these dialog acts was based on (l) the particular domMn and corpus and (2) our goal to learn rather few dialog &lt;:at&lt;;: gories but in a robusl; n\]anucr 2.</Paragraph>
    <Paragraph position="1">  For example, in our example turu below there arc several utterances and each of them has a particular dialog act as shown below. The turn starts with a reiection, followed by all explaining statement. rl'hen a suggestion is made and a request for commenting on this suggestion: * l)ienstags nm zehn ist bei mir mm wiederum s(:hlecht (Tuesday at I0 is for me now again bad) -+ rejection - well ich da noch trainieret, bin (because I there still train) --~ statement * ich denke (I think) -+ miscellaneous * wit sollten das Ganze dann doch auf die naechstc Woche verschieben (we should tile whole then really to the next week delay; we should delay the whole then really to the next</Paragraph>
    <Paragraph position="3"> It is important to note that segmentation parsing and dialog act processing work increinentM and in parallel on the incoming stream of word hypotheses. Alter each incoming word the segmentation parsing and dialog act processing analyze the current input. For instance, dialog act hypotheses are available with the first input word, although good hypotheses may only be possible 2This is also motivated by our additional goal of receiving noisy input directly from a speech recognizer. after most of an utterance has been seen. Our genera\] goal here is to produce hypotheses about segmentation and diMog acts as early as possible in an incremental manner.</Paragraph>
  </Section>
  <Section position="4" start_page="740" end_page="741" type="metho">
    <SectionTitle>
3 The Overall Approach
</SectionTitle>
    <Paragraph position="0"> The research presented here is embedded in a larger effort for examining hybrid eonnectionist learning capabilities for the analysis of spoken language at various acoustic, syntactic, semantic and pragmatic levels. To investigate hybrid connectionist architectures for speech/language analysis we devek)l)ed the SCREEN system (Symbolic Connectionist ll.obust Enterprise for Natural language) (Wermter and Weber, 1996). For the task of analyzing spontancous language we pursue a shallow screening analysis which uses prima,&gt; ily flat representations (like category sequences) wherever possible.</Paragraph>
    <Paragraph position="1"> also Friday the nineteenth is not possible reject but Thursday afternoon is ok for mc  nent in SCI{EEN. The interpretation of utterances is based on syntactic, semantic and dialog knowledge for each word. The syntactic and semantic knowledge is provided by other SCREEN conlpoheats and has been described elsewhere (Wermter att(l Weber, 1995). Each word of an utterance is processed incrementally and passed to tile seg- null mentation parser and to the dialog act network. The dialog act network provides the currently recognized dialog act for the current fiat fl'ame representation of the utterance part. The segmentation parser provides knowledge about utterance boundaries. This is important control knowledge for the dialog act network since without knowing about utterance boundaries the dialog network may assign incorrect dialog acts.</Paragraph>
  </Section>
  <Section position="5" start_page="741" end_page="742" type="metho">
    <SectionTitle>
4 The Segmentation Parser
</SectionTitle>
    <Paragraph position="0"> The segmentation parser receives one word at a time and builds up a flat frame structure in an incremental manner (see tables 2 and 3). Together with each word the segmentation parser receives syntactic and semantic knowledge about this word based on other syntactic and semantic modules in SCREEN. Each word is associated with 1. its most plausible basic syntactic category (e.g. noun, verb, adjective), 2. its most plausible abstract syntactic category (e.g. noun group, verb group, prepositional group), 3. basic semantic category (e.g., animate, abstract), and 4:. abstract semantic category (e.g., agent, object, recipient).</Paragraph>
    <Paragraph position="1">  incremental translation: Dienstags mn zehn ist bet mir nun wiederum schlecht (Tuesday at 10 is for me now again bad) This syntactic and semantic category knowledge is used by the segmentation parser for two main purposes. First, this category knowledge is needed for our segmentation heuristics. For our domain we have developed segmentation rules which allow the system to split turns into utterances. For instance, if we know that the basic syntactic category of a word &amp;quot;because&amp;quot; is col\junction and it is part of a conjunction group, then this is an indication to close the current frame and trigger a new fl:ame for the next; utterance. Second, the category knowledge, primarily the abstract semantic knowledge, is used for :filling the flames, so that we get a symbolically accessible structure rather than a tagged word sequence.</Paragraph>
    <Paragraph position="2">  ich (la noch trainieren bin (because I there still train \[am\]) The segmentation parser is able to segment 84% of the 184 turns with 314 utterances correctly.</Paragraph>
    <Paragraph position="3"> The remaining 16% are mostly difficult ambiguous cases some of which could be resolved if more knowledge could be used. For instance, while many conjunctions like &amp;quot;because&amp;quot; are good indicators for utterance borders, some conjunctions like &amp;quot;and&amp;quot; and &amp;quot;or&amp;quot; may not start new coordinated subsentences but coordinate noun groups. Fundamental structural disambiguation could be used to deal with these cases. Since they occur relatively rarely in our spoken utterances we have chosen not to incorporate structural disambigualion. Furthermore, another class of errors is characterized by time and location specifiers which can occur at the end or start of an utterance. For instance, consider the example: &amp;quot;On Tuesday the sixth of April \[ still have a slot in the afternoon  - is that possible&amp;quot; versus &amp;quot;On Tuesday the sixth of April I still have a slot --- in the afternoon is that possible&amp;quot;. Such decisions are difficult and ad null ditional knowledge' like prosody might help here.</Paragraph>
    <Paragraph position="4"> Currently, there is a pref&gt;rence for @ling the earlier Dame.</Paragraph>
  </Section>
  <Section position="6" start_page="742" end_page="743" type="metho">
    <SectionTitle>
5 The Dialog Act Network
</SectionTitle>
    <Paragraph position="0"> In t, able I we have described the dialog acts w(' use in our domain. Before we start to describe, any experiments on learning dialog acts we show the distributioll of dialog acts across our tr;dning and test, sets. Table ,1 shows the distribution for ore: set of 1184 turns with 3:14 utterances. There were 100 utterances in the training set att(l 21d in the test set. As we can see, st,ggestions and exl&gt;lanatory st~d;ements often occur but in general all dialog acts occur reasonably of'ten. This disl, ribu--Lion analysis is iml)orl,;mt \[br judging tit(: leat'ltiug and generalization behavior.</Paragraph>
    <Paragraph position="1">  After this initial distribution aualysis we now describe ore: nel, work architectur(, for learning dialog acts. I)iaJog acts depend a lot on signiticant words and word order. Certain key words are much more significant R)r n certain dialog act than others. For instance &amp;quot;prol)ose&amp;quot; is highly significanl; for the dialog act, su.qgcsl, while &amp;quot;in&amp;quot; is nol,. 'Fherefore we COlnputed a, smoothed dialog act plausibility vector for each word w which re\[lects the i)lausilility of the cat,egol:ies \[br a particular word. The sm-n of all values is 1 and each wdue is at leasl, 0.01. The plausibility value of a word w in a. dialog category chti with the frequency f is computed as describ('d in tJtc formula below.</Paragraph>
    <Paragraph position="2"> J~,+, (,,t,) - (A, (',,,) * A, := 0(,%) * o.ol) Total frcq.uc, ucy f(w) in cortms '1%1)1(; 5 shows ex~unples of plausibility w'.cl,ors rot some words. As we can see, &amp;quot;bad&amp;quot; has the highest plausibility Rw 1,he reject dialog act, aml &amp;quot;l)ropose&amp;quot; for the 8tty!leSl, dialog act. On I;he other haud the word &amp;quot;is&amp;quot; is not i)articul,u'ly significant for certain dialog acts and therefore has a plausibility vector with relatively evenly distributed V~|lteS.</Paragraph>
    <Paragraph position="3"> bad propose is  'l'aloh&amp;quot; 5: Three examples for plausibility vectors We have experimented with dilferent variations of simple recurrent networks (Elman, 1990) for learning dialog ~mt assignment. We had chosen simple recurrent networks since these networks can represent the previous context in an utterante in their recurrent context layer. The best performing network is shown in figure 2.</Paragraph>
    <Paragraph position="4"> output layer hidden l;a~'t~ in mJ3~ayer +% % %5. % %+ % % Figm:e 2: I)ia\]og act network with dialog plausi1)ilil;y vectors a.s input Input to t, his network is tile current word represented by its dialog plausibility vector. The output is the dialog act of the whole uttera.m:(,. Between input and output layer there are the hidden layer and the context layer. All the DedR)rward connections in the network are flflly connected.</Paragraph>
    <Paragraph position="5"> Only the recurrent connections fi:om the hidden layer to the context layer are 1:1 copy connectR)tlS, which represent the internal learned context of the. utl;erance before the current word. 'lYa\[ning in these uetworks is per\[brined by using gradient descent; (l{mnelhart et M., 1986) using up to 3000 cycles through the training set.. By using Che iuternM learned context it is possiMe to ~na.ke dialog act assignments for a whole utter- null ance. While processing a whole utterance, each word is presented with its plausibility vector and at the output layer we can check the incrementally assigned dialog acts for each incoming word of the utterance.</Paragraph>
    <Paragraph position="6"> We have experimented with different input knowledge (only dialog act plausibility vectors, additional abstract semantic plausibility vectors, etc.), different architectures (different numbers of context layers, and different number of units in hidden layer, ere). l)ue to space restrictions it is not possible to describe all these comparisons. Therefbre we just focus on the description of the network with the best generalization performance.</Paragraph>
    <Paragraph position="7">  with dialog plausibility vectors in percent Table 6 shows the results for our training and test utterances. The overall performance on the training set was 82.0% on the training set and 79.4% on the test set. An utterance was counted as classified in the correct dialog act class if the majority of the outputs of the dialog act network corresponded with the desired dialog act. This good performance is partly due to the distributed representation in the dialog plausibility vector at the input layer. Other second best networks with additional local representations tbr abstract semantic category knowledge could perform better on the training set but failed to generalize on the test set and only reached 71%.</Paragraph>
    <Paragraph position="8"> The remaining errors are partly due to seldomly occurring dialog acts. Por instance, there are only 2% of the training utterances and 2.8% of the test utterances which belong to the request-comment dialog act. The network was not able to learn correct assignments due to the little training data. The drop in the performance for the query dialog act from training to test set can be explained by the higher variability of the queries compared to all other categories. Since queries differ much more from each other than all other dialog acts they could not be generalized. However they do not occur very often. All other often occurring dialog act categories performed very well as the individual percentages and the overall percentage show.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML