File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1015_intro.xml
Size: 19,722 bytes
Last Modified: 2025-10-06 14:00:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1015"> <Title>Flexible Speech Act Based Dialogue Management</Title> <Section position="4" start_page="0" end_page="134" type="intro"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"> First, we discuss our system architecture and data flow between modules. Second, we present the application description of a movie service, which we will use for the examples in later sections. Third, we present some of our current primitives, and finaUy, we describe the dialogue engine and how it uses the application description and other sources to calculate dialogue primitives.</Paragraph> <Section position="1" start_page="0" end_page="131" type="sub_section"> <SectionTitle> 2.1 System Architecture </SectionTitle> <Paragraph position="0"> Our system architecture is presented in Figure 1.</Paragraph> <Paragraph position="1"> The dialogue manager takes an application description (Section 2.2) and a set of dialogue strategies (Sections 3 and 4) as input--both provided by the service designer. The application description describes the parameters needed by the service and is necessarily application dependent. The dialogue strategies contain directions for how the dialogue shall proceed in certain situations. For example, whether to ask for confirmation or spelling of a badly recognized parameter value or whether to generate system or user directed dialogue.</Paragraph> <Paragraph position="2"> The output of our dialogue manager is a bag of abstract, language independent primitives. On the generation side they encode the next system utterance and a response generator translates the GEN-primitives into text, which is then synthesized. On the recognition side, the REC-primitives represent the dialogue manager's predictions about the next user utterance. REC-primitives are translated into (recognition) contexts and grammars for speech recognition and they may activate sub-components of a synsem grammar. After speech recognition has taken place, the dialogue engine must be told which predictions came true, thus the pragmatic interpreter maps the output of synsem interpreter onto a subbag of REC-primitives, which is then returned to the dialogue engine for further processing (Section 2.4).</Paragraph> </Section> <Section position="2" start_page="131" end_page="131" type="sub_section"> <SectionTitle> 2.2 Application Description </SectionTitle> <Paragraph position="0"> The application description (AD) specifies the tasks that a service can solve and the parameter values needed to solve them. The AD for a movie service is presented in Figure 2. Our representation is an extended version of and-or trees 1 and in Figure 2, the U-shaped symbols represent and-relations, while the V-shaped symbols represent or-relations. Thus, this movie service can perform three tasks: selling tickets or providing movie or theatre information. If the user wants to buy tickets, the system needs to acquire six pararneter values, e.g., the show time, the date, and the name of the film. Date and show time can be acquired in several ways. For example, a date can be a simple date (e.g., &quot;November 17th ~) or a combination of day of the week and week (e.g., &quot;Wednesday this week.&quot;).</Paragraph> <Paragraph position="1"> The nodes keep state information. Open nodes have not yet been negotiated, topic nodes are being negotiated, and closed nodes have been negotiated. The currently active task has status active. Parameters can be retrieved through the functions activeTask(AD), openParams(AD), closed-Params(AD), and topicParams(AD). Status(p) returns the status of parameter p. tasks(AD) and params(AD) return the task and parameter nodes.</Paragraph> <Paragraph position="2"> Similar hierarchical domain descriptions have been suggested in (Young et al., 1990) for a naval domain and in (Caminero-Gil et al., 1996) for an e-mall assistance domain. A tree-like organization of the domain is sufficent for the information retrieval domains, which we are currently considering. We expect, however, that in future work we tation: U and v represent and/or-relations. Subscripts t and p denote tasks and parameters.</Paragraph> <Paragraph position="3"> will need to switch to a semantic network structure or since our future research includes automatic generation of system utterances from our dialogue primitives, we hope to be able to utilize the ontology and domain organization work, which has proven so useful for text generation (Bateman et al., 1994; Bateman et al., 1995), for both dialogue management and text generation.</Paragraph> </Section> <Section position="3" start_page="131" end_page="133" type="sub_section"> <SectionTitle> 2.3 Dialogue Primitives </SectionTitle> <Paragraph position="0"> Following the procedure outlined in Section 2.4, the dialogue manager calculates a bag of primitives for each turn and speaker. Our current collection is motivated through our experience with several domains, e.g., movie service, horoscope service, and directory assistance. The collection is not exhaustive and we will add primitives as wider dialogue coverage is required.</Paragraph> <Paragraph position="1"> Notation: A primitive is written prim-Name(p=v,n), where primName is its name; p E params(AD) U {aTask}; aTask is a special parameter whose values E tasks(AD); v is the value of p; and n is an integer denoting the number of times a primitive has been uttered. If v is uninstantiated, it is left out for readability. Unless otherwise stated, p E params(AD).</Paragraph> <Paragraph position="2"> 2.3.1 GEN-Primitives Our current GEN-primitives: salutation(p=v): system opens or closes the interaction, p E {hello, goodbye}, v E {morning, day, evening}.</Paragraph> <Paragraph position="3"> requestValue(p): system requests a value for the paramter p. p E params(AD) U {aTask}.</Paragraph> <Paragraph position="4"> requestValue(p=v): system asks whether the value v of parameter p is correct. If this form is used, the system has a list of alternative values for p, and v is not a recognition result (e.g., Frankfurt am Main or Frankfurt an der Oder where Frankfurt is the recognition result.) requestValue(aTask=v), v E tasks(AD) U {repeat-PreServiceTask, useService, repeatService}: system requests a value for aTask. If v E {repeatPre-ServiceTask, useService, repeatService}, the system requests whether the user wants the pre-service task repeated, the service started (first task after pre-service task), or a new task started.</Paragraph> <Paragraph position="5"> requestConfirm(p=v): system asks whether the value v of parameter p is correct, v is a recognition result, p E params(AD) U {aTask}. Ambignous results not resulting from speech recognition, e.g., Frankfurt am Main vs. ~zauldurt an der Oder, would yield multiple requestValue(p=v) primitives.</Paragraph> <Paragraph position="6"> requestValueABC(p): system requests the spelling of the value of parameter p.</Paragraph> <Paragraph position="7"> requestParam(p=v): system asks whether the value v is a value for parameter p.</Paragraph> <Paragraph position="8"> evaluate(p=v) : system acknowledges value v of parameter p.</Paragraph> <Paragraph position="9"> promise(p=v): system promises to attempt to answer the user's request, p E params(AD) U {aTask}. v E {pleaseWait}. Only used after navigate(), requestParam 0 or requestAIternative 0 if the user has to wait long for a reply.</Paragraph> <Paragraph position="10"> inform(aTask=v): system informs about the acquired database results, v E aetiveTask(AD) U {tooMany, zero}. If v = activeTask(AD), there are several answers, if v = tooMany/zero, there are either too many answers to be enumerated or zero answers.</Paragraph> <Paragraph position="11"> inform(aTask=n): system presents the n'th answer to the query t. n > 0 inforrnAIternative(p): system informs that there are several possible values for p. p E params(AD) U {aTask}. v E {tooMany, null}. If v = tooMany, there are too many alternatives to be enumerated. v = null, means that v is uninstantiated, not that there are zero alternatives.</Paragraph> <Paragraph position="12"> inforrnAIternative(p=v): system informs that a possible value of p is v. p E params(AD) U {aTask}.</Paragraph> <Paragraph position="13"> informNegative(p): system infolds that the user misrecognized something, p E params(AD) U {aTask}.</Paragraph> <Paragraph position="14"> informPositive(p): system informs that the user recognized something correctly, p E params(AD) U {aTask}.</Paragraph> <Paragraph position="15"> withdraw(p): system withdraws from dialogue for reason p E {error} before it has started negotiations. null withdrawOffer(aTask=v): system withdraws an offer for reason v E {error}.</Paragraph> <Paragraph position="16"> withdrawPrornise(aTask=v): system withdraws a promise for reason v E {error}.</Paragraph> <Paragraph position="17"> In Section 3, we present several sample instantiations of the primitives.</Paragraph> <Paragraph position="18"> Our current REC-primitives: requestParam(p): user requests which parameter the system requested, p E params(AD) U {null}. requestAIternatives(p): user requests possible values for parameter p.</Paragraph> <Paragraph position="19"> requestConffirm(aTask=n): user asks system to confirm an answer that it has given, e.g., &quot;Was the first answer $30?&quot; 0 < n < no of query results. informValue(p=v): user provides value v for parameter p. p was requested. 2 informExtraValue(p=v): user provides value v for parameter p. p was not requested in the preceeding system utterance.</Paragraph> <Paragraph position="20"> informValueABC(p=v): user spells the value v of parameter p. The spelling is expanded by synsem and expansions are presented to the dialogue manager. 2 inforrnPositive(p=v): user confirms that the value of parameter p is v. p E params(AD) U {aTask}. informNegative(p=v): user disconfirms that the value of parameter p is v. p E params(AD) U {aTask}.</Paragraph> <Paragraph position="21"> correctValue(p=v): user corrects a misrecognized value. Often used together with informNegative. For example, &quot;Hamburg, not Homburg. &quot;2 informGarbage(p): user says something but recognizer and/or synsem could not make sense out of it.</Paragraph> <Paragraph position="22"> changeValue(p=v): user changes the value of parameter p to v instead of v'. 2 repeatValue(p=v): user repeats the value v of parameter p.2 correctPararn(p=v): user corrects that v is the value of p, not p'.</Paragraph> <Paragraph position="23"> disambiguate(p=v): user chooses v as the value of p when presented with a choice between several values for p. p E params(AD) U {aTask}.</Paragraph> <Paragraph position="24"> rejectValue(p=v): the user has been given a series of alternatives and chooses p=:v'. Primitive is combined with disambiguate(p=v').</Paragraph> <Paragraph position="25"> navigate(aTask=v): user navigates in the query results, v E {forward, backward, repeat, n} where 0 n < no of query results. 2 rejectRequest(p=v): user ignores or does not hear the system request, v E {null, didNotHear}.</Paragraph> <Paragraph position="26"> rejectOffer(aTask=v): user ignores or does not hear the system offer, v E tasks(AD) U {null, didNotHear}. null evaluate(t=v): user evaluates an answer she has received, v E {positive, neutral, negative, cancel}. cancel is used to end the current dialogue after at least one answer has been given ~md start a new one without calling again.</Paragraph> <Paragraph position="27"> promise(p): user promises to find a value for p. withdrawAccept(aTask=v): user ,mthdraws from the conversation for reason v E {cancel, hangup}. With cancel, the user ends the current dialogue before an answer has been given and starts a new task Without calling again. 2 withdrawPromise(p=v): user withd.raws a promise to provide a value for reason v E {cancel, hangup}. 2 withdrawRequest(p=v): user withdraws a request. p E params(AD) U {forward, backward, repeat, and n}. 2 null(): returned to the dialogue manager if the ,user does not say anything and is not ezpected to say anything, e.g., after a greeting or promise. In Section 3, we present several sample instantiations of the primitives.</Paragraph> </Section> <Section position="4" start_page="133" end_page="134" type="sub_section"> <SectionTitle> 2.4 Dialogue Engine </SectionTitle> <Paragraph position="0"> The dialogue engine (Hagen, 1999) consists of a reasoning engine and several knowledge sources: An AD defines an application's data-needs, a dialogue grammar defines how a dialogue may proceed at the level of speech acts, and a dialogue history is a dynamically growing parse tree of an on-going dialogue with respect to the dialogue grammar. Other knowledge sources may be required, for instance, recognition confidence or disambiguation of city names.</Paragraph> <Paragraph position="1"> The dialogue engine calculates the next turn by consulting and combining information from the knowledge sources. It consults with the dialogue history and the dialogue grammar in order to calculate which speech acts may continue a dialogue. Speech acts have no propositional content, thus in the context of the current dialogue history and the state of the application description, they are translated into dialogue primitives, which have content, for example, the name of a parameter and a potential value for this parameter. Here we will walk through an example of how some primitives are calculated in a simple question-answer dialogue.</Paragraph> <Paragraph position="2"> Example: For our example we will use the AD in Figure 2. Assume that the task has already been negotiated and set to theatre information (i.e., activeTask(AD) = theatrelnfo), i.e., the system needs to acquire the name of the theatre and the name of the city. All other nodes in the AD are closed since they are not relevant to this task. The speech act grammar used in our system is presented in Appendix A but we will use a trivial grammar for the example. It can account for simple question-answer dialogues where a request from the system (sys) is followed by an inform from the user (usr). The system can respond to the inform with a sub-dialogue: s</Paragraph> <Paragraph position="4"> The dialogue history reflects all previous negotiations (here: task theatreinfo).</Paragraph> <Paragraph position="6"> The next turn can be rooted in either the Inform(usr) after the inform(usr) or in the Dia-Iogue(sys) after Inform(usr).</Paragraph> <Paragraph position="7"> With all the above knowledge sources in place, the calculation of the next dialogue turn can start: 1. The last speech act in the dialogue history gives us a starting point in the grammar, thus moving forward from inform(usr), the next atomic speech act is request(sys)--either as a flat structure (i.e., request(sys) off Dialogue(sys)) or in a sub-dialogue (i.e., Dialogue(sys)-I-request(sys) off Inform(usr)).</Paragraph> <Paragraph position="8"> 2. Knowing that the system can request something, the dialogue engine consults with the AD for what the system can ask about. The flat strucutre (request(us)) represents negotiation of the task but since we assume that negotiation of the task is complete (i.e., Status(theatrelnfo) = active), this speech act is not interpreted into a prim-SThe star (') means that a dialogue may contain several request(sys) -I- Inform(usr) sequences. Lower-case speech acts are atomic, while others are complex. The dialogue in square brackets (\[\]) is optional.</Paragraph> <Paragraph position="9"> itive. Next we consider the sub-diaJogue structure. Both children of theatrelnfo are open (i.e., they have not been negotiatied yet) thus the system randomly chooses to pursue city whose state is changed to topic. The speech act and the parameter are combined into the primitive request-Value(city)--request a value for the parameter city (e.g., &quot;In which city is the theatre?&quot;). We chose to use the sub-dialogue structure instead of the flat strucutre to represent negotiation of parameter values since they are subordinate to the task in the sense that the task dictates which parame- null ter values are needed. This is also the case for the real gammar (Appendix A).</Paragraph> <Paragraph position="10"> 3. The primitive requestValue(city) is added to the dialogue history:</Paragraph> <Paragraph position="12"> states that inform(usr) (i.e., Inform(usr) + inform(usr)) is the next speech act in the dialogue.</Paragraph> <Paragraph position="13"> requestValue(city) was the last primitive spoken.</Paragraph> <Paragraph position="14"> Reasoning that a user-inform in response to a system requestValue should involve the same parameter as the system's requestValue, the information is combined to form the primitive informValue(city), i.e., the user should respond to the system request with a value for the parameter city. Let's assume that the user replied &quot;Hong Kong&quot;, thus the dialogue history is expanded:</Paragraph> <Paragraph position="16"> 5. Starting ~om inform(usr), the grammar returns reques't(sys) and Dialogue(sys)-t-request(sys). Since a recogniton result is available from the previous turn, the engine checks its recogution confidence. If it is high, it would consider the negotiation of city finished, change its state to closed, and discard Dialogue(sys)+request(sys) since there is nothing to be requested about a closed parameter. It would translate request(sys) into request-Value(theatre) since theatre is the only remaining open parameter.</Paragraph> <Paragraph position="17"> If confidence is low, the dialogue engine may decide to ask the user to confirm the recognized value. In which case, Dia-Iogue(sys)+request(sys) would be interpreted into requestConfirm(city=Hong Kong). Whether request(sys) would be interpreted or not depends on the dialogue strategies chosen by the service designer (see Sections 3 and 4).</Paragraph> <Paragraph position="18"> If confidence is extremely low, the dialogue engine may decide to repeat the question. In which case, request(sys) would be interpreted into requestValue(city, 2), while the sub-dialogue structure would be discarded.</Paragraph> <Paragraph position="19"> 6. Any interpretation of the flat strucutre would result in the following addition to the last Dia-Iogue(sys) in the dialogue history.</Paragraph> <Paragraph position="21"> Our example shows how a speech act can result in several primitives depending on the context and thus how the dialogue manager dynamically reacts to external events. Although this brief description may not show it, our dialogue manager can handle mixed initiative dialogue (Hagen, 1999). In (Hagen, 1999), we also present our theory of taking, keeping, and relinquishing the initiative.</Paragraph> <Paragraph position="22"> Heisterkamp and McGlashan (1996) presented an approach that uses a similar division of functionality as we do: task (=application), contextual (=synsem + pragmatic), and pragmatic interpretation (=dialogue engine). They also use abstract parameterized units similar to ours, but they do not use a speech act grammar to calculate the units. Rather, they map contextual functions onto dialogue goals, e.g., the function new_for_system(gaalcity:munich) introduces the dialogue goal confirm(goalcity:munich). In terms of our primitievs this could be expressed as requestConfirm 0 follows informValue 0. We choose not to start our modelling at this level since we want to be able to vary what follows informValue0, e.g., requestConfirmO, requestValueABCO, or evaluate(). null</Paragraph> </Section> </Section> class="xml-element"></Paper>