File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0402_metho.xml
Size: 9,416 bytes
Last Modified: 2025-10-06 14:14:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0402"> <Title>A Dialogue Analysis Model with Statistical Speech Act Processing for Dialogue Machine Translation*</Title> <Section position="5" start_page="10" end_page="12" type="metho"> <SectionTitle> 3 Statistical Speech Act Processing </SectionTitle> <Paragraph position="0"> We construct a statistical dialogue model based on speech acts as follows.</Paragraph> <Paragraph position="1"> Let D denote a dialogue which consists of a sequence of n utterances, U1, U2 .... , Un, and let Si denote the speech act of Ui. With this notation,</Paragraph> <Paragraph position="3"> Ui will be uttered given a sequence of utterances U1,U2,...,Ui-1. As shown in the equation (1), we can approximate P(UilU1, U2,..., Ui-1) by the product of the sentential probability P(UilSi) and the contextual probability P(SilS1,S2,...,Si-1) (Nagata and Morimoto 1994). In subsequent sections, we describe the details for each probability.</Paragraph> <Paragraph position="5"/> <Section position="1" start_page="10" end_page="11" type="sub_section"> <SectionTitle> 3.1 Sentential Probability </SectionTitle> <Paragraph position="0"> There is a strong relation between the speaker's speech act and the surface utterances expressing that speech act (Allen 1989 ; Andernach 1996). That is, the speaker utters a sentence which most well expresses his/her intention (speech act). This sentence allows the hearer to infer what the speaker's speech act is. However, a sentence can be used as several speech acts depending on the context of the sentence.</Paragraph> <Paragraph position="1"> The sentential probability P(Ui ISi) represents the relationship between the speech acts and the features of surface sentences. In this paper, we approximate utterances with a syntactic pattern, which consists of the selected syntactic features.</Paragraph> <Paragraph position="2"> We decided the syntactic pattern which consists of the fixed number of syntactic features. Sentence Type, Main- Verb, Aux- Verb, Clue- Word are selected as the syntactic features since they provide strong cues to infer speech acts. The features of a syntactic pattern with possible entries are shown in figure 1.</Paragraph> <Paragraph position="3"> * Sentence Type represents the mood of an utterance. Assert, YN-Quest, WH-Quest, Imperative are possible sentence types.</Paragraph> <Paragraph position="4"> * Main- Verb is the type of the main verb in the utterance. PA is used when the main verb represents a slate and PV for the verbs of type event or action. Utterances without verbs belong to FRAG (fragment). In the case of performative verbs (ex. promise, request, etc.), lexical items are used as a Main-Verb because these are closely tied with specific speech acts.</Paragraph> <Paragraph position="5"> terance having particular speech acts, such as Yes, No, O.K., and so on.</Paragraph> <Paragraph position="6"> We extracted 167 pairs of speech acts and syntactic patterns from a dialogue corpus automatically using a conventional parser. As the result of applying these syntactic patterns to all utterances in corpus, we found that the average number of speech act ambiguity for each utterance is 3.07. Table 1 gives a part of the syntactic patterns extracted from corpus.</Paragraph> <Paragraph position="7"> Since a syntactic pattern can be matched with several speech acts, we use sentential probability, P(UilSi) using the probabilistic score calculated from the corpus. Equation (2) represents the approximated sentential probability. F denotes the syntactic pattern and freq denotes the frequency count of its argument.</Paragraph> <Paragraph position="9"/> </Section> <Section position="2" start_page="11" end_page="12" type="sub_section"> <SectionTitle> 3.2 Contextual Probability </SectionTitle> <Paragraph position="0"> The contextual probability P(SilS1, $2,..., Si-1) is the probability that n utterances with speech act Si is uttered given that utterances with speech act $1, $2, * *., Si-1 were previously uttered. Since previous speech acts constrain possible speech acts in the next utterance, contextual information have an important role in determining the speech act of an utterance. For example, if an utterance with ask-ref speech act uttered, then the next speech act would be one of response, request-conf, and reject. In this case, response would be the most likely candidate.</Paragraph> <Paragraph position="1"> The following table shows an example of the speech act bigrams.</Paragraph> <Paragraph position="2"> This table shows that response is the most likely candidate speech act of the following utterance of the utterances with ask-refspeech act. Also, request-confirm and ask-if are probable candidates.</Paragraph> <Paragraph position="3"> Since it is impossible to consider all preceding utterances $1, $2,..., Si-1 as contextual information, we use the n-gram model. However, simply using n utterances linearly adjacent to an utterance as contextual information has a problem due to sub-dialogues which frequently occurred in a dialogue. Let's consider an example dialogue.</Paragraph> <Paragraph position="4"> In dialogue 3, utterances 3-4 are part of an embedded segment. In utterance 3, the speaker asks for the type of rooms without responding to B's ques- null 1. A : I would like to reserve a room.</Paragraph> <Paragraph position="5"> 2. B : What kind of room do you want? 3. A : What kind of room do you have? 4. B : We have single and double rooms.</Paragraph> <Paragraph position="6"> 5. A : A single room, please.</Paragraph> <Paragraph position="8"> tion (utterance 2). This subdialogue continues up to the utterance 4. As shown in the above example, dialogues cannot be viewed as a linear sequence of utterances. Rather, dialogues have a hierarchical structure. Therefore, if we use n utterances linearly adjacent to an utterance as a context, we cannot refleet the hierarchical structure of a dialogue in the model.</Paragraph> <Paragraph position="9"> Therefore, we approximate the context for an utterance as speech acts of n utterances which is hierarchically recent to the utterance. An utterance A is hierarchically recent to an utterance B if A is adjacent to B in the tree structure of the discourse (Walker 1996). Equation (3) represents the approximated contextual probability in terms of hierarchical recency in the case of using trigram. In this equation, Ui is adjacent to Uj and Uj is adjacent to Uk in the discourse structure, where 1 _< j < k _< i- 1 .</Paragraph> <Paragraph position="11"/> </Section> </Section> <Section position="6" start_page="12" end_page="13" type="metho"> <SectionTitle> 4 Discourse Structure Analysis </SectionTitle> <Paragraph position="0"> Now we can define a discourse structure analysis model with the statistical speech act processing.</Paragraph> <Paragraph position="1"> Formally, choose Si which maximizes the following probability max P( F~IS~ )P( S~ISj, Sk ). (4) S, where Si is a possible speech act for the utterance Ui. Uj and Uk are the utterances which Uj is hierarchically adjacent to Ui, and Uk to Uj, where 1 <_j<k<_i-1.</Paragraph> <Paragraph position="2"> In equation (4), one problem is to search all possible Uj that Ui can be connected to. We use the dialogue transition networks (DTN) and a stack for maintaining the dialogue state efficiently. The dialogue transition networks describe possible flow of speech acts in dialogues as shown in figure 2 (Seo et al. 1994, Jin Ah Kim et al. 1995). Since DTN is defined using recursive transition networks, it can handle recursively embedded subdialogues. It works just like the RTN parser (Woods 1970). If a subdialogue is initiated, a dialogue transition network is initiated and a current state is pushed on the stack. On the other hand, if a subdialogue is ended, then a dialogue transition network is ended and a current state is popped from the stack. This process continues until a dialogue is finished.</Paragraph> <Paragraph position="3"> With DTN and the stack, the system makes expectations for all possible speech acts of the next utterance. For example, let us consider dialogue 3. Figure 3 shows the transitions with the dialogue 3. In utterance 2, according to the RA diagram in figure 2, B may request-confirm or requestil~formation. Since B asks for the type of rooms, push operation occurs and a RI diagram is initiated. In utterance 3, A doesn't know the possible room sizes, hence asks B to provide such information. Therefore, push operation occurs again and a new RI diagram is initiated. This diagram is continued by response in utterance 4. In utterance 5, this diagram is popped from the stack by response for ask-refin utterance 2.</Paragraph> <Paragraph position="4"> In this state, some cases can be expected for the dialogue. Therefore, if we assume that ask-if and request-confirm are possible from the syntactic pattern of the next utterance, then the following table can be expected for the next utterance from the dialogue transition networks.</Paragraph> <Paragraph position="5"> believe that it is not enough to cover the whole phenomenon of dialogues. However, considering the fact that the utterances requiring context for translation is relatively small, it is practically acceptable for dialogue machine translation.</Paragraph> </Section> class="xml-element"></Paper>