File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2120_metho.xml
Size: 14,572 bytes
Last Modified: 2025-10-06 14:10:29
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2120"> <Title>Stochastic Discourse Modeling in Spoken Dialogue Systems Using Semantic Dependency Graphs</Title> <Section position="4" start_page="937" end_page="940" type="metho"> <SectionTitle> 2 Semantic Dependency Graph </SectionTitle> <Paragraph position="0"> Since speech act theory is developed to extract the functional meaning of an utterance in the dialogue (Searle, 1979), discourse or history can be defined as a sequence of speech acts, = , and accordingly the speech act theory can be adopted for discourse modeling. Based on this definition, the discourse analysis in semantics using the dependency graphs tries to identify the speech act sequence of the discourse. Therefore, discourse modeling by means of speech act identification considering the history is shown in Equation (1). By introducing the hidden variable D i , representing the i-th possible dependency graph derived from the word sequence W. The dependency relation, r</Paragraph> <Paragraph position="2"> DRw w r[?] . The dependency graph which is composed of a set of dependency relations in the word sequence W is defined as</Paragraph> <Paragraph position="4"> The probability of hypothesis SA</Paragraph> <Paragraph position="6"> given word sequence W and history H can be described in Equation (1). According to the Bayes' rule, the speech act identification model can be decomposed into two components,</Paragraph> <Paragraph position="8"> arg ax |, arg ax , |, arg ax |, , |,</Paragraph> <Paragraph position="10"> are the most probable speech act and the potential speech act at the t-th dialogue turn, respectively. W={w word sequence extracted from the user's utteance without considering the stop words. H is the history representing the previous t-1 turns.</Paragraph> <Paragraph position="12"/> <Section position="1" start_page="938" end_page="939" type="sub_section"> <SectionTitle> 2.1 Speech act identification using semantic </SectionTitle> <Paragraph position="0"> dependency with discourse analysis In this analysis, we apply the semantic dependency, word sequence, and discourse analysis to the identification of speech act. Since D i is the i-th possible dependency graph derived from word sequence W, speech act identification with semantic dependency can be simplified as Equation (2).</Paragraph> <Paragraph position="2"/> <Paragraph position="4"> As the history is defined as the speech act sequence, the joint probability of D</Paragraph> <Paragraph position="6"> can be expressed as Equation (4).</Paragraph> <Paragraph position="7"> For the problem of data sparseness in the training corpus, the probability,</Paragraph> <Paragraph position="9"> , is hard to obtain and the speech act bi-gram model is adopted for approximation. null</Paragraph> <Paragraph position="11"> For the combination of the semantic and syntactic structures, the relations defined in HowNet are employed as the dependency relations, and the hypernym is adopted as the semantic concept according to the primary features of the words defined in HowNet. The headwords are decided by the algorithm based on the part of speech (POS) proposed by Academia Sinica in Taiwan. The probabilities of the headwords are estimated according to the probabilistic context free grammar (PCFG) trained on the Treebank developed by Sinica (Chen et al., 2001). That is to say, the headwords are extracted according to the syntactic structure and the dependency graphs are constructed by the semantic relations defined in HowNet. According to previous definition with independent assumption and the bigram smoothing of the speech act model using the back-off procedure, we can rewrite Equa- null where a is the mixture factor for normalization.</Paragraph> <Paragraph position="12"> According to the conceptual representation of the word, the transformation function, ()f [?] , transforms the word into its hypernym defined as the semantic class using HowNet. The dependency relation between the semantic classes of two words will be mapped to the conceptual space. Also the semantic roles among the dependency relations are obtained. On condition that</Paragraph> <Paragraph position="14"> and the relations are independent, the equation becomes estimated according to Equations (7) and (8), respectively. null</Paragraph> <Paragraph position="16"> where ()C [?] represents the number of events in the training corpus. According to the definitions in Equations (7) and (8), Equation (6) becomes practicable. null</Paragraph> </Section> <Section position="2" start_page="939" end_page="940" type="sub_section"> <SectionTitle> 2.2 Semantic dependency analysis using </SectionTitle> <Paragraph position="0"> word sequence and discourse Although the discourse can be expressed as the speech act sequence</Paragraph> <Paragraph position="2"> Seeing that several dependency graphs can be generated from the word sequence W, by introducing the hidden factor D</Paragraph> <Paragraph position="4"> , the probability ()PW can be the sum of the probabilities (,)</Paragraph> <Paragraph position="6"> . Further, the dependency relations are assumed to be independent with each other and therefore simplified as</Paragraph> <Paragraph position="8"> The probability of the dependency relation between words is defined as that between the concepts defined as the hypernyms of the words, and then the dependency rules are introduced. The</Paragraph> <Paragraph position="10"/> <Paragraph position="12"> where function, ()f [?] , denotes the transformation from the words to the corresponding semantic</Paragraph> </Section> </Section> <Section position="5" start_page="940" end_page="943" type="metho"> <SectionTitle> 3 Experiments </SectionTitle> <Paragraph position="0"> In order to evaluate the proposed method, a spoken dialogue system for medical domain with multiple services was investigated. Three main services: registration information service, clinic information service, and FAQ information service are used.</Paragraph> <Paragraph position="1"> This system mainly provides the function of on-line registration. For this goal, the health education documents are provided as the FAQ files. And the inference engine about the clinic information according to the patients' syndromes is constructed according to a medical encyclopedia. An example is illustrated as figure 2: Figure 2 An example of dialog 12 Speech acts are defined and shown in Figure 1.</Paragraph> <Paragraph position="2"> Every service corresponds to the 12 speech acts with different probabilities.</Paragraph> <Paragraph position="3"> The acoustic speech recognition engine embedded in dialog system based on Hidden Markov Models (HMMs) was constructed. The feature vector is parameterized on 26 MFCC coefficients. The decoding strategy is based on a classical Viterbi algorithm. The evaluation results by the character error rate (CER) for a Chinese speech recognition system is 18.3 percent and the vocabulary size of the language is 25,132.</Paragraph> <Section position="1" start_page="940" end_page="940" type="sub_section"> <SectionTitle> 3.1 Analysis of corpus </SectionTitle> <Paragraph position="0"> The training corpus was collected using the on-line recording from National Cheng Kung University Hospital in the first phase and the Wizard-of-Oz method in the second phase. Totally, there are 1,862 dialogues with 13,986 sentences in the corpus. The frequencies of the speech acts used in the system are shown in Figure 3.</Paragraph> <Paragraph position="1"> The number of dialogue turns is also important to the success of the dialogue task. According to the observation of the corpus, we can find that the dialogues with more than 15 turns usually failed to complete the dialogue, that is to say, the common ground cannot be achieved. These failed dialogues were filtered out from the training corpus before conducting the following experiments. The distribution of the number of turns per dialogue is shown in Figure 4.</Paragraph> </Section> <Section position="2" start_page="940" end_page="941" type="sub_section"> <SectionTitle> 3.2 Precision of speech act identification re- </SectionTitle> <Paragraph position="0"> lated to the corpus size You can check the Services Schedule and decide a convenient time for you. The Available time for now is.... The size of the training corpus is crucial to the practicability of the proposed method. In this experiment, we analyze the effect of the number of sentences according to the precision rate of the speech act using the semantic dependency graphs with and without the discourse information. From the results, the precision rates for speech act identification achieved 95.6 and 92.4 percentages for the training corpus containing 10,036 and 7,012 sentences using semantic dependency graphs with and without history, respectively. This means that semantic dependency graph with discourse outperforms that without discourse, but more training data are needed to include the discourse for speech act identification. Fig. 5 shows the relationship between the speech act identification rate and the size of the training corpus. From this figure, we can find that more training sentences for the semantic dependency graph with discourse analysis are needed than that without discourse. This implies discourse analysis plays an important role in the identification of the speech act.</Paragraph> </Section> <Section position="3" start_page="941" end_page="943" type="sub_section"> <SectionTitle> 3.3 Performance analysis of semantic depend- </SectionTitle> <Paragraph position="0"> ency graph To evaluate the performance, two systems were developed for comparison. One is based on the Bayes' classifier (Walker et al., 1997), and the other is the use of the partial pattern tree (Wu et al., 2004) to identify the speech act of the user's utterances. Since the dialogue discourse is defined as a sequence of speech acts. The prediction of speech act of the new input utterance becomes the core issue for discourse modeling. The accuracy for speech act identification is shown in Table 1.</Paragraph> <Paragraph position="1"> According to the observation of the results, semantic dependency graphs obtain obvious tification rate and the size of training corpus improvement compared to other approaches. The reason is that not only the meanings of the words or concepts but also the structural information and the implicit semantic relation defined in the knowledge base are needed to identify the speech act.Besides, taking the discourse into consideration will improve the prediction about the speech act of the new or next utterance. This means the discourse model can improve the accuracy of the speech act identification, that is to say, discourse modeling can help understand the user's desired intension especially when the answer is very short. For example, the user may only say &quot;yes&quot; or &quot;no&quot; for confirmation. The misclassification in speech act will happen due to the limited information.</Paragraph> <Paragraph position="2"> However, it can obtain better interpretation by introducing the semantic dependency relations as well as the discourse information.</Paragraph> <Paragraph position="3"> To obtain the single measurement, the average accuracy for speech act identification is shown in pendency graphs with the discourse. This means the information of the discourse can help speech act identification. And the semantic dependency graph outperforms the traditional approach due to the semantic analysis of words with their corresponding relations.</Paragraph> <Paragraph position="4"> The success of the dialog lies on the achievement of the common ground between users and machine which is the most important issue in dialogue management. To compare the semantic dependency graph with previous approaches, 150 individuals who were not involved in the development of this project were asked to use the dialogue system to measure the task success rate. To filter out the incomplete tasks, 131 dialogs were employed as the analysis data in this experiment. The results are listed in Table 2.</Paragraph> <Paragraph position="5"> and the number of dialogue turns between different approaches We found that the dialogue completion rate and the average length of the dialogs using the dependency graph are better than those using the Bayes' classifier and partial pattern tree approach. Two main reasons are concluded: First, dependency graph can keep the most important information in the user's utterance, while in semantic slot/frame approach, the semantic objects not matching the semantic slot/frame are generally filtered out. This approach is able to skip the repe tition or similar utterances to fill the same information in different semantic slots. Second, the dependency graph-based approach can provide the inference to help the interpretation of the user's intension.</Paragraph> <Paragraph position="6"> For semantic understanding, correct interpretation of the information from the user's utterances becomes inevitable. Correct speech act identification and correct extraction of the semantic objects are both important issues for semantic understanding in the spoken dialogue systems. Five main categories about medical application, clinic information, Dr.'s information, confirmation for the clinic information, registration time and clinic inference, are analyzed in this experiment.</Paragraph> <Paragraph position="7"> According to the results shown in Table 3, the worst condition happened in the query for the Dr.'s information using the partial pattern tree. The mis-identification of speech act results in the un-matched semantic slots/frames. This condition will not happen in semantic dependency graph, since the semantic dependency graph always keeps the most important semantic objects according to the dependency relations in the semantic dependency graph instead of the semantic slots.</Paragraph> <Paragraph position="8"> Rather than filtering out the unmatched semantic objects, the semantic dependency graph is constructed to keep the semantic relations in the utterance. This means that the system can preserve most of the user's information via the semantic dependency graphs. We can observe the identification rate of the speech act is higher for the semantic dependency graph than that for the partial pattern tree and Bayes' classifier as shown in Table 3.</Paragraph> </Section> </Section> class="xml-element"></Paper>