File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1136_metho.xml
Size: 11,905 bytes
Last Modified: 2025-10-06 14:07:52
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1136"> <Title>Stochastic Dependency Parsing of Spontaneous Japanese Spoken Language</Title> <Section position="3" start_page="1" end_page="2" type="metho"> <SectionTitle> 2 Linguistic Analysis of Spontaneous </SectionTitle> <Paragraph position="0"> Speech We have investigated spontaneously spoken utterances in an in-car speech dialogue corpus which is constructed at the Center for Integrated Acoustic Information Research (CIAIR), Nagoya University (Kawaguchi et al., 2001) The corpus contains speeches of dialogue between drivers and navigators (humans, a Wizard of OZ system, or a spoken dialogue system) and their transcripts.</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 2.1 CIAIR In-car Speech Dialogue Corpus </SectionTitle> <Paragraph position="0"> Data collection project of in-car speech dialogues at CIAIR has started in 1999 (Kawaguchi et al., 2002). The project has developed a private car, and been collecting a total of about 140 hours of multimodal data such as speeches, images, locations and so on. These data would be available for investigating in-car speech dialogues. null The speech files are transcribed into ASCII text files by hand. The example of a transcript is shown in Figure 1. As an advance analysis, discourse tags are assigned to fillers, hesitations, and so on. Furthermore, each speech is segmented into utterance units by a pause, and the exact start time and end time are provided for them. The environmental information about sex (male/female), speaker's role (driver/navigator), dialogue task speech (navigation/information retrieval/...), noise (noisy/clean) is provided for each utterance unit.</Paragraph> <Paragraph position="1"> In order to study the features of in-car dialogue speeches, we have investigated all driver's utterance units of 195 dialogues. The number per utterance unit of fillers, hesitations and slips, are 0.34, 0.07, 0,04, respectively. The fact that the frequencies are not less than those of human-human conversations suggests the in-car speech of the corpus to be spontaneous.</Paragraph> </Section> <Section position="2" start_page="1" end_page="2" type="sub_section"> <SectionTitle> 2.2 Dependency Structure of Spoken Language </SectionTitle> <Paragraph position="0"> In order to characterize spontaneous dialogue speeches from the viewpoint of dependency, we have constructed a spoken language corpus with dependency structures. Dependency analyses have been provided by hand for all driver's utterance units in 81 spoken dialogues of the in-car speech corpus. The specifications of part-of-speeches and dependency grammars are in accordance with those of Kyoto Corpus (Kurohashi and Nagao, 1997), which is one of Japanese text corpora. We have provided the following criteria for the linguistic phenomena peculiar to spoken language: * There is no bunsetsu on which fillers and hesitations depend. They forms dependency structures independently.</Paragraph> <Paragraph position="1"> * A bunsetsu whose head bunsetsu is omitted doesn't depend on any bunsetsu.</Paragraph> <Paragraph position="2"> * The specification of part-of-speeches has been provided for the phrases peculiar to spoken language by adding lexical entries to the dictionary.</Paragraph> <Paragraph position="3"> * We have defined one conversational turn as a unit of dependency parsing. The depen- null dencies might be over two utterance units, but be not hardly over two conversational turns.</Paragraph> <Paragraph position="4"> The outline of the corpus with dependency analyses is shown in Table 1. There exist 11,789 dependencies for 24,993 bunsetsus . The average number of dependencies per turn is 1.94, and is exceedingly less than that of written language such as newspaper articles (about 10 dependencies). This does not necessarily mean that dependency parsing of spoken language is easy than that of written language. It is also required to specify the bunsetsu with no head bunsetsu because every bunsetsu does not depend on another bunsetsu. In fact, the bunsetsus which don't have the head bunsetsu occupy 52.8% of the whole.</Paragraph> <Paragraph position="5"> Next, we investigated inversion phenomena and dependencies over two utterance units. 320 inversions, 3.8% of all utterance turns and about 0.04 times per turn, are in this data. This fact means that the inversion phenomena can not be ignored in spoken language processing. About 86.5% of inversions appear at the last bunsetsu. On the other hand, 73 dependencies, providing 5.4% of 1,362 turns consisting of more than two units, are over two utterance units. Therefore, we can conclude that utterance units are not always sufficient as parsing units of spoken language.</Paragraph> </Section> </Section> <Section position="4" start_page="2" end_page="3" type="metho"> <SectionTitle> 3 Stochastic Dependency Parsing </SectionTitle> <Paragraph position="0"> Our method provides the most plausible dependency analysis for each spoken language utterance unit by relaxing syntactic constraints and utilizing stochastic information acquired from a large-scale spoken dialogue corpus. In this paper, we define one turn as a parsing unit accord- null its frequencies ing to the result of our investigation described in Section 2.2</Paragraph> <Section position="1" start_page="2" end_page="3" type="sub_section"> <SectionTitle> 3.1 Dependency Structural Constraints </SectionTitle> <Paragraph position="0"> As Section 1 has already pointed out, most conventional techniques for Japanese dependency parsing have assumed three syntactic constraints. Since the phenomena which are not hardly in written language appear frequently in spoken language, the actual dependency structure does not satisfy such the constraints. Our method relaxes the constraints for the purpose of robust dependency parsing. That is, our method considers that the bunsetsus, which don't have the head bunsetsu, such as fillers and hesitations, depend on themselves (relaxing the constraint that each bunsetsu depends on another only one bunsetsu). Moreover, we permit that a bunsetsu depends on its left-side bunsetsu to cope with the inversion phenomena (relaxing the constraint that dependencies are directed from left to right) .</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 3.2 Utilizing Stochastic Information </SectionTitle> <Paragraph position="0"> Our method calculates the plausibility of the dependency structure by utilizing the stochastic information. The following attributes are used for that: Since the phenomena that dependencies cross each other is very few, the constraint is not relaxed.</Paragraph> <Paragraph position="2"> has an ancillary word, the type of the dependencies of a bunsetsu</Paragraph> <Paragraph position="4"> , is the lexicon, part-of-speech and conjugated form of the word, and if not so, r</Paragraph> <Paragraph position="6"> part-of-speech and the conjugated form of the last morpheme. Table 2 shows several examples of the types of dependencies. The location of the dependent bunsetsu means whether it is the last one of the turn or not. As Section 2 indicates, the method uses the location attribute for calculating the probability of the inversion, because most inverse phenomena tend to appear at the last of the turn.</Paragraph> <Paragraph position="7"> The probability of the dependency between bunsetsus are calculated using these attribute values as follows:</Paragraph> <Paragraph position="9"> Here, C is a cooccurrence frequency function and B is a sequence of bunsetsus (b In the formula (1), the first term of the right hand side expresses the probability of cooccurrence between the independent words, and the second term does that of the distance between bunsetsus. The problem of data sparseness is reduced by considering these phenomena to be independent each other and separating the probabilities into two terms. The probability that a bunsetsu which doesn't have the head bunsetsu can also be calculated in formula (1) by considering such the bunsetsu to depend on itself (i.e., i = j). The probability that a dependency structure for a sequence of bunsetsus B is S can be calculated from the dependency probabilities between bunsetsus as follows.</Paragraph> <Paragraph position="11"> For a sequence of bunsetsus, B, the method identifies the dependency structure with structure whose probability is maximum to be the most plausible one.</Paragraph> </Section> <Section position="3" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 3.3 Parsing Example </SectionTitle> <Paragraph position="0"> The parsing example of a user's utterance sentence including a filler &quot;eto&quot;, a hesitation &quot;so&quot;, a inversion between &quot;nai-ka-na&quot; and &quot;chikaku-ni&quot;, and a pause, &quot;Eto konbini nai-kana <pause> so sono chikaku-ni (Is there a convenience store near there?)&quot; is as follows: The sequence of bunsetsus of the sentence is &quot;[eto (well)],[konbini (convenience store)],[naika-na (Is there?)],<pause> , [so], [sono (there)], [chikaku-ni (near)]&quot;. The types of dependent of bunsetsus and the dependency probabilities between bunsetsus are shown in Table 2 and 3, respectively. Table 3 expresses that, for instance, the probability that &quot;konbini&quot; depends on &quot;nai-ka-na&quot; is 0.40. Moreover, the probability of that &quot;eto&quot; depends on &quot;eto&quot; means that the probability of that &quot;eto&quot; does not depend on any bunsetsu. Calculating the probability of every possible structure according to Table 3, that of the dependency structure shown in Figure 3 becomes the maximum.</Paragraph> <Paragraph position="1"> (a): The result for 241 bunsetsus with a head (b): The result for 240 bunsetsus with no head (a)+(b): The result for 481 bunsetsus</Paragraph> </Section> </Section> <Section position="5" start_page="3" end_page="3" type="metho"> <SectionTitle> 4 Parsing Experiment </SectionTitle> <Paragraph position="0"> In order to evaluate the effectiveness of our method, an experiment on dependency parsing has been made using a corpus of CIAIR (Kawaguchi et al., 2001).</Paragraph> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.1 Outline of Experiment </SectionTitle> <Paragraph position="0"> We used the same data as that for our investigations in Section 2.2. That is, among all driver's utterance units of 81 dialogues, 100 turns were used for the test data, and 5978 turns for the learning data. The test data, the average bunsetsus per turn is 4.81, consists of 481 dependencies. null</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.2 Experimental Result </SectionTitle> <Paragraph position="0"> The results of the parsing experiment are shown partially in Figure 4. Table 4 shows the evaluation. For the parsing accuracy, both precision and recall are measured. 355 of 415 dependencies extracted by our method are correct dependencies, providing 85.5% for precision rate and 73.8% for recall rate. We have confirmed that the parsing accuracy of our method for spoken language is as high as that of another methods for written language (Fujio and Matsumoto, 1998; Uchimoto et al., 1999).</Paragraph> <Paragraph position="1"> Our method correctly specified 200 of 240 bunsetsus which don't have the head bunsetsu.</Paragraph> <Paragraph position="2"> Most of them are fillers, hesitations and so on.</Paragraph> <Paragraph position="3"> It became clear that it is effective to utilize the dependency probabilities for identifying them.</Paragraph> </Section> </Section> class="xml-element"></Paper>