File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1136_intro.xml

Size: 3,991 bytes

Last Modified: 2025-10-06 14:01:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1136">
  <Title>Stochastic Dependency Parsing of Spontaneous Japanese Spoken Language</Title>
  <Section position="2" start_page="0" end_page="1" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> With the recent advances of the continuous speech recognition technology, a considerable number of studies have been made on spoken dialogue systems. For the purpose of smooth interaction with the user, it is necessary for the system to understand the spontaneous speech.</Paragraph>
    <Paragraph position="1"> Since spoken language includes a lot of grammatically ill-formed linguistic phenomena such as fillers, hesitations and self-repairs, grammaroriented approaches are not necessarily suited to spoken language processing. A technique for robust parsing is thus strongly required.</Paragraph>
    <Paragraph position="2"> This paper describes the characteristic features of Japanese spoken language on the basis of investigating a large-scale spoken dialogue corpus from the viewpoint of dependency, and moreover, proposes a method of dependency parsing by taking account of such the features.</Paragraph>
    <Paragraph position="3"> The conventional methods of dependency parsing have assumed the following three syntactic  constraints (Kurohashi and Nagao, 1994): 1. No dependency is directed from right to left.</Paragraph>
    <Paragraph position="4"> 2. Dependencies don't cross each other.</Paragraph>
    <Paragraph position="5"> 3. Each bunsetsu  , except the last one, depends on only one bunsetsu.</Paragraph>
    <Paragraph position="6"> As far as we have investigated the corpus, however, many spoken utterance do not satisfy these constraints because of inversion phenomena, bunsetsus which don't have the head bunsetsu, and so on. Therefore, our parsing method relaxes the first and third ones among the above three constraints, that is, permits the dependency direction from right to left and the bunsetsu which doesn't depend on any bunsetsu. The parsing results are expressed by partial dependency structures.</Paragraph>
    <Paragraph position="7"> The method acquires in advance the probabilities of dependencies from a spoken dialogue corpus tagged with dependency structures, and provides the most plausible dependency structure for each utterance on the basis of the probabilities. Several techniques for dependency parsing based on stochastic approaches have been proposed so far. Fujio and Matsumoto have used the probability based on the frequency of cooccurrence between two bunsetsus for dependency parsing (Fujio and Matsumoto, 1998). Uchimoto et al. have proposed a technique for learning the dependency probability model based on a maximum entropy method (Uchimoto et al., 1999). However, since these  A bunsetsu is one of the linguistic units in Japanese, and roughly corresponds to a basic phrase in English. A bunsetsu consists of one independent word and more than zero ancillary words. A dependency is a modification relation between two bunsetsus.</Paragraph>
    <Paragraph position="8"> techniques are for written language, whether they are available for spoken language or not is not clear. As the technique for stochastic parsing of spoken language, Den has suggested a new idea for detecting and parsing self-repaired expressions, however, the phenomena with which the framework can cope are restricted (Den, 1995).</Paragraph>
    <Paragraph position="9"> On the other hand, our method provides the most plausible dependency structures for natural speeches by utilizing stochastic information. In order to evaluate the effectiveness of our method, an experiment on dependency parsing has been made. In the experiment, all driver's utterances in 81 spoken dialogues of CIAIR in-car speech dialogue corpus (Kawaguchi et al., 2001) have been used. The experimental result has shown our method to be available for robust parsing of spontaneously spoken language.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML