File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1022_intro.xml

Size: 5,326 bytes

Last Modified: 2025-10-06 14:03:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1022">
  <Title>Dependency Parsing of Japanese Spoken Monologue Based on Clause Boundaries Tomohiro Ohno+a) Shigeki Matsubara++ Hideki KashiokaSS</Title>
  <Section position="3" start_page="169" end_page="170" type="intro">
    <SectionTitle>
2 Parsing Unit of Japanese Monologues
</SectionTitle>
    <Paragraph position="0"> Our method achieves an efficient parsing by adopting a shorter unit than a sentence as a parsing unit.</Paragraph>
    <Paragraph position="1"> Since the search range of a dependency relation can be narrowed by dividing a long monologue sentence into small units, we can expect the parsing time to be shortened.</Paragraph>
    <Section position="1" start_page="169" end_page="169" type="sub_section">
      <SectionTitle>
2.1 Clauses and Dependencies
</SectionTitle>
      <Paragraph position="0"> In Japanese, a clause basically contains one verb phrase. Therefore, a complex sentence or a compound sentence contains one or more clauses.</Paragraph>
      <Paragraph position="1"> Moreover, since a clause constitutes a syntactically sufficient and semantically meaningful language unit, it can be used as an alternative parsing unit to a sentence.</Paragraph>
      <Paragraph position="2"> Our proposed method assumes that a sentence is a sequence of one or more clauses, and every bunsetsu in a clause, except the final bunsetsu, depends on another bunsetsu in the same clause.</Paragraph>
      <Paragraph position="3"> As an example, the dependency structure of the  poll that the Prime Minister's Office announced the other day indicates that the ratio of people advocating capital punishment is nearly 80%) is presented in Fig. 1. This sentence consists of four clauses:  (the ratio of people is nearly 80%) Each clause forms a dependency structure (solid arrows in Fig. 1), and a dependency relation from the final bunsetsu links the clause with another clause (dotted arrows in Fig. 1).</Paragraph>
    </Section>
    <Section position="2" start_page="169" end_page="170" type="sub_section">
      <SectionTitle>
2.2 Clause Boundary Unit
</SectionTitle>
      <Paragraph position="0"> In adopting a clause as an alternative parsing unit, it is necessary to divide a monologue sentence into clauses as the preprocessing for the following dependency parsing. However, since some kinds of clauses are embedded in main clauses, it is fundamentally difficult to divide a monologue into clauses in one dimension (Kashioka and Maruyama, 2004).</Paragraph>
      <Paragraph position="1"> Therefore, by using a clause boundary annotation program (Maruyama et al., 2004), we approximately achieve the clause segmentation of a monologue sentence. This program can identify units corresponding to clauses by detecting the end boundaries of clauses. Furthermore, the program can specify the positions and types of clause boundaries simply from a local morphological analysis. That is, for a sentence morphologically analyzed by ChaSen (Matsumoto et al., 1999), the positions of clause boundaries are identified and clause boundary labels are inserted there. There exist 147 labels such as &amp;quot;compound clause&amp;quot; and &amp;quot;adnominal clause.&amp;quot; 2 In our research, we adopt the unit sandwiched between two clause boundaries detected by clause boundary analysis, were called the clause boundary unit, as an alternative parsing unit. Here, we regard the label name provided for the end boundary of a clause boundary unit as that unit's type. 2The labels include a few other constituents that do not strictly represent clause boundaries but can be regarded as being syntactically independent elements, such as &amp;quot;topicalized element,&amp;quot; &amp;quot;conjunctives,&amp;quot; &amp;quot;interjections,&amp;quot; and so on.</Paragraph>
    </Section>
    <Section position="3" start_page="170" end_page="170" type="sub_section">
      <SectionTitle>
2.3 Relation between Clause Boundary Units
and Dependency Structures
</SectionTitle>
      <Paragraph position="0"> To clarify the relation between clause boundary units and dependency structures, we investigated the monologue corpus &amp;quot;Asu-Wo-Yomu 3.&amp;quot; In the investigation, we used 200 sentences for which morphological analysis, bunsetsu segmentation, clause boundary analysis, and dependency parsing were automatically performed and then modified by hand. Here, the specification of the parts-of-speech is in accordance with that of the IPA parts-of-speech used in the ChaSen morphological analyzer (Matsumoto et al., 1999), the rules of the bunsetsu segmentation with those of CSJ (Maekawa et al., 2000), the rules of the clause boundary analysis with those of Maruyama et al. (Maruyama et al., 2004), and the dependency grammar with that of the Kyoto Corpus (Kurohashi and Nagao, 1997).</Paragraph>
      <Paragraph position="1"> Table 1 shows the results of analyzing the 200 sentences. Among the 1,479 bunsetsus in the difference set between all bunsetsus (2,430) and the final bunsetsus (951) of clause boundary units, only 94 bunsetsus depend on a bunsetsu located outside the clause boundary unit. This result means that 93.6% (1,385/1,479) of all dependency relations are within a clause boundary unit. Therefore, the results confirmed that the assumption made by our research is valid to some extent.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML