File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1002_intro.xml

Size: 3,445 bytes

Last Modified: 2025-10-06 14:02:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1002">
  <Title>Linear-Time Dependency Analysis for Japanese</Title>
  <Section position="3" start_page="0" end_page="1" type="intro">
    <SectionTitle>
2 Parsing Japanese
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="1" type="sub_section">
      <SectionTitle>
2.1 Syntactic Properties of Japanese
</SectionTitle>
      <Paragraph position="0"> The Japanese language is basically an SOV language. Word order is relatively free. In English the syntactic function of each word is represented with word order, while in Japanese postpositions represent the syntactic function of each word. For example, one or more postpositions following a noun play a similar role to declension of nouns in German, which indicates a grammatical case.</Paragraph>
      <Paragraph position="1"> Based on such properties, a bunsetsu  was devised and has been used to analyze syntactically a sentence in Japanese. A bunsetsu consists of one or more content words followed by zero or more function words. By defining a bunsetsu like that, we can analyze a sentence in a similar way that is used when analyzing a grammatical role of words in inflecting languages like German.</Paragraph>
      <Paragraph position="2"> Thus, strictly speaking, bunsetsu order rather than word order is free except the bunsetsu that contains a main verb of a sentence. Such bunsetsu must be placed at the end of the sentence. For example, the following two sentences have an identical meaning: (1) Ken-ga kanojo-ni hon-wo age-ta. (2) Ken-ga hon-wo kanojo-ni age-ta. (-ga: subject marker, -ni: dative case particle, -wo: accusative case particle. English translation: Ken gave a book to her.) Note that the rightmost bunsetsu 'age-ta,' which is composed of a verb stem and a past tense marker, has to be placed at the end of the sentence.</Paragraph>
      <Paragraph position="3">  'Bunsetsu' is composed of two Chinese characters, i.e., 'bun' and 'setsu.' 'Bun' means a sentence and 'setsu' means a segment. A 'bunsetsu' is considered to be a small syntactic segment in a sentence. A eojeol in Korean (Yoon et al., 1999) is almost the same concept as a bunsetsu. Chunks defined in (Abney, 1991) for English are also very similar to bunsetsus. We here list the constraints of Japanese dependency including ones mentioned above.</Paragraph>
      <Paragraph position="4"> C1. Each bunsetsu has only one head except the rightmost one.</Paragraph>
      <Paragraph position="5"> C2. Each head bunsetsu is always placed at the right hand side of its modifier.</Paragraph>
      <Paragraph position="6"> C3. Dependencies do not cross one another.</Paragraph>
      <Paragraph position="7"> These properties are basically shared also with Korean and Mongolian.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.2 Typical Steps of Parsing Japanese
</SectionTitle>
      <Paragraph position="0"> Since Japanese has the properties above, the following steps are very common in parsing Japanese:  1. Break a sentence into morphemes (i.e. morphological analysis).</Paragraph>
      <Paragraph position="1"> 2. Chunk them into bunsetsus.</Paragraph>
      <Paragraph position="2"> 3. Analyze dependencies between these bunsetsus. null 4. Label each dependency with a semantic role  such as agent, object, location, etc. We focus on dependency analysis in Step 3.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML