File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-1023_intro.xml

Size: 4,981 bytes

Last Modified: 2025-10-06 14:03:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1023">
  <Title>A Fully-Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis</Title>
  <Section position="2" start_page="0" end_page="176" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Case structure (predicate-argument structure or logical form) represents what arguments are related to a predicate, and forms a basic unit for conveying the meaning of natural language text. Identifying such case structure plays an important role in natural language understanding.</Paragraph>
    <Paragraph position="1"> In English, syntactic case structure can be mostly derived from word order. For example, the left argument of the predicate is the subject, and the right argument of the predicate is the object in most cases.</Paragraph>
    <Paragraph position="2"> Blaheta and Charniak proposed a statistical method [?]Currently, National Institute of Information and Communications Technology, JAPAN, dk@nict.go.jp +Currently, Graduate School of Informatics, Kyoto University, kuro@i.kyoto-u.ac.jp for analyzing function tags in Penn Treebank, and achieved a really high accuracy of 95.7% for syntactic roles, such as SBJ (subject) and DTV (dative) (Blaheta and Charniak, 2000). In recent years, there have been many studies on semantic structure analysis (semantic role labeling) based on PropBank (Kingsbury et al., 2002) and FrameNet (Baker et al., 1998). These studies classify syntactic roles into semantic ones such as agent, experiencer and instrument. null Case structure analysis of Japanese is very different from that of English. In Japanese, postpositions are used to mark cases. Frequently used postpositions are &amp;quot;ga&amp;quot;, &amp;quot;wo&amp;quot; and &amp;quot;ni&amp;quot;, which usually mean nominative, accusative and dative. However, when an argument is followed by the topic-marking post-position &amp;quot;wa&amp;quot;, its case marker is hidden. In addition,case-markingpostpositionsareoftenomittedin null Japanese. These troublesome characteristics make Japanese case structure analysis very difficult.</Paragraph>
    <Paragraph position="3"> To address these problems and realize Japanese case structure analysis, wide-coverage case frames are required. For example, let us describe how to apply case structure analysis to the following sentence: null bentou-wa taberu lunchbox-TM eat (eat lunchbox) In this sentence, taberu (eat) is a verb, and bentou-wa (lunchbox-TM) is a case component (i.e. argument) of taberu. The case marker of &amp;quot;bentou-wa&amp;quot; is hidden by the topic marker (TM) &amp;quot;wa&amp;quot;. The analyzer matches &amp;quot;bentou&amp;quot; (lunchbox) with the most  suitable case slot (CS) in the following case frame of &amp;quot;taberu&amp;quot; (eat).</Paragraph>
    <Paragraph position="4"> CS examples taberu ga person, child, boy, ***wo lunch, lunchbox, dinner, *** Since &amp;quot;bentou&amp;quot; (lunchbox) is included in &amp;quot;wo&amp;quot; examples, its case is analyzed as &amp;quot;wo&amp;quot;. As a result, we obtain the case structure &amp;quot;ph:ga bentou:wo taberu&amp;quot;, which means that &amp;quot;ga&amp;quot; (nominative) argument is omitted,and&amp;quot;wo&amp;quot;(accusative)argumentis&amp;quot;bentou&amp;quot; (lunchbox). In this paper, we run such case structure analysis based on example-based case frames that are constructed from a huge raw corpus in an unsupervised manner.</Paragraph>
    <Paragraph position="5"> Let us consider syntactic analysis, into which our method of case structure analysis is integrated. Recently, many accurate statistical parsers have been proposed (e.g., (Collins, 1999; Charniak, 2000) for English, (Uchimoto et al., 2000; Kudo and Matsumoto, 2002) for Japanese). Since they somehow uselexicalinformationinthetaggedcorpus,theyare called &amp;quot;lexicalized parsers&amp;quot;. On the other hand, unlexicalizedparsersachievedanalmostequivalentac- null curacy to such lexicalized parsers (Klein and Manning, 2003; Kurohashi and Nagao, 1994). Accordingly, we can say that the state-of-the-art lexicalized parsers are mainly based on unlexical (grammatical) information due to the sparse data problem. Bikel also indicated that Collins' parser can use bilexical dependencies only 1.49% of the time; the rest of the time, it backs off to condition one word on just phrasal and part-of-speech categories (Bikel, 2004).</Paragraph>
    <Paragraph position="6"> This paper aims at exploiting much more lexical information, and proposes a fully-lexicalized probabilistic model for Japanese syntactic and case structure analysis. Lexical information is extracted not  fromasmalltaggedcorpus,butfromahugerawcorpus as case frames. This model performs case structure analysis by a generative probabilistic model based on the case frames, and selects the syntactic structure that has the highest case structure probability. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML