File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1122_metho.xml

Size: 14,943 bytes

Last Modified: 2025-10-06 14:07:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1122">
  <Title>Fertilization of Case Frame Dictionary for Robust Japanese Case Analysis</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Japanese Grammar
</SectionTitle>
    <Paragraph position="0"> We introduce Japanese grammar briefly in this section.</Paragraph>
    <Paragraph position="1"> Japanese is a head-final language. Word order does not play a case-marking role. Instead, postpositions function as case markers (CMs). The basic structure of a Japanese sentence is as follows:</Paragraph>
    <Paragraph position="3"> (he writes a book) ga and wo are postpositions which mean nominative and accusative, respectively. kare ga and hon wo are case components, and kaku is a verb1.</Paragraph>
    <Paragraph position="4"> There are two phenomena that case markers are hidden.</Paragraph>
    <Paragraph position="5"> A modifying clause is left to the modified noun in Japanese. In this paper, we call a noun modified by a clause clausal modifiee. A clausal modifiee is usually a case component for the verb of the modifying clause. There is, however, no case marker for their relation.  In (2), hito 'person' has ga 'nominative' relation to kaita 'write'. In (3), hon 'book' has wo 'accusative' relation to kaita 'write'.</Paragraph>
    <Paragraph position="6"> There are some non case-marking postpositions, such as wa and mo. They topicalize or emphasize noun phrases. We call them topic markers (TMs) and a phrase followed by one of them TM phrase.</Paragraph>
    <Paragraph position="7">  (he wrote a book also) In (4), wa is interpreted as ga 'nominative'. In (5), mo is interpreted as wo 'accusative'. 3 Construction of the initial case frame dictionary This section describes how to construct the initial case frame dictionary. This is the first stage of our two-stage approach, and is performed by the method proposed by (Kawahara and Kurohashi, 2001). In the rest of this section, we describe this approach in detail.</Paragraph>
    <Paragraph position="8"> The biggest problem in automatic case frame construction is verb sense ambiguity. Verbs which have different meanings should have different case frames, but it is hard to disambiguate verb senses very precisely. To deal with this problem, we distinguish predicate-argument examples, which are collected from a large corpus, by coupling a verb and its closest case component. That is, examples are not distinguished by verbs such as naru 'make/become' and tsumu 'load/accumulate', but by couples such as &amp;quot;tomodachi ni naru&amp;quot; 'make a friend', &amp;quot;byouki ni naru&amp;quot; 'become sick', &amp;quot;nimotsu wo tsumu&amp;quot; 'load baggage', and &amp;quot;keiken wo tsumu&amp;quot; 'accumulate experience'. This process makes separate case frames which have almost the same meaning or usage. For example, &amp;quot;nimotsu wo tsumu&amp;quot; 'load baggage' and &amp;quot;busshi wo tsumu&amp;quot; 'load supply' are separate case frames. To merge these similar case frames and increase coverage of the case frame, we cluster the case frames.</Paragraph>
    <Paragraph position="9"> We employ the following procedure for the  automatic case frame construction: 1. A large raw corpus is parsed by a Japanese  parser, and reliable predicate-argument examples are extracted from the parse results. Nouns with a TM such as wa or mo and clausal modifiees are discarded, because their case markers cannot be understood by syntactic analysis.</Paragraph>
    <Paragraph position="10"> 2. The extracted examples are bundled accordingtotheverbanditsclosestcasecom- null ponent, making initial case frames.</Paragraph>
    <Paragraph position="11"> 3. The initial case frames are clustered using a similarity measure, resulting in the final case frames. The similarity is calculated by using NTT thesaurus.</Paragraph>
    <Paragraph position="12"> We constructed a case frame dictionary from newspaper articles of 20 years (about 20,000,000 sentences).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Target expressions
</SectionTitle>
    <Paragraph position="0"> The following expressions could not be handled with the initial case frame dictionary shown in section 3, because of lack of information in the case frame.</Paragraph>
    <Paragraph position="1"> Non-gapping relation This is the case in which the clausal modifiee is not a case component of the verb in the modifying clause, but is semantically associated with the clause.</Paragraph>
    <Paragraph position="2">  (the meeting in which he has the initiative) In this example, kaigi 'meeting' is not a case component of nigiru 'have', and there is no case relation between kaigi and nigiru. We call this relation non-gapping relation.</Paragraph>
    <Paragraph position="3"> Double nominative sentence This is the case in which the verb has two nominatives in sentences such as the following.  (the engine of the car is good) In this example, wa plays a role of nominative, so yoi 'good' subcategorizes two nominatives: kuruma 'car' and engine. We call this outer nominative outer ga and this sentence double nominative sentence.</Paragraph>
    <Paragraph position="4"> Case change In Japanese, to express the same meaning, we can use different case markers. We call this  In this example, Mary has kara 'from' relation to eta 'derive'. In this paper, we handle case change related to no 'of', such as (no, kara). The following is an example that outer nominative is related to no case.</Paragraph>
    <Paragraph position="5">  (the engine of the car is good) The outer nominative of (7) can be nominal modifier of the inner nominative like this example. This is case change of (no, outer ga). There is a different case from the above that an NP with no modifying a case component does not have a case relation to the verb.  (he has the initiative in the meeting) In this example, kaigi 'meeting' has a no relation to syudoken 'initiative', but does not have a case relation to nigiru 'have'. This example is a transformation of (6), and includes case change of (no, non-gapping).</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Fertilization of case frame
</SectionTitle>
    <Paragraph position="0"> dictionary We construct a fertilized case frame dictionary from the initial case frame dictionary shown in section 3, to handle the complicated expressions described in section 4.</Paragraph>
    <Paragraph position="1"> We apply case analysis to a large corpus using the dictionary, collect information which could not be acquired by a mere parsing, and upgrade the case frame dictionary.</Paragraph>
    <Paragraph position="2"> The procedure is as follows (figure 1):  1. The initial case frames are acquired by the method shown in section 3.</Paragraph>
    <Paragraph position="3"> 2. Case analysis utilizing the case frames ac- null quired in phase 1 is applied to a large corpus, and examples of outer nominative are collected from case analysis results.</Paragraph>
    <Paragraph position="4">  3. Case analysis utilizing the case frames acquired in phase 2 is applied to the large corpus, and examples of non-gapping relation are collected similarly.</Paragraph>
    <Paragraph position="5"> 4. Case similarities are judged to handle case change.</Paragraph>
    <Paragraph position="6"> 5.1 Case analysis based on the initial case frame dictionary  Case analysis of TM phrases and clausal modifiees is indebted to a case frame dictionary. This section describes an example of case analysis utilizing the initial case frame dictionary.  nom person, child, C/C/C/ he acc book, paper, C/C/C/ book loc* library, house, C/C/C/ library kare 'he' and tosyokan 'library' correspond to nominative and locative, respectively, according to the surface cases. The case marker of TM phrase &amp;quot;hon wa&amp;quot; 'book (TM)' cannot be understood by the surface case, but it is interpreted as wo 'accusative' because of the matching between &amp;quot;hon wa&amp;quot; 'book (TM)' and the accusative case slot of the case frame (underlined in the</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Collecting examples of outer
</SectionTitle>
      <Paragraph position="0"> nominative In the initial case frame construction described in section 3, the TM phrase was discarded, because its case marker could not be understood by parsing. In the example (7), &amp;quot;engine ga yoi&amp;quot;  dictionary tells a case of a TM phrase. Correspondencetoouternominativecannotbeunder- null stood by the case slot matching, but indirectly. If the TM cannot correspond to any case slots of the initial case frame, the TM can be regarded as outer nominative. For example, in the case of (7), since the case frame of &amp;quot;engine ga yoi&amp;quot; 'the engine is good' has only nominative which corresponds to &amp;quot;engine&amp;quot;, the TM of &amp;quot;kuruma wa&amp;quot; cannot correspond to any case slots and is recognized as outer nominative. On the other hand, in the case of (11), the TM of hon wa is recognized as accusative, because hon 'book' is similar to the examples of the accusative slot. We can distinguish and collect outer nominative examples in this way.</Paragraph>
      <Paragraph position="1"> We apply the following procedure to each sentence which has both a TM and ga. To reduce the influence of parsing errors, the collection process of these sentences is done under the condition that a TM phrase has no candidates of its modifying head without its verb.</Paragraph>
      <Paragraph position="2"> 1. We apply case analysis to a verb which is a head of a TM phrase. If the verb does not have the closest case component and cannot select a case frame, we quit processing this sentence and proceed to the next sentence. In this phase, the TM phrase is not made correspondence with a case of the selected case frame.</Paragraph>
      <Paragraph position="3"> 2. If the case frame does not have any cases which have no correspondence with the case components in the input, the TM cannot correspond to any case slots and is regarded as outer nominative. This TM phrase is added to outer nominative examples of the case frame.</Paragraph>
      <Paragraph position="4"> The following is an example of this process.</Paragraph>
      <Paragraph position="5">  Case analysis of this example chooses the following case frame &amp;quot;futan ga kakaru&amp;quot; 'impose a burden'.</Paragraph>
      <Paragraph position="6"> CM examples input impose nom* burden burdendat heart, legs, loins, C/C/C/ legs and loins futan 'burden' and ashi-koshi 'legs and loins' correspond to nominative and dative of the case frame, respectively, and sumo corresponds to no case marker. Accordingly, the TM of &amp;quot;sumo wa&amp;quot; is recognized as outer nominative, and sumo is added to outer nominative examples of the case frame &amp;quot;futan ga kakaru&amp;quot;.</Paragraph>
      <Paragraph position="7"> This process made outer nominative of 15,302 case frames (of 597 verbs).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Collecting examples of non-gapping
</SectionTitle>
      <Paragraph position="0"> relation Examples of non-gapping relation can be collected in a similar way to outer nominative. When a clausal modifiee has non-gapping relation, it should not be similar to any examples of any cases in the case frame, because the constructed case frames have examples of only casesexceptfornon-gappingrelation. Fromthis point of view, we apply the following procedure to each example sentence which contains a modifying clause. To reduce the influence of parsing errors, the collection process of example sentences is done under the condition that a verb in a clause has no candidates of its modifying head without its clausal modifiee (&amp;quot;C/C/C/ [modifying verb] N1 no N2&amp;quot; is not collected). 1. We apply case analysis to a verb which is contained by a modifying clause. If the verb does not have the closest case componentandcannotselectacaseframe, wequit processing this sentence and proceed to the next sentence. In this phase, the clausal modifiee is not made correspondence with a case of the selected case frame.</Paragraph>
      <Paragraph position="1"> 2. If the similarity between the clausal modifiee and examples of any cases which have no correspondence with input case components does not exceed a threshold, this clausal modifiee is added to examples of non-gapping relation in the case frame. We set the threshold 0.3 empirically.</Paragraph>
      <Paragraph position="2"> The following is an example of this process.</Paragraph>
      <Paragraph position="3">  (` got a license to carry on business) Case analysis of this example chooses the followingcaseframe&amp;quot;fgyomu, businessgwo itonamu&amp;quot; 'carry on f work, business g'.</Paragraph>
      <Paragraph position="4"> CM examples input carry on nom bank, company, C/C/C/ -acc* work, business business Nominative of this case frame has no correspondence with a case component of the input, so the clausal modifiee, menkyo 'license', is checked whether it can correspond to nominative case examples. In this case, the similarity between menkyo 'license' and examples of nominative is not so high. Consequently, the relation of menkyo 'license' is recognized as non-gapping relation, and menkyo is added to examples of non-gapping relation in the case frame &amp;quot;fgyomu, businessg wo itonamu&amp;quot;.</Paragraph>
      <Paragraph position="5">  (suspect that ` carried on telephone business illegally) In this case, the above case frame is also selected. Since utagai 'suspect' is not similar to the nominative case examples, it is added to case examples of non-gapping relation in the case frame.</Paragraph>
      <Paragraph position="6"> This process made non-gapping relation of 23,094 case frames (of 637 verbs).</Paragraph>
      <Paragraph position="7"> Collecting examples of non-gapping relation for all the case frames Non-gapping relation words which have wide distribution over verbs can be considered to have non-gapping relation for all the verbs or case frames. We add these words to examples of non-gapping relation of all the case frames. For example, 5 verbs have menkyo 'license' (example (13)) in their non-gapping relation, and 381 verbs have utagai 'suspect' (example (14)). We, consequently, judge utagai has non-gapping relation for all the case frames. We call such a word global non-gapping word.</Paragraph>
      <Paragraph position="8"> We treated words which have non-gapping relation for more than 100 verbs as global non-gapping words. We acquired 128 global non-gapping words, and the following is the examples of them (in English).</Paragraph>
      <Paragraph position="9"> possibility, necessity, result, course, case, thought, schedule, outlook, plan, chance, C/C/C/</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML