File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1078_metho.xml

Size: 17,463 bytes

Last Modified: 2025-10-06 14:07:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1078">
  <Title>A Probabilistic Method for Analyzing Japanese Anaphora Integrating Zero Pronoun Detection and Resolution</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 A System for Analyzing Japanese
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Zero Pronouns
2.1 Overview
</SectionTitle>
      <Paragraph position="0"> Figure 1 depicts the overall design of our system to analyze Japanese zero pronouns. We explain the entire process based on this figure.</Paragraph>
      <Paragraph position="1"> First, given an input Japanese text, our system performs morphological and syntactic analyses. In the case of Japanese, morphological analysis involves word segmentation and part-of-speech tagging because Japanese sentences lack lexical segmentation, for which we use the JUMAN morphological analyzer (Kurohashi and Nagao, 1998b). Then, we use the KNP parser (Kurohashi, 1998) to identify syntactic relations between segmented words.</Paragraph>
      <Paragraph position="2"> Second, in a zero pronoun detection phase, the system uses syntactic relations to detect omitted cases (nominative, accusative, and dative) as zero pronoun candidates. To avoid zero pronouns overdetected, we use the IPAL verb dictionary (Information-technology Promotion Agency, 1987) including case frames associated with 911 Japanese verbs. We discard zero pronoun candidates unlisted in the case frames associated with a verb in question.</Paragraph>
      <Paragraph position="3"> For verbs unlisted in the IPAL dictionary, only nominative cases are regarded as obligatory. The system also computes a probability that case c related to target verb v is a zero pronoun, P zero (c|v), to select plausible zero pronoun candidates.</Paragraph>
      <Paragraph position="4"> Ideally, in the case where a verb in question is polysemous, word sense disambiguation is needed to select the appropriate case frame, because different verb senses often correspond to different case frames. However, we currently merge multiple case frames for a verb into a single frame so as to avoid the polysemous problem. This issue needs to be further explored. Third, in a zero pronoun resolution (i.e., antecedent identification) phase, for each zero pronoun the system extracts antecedent candidates from the preceding contexts, which are ordered according to the extent to which they can be the antecedent for the target zero pronoun. From input text  the viewpoint of probability theory, our task here is to compute a probability that zero pronoun ph refers to antecedent a</Paragraph>
      <Paragraph position="6"> |ph), and select the candidate that maximizes the probability score. For the purpose of computing this score, we model zero pronouns and antecedents in Section 2.2.</Paragraph>
      <Paragraph position="7"> Finally, the system outputs texts containing anaphoric relations. In addition, the number of zero pronouns analyzed by the system can optionally be controlled based on the certainty score described in Section 2.4.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Modeling Zero Pronouns and
Antecedents
</SectionTitle>
      <Paragraph position="0"> According to past literature associated with zero pronoun resolution and our preliminary study, we use the following six features to model zero pronouns and antecedents.</Paragraph>
      <Paragraph position="1">  * Features for zero pronouns - Verbs that govern zero pronouns (v), which denote verbs whose cases are omitted.</Paragraph>
      <Paragraph position="2"> - Surface cases related to zero pronouns (c),  for which possible values are Japanese case marker suffixes, ga (nominative), wo (accusative), and ni (dative). Those values indicate which cases are omitted.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="25" type="metho">
    <SectionTitle>
* Features for antecedents
</SectionTitle>
    <Paragraph position="0"> - Post-positional particles (p), which play crucial roles in resolving Japanese zero pronouns (Kameyama, 1986; Walker et al., 1994).</Paragraph>
    <Paragraph position="1"> - Distance (d), which denotes the distance (proximity) between a zero pronoun and an antecedent candidate in an input text. In the case where they occur in the same sentence, its value takes 0. In the case where an antecedent occurs in n sentences previous to the sentence including a zero pronoun, its value takes n.</Paragraph>
    <Paragraph position="2"> - Constraint related to relative clauses (r), which denotes whether an antecedent is included in a relative clause or not. In the case where it is included, the value of r takes true, otherwise false. The rationale behind this feature is that Japanese zero pronouns tend not to refer to noun phrases in relative clauses.</Paragraph>
    <Paragraph position="3"> - Semantic classes (n), which represent semantic classes associated with antecedents. We use 544 semantic classes defined in the Japanese Bunruigoihyou thesaurus (National Language Research Institute, 1964), which contains 55,443 Japanese nouns.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Our Probabilistic Model for Zero
Pronoun Detection and Resolution
</SectionTitle>
      <Paragraph position="0"> We consider probabilities that unsatisfied case c related to verb v is a zero pronoun, P zero (c|v), and that zero pronoun ph</Paragraph>
      <Paragraph position="2"> ). Thus, a probability that case c (ph c ) is zero-pronominalized and refers to candidate</Paragraph>
      <Paragraph position="4"> is formalized as in Equation (1).</Paragraph>
      <Paragraph position="6"> ) are computed in the detection and resolution phases, respectively (see Figure 1).</Paragraph>
      <Paragraph position="7"> Since zero pronouns are omitted obligatory cases, whether or not case c is a zero pronoun depends on the extent to which case c is obligatory for verb v. Case c is likely to be obligatory for verb v if c frequently co-occurs with v. Thus, we compute P zero (c|v) based on the co-occurrence frequency of &lt;v,c&gt; pairs, which can be extracted from unannotated corpora.</Paragraph>
      <Paragraph position="9"> (c|v) takes 1 in the case where c is ga (nominative) regardless of the target verb, because ga is obligatory for most Japanese verbs.</Paragraph>
      <Paragraph position="10"> Given the formal representation for zero pronouns and antecedents in Section 2.2, the probability, P(a|ph), is expressed as in Equation (2).</Paragraph>
      <Paragraph position="12"> To improve the efficiency of probability estimation, we decompose the right-hand side of Equation (2) as follows.</Paragraph>
      <Paragraph position="13"> Since a preliminary study showed that d</Paragraph>
      <Paragraph position="15"> were relatively independent of the other features, we approximate Equation (2) as in Equation (3).</Paragraph>
      <Paragraph position="17"> can further approximate Equation (3) to derive Equation (4).</Paragraph>
      <Paragraph position="19"> Here, the first three factors, P(p</Paragraph>
      <Paragraph position="21"> ), are related to syntactic properties, and</Paragraph>
      <Paragraph position="23"> |v,c) is a semantic property associated with zero pronouns and antecedents. We shall call the former and latter &amp;quot;syntactic&amp;quot; and &amp;quot;semantic&amp;quot; models, respectively.</Paragraph>
      <Paragraph position="24"> Each parameter in Equation (4) is computed as in Equations (5), where F(x) denotes the frequency of x in corpora annotated with anaphoric relations.</Paragraph>
      <Paragraph position="26"> |v,c), needs large-scale annotated corpora, the data sparseness problem is crucial. Thus, we explore the use of unannotated corpora.</Paragraph>
      <Paragraph position="28"> |v,c), v and c are features for a zero pronoun, and n</Paragraph>
      <Paragraph position="30"> is a feature for an antecedent.</Paragraph>
      <Paragraph position="31"> However, we can regard v, c, and n</Paragraph>
      <Paragraph position="33"> as features for a verb and its case noun because zero pronouns are omitted case nouns. Thus, it is possible to estimate the probability based on co-occurrences of verbs and their case nouns, which can be extracted automatically from large-scale unannotated corpora.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Computing Certainty Score
</SectionTitle>
      <Paragraph position="0"> Since zero pronoun analysis is not a stand-alone application, our system is used as a module in other NLP applications, such as machine translation. In those applications, it is desirable that erroneous anaphoric relations are not generated.</Paragraph>
      <Paragraph position="1"> Thus, we propose a notion of certainty to output only zero pronouns that are detected and resolved with a high certainty score.</Paragraph>
      <Paragraph position="2"> We formalize the certainty score, C(ph</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Methodology
</SectionTitle>
      <Paragraph position="0"> To investigate the performance of our system, we used Kyotodaigaku Text Corpus version 2.0 (Kurohashi and Nagao, 1998a), in which 20,000 articles in Mainichi Shimbun newspaper articles in 1995 were analyzed by JUMAN and KNP (i.e., the morph/syntax analyzers used in our system) and revised manually. From this corpus, we randomly selected 30 general articles (e.g., politics and sports) and manually annotated those articles with anaphoric relations for zero pronouns. The number of zero pronouns contained in those articles was 449.</Paragraph>
      <Paragraph position="1"> We used a leave-one-out cross-validation evaluation method: we conducted 30 trials in each of which one article was used as a test input and the remaining 29 articles were used for producing a syntactic model. We used six years worth of Mainichi Shimbun newspaper articles (Mainichi Shimbunsha, 1994-1999) to produce a semantic model based on co-occurrences of verbs and their case nouns.</Paragraph>
      <Paragraph position="2"> To extract verbs and their case noun pairs from newspaper articles, we performed a morphological analysis by JUMAN and extracted dependency relations using a relatively simple rule: we assumed that each noun modifies the verb of highest proximity. As a result, we obtained 12 million co-occurrences associated with 6,194 verb types. Then, we generalized the extracted nouns into semantic classes in the Japanese Bunruigoihyou thesaurus. In the case where a noun was associated with multiple classes, the noun was assigned to all possible classes. In the case where a noun was not listed in the thesaurus, the noun itself was regarded as a single semantic class.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="25" type="sub_section">
      <SectionTitle>
3.2 Comparative Experiments
</SectionTitle>
      <Paragraph position="0"> Fundamentally, our evaluation is two-fold: we evaluated only zero pronoun resolution (antecedent identification) and a combination of detection and resolution. In the former case, we assumed that all the zero pronouns are correctly detected, and investigated the effectiveness of the resolution model, P(a</Paragraph>
      <Paragraph position="2"> |ph). In the latter case, we investigated the effectiveness of the combined model, P(a</Paragraph>
      <Paragraph position="4"> First, we compared the performance of the following different models for zero pronoun res- null As a control (baseline) model, we took approximately two man-months to develop a rule-based model (Rule) through an analysis on ten articles in Kyotodaigaku Text Corpus. This model uses rules typically used in existing rule-based methods: 1) post-positional particles that follow antecedent candidates, 2) proximity between zero pronouns and antecedent candidates, and 3) conjunctive particles. We did not use semantic properties in the rule-based method because they decreased the system accuracy in a preliminary study.</Paragraph>
      <Paragraph position="5">  Table 1 shows the results, where we regarded the k-best antecedent candidates as the final output and compared results for different values of k. In the case where the correct answer was included in the k-best candidates, we judged it correct. In addition, &amp;quot;Accuracy&amp;quot; is the ratio between the number of zero pronouns whose antecedents were correctly identified and the number of zero pronouns correctly detected by the system (404 for all the models). Bold figures denote the highest performance for each value of k across different models. Here, the average number of antecedent candidates per zero pronoun was 27 regardless of the model, and thus the accuracy was 3.7% in the case where the system randomly selected antecedents.</Paragraph>
      <Paragraph position="6"> Looking at the results for two different semantic models, Sem2 outperformed Sem1, which indicates that the use of co-occurrences of verbs and their case nouns was effective to identify antecedents and avoid the data sparseness problem in producing a semantic model.</Paragraph>
      <Paragraph position="7"> The syntactic model, Syn, outperformed the two semantic models independently, and therefore the syntactic features used in our model were more effective than the semantic features to identify antecedents. When both syntactic and semantic models were used in Both2, the accuracy was further improved. While the rule-based method, Rule, achieved a relatively high accuracy, our complete model, Both2, outperformed Rule irrespective of the value of k.To sum up, we conclude that both syntactic and semantic models were effective to identify appropriate anaphoric relations.</Paragraph>
      <Paragraph position="8"> At the same time, since our method requires annotated corpora, the relation between the corpus size and accuracy is crucial. Thus, we performed two additional experiments associated with Both2.</Paragraph>
      <Paragraph position="9"> In the first experiment, we varied the number of annotated articles used to produce a syntactic model, where a semantic model was produced  and accuracy for a combination of syntactic and semantic models (Both2).</Paragraph>
      <Paragraph position="10"> based on six years worth of newspaper articles. In the second experiment, we varied the number of unannotated articles used to produce a semantic model, where a syntactic model was produced based on 29 annotated articles. In Figure 2, we show two independent results as space is limited: the dashed and solid graphs correspond to the results of the first and second experiments, respectively. Given all the articles for modeling, the resultant accuracy for each experiment was 50.7%, which corresponds to that for Both2 with k = 1 in Table 1.</Paragraph>
      <Paragraph position="11"> In the case where the number of articles was varied in producing a syntactic model, the accuracy improved rapidly in the first five articles. This indicates that a high accuracy can be obtained by a relatively small number of supervised articles. In the case where the amount of unannotated corpora was varied in producing a semantic model, the accuracy marginally improved as the corpus size increases. However, note that we do not need human supervision to produce a semantic model.</Paragraph>
      <Paragraph position="12"> Finally, we evaluated the effectiveness of the  accuracy for zero pronoun detection (Both2).</Paragraph>
      <Paragraph position="13">  accuracy for antecedent identification (Both2). combination of zero pronoun detection and resolution in Equation (1). To investigate the contribution of the detection model, P</Paragraph>
      <Paragraph position="15"> ) for comparison. Both cases used Both2 to compute the probability for zero pronoun resolution. We varied a threshold for the certainty score to plot coverage-accuracy graphs for zero pronoun detection (Figure 3) and antecedent identification (Figure 4).</Paragraph>
      <Paragraph position="16"> In Figure 3, &amp;quot;coverage&amp;quot; is the ratio between the number of zero pronouns correctly detected by the system and the total number of zero pronouns in input texts, and &amp;quot;accuracy&amp;quot; is the ratio between the number of zero pronouns correctly detected and the total number of zero pronouns detected by the system. Note that since our system failed to detect a number of zero pronouns, the coverage could not be 100%.</Paragraph>
      <Paragraph position="17"> Figure 3 shows that as the coverage decreases, the accuracy improved irrespective of the model used. When compared with the case of P(a</Paragraph>
      <Paragraph position="19"> (c|v), achieved a higher accuracy regardless of the coverage. In Figure 4, &amp;quot;coverage&amp;quot; is the ratio between the number of zero pronouns whose antecedents were generated and the number of zero pronouns correctly detected by the system. The accuracy was improved by decreasing the coverage, and our model marginally improved the accuracy for P(a i |ph).</Paragraph>
      <Paragraph position="20"> According to those above results, our model was effective to improve the accuracy for zero pronoun detection and did not have side effect on the antecedent identification process. As a result, the overall accuracy of zero pronoun detection and resolution was improved.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML