File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2147_metho.xml

Size: 15,160 bytes

Last Modified: 2025-10-06 14:14:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2147">
  <Title>Zero Pronoun Resolution in Japanese Discourse Based on Centering Theory</Title>
  <Section position="4" start_page="0" end_page="871" type="metho">
    <SectionTitle>
2 Two Versions of the Centering
</SectionTitle>
    <Paragraph position="0"> Theory In tim (:(',ntx~ring th(:ory, (,a(:h SCII.I;CIIC( ~. \]l~toS l;W() StI'IICI;IlI'(Lq associated with it: a set of discom's(: (:nti|;i(:s called fi)rward-looking centers, Cfs, ilia\]; at)pear in dm sentenc(:, and a Sl)(:(:ial membt:r of Cfs (:alh~d tlm l/a(;kw~r(t-h)(/king ccitt(:(, Ub. Th(' (/I, is Lit(: dis(:ours(', cntii,y Lh;tt l;lm st:Ill;el((;(: Iltost c(:nt;rally concerns. A (7f may \[mconm a. Ct, later  in the discourse. The set of Cfs is ordered by their grammatical properties which are considered to reflect their degrees of salience. The centering theory specifies the following (heuristic) rule: If the Cb of the current sentence is the same as the Cb of the previous sentence, a (zero) pronoun should be used.</Paragraph>
    <Paragraph position="1"> There are two versions of the centering theory that have been applied to Japanese zero pronoun resolution: Kameyama's(Kameyama, 1986) and Walker's(Walker et al., 1994). Roughly both versions use the following same forward center ranking for Japanese:</Paragraph>
    <Paragraph position="3"> where Empathy is a grammatical property that indicates the speaker's position in describing a situation. In addition to the above rule, Kameyama's version uses the property sharing constraint that two zero pronouns in adjacent sentences, which co-specify the same Cb, should share one of the grammatical properties. This constraint is used for ranking discourse entities in the order of preference as the antecedent of a zero pronoun.</Paragraph>
    <Paragraph position="4"> Walker's version, on the other hand, uses the following additional rules and constraint:</Paragraph>
  </Section>
  <Section position="5" start_page="871" end_page="871" type="metho">
    <SectionTitle>
* Constraint
</SectionTitle>
    <Paragraph position="0"> For each sentence Ui: The center, Cb(Ui), is the highest-ranked element of C I (\[7/-1) that appears in Ui.</Paragraph>
  </Section>
  <Section position="6" start_page="871" end_page="871" type="metho">
    <SectionTitle>
* Rules
</SectionTitle>
    <Paragraph position="0"> For each sentence Ui:  1. If a certain element of C I (Ui-1) appears  as a (zero) pronoun in Ui, then so is Cb(Ud.</Paragraph>
    <Paragraph position="1"> 2. Transition states are ordered, where the transition state is determined based on two factors: whether Cb of the current sentence is the same as of the previous sentence, and whether Cb is the same as the highest-ranked member of C I of the current sentence. This transition ordering is used for ranking discourse entities in the order of preference as the antecedent of a zero pronoun.</Paragraph>
    <Paragraph position="2"> Basically, when the centering algorithm is used for the (zero) pronoun resolution, the algorithm first generates all possible antecedents for (zero) pronouns in a sentence by enumerating all possible Ct, and C\] pairs for the sentence, and then filters and ranks these possible antecedents with the constraint and rules that are mentioned above. The Cb of the sentence is computed as the side effect of performing the (zero) pronoun resolution.</Paragraph>
  </Section>
  <Section position="7" start_page="871" end_page="873" type="metho">
    <SectionTitle>
3 Processing Complex Sentences
</SectionTitle>
    <Paragraph position="0"> with the Centering Theory In the centering theory that we outlined in the last section, 'sentence', that is its basic unit of processing, means the simple sentence that contains only one predicate(verb). The centering theory, therefore, has not adequately addressed the way to handle complex sentences that contain nmltiplc verbs. However, it is necessary to handle complex sentences that are prevalent in naturally occurring discourses with the centering algorithms.</Paragraph>
    <Paragraph position="1"> We can think of (at least) two ways to handle complex sentences. For instance, consider processing a complex sentence of the form 'SX Conj SY,' where SX and SY each consists of a simple sentence and Conj is a conjunctive element(Suri and McCoy, 1994) 1. One can imagine processing SX first and then SY as if they are a linear sequence of simple sentences and applying the centering theory to each sentence successively and updating the data structures for centering.</Paragraph>
    <Paragraph position="2"> On the other hand, the whole sentence can be treated as a single unit. This approach, however, has two problems. First, the intrasentential ellipsis that the antecedent exists in the same sentence 9 cannot be handled with the centering theory, because the centering theory only handles the intersentential ellipsis. Therefore, the intrasentential ellipsis must be dealt with separately from the intersentential ellipsis. Secondly, in the centering theory, it is unclear whether two zero pronouns with the same grammatical property in the different simple sentences (of a complex sentence) can be simultaneously handled without any extension to the original theory.</Paragraph>
    <Paragraph position="3"> Comparing these two approaches, we adopt the former. And if a sentence contains multiple verbs, we partition it into multiple simple sentences and apply the centering theory to a sequence of p~rtitioned simple sentences individually for the zero pronoun resolution. Using this approach, we need not modify the original centering algorithm drastically to handle complex senteimes. Even the intrasentential ellipsis can be handled with the centering theory, because different simple sentences contain the antecedent and the zero pronoun respectively, after partitioning.</Paragraph>
    <Paragraph position="4"> a.1 The range of search for the antecedent Since the centering theory uses only the infornmtion in the previous and current sentences, this might be problematic when we adopt the 'partition' approach. \]ebr example, if the previous sentence consists of three simple sentences,  tile firs(; simple sentence in tile previous seiltence becomes tile third from the, current sentence, after partitioning. Partitioning might cause that, the information in the previous and current (post-partitioned simple) sentences does not inchide even tile information in the current (prepartitioned) sentence. We tilink it is inadequate, since the antecedents of zero pronouns often appear in the previous (pre-partitioned) sentence.</Paragraph>
    <Paragraph position="5"> Therefore, it; is necessary to extend the range of search for (;he antecedent to more previous (postpartitione.d simple) sentences.</Paragraph>
    <Paragraph position="6"> To determine to what extent we should extend the range of search for the antecedent, we make the following investigations and e.xperiment: * How many simple sentences does a naturally occurring sentence consist of? * How many sentences from the current seiltence do we find tim antecedent of a zero I)ronoun in real discourses? * How does tile accuracy of tile zero pronoun resolution change if we vary the range of simple sentences where the antecedent of a zero pronoun is searched? The first investigation is l)ertormed manually, and the result shows that 10,000 sentences of the review articles from the newspaper consist of 24,332 simple sentences. Therefore, a naturally occurring Japanese sentence can be considered to consist of 2.0 2.5 simple sentences on average.</Paragraph>
    <Paragraph position="7"> The second investigation is performed manually on one of tile test discourses that are inentioned in the next section, and the result shows that 95% of the antecedents appear in the previous or current (pre-partitioned) sentence, This result is consistent with the larger-scale investigation that l%ljisawa et al.(l%jisawa et al., :1991) made. for tile same purpose. Fnjisawa's investigation, on 1,087 sentences of the scientific journal and 1,426 sentences of the review articles from the newspaper, showed that 87.6% of the antecedents appeared in the previous or current sentence and 95.1% appeared ill tile previous two sentences or current sentence. The third experiment is performed on two of the test discourses in the next section, by implementing two versions of the centering algorithms that are mentioned in the hast section anti wn'ying tim range of simple sentences where the.</Paragraph>
    <Paragraph position="8"> antecedent of a zero pronoun is searched from tile previous sentence to the previous ten sentences.</Paragraph>
    <Paragraph position="9"> Tile experiment shows that the accnraey improves until the previous 2 4 sentences are searched, but degrades after that.</Paragraph>
    <Paragraph position="10"> Totally taking into account these results, we determine that tile antecedents are searched in the previous four simple sentences. Since the antecedent tends to appear in the closer sentence to ~he zero t)ronoun, as 141jisawa's investigation indicates, we deternfine the following forward center ranking among the Cfs of tile previous four simple Selltences: c} &gt; &gt; &gt; c}, where C} ~ represents the C I of the n-th silnple sentence froln the current sentence.</Paragraph>
    <Section position="1" start_page="872" end_page="873" type="sub_section">
      <SectionTitle>
3.2 Taking into account the information
</SectionTitle>
      <Paragraph position="0"> of emdunetive postpositions Even if the antecedents are searched in the previous tour simple sentences, simple ~partition' approach might not yield good performance, because tile information of conjunctive postpositions that are between two adjacent simple sentences is not taken into account. For exmnple, consider the following sentences:  (a) Taro wa issyoukenmei benkyou siteita.</Paragraph>
      <Paragraph position="1"> (b) Jiro ga koe wo kake temo,  kizukanakatta.</Paragraph>
      <Paragraph position="2"> These sentences are partitioned into the folh)wing simple, sentences:  he did not notice ilim.</Paragraph>
      <Paragraph position="3"> Here (/5 represents a zero pronoun. Applying the. centering algorithm to timse sentences, tile process becomes as follows:</Paragraph>
      <Paragraph position="5"> Therefore, the counter-intuitive interpretation that 'Jiro did not notice Taro' is obtained.</Paragraph>
      <Paragraph position="6"> Since two adjacent simple sentences in a complex sentence are combined together by the conjunctive postposition that indicates the relationship between theln, using the intorination of the conjunctive postposition might ilnprove tim pertormanee of the zero pronoun resolution.</Paragraph>
      <Paragraph position="7"> To clarify how tile zero pronoun resolution relies on the intormation of conjunctive i)ostpositions, we pertorin the investigation whether tile noun phrases with the same grammatical property agree in two adjacent simple sentences that have a conjunctive postposition tmtween timm, by extracting sentences with conjunctive postpositions Dora the revie, w articles in the newspaper and enmnerating the agreement and disagreement. The enulneralion is performed in cases where both sentences haw; zero pronouns and only e.ither sentence has aThe first sentence in a discourse has no Ca.</Paragraph>
      <Paragraph position="8">  a zero pronoun. Twelve main conjunctive postpositions are investigated. The result of the investigation is quite similar to the Yoshimoto's and Minatal's investigations(Yoshimoto, 1986; Minaret, 1974) that classify the conjunctive postpositions into three classes: ,, Class A: 'nagara' ('while'), 'tart' ('and'), 'tutu' ('while'), 'te' ('and') 4 If two sentences have a conjunctive post;position of class A between then:, the subject noun phrases tend to coincide, in both cases where both sentences have zero pronouns and only either sentence has a zero pronoun.</Paragraph>
      <Paragraph position="9"> * Class B: 'temo' ('although'), 'node' ('because'), 'non)' ('although'), 'keredo' ('although'), 'ba' ('if'), 'kara' ('because'), 'to' ('when') If two sentences have a conjunctive postposition of class B between them, the antecedent tends to be not the subject of the other sentence, in case where only either sentence has the zero pronoun of the subject position. In case where both sentences have zero pronouns, tile agreement/disagreelnent depends on the context and doe.s not have any tendency. null * Class C: 'ga' ('but') Tile agreement/disagreement depends on tile context and does not have any tendency.</Paragraph>
      <Paragraph position="10"> LFrom this result of the investigation, we determine to apply to the zero pronoun resolution the following heuristics that are concerned with conjunctive postpositions. Since conjunctive postpositions of class A have a strong preference that two subjects in adjacent sentences tend to coincide, instead of the centering alger)thin, we use this preference tbr tile zero pronoun resolution in the simple sentence after the conjuimtive postpositions of class A, and try to find the antecedents of zero pronouns in the same position of the adjacent sentence, if any. In this case also, the center of the current sentence is computed similarly to the ordinary algorithm, and the antecedent of the zero pronoun becomes the Cb of tile current sentence. null In case of conjunctive postpositions of class B, the antecedent tends to lie not the subject of the other sentence if one of tile sentences has the zero pronoun of the subject position. We think this tendency in)plies that noun phrases in the sentence before the conjunctive t)ostt)ositions of class B tend to be not the antecedents of zero pronouns in the next sentences. Therefore, we give these noun phrases the least t)reference as the antecedents, although the. zero pronoun resolution is perforlned by the original centering algorithm.</Paragraph>
      <Paragraph position="11"> 4In parentheses, we show the direct translation of conjunct;ive postpositions into English.</Paragraph>
      <Paragraph position="12"> Since conjunctive postpositions of class C have no preference for the antecedents of zero pronouns, the zero pronoun resolution is performed as usual. Consider again the following sentences:  he did not notice him.</Paragraph>
      <Paragraph position="13"> If the original centering algorithm is applied to each sentence uniformly, the counter-intuitive interpretation is obtained, as mentioned above.</Paragraph>
      <Paragraph position="14"> Taking into account the ilflbrmation of conjunctive postpositions and applying the above heuristics to the points, since (bl) and (l/2) have the conjunctive postposition of class B, 'temo' (%lthough'), between them, the noun phrases it: sentence (bl) have the least preference and the order of C~, ix) the sentence (bl) becomes tile opposite to the case of the original centering algorithm.</Paragraph>
      <Paragraph position="15"> Therefore, the antecedents of tile zero pronouns in sentence (b2) are identified as follows:</Paragraph>
      <Paragraph position="17"> Here this interpretation that 'Taro (lid not notice Jiro' fits our intuition.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML