File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/c94-2117_evalu.xml

Size: 10,074 bytes

Last Modified: 2025-10-06 14:00:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2117">
  <Title>ON THE AN EMPIRICAL STUDY GENERATION OF ZERO ANAPHORS IN CHINESE</Title>
  <Section position="5" start_page="732" end_page="734" type="evalu">
    <SectionTitle>
3 Results
</SectionTitle>
    <Paragraph position="0"> llnving clone this, we carried oat similar experiments with enhanced rules.</Paragraph>
    <Section position="1" start_page="732" end_page="732" type="sub_section">
      <SectionTitle>
3.1 Etfeet of using l~ule 1 and adding
</SectionTitle>
      <Paragraph position="0"> syntactic constraints In Sets 1 ;rod 2 of tire testing data, there are 651 and  149 anaphors, respectively, liy using the algorithln of Rule t on the data, the result is shown iu TaMe 1. In tire data, 7 and 1 long distaace zero anaphors  occur but the algorithm decides to use non-zero oues for the corresponding positions. ()onsequently, they belong to the missing tyl&gt;e. Frnm the result shown in Table 1, the performance of the algorithm is olwiously unpromising.</Paragraph>
      <Paragraph position="1"> There are certain syntactic coustrailH;s oil zero anaphorn, regardless of discourse factors, as shown in \[Li and Thoml)SOn 79, Li and '\['hompson 81\],. Therefore, we enhanced R,uh: 1 by adding the above syntac tic constraints on zero annphora, which I)ecomes l(,ulc In as below. Rule la can be alternntively I&gt;e represented as a decision tree iu Fig. 1, where internal nodes are conditions in the rule and leaf nodes are decisions about the anaphor type, either zero or non-zero.</Paragraph>
      <Paragraph position="2"> It,de la: If an entity, e, in the eurr(~+lt tLt,terence was re\['erred to in the imnm(/iately prece(ling utt,er;mce mid does not violate any syl+taetic eonsi, raint on ze.ro anaphora, then ~t zero anaphor is used for c; otherwise a non-zero n+lal)hor is tlsed.</Paragraph>
      <Paragraph position="3"> in 'l';d)le 1, by using lhdc l a, the correct cases in. crease from 408 to 510 and 98 to 126 for Sets I and 2, respectively. Though Rule la improves its ancestor's performance, the result, howew;r, still discourages us from using it fbr tile gen(;ration of zero anaphors.</Paragraph>
    </Section>
    <Section position="2" start_page="732" end_page="733" type="sub_section">
      <SectionTitle>
3.2 The elfeet of adding discourse st;rltet;uro
</SectionTitle>
      <Paragraph position="0"> (-~r()SZ all({ Sidner stlggest, that three sl.r+lcttlr(?s earl be identilied within a discern'st: liTq/uistic slr'u(lure, inleuIional .slruclttrc, mid allenlio~al stale \[Grosz and Sidner 8@ An important idea in lhe thc-ory is the mutual elf'ect between the linguistic exprcs</Paragraph>
      <Paragraph position="2"> l{ules 1 and 12.</Paragraph>
      <Paragraph position="3"> Set Alg. ~~1~ lVlis21 st m 40sA a V-FI Rla 126 \[ 22 l II l~O _yL I~  sions in utterances constituting the discourse and t, he discourse segment strllct+lre. Wh\[tt eol/cel'llS +IS hel'e is the. interrelationship between the forms of referring ex-. pressions and the discom:se segment structures. \]n NL generation systems, the semantic struetl|res oF llleSsages to be produced are usually organized according to hierarchical inteutional structures; the.n, based on the structures, referring expressions are. decided \[llovy 90, l)Me 92\]. l/ence, in this subsection, we employ the idea or (lisCOllrse structure to improw~ our algorithm lbr the generation of zero anaphors.</Paragraph>
      <Paragraph position="4"> In their study \[Li and Thompson 7!/\], IA and 'rhompson propose that &amp;quot;the degree of preference for the occurrence of 1)rononfinal nnaphora in a clause in versely correspon(Is to the degrc'e of connection with the preceding clause.&amp;quot; They listed the tollowing c(mditions of decreasing of commction: switching from background to lk)regrotmd information, or vice versa, between two clauses, the second clause headed by a,n adverhial expression and two clauses spokel~ by two dilD.rent participants.</Paragraph>
      <Paragraph position="5"> In gel,eral, a zero allapl|or |lsed to l'et'el&amp;quot; to SOil+(': enl,ity in the previous utterance might he i;xpccted to indicate the contimlation of ~ discourse segment, while a lion-zero nnnphor occurring in the same situation signals n boundary of discourse segment. From the. gem~r-. ator's perspe.ctive, when the decision of the anaphoric form tbr a phrase referring to some entity in tile pro-. vious utterance is t,o be made., the factor of discourse segment boundary l|n|st be taken inl,o consideration.</Paragraph>
      <Paragraph position="6"> Therefore, based on this idea, we improve the previous ruh'.s for generation of zero anaphors, ll,ules land l a, to make the following rule. The (lecision h'ce for /IJlle  2 is shown in Fig. 2.</Paragraph>
      <Paragraph position="7"> \]lade 2: If an el,tity, c, in the. current tli, l, eraacc, u, was referred to in the immediately preceding utterance and does not violate  any syntactic constraints on zero anaphora, then if u is not the beginning of a discourse segment, then a zero anaphor is used for c; otherwise, a non-zero anaphor is used.</Paragraph>
      <Paragraph position="8"> To perform the experiments for the new rules, wc have to access the discourse segment structures of the testing data. Therefore, we annotated the boundaries between discourse segments in the testing data and the hierarchical discourse structures according to the discourse segment intentions. We farther carried out a test by comparing our annotations with other native speakers of Chinese. In the test, four native speakers of Chinese were asked to do the same tasks we have done for five articles selected from the testing data. Comparing with the speakers' results, on average 76% of the speakers' annotations coincide with ours. According to the ahove comparison the annotations we made were reliable for the purpose of the experiment. We then performed the experiment by employing the algorithm of Rule 2. As shown in Table 2, for the. Set 1 data, 49 and 12 zero anaphors were over- aIM under-generated by the algorithm, respectively. For the other set of testing data, Rule 2 achieves an even better result.</Paragraph>
      <Paragraph position="10"/>
    </Section>
    <Section position="3" start_page="733" end_page="734" type="sub_section">
      <SectionTitle>
3.3 The effect of topic
</SectionTitle>
      <Paragraph position="0"> In this snbsection, we use the feature of topic in Chinese to further refine the i)revious rifles. The basic idea here is to investigate the positions of antecedent and anaphor in their respective utterances. In the following, we divided the position of anapbors in their respective utterances into topic and non-topic. For each anaphor, its antecedent's position is one of the following categories: topic, direct object or the NP following a presentative verb and others. We thus classify the following types, A to F, of antecedent-anaphor pairs: the antecedents of Types A and C are ill topic position, of B and D are in direct object position or are the NP following a presentative verb, and of E and F are in other positions; the anaphors of Types A, B and E arc in topic position, and of C, D and F are in non-topic positions.</Paragraph>
      <Paragraph position="1"> Since in the new rule conditions on topic and non-topic will only be considered after the conditions in Rnle 2, in investigating the antecedcnt-anaphor pairs, we have to exclude the ones with either their anaphors violating syntactic constraints on zero anaphor or at the beginning of discourse segments. In other words, the new condition will be attached under the Z-node in the decision tree of Fig. 2. In the Set 1 test, lag data, there are 239 such pairs, among which anaphors of 49 pairs are zeroed by the algorithm of Rule 2 but appear in non-zero forms in the text. In other words, the 49 anaphors were over-generated by onr algoritbm, which in our terms belong to the false type; the other 190 cases belong to the correct type. The number of each type of pairs for both correct and over-generated cases in the testing data are shown in Table 3.</Paragraph>
      <Paragraph position="2">  false. 15 14 6 ~ 9 \[ 4 \[ 49~ total 173 41 7 1 I ,~)~ 8 ~ 239_J l%r tbe Set, 1 data in Table 3, the over-generated cases of both Types A and FI, 15 and 14 out of 173 and 41, respectively, are the minorities o\[' tbe respective types, while on the contrary, {,tie number of over-generated c~es of Types C and g are greater tbat their counterparts. Thus, if we let anaphors of Types A and B be zero and Types C and E non-zero, then there will be 29 (15+14) over-generated zero anaphors and 5 (l-t-4) under-generated ones for the Set 1 testing data. The numhers for Types D and F do not conclusively support eitber usir, g zero or non-zero in this case. In Chen's study \[Chen 87\], he fonnd a higher percentage of zero anaphors occurring in the topic position with their antecedent most frequently in the topic or object positions of the immediately previous utterance, which strongly supports the idea of letting anaphors of Types A arm B be zero and others non-zero. We choose to generate non-zero anaphora for Types D and F. ~C/W; thus obtain Rule 3 by adding the. affect of topic into</Paragraph>
      <Paragraph position="4"> segment, then if e is either a Tyl)e A or 13 pair, then a zero anaphor is used \['or e; otherwise, a non-zero allaphor is ilS(!d.</Paragraph>
      <Paragraph position="5"> As a short summary, the numbers of anaphors in the Set 1 testing data satisfying the conditions of 1Lule 3 are shown in Fig. 4, where. Z, N and 1' represent zero, pronominal and nominal anaphors, respectively.</Paragraph>
      <Paragraph position="6"> Indicated in the root node are the total number of all kinds of anaphors in the data. The corr'ect match is calculated by summing up the numbers of non: zero anaphors, pronouns and nominal anaphors, under non-zero leaf nodes and zero anaphors under zero leaf nodes. Non-zero anaphors under zero leaf nodes are the false matches. Conversely, zero anaphors under non-zero leaf nodes arc the missing matches.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML