File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/j97-1007_abstr.xml
Size: 5,989 bytes
Last Modified: 2025-10-06 13:48:51
<?xml version="1.0" standalone="yes"?> <Paper uid="J97-1007"> <Title>An Empirical Study on the Generation of Anaphora in Chinese Ching-Long Yeh* Tatung Institute of Technology</Title> <Section position="2" start_page="0" end_page="170" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> The field of natural language generation has made a great deal of progress in the generation of multisentential text in recent years (McKeown 1985; Maybury 1990; Dale 1992; Hovy 1993). Most of the well-known systems first select and organize the message contents to be generated and then map the organized results into a sequence of surface sentences. When mapping into the surface form, the selection of appropriate forms for anaphora is very important to make the generated text a cohesive unit (McDonald 1980; Dale 1992). In this paper, our goal is the computer generation of anaphora in Chinese.</Paragraph> <Paragraph position="1"> In Chinese, anaphora can be classified as zero, pronominal, and nominal forms, as exemplified in (1) by ~, ta i 'he' and nage ren i 'that person', respectively (Chen 1987). 1 Zero anaphora are generally noun phrases that are understood from the context the superscript b is the index of the referent. A single ~ without any script represents an intrasentential zero anaphor. Also note that a superscript attached to an NP is used to represent the index of the referent.</Paragraph> <Paragraph position="2"> C) 1997 Association for Computational Linguistics Computational Linguistics Volume 23, Number 1 anaphora to denote those that are specified in discourse, namely, pronominal and nominal anaphora.</Paragraph> <Paragraph position="3"> (1) a. Zhangsan i jinghuang de wang wai pao, Zhangsan frightened NOM towards outside run 'Zhangsan was frightened and ran outside.' b. ~ zhuangdao yige renJ, (he) bump-to a person '(He) bumped into a person.' c. ta i kanqing lena ren J de zhangxiang, he see-clear ASPECT that person GEN appearance 'He saw clearly that person's appearance.' d. oi 2 renchu na renJ shi shui.</Paragraph> <Paragraph position="4"> (he) recognize that person is who '(He) recognized who that person is.' This research starts with establishing possible rules for the generation of anaphora in Chinese. Previous work suggests obtaining these rules from consulting the results of linguistic study, including general principles, such as the Gricean maxims (Grice 1975) used in (Dale and Haddock 1991; Reiter and Dale 1992; Dale 1992) and focus theory, as used in (Dale 1992). A shortcoming of previous work is that it is unclear to what extent the resulting rules are effective in dealing with the generation of anaphora. In an attempt to overcome this, we adopt an empirical approach to obtaining rules based on observations of real texts.</Paragraph> <Paragraph position="5"> The basic methodology used is to start with a set of human-generated Chinese texts and the simplest possible anaphor generation rule (a rule that only considers the locality of anaphora). We then progressively add extra tests to the rule, based on independently motivated but simple linguistic principles. At each stage, we conduct experiments that compare the anaphora occurring in the human-generated text with those in the texts that would be generated by a computer taking the same syntactic and semantic content as the human texts and generating Chinese anaphora according to the rule being tested (this has to be simulated by hand). This process continues until a rule with promising performance on the data is obtained. The objective is thus to answer the question of how complex a rule must be to account for the complexity of anaphor generation exhibited by the test data.</Paragraph> <Paragraph position="6"> This paper presents one sequence of rules developed using the above methodology and evaluates the effectiveness of the new linguistic principles taken into account at each point. At present, we have chosen only one intuitively plausible way to generate increasingly complex rules, with refinements introduced as they occurred to us (though not motivated by the data). Clearly the work could and should be extended to consider all possible combinations of the principles in all possible orders.</Paragraph> <Paragraph position="7"> Except where noted below, the preselected Chinese data serves as an independent test of the effectiveness of the different rules, which are based on principles that have been independently suggested in the literature. However, the fact that the chosen data determine the termination condition for the development means that the rules could be overfitting the chosen data. Therefore a selection of the rules have been implemented in a Chinese natural language generation system and their results are further evaluated by means of an experiment using native speakers.</Paragraph> <Paragraph position="8"> This paper concentrates on the use of zero, pronominal, and nominal anaphora in Chinese generated text. We are not concerned with lexical anaphora (Tutin and Kittredge 1992) where the anaphor and its antecedent share meaning components, Decision tree, classification tree, and result for Rule 1.</Paragraph> <Paragraph position="9"> while the anaphor belongs to an open lexical class. For example, flower can be used as a lexical anaphor for rose (Tutin and Kittredge 1992).</Paragraph> <Paragraph position="10"> In Sections 2 to 3.3, we establish the rules for the generation of anaphora in Chinese. We consider the case of zero anaphora (Section 2) first, followed by nonzero anaphora (Section 3), which divides into pronouns (Section 3.1) and nominal anaphora (Sections 3.2 and 3.3). Next, in Section 4, we describe the implementation of the generation rules in our Chinese generation system and show the result of evaluating the anaphora in the text generated by systems employing different rules. Finally, Section 5 presents the conclusions.</Paragraph> </Section> class="xml-element"></Paper>