File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/w96-0412_intro.xml
Size: 4,265 bytes
Last Modified: 2025-10-06 14:06:09
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0412"> <Title>An Evaluation of Anaphor Generation in Chinese</Title> <Section position="4" start_page="111" end_page="112" type="intro"> <SectionTitle> 2 Previous Work and Our Approach </SectionTitle> <Paragraph position="0"> Though the field of natural language generation has progressed towards composing complex texts, the evaluation of natural language generation systems has remained at the discussion stage (May90; MM91). Two broad methods have been identified for evaluating natural language generation systems: glass box and black box evaluation (MM91). The glass box method is concerned with examining the internal working of individual components in a system, while the latter looks at the behaviour of the input and output to the generation systems. The difficulty of the glass box method is the lack of a clear division between components in generation systems. Even if the black box method is adopted, however, it is difficult to determine what is the appropriate input for generation and to be objective in evaluating the output text.</Paragraph> <Paragraph position="1"> In this paper, we aim to investigate the quality of anaphors generated by the referring expression component in our Chinese natural language generation system. The referring expression component lies between the text planner and the linguistic realisation component in the system, as shown in Fig. 1. On accepting an input goal from the user, the system invokes the text planner which uses the operators in the plan library to build up a plan which is a hierarchical discourse structure to satisfy the input goal. After the text planning is finished, the decision of anaphoric forms and descriptions is then made by traversing the plan tree. As is discussed in (YM95; Yeh95), the algorithm of the referring expression component first determines an appropriate form for an anaphor to be generated. null Suppose that the referring expression components we wish to compare all adopt the above basic algorithm. Then the essential character- null the Chinese natural language system.</Paragraph> <Paragraph position="2"> istic to distinguish them from each other becomes the rules used in the components and how these rules are implemented. If all of these referring expression components are embedded in the same Chinese natural language generation system, as in Fig. 1, for example, then, given an input to the system, anaphors in the resulting texts can be characterised by the rules used in the referring expression component and their implementation.</Paragraph> <Paragraph position="3"> By adopting this approach, we need not worry about the problems of either of the evaluation methods stated above, except the objective evaluation of output text. Since there is no machine that can read the generated texts and give an impartial judgement about them, we rely on the opinions of human readers who are native speakers of Chinese to investigate the quality of the generated anaphors. This is an easier task than assessing the quality of whole texts. To compensate for possible bias among the individual readers, we sent the output texts to a group of readers for viewing and took the average of their outcomes as the measurement.</Paragraph> <Paragraph position="4"> In brief, each object system in our evaluation work is thought of as having the same individual components, including control and knowledge bases (which are discussed in full in (Yeh95) but</Paragraph> <Paragraph position="6"> cannot be presented here for reasons of space), except that the anaphor generation rules used in the referring expression components are different to each other. In the existing literature, we cannot find other work on the generation of Chinese referring expressions (or indeed on the full evaluation of anaphor generation for any other language), which means that we have no real working systems to compare with. In practice, we employ our Chinese natural language generation system described in (Yeh95) as the backbone of the evaluation work because it is easy for us to control and maintain. What we have to do for each generation system is simply to insert the corresponding generation rule.</Paragraph> </Section> class="xml-element"></Paper>