File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/w00-0206_concl.xml
Size: 7,676 bytes
Last Modified: 2025-10-06 13:52:49
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0206"> <Title>An Application of the Interlingua System ISS for Spanish-English Pronominal Anaphora Generation,</Title> <Section position="7" start_page="48" end_page="50" type="concl"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> The syntactic generation and morphological generation modules of our approach have been evaluated. To do so, one experiment for each module has been accomplished. In the first one, the generation of Spanish zero-pronouns into English, using the techniques described above in subsection 4.1.2, has been evaluated 6. In the second one, the generation of English pronouns into Spanish ones has been evaluated.</Paragraph> <Paragraph position="1"> In this experiment number and gender discrepancies and their resolution, described above in section 4.2, have been taken into account.</Paragraph> <Paragraph position="2"> With reference to the first experiment, our computational system has been trained with a 6 Syntactic discrepancies has not been evaluated due to the aim of this work is only the pronominal anaphora generation into the target language, so the evaluation of the generation of the whole sentence into the target language has been omitted.</Paragraph> <Paragraph position="3"> handmade corpus 7 that contains 106 zeropronouns. With this training, we have extracted the degree of importance of the preferences that are used in the anaphora resolution module of the system. Furthermore, we have been able to check and correct the techniques used in the detection and generation of zero-pronouns into English. After that, we have carried out a blind evaluation on unrestricted texts using the set of preferences and the generation techniques learned during the training phase. In this case, partial parsing of the text with no semantic information has been used.</Paragraph> <Paragraph position="4"> With regard to unrestricted texts, our system has been run on two different Spanish corpora: a) a fragment of the Spanish version of The Blue Book corpus (15,571 words), which contains the handbook of the International Telecommunications Union CCITT, and b) a fragment of the Lexesp corpus (9,746 words), which contains ten Spanish texts from different genres and authors. These texts are taken mainly from newspapers. These corpora have been POS-tagged. Having worked with different genres and disparate authors, we feel that the applicability of our proposal to other sorts of texts is assured.</Paragraph> <Paragraph position="5"> To evaluate the generation of Spanish zero-pronouns into English three tasks have been accomplished: a) the evaluation of the detection of zero-pronouns, b) the evaluation of anaphora resolution and c) the evaluation of generation. a) Evaluating the detection of zero-pronouns.</Paragraph> <Paragraph position="6"> To do this, verbs have been classified into two categories: 1) verbs whose subjects have been omitted, and 2) verbs whose subjects have not. We have obtained a success rate s of 88% on 1,599 classified verbs, with no significant differences seen between the corpora. We should also remark that a success rate of 98% has been obtained in the detection of verbs whose subjects were omitted, whereas only 80% was achieved for verbs whose subjects were not. This lower success rate is This corpus contains sentences with zero-pronouns made by different researchers of our Research Group. g By &quot;success rate&quot;, we mean the number of verbs successfully classified, divided by the total number of verbs in the text.</Paragraph> <Paragraph position="7"> justified for several reasons. One important reason is the non-detection of impersonal verbs by the POS tagger. Two other reasons are. the lack of semantic information and the inaccuracy of the grammar used. It is important to note that 46% of the verbs in these corpora have their subjects omitted. It shows quite clearly the importance of this phenomenon in Spanish.</Paragraph> <Paragraph position="8"> b) Evaluating anaphora resolution. In this task, the evaluation of zero-pronoun resolution is accomplished. Of the 1,599 verbs classified in these two corpora, 734 of them have zero-pronouns. Only 228 of them 9, however, are in third person and will be anaphorically resolved. A success rate of 75% was attained for the 228 zeropronouns. By &quot;successful resolutions&quot; we mean that the solutions offered by our system agree with the solutions offered by two human experts.</Paragraph> <Paragraph position="9"> c) Evaluating zero-pronoun generation. The generation of the 228 Spanish zero-pronouns into English has been evaluated. The following results in the generation have been obtained: a success rate of 70% in Lexesp and a success rate of 899'o in The Blue Book. In general (both corpora) a success rate of 75% has been achieved. The errors are mainly produced by fails in anaphora resolution and fails in the generation of pronouns he/she/it (some heuristics 10, which have failed sometimes, have been applied due to the used corpora do not include semantic information).</Paragraph> <Paragraph position="10"> In the second experiment, we have evaluated the generation of Spanish personal pronouns with subject role into the English ones. A fragment of the English version of The Blue Book corpus (70,319 words) containing 165 9 The remaining pronouns are not in third person or they are cataphoric (the antecedent appears after the anaphor) or exophoric (the antecedent does not appear, linguistically, in the text).</Paragraph> <Paragraph position="11"> J0 For instance: &quot;all the pronouns in third person and singular whose antecedents are proper nouns have boon translated into he (antecedent with masculine gender) or she (antecedent with feminine gender); otherwise they have been translated into it&quot;. pronouns with subject role has been used in order to carry out a blind evaluation. A success rate of 85.41% has been achieved. The errors are mainly produced by fails in anaphora resolution and in the correct choice of the gender of the antecedent's Head in Spanish. With reference to the choice of the gender of the antecedent's Head, an electronic dictionary has been used in order to translate the original English word into the Spanish one, and subsequently, the gender is extracted from the Spanish word. Several problems have occurred when using this electronic dictionary: 1) ' the word to be translated does not appear in the dic-tionary, and therefore, a heuristic is applied to assign the gender 2) the correct sense of the English word is not chosen, and therefore, the gender could be assigned incorrectly.</Paragraph> <Paragraph position="12"> Conclusion In this paper a complete approach to solve and generate pronominal anaphora in the Spanish and English languages is presented. The approach works on unrestricted texts to which partial parsing techniques have been applied.</Paragraph> <Paragraph position="13"> After the parsing and solving pronominal anaphora, an interlingua representation (based on semantic roles and features) of the whole text is obtained. The representation of the whole text is one of the main advantages of our system due to several problems, that are hardly solved by the majority of MT systems, can be treated and solved. These problems are the generation of intersentential anaphora, the detection of coreference chains and the generation of Spanish zero-pronouns into English. Generation of zero-pronouns and Spanish personal pronouns has been evaluated obtaining a success rate of 75% and 85.41% respectively.</Paragraph> </Section> class="xml-element"></Paper>