XML Viewer - w99-0210

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0210_metho.xml
Size: 21,081 bytes
Last Modified: 2025-10-06 14:15:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0210">
  <Title>Coreference-oriented Interlingual Slot Structure &amp; Machine Translation,</Title>
  <Section position="2" start_page="69" end_page="74" type="metho">
    <SectionTitle>
2 The anaphora resolution module
</SectionTitle>
    <Paragraph position="0"> In this section we will describe the anaphora resolution module of our system. This section consists of two subsections. In the first one we show the algorithm for anaphora resolution. Next, we show the evaluation of the module.</Paragraph>
    <Section position="1" start_page="69" end_page="70" type="sub_section">
      <SectionTitle>
2.1 The algorithm
</SectionTitle>
      <Paragraph position="0"> We are going to describe an algorithm that deals with discourse anaphora in unrestricted texts using partial or full parsing. It is based on the process described in Figure 1. So, this process will be applied after the parsing of a sentence.</Paragraph>
      <Paragraph position="1"> This algorithm is shown in Figure 2 and it can deal with pronominal anaphora, surface-count anaphora and one-anaphora as is shown in Ferrfindez (1998a). This algorithm will use a Slot Structure (SS) corresponding to the output of the parsing module and a list of antecedents. This list consists of the slot structures of all the previously parsed noun phrases. For each anaphor in this SS, several constraints and preferences will be applied. The output of this algorithm consists on a  new SS (SS), where each anaphor has been stord~d with its correct antecedent.</Paragraph>
      <Paragraph position="2"> Parse a sentence. We obtain its slot structure (SS1). For each anaphor in SSI : Select the antecedents of the previous X sentences depending on the kind of anaphor in LO Apply constraints (depending on the kind of anaphor) to LO with a result of Ll :  The detection of the anaphors and possible antecedents is easily carried out by means of the information stored in each SS, i.e. its functor and arity. For example, the antecedents have an SS with np as their functor, whereas the pronouns have pron. We have considered the previous two sentences to search for antecedents of a pronoun. The algorithm will apply a set of constraints (morphosyntactic agreement and c-command constraints) to the list of possible antecedents in order to discount candidates. If there is only one candidate, this one will be the antecedent of the anaphor. Otherwise, if there is still more than one candidate left, a set of preferences (syntactic parallelism, lexical information, reiteration of an antecedent in the text, ...) will be applied. These preferences will sort the list of remaining antecedents, and the first one will be the selected antecedent. These constraints and preferences are described in more detail in Ferrfindez (1998a), Ferrfindez et al. (1998b).</Paragraph>
    </Section>
    <Section position="2" start_page="70" end_page="71" type="sub_section">
      <SectionTitle>
2.2 Evaluation of the anaphora resolution
</SectionTitle>
      <Paragraph position="0"> module As we reported in Ferr~indez et. al (1998b), we run our system on part of the Spanish version of The Blue Book corpus. We did not use semantic information since the tagger did not provide this information, but in spite of this being lacking we obtained the following figures: it detected 100% of the pronominal anaphors, medium length .of sentences with anaphors was 48 words and for pronominal anaphora we obtained 83% accuracy (pronouns rightly solved divided by the total number of pronouns). For The Blue Book in English, we have obtained the following figures: 79 pronouns (it2:41, they:29, themselves:9) with an accuracy of 87.3% (it:80,5%, they:93,1%, themselves:lO0%); on average, 22 words per sentence.</Paragraph>
      <Paragraph position="1"> The reason why some of the references failed is mainly due to the lack of semantic information and due to some weakness of the English grammar that we use. For example, in the sentence (1), our system has not selected the right  antecedent (The French term &amp;quot;communication&amp;quot; and the Spanish term &amp;quot;comunicaci6n') due to the symbol &amp;quot;(inverted commas) has been tagged as a new word, and in our grammar we have not foreseen this in a np, so the coordination of both np have failed.</Paragraph>
      <Paragraph position="2"> (1) Note 2 - The French term &amp;quot;communication &amp;quot;and the  Spanish term &amp;quot;comunicaci6n &amp;quot;have the current meaning given in this definition, but they also acquire a more specific meaning in telecommunication (see 0009, 0010 and 0011).</Paragraph>
      <Paragraph position="3"> With reference to the differences between English and Spanish pronoun resolution, we have observed that there is a greater number of possible antecedents for Spanish pronouns (26) than for English (11). This fact could be due to the larger size of Spanish sentences.</Paragraph>
      <Paragraph position="4"> Another difference is that constraints (c-command and morphologic agreement) have played a more important role for Spanish texts in the detection of the antecedent: the total number of possible antecedents is reduced from 733 to 222 (a reduction of 70%), whereas for English texts it has only a reduction of 37.7%. This fact is mainly due to the fact that Spanish language has more morphologic information than English.</Paragraph>
      <Paragraph position="5"> With regard to the importance of each kind of information for each language, if we apply exactly the same set of preferences in Spanish and English, we obtain a 76% accuracy in English.</Paragraph>
      <Paragraph position="6"> But we have obtained a better accuracy (87.3%) if we give&amp;quot; more importance to syntactic parallelism and less importance to statistical information.  slot structure obtained after applying the anaphora resolution module) from the source language and generates the slot structure in the target language. In our proposal, we will study pronominal anaphora generation exclusively. We will divide the section in several subsections that solve the different discrepancies between English and Spanish. In Figure 3 we can see the 1SS.</Paragraph>
    </Section>
    <Section position="3" start_page="71" end_page="71" type="sub_section">
      <SectionTitle>
3.1 Number discrepancy resolution
</SectionTitle>
      <Paragraph position="0"> One problem is generated by the discrepancy between words of different languages that express the same concept. These words can be referred to a singular pronoun in the source language and to a plural pronoun in the target language.</Paragraph>
      <Paragraph position="1"> We construct a table with the words that refer to a singular pronoun in the source language and they refer to a plural pronoun in the target language in order to be able to solve these discrepancies correctly. Firstly, we consult this table in the anaphora translation. If the pronoun and its antecedent appear in this figure, we will carry out the indicated transformation.</Paragraph>
      <Paragraph position="2"> Anteced Span. Anaphor i Eng. Anaphor Anteced  In Figure 4, some examples of these words are shown.</Paragraph>
      <Paragraph position="3"> In Figure 5 the English-Spanish translation of a sentence with number discrepancies is described. In this figure, the translation of English SS 3 into  This SS stores for each constituent the following information: constituent name, semantic and morphologic information (structure with functor conc), discourse marker (identifier of the entity or discourse object) and the SS of its subconstituents. As can be observed in Figure 5 we store in the SS of pronouns the information of the right antecedent obtained after applying the anaphora resolution module.</Paragraph>
      <Paragraph position="4"> It is necessary to emphasise that after carrying out the translation, the anaphor must agree in number and person with the verb of the sentence where it appears.</Paragraph>
    </Section>
    <Section position="4" start_page="71" end_page="73" type="sub_section">
      <SectionTitle>
3.2 Gender discrepancy resolution
</SectionTitle>
      <Paragraph position="0"> In order to solve personal pronoun gender discrepancies, we construct a table that translates Spanish personal pronouns into the English ones and vice versa.</Paragraph>
      <Paragraph position="1"> In the Spanish-English translation we only have problems with the pronoun it. The Spanish pronoun 61/6ste (masculine singular third person) can be translated into he or it. If the antecedent of the pronoun dl/dste refers to a person, we will translate it into he. If the antecedent of the  pronoun is an animal or a thing we will translate it into it. These characteristics of the antecedent can be obtained from the semantic information that it is stored in its SS. This semantic information can be incorporated to the system using IRSAS method Moreno eta/. (1992) or another linguistic resource, like WordNet. A similar trouble occurs with the Spanish pronoun ellaJdsta which is solved in the same way.</Paragraph>
      <Paragraph position="2"> In the example of Figure 6 the third argument of the conc structures of these SS is the semantic type, according to the IRSAS ontology. As it can be observed, the np &amp;quot;the cat&amp;quot; has the semantic type nonhuman(animal) and for this reason the pronoun ella is translated into the English pronoun it.</Paragraph>
      <Paragraph position="4"> The table of Figure 7 is used for the remaining pronouns and a direct conversion into English is made.</Paragraph>
      <Paragraph position="5"> We have analysed that Spanish has more morphologic information than English, which is extremely relevant in the English-Spanish translation. In order to solve this problem and to choose the right Spanish pronoun we must obtain the gender and number information from the antecedent of the anaphora and carry out the translation. The pronoun it involves a series of problems since it can be translated into four different Spanish pronouns (dl, ella, dste, dsta). These Spanish pronouns refer to both animals and things, but normally dl/ella refers to animals and dste/dsta refers to things. Therefore, in our automatic Interlingual mechanism, when the antecedent of the pronoun is an animal it is translated into dl/ella and when it is a thing it is translated into dste/dsta, since it is the most common use in Spanish.</Paragraph>
      <Paragraph position="6"> Finally, an additional difficulty exists in the translation of the pronoun you. In Spanish, there are two pronouns for the singular second person (tzi or usted) and three pronouns for the plural second person (vosotros/vosotras or ustedes).</Paragraph>
      <Paragraph position="7"> Basically, the difference lies on which the pronouns tfdvosotros/vosotras are used in an  informal language (colloquial) whereas usted/ustedes are used in a formal one. This implies that to have a specific knowledge of the situation is necessary to be able to choose the right pronoun. Our proposal does not carry out word sense disambiguation and, simply, the colloquial pronouns t~/vosotros/vosotras will be chosen in these cases.</Paragraph>
    </Section>
    <Section position="5" start_page="73" end_page="73" type="sub_section">
      <SectionTitle>
3.3 Syntactic discrepancy resolution
</SectionTitle>
      <Paragraph position="0"> This discrepancy is due to the fact that the surface structures of the Spanish sentences are more flexible than the English ones. The constituents of the Spanish sentences can appear in any position of the sentence. In orde? to carry out a correct translation into English, we must firstly reorganise the Spanish sentence. Nevertheless, in the English-Spanish translation, in general, this reorganisation is not necessary and a direct translation can be carried out.</Paragraph>
      <Paragraph position="1"> ~ (literally: To Peter him saw yesterday) I saw Peter yesterday * f~ of the initial sentence in Spanish: sentencePP(pp(prep(A), np(eedro)), pron(Io), verb(vO,  Let us see an example with the Spanish sentence &amp;quot;A Pedro 1o vi ayer&amp;quot; (1 saw Peter yesterday). In this sentence, the object of the verb appears before the verb (in the position of the theoretically subject) and the subject is omitted. Moreover, there is a pronoun, 1o (him) that functions as complement of the verb vi (saw). This pronoun in Spanish refers to the object of the verb, Pedro (Peter), when it is moved from its theoretical place after the verb (as it occurs in this sentence). In this sentence, the pronominal subject has been omitted. We can find out the subject since the verb is in first person and singular (information stored into its conc structure), so the subject would be the pronoun yo (1). Therefore, the solution would be a new SS in which the order of the constituents is the usual in English: subject, verb, complements of the verb.</Paragraph>
      <Paragraph position="2"> In Figure 8, we can see this process graphically. In this sentence, the pp (&amp;quot;a Pedro &amp;quot;) functions as a indirect object of the verb (because it has the preposition a (to)), and the subject of the verb has to be in first person and singular. After reorganising the sentence, we carry out the translation of each constituent. The words that have not been parsed (freeWord) are translated into the appropriate words in the target language.</Paragraph>
    </Section>
    <Section position="6" start_page="73" end_page="74" type="sub_section">
      <SectionTitle>
3.4 Elliptical zero-subject construction
</SectionTitle>
      <Paragraph position="0"> resolution Omitting the pronominal subject is usual in Spanish. In these cases, we get the number and person information from the verb to obtain the corresponding English pronoun.</Paragraph>
      <Paragraph position="1"> Pedro gan6 el partido de tenis. ~ $61o perdi6 un set. \] / SS of the sentence sentence(np(&amp;quot;Pedro&amp;quot;), vp(&amp;quot;gon6 el pa ido de tenis&amp;quot;) sentence( , vp(&amp;quot;Sdlo perdi6 un set&amp;quot;)  We can check the omission of the pronominal subject of a sentence by means of the SS of the sentence as it is shown in Figure 9. In this figure, we know that the subject of the sentence has been omitted due to the Prolog variable that we find. When it is omitted in the sentence, the SS would have a Prolog variable in the slot corresponding to this noun phrase. We can obtain the information corresponding to the subject from the verb of the sentence. In this figure, it would be third person,  singular and masculine or feminine. With these omitted pronominal anaphors, we will apply the preference for the subject of the previous sentence (if it agrees in person and number, and if it is semantically consistent). This information is used to find its antecedent, in this case Pedro (Peter) with masculine gender, so the final translation would choose a masculine pronoun (he).</Paragraph>
      <Paragraph position="2"> Sometimes, we can also obtain the gender information of the pronoun when the verb is copulative. For example, in4: Pedroi vio a Anaj en el parque, fDj Estaba muy guapa (Peteri saw Ann~ in the park. Shej was very beautiful). In this example, the verb estaba (was) is copulative, so its subject has to agree in gender and number with its object. In this way, we can obtain the gender information from the object, guapa (beautiful woman), that has feminine gender, so the omitted pronoun would be she instead of he.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="74" end_page="74" type="metho">
    <SectionTitle>
4 Commercial MT system evaluation
</SectionTitle>
    <Paragraph position="0"> and discussion In this section, we evaluate different commercial MT systems analysing their deficiencies in translating pronominal anaphora. We study how MT systems deal with the presented discrepancies. In this paper we evaluate 4 systems: (1) Key Translator Pro Version 2.0  In Figure 10, it can be observed the translation of an English-Spanish sentence with gender discrepancies. In (1) and (2) the pronoun they is wrongly translated into ellos (masculine plural); in (3) and (4) the pronoun is omitted. The pronominal subject can be omitted inSpanish. However, pronominal anaphora is always presented in Spanish in our automatic 1SS mechanism.</Paragraph>
    <Paragraph position="1"> The correct translation of this anaphoric expression in our system is the pronoun elias (feminine plural). The information related to the gender and number must be extracted from the correct antecedent.</Paragraph>
    <Paragraph position="2"> Source language : Women were in the duty-free shop. They were buying gifts for their husbands.</Paragraph>
    <Paragraph position="3">  (1) Mujeres sido en el exento de derechos de aduana tienda. Ellos estaban regalos comprantes para sus esposos. (2) Las mujeres estaban en la tienda libre de impuestos. Eilos compraban los regalos para sus esposos.</Paragraph>
    <Paragraph position="4"> (3) Las mujeres estaban en el departamento con franquicia. 0 Compraban regalos para sus maridos.</Paragraph>
    <Paragraph position="5"> (4) Las mujeres estuvieron en ia tienda de libre-de-impuestos. 0 Estuvieron comprando regalos para sus maridos.  Target language: Las mujeres estaban en la tienda iibre de impuestos. Elias estaban comprando regalos para sus maridos. Figure lO.</Paragraph>
    <Paragraph position="6"> In figure 11, an English-Spanish translation with gender discrepancies can be observed. The Spanish pronoun dl is translated into he in (1) (2) (3) and (4) while the right translation is the pronoun it. In our proposal, we solve the problem using semantic information of the antecedent. In this case, the antecedent el mono (the monkey) is an animal, therefore, the pronoun he must be translated into it.</Paragraph>
    <Paragraph position="7"> In figure 12, a number discrepancy can be observed. The word police is plural in English, while it is singular in Spanish (policia). In (1) (2) (3) and (4) we can observed wrong translations and pronouns that do not agree with the verb.</Paragraph>
    <Paragraph position="8"> Before the translation, the number discrepancy table is consulted and if the pronoun and its antecedent appear in this table, we will carry out the indicated transformation. After the translation, the anaphor must agree in number and person with the verb of the sentence where it appears. Source language : El mono se bebi6 la leche. Despu6s, dl salt6 entre los 6rboles.</Paragraph>
    <Paragraph position="9">  (1) The monkey was dmnk the milk. Afterwards, he jumped between the trees.</Paragraph>
    <Paragraph position="10"> (2) The monkey was drunk the milk. After, he jumped between the trees.</Paragraph>
    <Paragraph position="11"> (3) The monkey drank milk. Later, he jumped between the trees.</Paragraph>
    <Paragraph position="12"> (4) The monkey \[bebi6\] milk her/you/it \[Despu6s\], \[61\] \[salt6\] 1~he~she/you enter the \[~boles\].</Paragraph>
    <Paragraph position="13"> Target language: The monkey drank milk. Later, it jumped between the trees.</Paragraph>
  </Section>
  <Section position="4" start_page="74" end_page="75" type="metho">
    <SectionTitle>
4 The symbol ~ in a position of the sentence marks the
</SectionTitle>
    <Paragraph position="0"> omitted words in that position.</Paragraph>
    <Paragraph position="1">  Source language: The police are coming. They are just in time.  (1) La policta viene. Ellos son solamente en tiempo. (2) Los policlas vienen. Ellos son simplemente en el tiempo. (3) El policia est~ viniendo. L~I es justa en tiempo. (4) La policia est~ viniendo. 0 Justamente son a tiempo.  Target language : La polieia es~ viniendo, l~sta ilegar~ a tiempo. Figure 12.</Paragraph>
    <Paragraph position="2"> In Figure 13, an example of Spanish-English syntactic discrepancies can be observed. The systems (1) (2) (3) and (4) fail in the translation. In our mechanism, we reorganise the sentence and then, we accomplish the translation.</Paragraph>
    <Paragraph position="3"> Source language : A Pedro 1o vi ayer.</Paragraph>
    <Paragraph position="4">  (1) To I Ask for was seen it yesterday.</Paragraph>
    <Paragraph position="5"> (2) To Pedro I saw it yesterday.</Paragraph>
    <Paragraph position="6"> (3) To Pedro I saw yesterday.</Paragraph>
    <Paragraph position="7"> (4) TO/AT Pedro saw him/you/it yesterday.  Target language : I saw Peter yesterday. Figure 13.</Paragraph>
    <Paragraph position="8"> Finally, we analyse the Spanish elliptical zero-subject construction. In Figure 14, the systems (1) (2) and (4) fail in the translation. In our proposal, we obtain the information corresponding to the subject from the verb of the sentence. In this example, the pronoun must be first or third person and singular. We extract the gender information from the correct antecedent (feminine) and we determine that the pronoun is she (ella), feminine third person singular. Source language : La mujer tenia hambre. ~1Comia el mel6n.  (1) The woman was hungry. O Was catting the melon. (2) The woman were hungry. O Was eating the melon. (3) The woman was hungry. She ate the melon. (4) The woman was being hungry. 1~he./she/you was eating the</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML