File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1904_metho.xml

Size: 19,106 bytes

Last Modified: 2025-10-06 14:10:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1904">
  <Title>Evaluation and Improvement of Cross-Lingual Question Answering Strategies</Title>
  <Section position="3" start_page="24" end_page="2005" type="metho">
    <SectionTitle>
4 Adopted approach
</SectionTitle>
    <Paragraph position="0"> In order to deal with the conversion from French to English in our system, two strategies are applied in parallel. They differ on what is translated to treat the question asked in French. The first sub-system called MUSQAT proceeds to the question analysis in French, and then translates the question terms extracted by this question analysis module, following the - - - arrows in Figure 1. The second sub-system makes use of a machine translation tool (Reverso 5) to obtain translations of the questions and then our English monolingual system called QALC is applied, following the ..-.. arrows in Figure 1 . These strategies will be detailed later in the article.</Paragraph>
    <Paragraph position="1"> If they represent the most common strategies for this kind of task, an original feature of our system is the implementation of both strategies, which enables us to merge the results obtained by following these strategies, in order to improve the global performance of our system.</Paragraph>
    <Paragraph position="2"> In Table 1, we present an analysis of the results we obtained for the CLEF evaluation campaign.</Paragraph>
    <Paragraph position="3"> We evaluate the results obtained at two different points of the question-answering process, i.e. after the sentence selection (point (a) in Figure 1), and after the answer extraction (point (b) in Figure 1). At point (a), we count how many questions (among the global evaluation set of 200 questions) have an appropriate answer in the first five sentences. At point (b), we distinguish the answers the analysis process labels as named entities (NE), from the others, since the corresponding answering processes are different. We also detail how many answers are ranked first, or in the first five ranks, as we take into account the first five answers. null As illustrated in Table 1, the two strategies for dealing with multilingualism give quite different results, which can be explained by each strategy characteristics.</Paragraph>
    <Paragraph position="4"> MUSQAT proceeds to the question analysis with French questions correctly expressed, and which analysis is therefore more reliable. Yet, the terms translations are then obtained from every possible translation of each term, and thus without taking account any context ; moreover, they depend on the quality of the dictionaries used, and  (fusion of both strategies) TAB. 1 - Performance of our system in CLEF  introduce noise because of the erroneous translations. null In MUSQAT, we do not only translate monoterms (i.e. terms composed of single word) : the biterms (composed of two words) of the French questions are also extracted by the question analysis. Every sequence of two terms which are tagged as adjective/common noun or proper noun/proper noun... constitutes a biterm. Each word of the biterm is translated, and then the existence of the corresponding biterm built in English is checked in the corpus. The biterms thus obtained are then used by the further modules of the system. Taking biterms into account is useful since they provide a minimal context to the words forming them, as well for the translation as for the re-indexing and re-ranking of the documents (see Figure 1), as explained in (Ferret et al., 2002). Moreover, the presence of the biterm translations in the corpus is a kind of validation of the monoterms translations.</Paragraph>
    <Paragraph position="5"> As for translating the question, which is implemented by Reverso+QALC, it presents the advantage of giving a unique translation of the question terms, which is quite reliable. But the grammaticality or realism of the question are not assured, and thus the question analysis, based on regular expression patterns, can be disturbed.</Paragraph>
    <Paragraph position="6"> In this work, we tried to evaluate each strategy EACL 2006 Workshop on Multilingual Question Answering - MLQA06  and to bypass their drawbacks : on the one hand (Section 5), by examining how the biterm translation in MUSQAT could be more reliable, and on the other hand (Section 6) by improving the question analysis, by relying on the French questions, for QALC.</Paragraph>
  </Section>
  <Section position="4" start_page="2005" end_page="2005" type="metho">
    <SectionTitle>
5 Biterm translation
</SectionTitle>
    <Paragraph position="0"> The translation of terms and biterms present in the question is achieved using two dictionaries.</Paragraph>
    <Paragraph position="1"> The first of them, which was used last year for our participation to CLEF is Magic-Dic 6. It is a dictionary under GPL licence, which was retained for its capacity to evolve. Indeed users can submit new translations which are controlled before being integrated. Yet, it is quite incomplete. This year we used FreeDict as well (FreeDict is also under GPL licence), to fill in the gaps of Magic-Dic.</Paragraph>
    <Paragraph position="2"> FreeDict added 424 translations to the 690 terms already obtained. By mixing both sets of translations we obtained 463 additional biterms, making a total of 777 biterms.</Paragraph>
    <Paragraph position="3"> Nevertheless, whatever the quality and the size of the dictionaries are, the problem of biterm translation remains the same : since biterms are not in the dictionaries, the only way for us to get their translation is to combine all the different term translations. The main drawback of this approach is the generated noise, for none of the terms constituting the biterm is disambiguated. For example, three different translations are found for the biterm Conseil de d'efense : defense council, defense advice and defense counsel ; but only the first of those should be finally retained by our system.</Paragraph>
    <Paragraph position="4"> To reduce this noise, an interesting possibility is to validate the obtained biterms by searching them or their variants in the complete collection of documents. (Grefenstette, 1999) reports a quite similar experiment in the context of a machine translation task : he uses the Web in order to order the possible translations of noun phrases, and in particular noun biterms. Fastr (Jacquemin, 1996) is a parser which takes as input a corpus and a list of terms (multi or monoterms) and outputs the indexed corpus in which terms and their variants are recognized. Hence, Fastr is quite adequate for biterms validation : it tags all the biterms present in the collection, whether in their original form or in a variant that can be semantic or syntactic.</Paragraph>
    <Paragraph position="5"> In order to validate the biterms, the complete  collection of the CLEF campaign (500 Mbyte) was first tagged using the TreeTagger, then Fastr was applied. The results are presented Table 2 : 39.5% of the 777 biterms were found in the collection, in a total of 63,404 occurrences. Thus there is an average of 206 occurrences for each biterm. If we do not take into account the biterm which is the most represented (last year with 30,981 occurrences), this average falls to 105. The 52 biterms which are found in their original form only are most of the time names of persons. Lastly, biterms that are never found in their original form, are often constituted of one term badly translated, for example the biterm oil importation is not present in the collection but its variant import of oil is found 28 times. Then, it may be interesting to replace these biterms by the most represented of their variants.</Paragraph>
    <Paragraph position="6"> Whenever a biterm is thus validated (found in the collection beyond a chosen threshold), the translation of its terms is itself validated, other translations being discarded. Thus, biterm validation enables us to validate monoterm translations. Then, the following step will be to evaluate how this new set of terms and biterms improves the results of MUSQAT.</Paragraph>
    <Paragraph position="7"> After CLEF 2005 evaluation, we had at our disposal the set of questions in their English original version (this set was provided by the organizers).</Paragraph>
    <Paragraph position="8"> We had also the English translation (far less correct) provided by the automatic translator Reverso. As we can see it Table 3, for each set of questions the number of terms and biterms is nearly the same. In the set of translations given by Reverso, we manually examined how many biterms were false and found that here again the figures were close to those of the original version. There are two main reasons for which a biterm may be false : - in two thirds of cases, the association itself is false : the two terms should not have been associated ; it is the case for example of many country from the question How many countries joined the international coalition to restore the democratic government in Haiti ? 7 - in one third of cases, one of the terms is not translated or translated with an erroneous term, like movement zapatiste coming from the question What carry the courtiers of the  TAB. 3 - Biterms in the different sets of questions However, we calculated that among the 204 biterms given by Reverso, 106 are also present in the original set of questions in English. Among the 98 remaining biterms, 38 are false (for the reasons given above). Then, there are 60 biterms which are neither erroneous nor present in the original version. Some of them contain a term which has been translated using a different word, but that is nevertheless correct ; yet, most of these 60 biterms have a different syntax from those constructed from the original version, which is due to the syntax of the questions translated by Reverso.</Paragraph>
    <Paragraph position="9"> This leads us to conclude that even if Reverso produces syntactically erroneous questions, the vocabulary it chooses is most of the time adequate.</Paragraph>
    <Paragraph position="10"> Yet, it is still interesting to use also the biterms constructed from the dictionaries since they are much more numerous and provide variants of the biterms returned by Reverso.</Paragraph>
    <Paragraph position="11"> 6 Multilingual question analysis We have developed for the evaluations a question analysis in both languages. It is based on the morpho-syntactic tagging and the syntactic analysis of the questions. Then different elements are detected from both analyses : recognition of the expected answer type, of the question category, of the temporal context...</Paragraph>
    <Paragraph position="12"> There are of course lexicons and patterns which are specific to each language, but the core of the module is independent from the language. This Reverso, which should have produced What do supporters of the Zapatistas in Mexico wear ? module was evaluated on corpora of similar questions in French and in English, and its results on both languages are quite close (around 90% of recall and precision for the expected answer type for example ; for more details, see (Ligozat et al., 2006)).</Paragraph>
    <Paragraph position="13"> As presented above, our system relies on two distinct strategies to answer to a cross-language question : - Either the question is analyzed in the original language, and next translated term-byterm. The question analysis is then more reliable since it processes a grammatically correct question ; yet, the translation of terms has no context to rely on.</Paragraph>
    <Paragraph position="14"> - Or the question is first translated into the target language before being analyzed. Although this strategy improves the translation, its main inconvenient is that each translation error has strong consequences on the question analysis. We will now try to evaluate to which extent the translation errors actually influence our question analysis and to find solutions to avoid minimize this influence in the Reverso+QALC system.</Paragraph>
    <Paragraph position="15"> An error in the question translation can lead to wrong terms or an incorrect English construction.</Paragraph>
    <Paragraph position="16"> Thus, the translation of the question &amp;quot;Combien y a-t-il d'habitants en France ?&amp;quot; (&amp;quot;How many inhabitants are there in France ?&amp;quot;) is &amp;quot;How much is there of inhabitants in France ?&amp;quot;.</Paragraph>
    <Paragraph position="17"> In order to evaluate our second strategy, Reverso+QALC, using question translation and then a monolingual system, it is interesting to estimate EACL 2006 Workshop on Multilingual Question Answering - MLQA06  the influence of a such a coarse translation on the results of our system.</Paragraph>
    <Paragraph position="18"> In order to avoid these translating problems, it is possible to adapt either the input or the output of the translating module. (Ahn et al., 2004) present an example of a system processing preand post-corrections thanks to surface reformulation rules. However, this type of correction is highly dependent on the kind of questions to process, as well as on the errors of the translation tool that is used.</Paragraph>
    <Paragraph position="19"> We suggest to use another kind of processing, which makes the most of the cross-lingual character of the task, in order to improve the analysis of the translated questions and to take into account the possibilities of errors in these questions.</Paragraph>
    <Paragraph position="20"> Our present system already takes into account some of the most frequent translation errors, by allowing the question analysis module to loosen some of its rules in case the question be translated. Thus, a definition question such as &amp;quot;Qu'estce que l'UNITA ?&amp;quot;, translated &amp;quot;What UNITA ?&amp;quot; by our translating tool, instead of &amp;quot;What is the UNITA ?&amp;quot;, will nevertheless be correctly analyzed by our rules : indeed, the pattern WhatGN will be considered as corresponding to a definition question, while on a non-translated question, only the pattern WhatBeGN will be allowed.</Paragraph>
    <Paragraph position="21"> In order to try and improve our processing of approximations in the translated questions, the solution we suggest here consists in making the question analysis in both the source and the target languages, and in reporting the information (or at least part of it) returned by the source analysis into the target analysis. This is possible first because our system treats both the languages in a parallel way, and second, some of the information returned by the question analysis module use the same terms in English and in French, like for example the question category or the expected Named Entity type.</Paragraph>
    <Paragraph position="22"> More precisely, we propose, in the task with French questions and English documents, to analyse the French questions, and their English translations, and then to report the question category and the expected answer type of the French questions into the English question analysis. The information found in the source language should be more reliable since obtained on a real question.</Paragraph>
    <Paragraph position="23"> For example, for the question &amp;quot;Combien de communaut'es Di Mambro a-t-il cr'ee ?&amp;quot; (&amp;quot;How many communities has Di Mambro created ?&amp;quot;), Reverso's translation is &amp;quot;How many Di Mambro communities has he create ?&amp;quot; which prevents the question analysis module to analyze it correctly.</Paragraph>
    <Paragraph position="24"> The French analysis is thus used, which provides the question category combien (how many) and the expected named entity type NUMBER. This information is reported in the English analysis file. These characteristics of the question are used at two different steps of the question answering process : when selecting the candidate sentences and when extracting the answers. Improving their reliability should then enable us to increase the number of correct answers after these two steps.</Paragraph>
    <Paragraph position="25"> In order to test this strategy, we conducted an experiment based on the CLEF 2005 FR-EN task, and the 200 corresponding French questions. We launched the question answering system on three question files : - The first question file (here called English file) contained the original English questions (provided by the CLEF organizers). This file will be considered as a test file, since the results of our system on this file represent those that would be reached without translation errors. null - The second file (called Translated file) contained the translated questions analysis.</Paragraph>
    <Paragraph position="26"> - The last file (called Improved file) contained the same analysis, but for which the question category and the expected answer type were replaced by those of the French analysis.</Paragraph>
    <Paragraph position="27"> Then we searched for the number of correct answers for each input question file after the sentence selection and after the answer extraction. The results obtained by our system on each file are presented on Figure 2, Figure 3 and Figure 4. These figures present the number of questions expecting a named entity answer, expecting another kind of answer, and the total number of questions, as well as the results of our system on each type of question : the number of correct questions are given at the first five ranks, and at the first rank, first for the sentences (&amp;quot;long answers&amp;quot;) and then for the short answers.</Paragraph>
    <Paragraph position="28"> These results show that the information transfer from the source language to the target language significantly improves the system's results ; the number of correct answers increases in every case. It increases from 34 on the translated questions file to 36 on the improved file, and from 52 EACL 2006 Workshop on Multilingual Question Answering - MLQA06  FIG. 2 - QALC's results (i.e. number of correct answers) on the 200 questions FIG. 3 - Results on the named entities questions FIG. 4 - Results on the non named entities questions null to 55 for the first 5 ranks. These results are closer to those of the monolingual system, which returns 41 correct answers at the first rank, and 59 on the first 5 ranks.</Paragraph>
    <Paragraph position="29"> It is interesting to see that the difference between the monolingual and the bilingual systems is less noticeable after the sentence selection step than after the answer extraction step, which tends to prove that the last step of our process is more sensitive to translation errors. Moreover, this experiment shows that this step can be improved thanks to an information transfer between the source and the target languages. In order to extend this strategy, we could also match each French question term to its English equivalent, in order to translate all the information given by the French analysis into English. Thus, the question analysis errors would be minimized.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML