File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1901_metho.xml

Size: 29,183 bytes

Last Modified: 2025-10-06 14:10:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1901">
  <Title>QA better than IR ?</Title>
  <Section position="3" start_page="0" end_page="1" type="metho">
    <SectionTitle>
1 Google Desktop Search : http://desktop.google.com/
2 Background
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Qristal pedigree
Qristal (French acronym of &amp;quot;Questions-Reponses
</SectionTitle>
      <Paragraph position="0"> Integrant un Systeme de Traitement Automatique des Langues&amp;quot;, which can be translated by &amp;quot;Question Answering System using NLP&amp;quot;) is, as far as we know, the first Multilingual Question Answering System available on the consumer market (B2C). It handles French, English, Italian, Portuguese or Polish.</Paragraph>
      <Paragraph position="1"> Qristal allows the user to query on a static corpus or on the Web. It supplies answers in one or any of the 4 languages.</Paragraph>
      <Paragraph position="2"> Our system is described in detail in other papers (Amaral, 2004; Laurent 2004; Laurent 2005-1; Laurent 2005-2). Qristal is based on our Cordial syntactic analyzer and extensively uses all the usual constituents of the natural language processing, while, as seldom found, remarkably featuring anaphora resolution and metaphor detection.</Paragraph>
      <Paragraph position="3"> Originally developed within the framework of the European project TRUST2 and M-CAST3, our system has evolved, over the last five years, from a monolingual single-user program into a multilingual multi-user system.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="1" type="sub_section">
      <SectionTitle>
2.2 Qristal benchmarks and performances
</SectionTitle>
      <Paragraph position="0"> A beta version of Qristal was evaluated in July 2004 in the EQueR4 evaluation campaign organized in France by several ministries (Ayache, 2004; Ayache, 2005). With a MRR (&amp;quot;Mean Reciprocal Rank&amp;quot;; cf. Ayache, 2004) of 0.58 for the exact answers and 0.70 for the snippets, our system ranked first out of the seven  Question Answering systems evaluated.</Paragraph>
      <Paragraph position="1"> The marketed version of Qristal was evaluated during the CLEF 2005 (Laurent, 2005-2) and obtained 64% of exact answers for French to French, 39.5% from English to French and 36.5% from Portuguese to French. Once again, Qristal ranked first in this evaluation, for French engines and for all cross language systems, all pairs considered.</Paragraph>
      <Paragraph position="2"> Since this evaluation, the resources were increased and some algorithms revised, so our last tests brought us a 70% of exacted answers and 45 % for cross language.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1" end_page="3" type="metho">
    <SectionTitle>
3 QA and IR
</SectionTitle>
    <Paragraph position="0"> It is true that, intrinsically, IR engines and QA systems differ in design, objectives and processes. An IR engine is geared to deliver snippets or docs from a query, a QA system strive to deliver the exact answer to a question. If one is to differentiate 3 key features of both systems, one of the first difference concerns the query mode : natural language for the QA systems and &amp;quot;Boolean like&amp;quot; for the IR engines. We define &amp;quot;Boolean like&amp;quot; extensively as the use of Boolean operators associated to underlying constraints induced by the word matching techniques. The table 1 gives the results of Google Desktop for natural language requests and Boolean requests (set of questions detailed below) and we can see that results with natural language requests are not so good:  natural language and Boolean requests This performance table shows that classical engines are not suited to answer questions in natural language. To quote Google &amp;quot;A Google search is an easy, honest and objective way to find high-quality websites with information relevant to your search.&amp;quot; The Google technology considers, at least in French, equally and of same &amp;quot;weight, words like, &amp;quot;de&amp;quot; or &amp;quot;le&amp;quot; and the highly semantically-loaded words of the query. This leads to a dramatic upsurge of noise in their results. Therefore, using classic engines require a good knowledge of their syntax and their underlying word matching techniques, like the necessity of grouping between quotation marks, the &amp;quot;noun phrases&amp;quot; and the expressions.</Paragraph>
    <Paragraph position="1"> The second difference concerns what is delivered to the user. Question Answering systems deliver one or more exact answers to a question and their context whereas classical engines return snippets with links to the texts those snippets were extracted from.</Paragraph>
    <Paragraph position="2"> The third difference relates to the dynamic and openness status of the corpora. Usually QA systems use confined or close corpora with low up-date rate, while classical IR engines are tuned to the Web queries and their reference file are continuously updated.</Paragraph>
    <Paragraph position="3"> Qristal QA is able to deliver answers from both web-based queries and closed corpora. We were eager to apply our proposed metrics on the web-based deliveries, but unfortunately, we had not at our disposal the appropriate web reference file of questions and answers, probably impossible to elaborate considering the extremely high up-dating rate of the web pages.</Paragraph>
    <Paragraph position="4"> Therefore we had but no choice to use a closed corpus, Google Desktop being able to manage this type of corpus (see note 1). We used the reference file of questions and answers established for EQueR.</Paragraph>
    <Section position="1" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
3.1 Question Answering systems
</SectionTitle>
      <Paragraph position="0"> Qristal interface mimics almost all the usual screen template of IR engine (see figure 2). It displays results in different languages, keeps the track or trace of the precedent requests, allows the user to choose the requested corpus and, if thought necessary, to make a semantic disambiguation of the question terms.</Paragraph>
      <Paragraph position="1">  Results (answer, sentences, links) are displayed at three levels: if an exact answer is found, it EACL 2006 Workshop on Multilingual Question Answering - MLQA06  is displayed in the top right part of the window, sentences in the lower right part of the window and links on top of the sentences. Note that the words or phrases supporting the inferred answer are put in bold in the text. These words are sometimes pronouns (anaphora) and, frequently, synonyms or derivate forms of the request words.</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.2 Information Retrieval engines
</SectionTitle>
      <Paragraph position="0"> Numerous Information Retrieval engines are available for closed or web-based corpora.</Paragraph>
      <Paragraph position="1"> For our evaluation we selected the Google engine, as it is available both for a web and closed PC desktop usage (in version &amp;quot;Desktop Search&amp;quot;), although, regrettably, this beta version had a few minor defects. The snippets supplied by Google are generally fragments of sentences, sometimes with cut words, stemming from excerpts of text seeming to correspond the best to the query. All you can expect is a help in the selection of the text(s) in which is likely to be present the answer to your query rather than a pinpointed and elaborated exact answer.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="3" type="sub_section">
      <SectionTitle>
3.3 Evaluation method of the performances
</SectionTitle>
      <Paragraph position="0"> Corpus of requests and answers The corpus selected for this evaluation is the corpus used for the EQueR evaluation campaign. This choice was justified by the size of the corpus (over half million texts for about 1.5 Gb) and, especially, by the fact that we have an important corpus of questions (500) with many answers and references for all these questions.</Paragraph>
      <Paragraph position="1"> To generate a comprehensive package of tests of Question Answering systems, ELDA, organizer of the EQueR campaign, made a compilation of all the results returned by all the participants. Then, thanks to a thorough examination by several specialists (one of whom being an author of this paper), this corpus has been verified, increased and validated. In doing so, most certainly, the immense majority of the possible answers are inventoried, the great majority of the references of these answers is known5 to the extent that this corpus of questions and answers can be automatically run, the results subsequently requiring only a reduced amount of checking.</Paragraph>
      <Paragraph position="2"> The initial corpus of 500 questions has been reduced to 330 questions for this evaluation. In 5 a set comprising textual corpora, questions and answers is available at ELDA (http://www.elda.org/ article139.html) fact, the last 100 questions were only reformulations of previous questions and offered not enough interest. 30 questions concerned binary answers YES/NO and 40 questions concerned lists as answers. As Information Retrieval engines are not able to return binary or list answers, including them within the evaluation would have biased it. Finally, five questions were without any answer. They were removed by the organizers of EQueR and we also did so accordingly. For these five &amp;quot;noanswer&amp;quot;questions, Qristal systematically returned correct answers i.e. NIL, where as a classical search engine like Google would systematically return at least one answer. We decided not to include these &amp;quot;NIL&amp;quot; questions so as not further penalize the IR Google engine.</Paragraph>
      <Paragraph position="3"> Evaluation of the &amp;quot;user effort&amp;quot; We have two competing systems on the same corpora and a reference file with questions and answers. We need now to define the basis of their comparison.</Paragraph>
      <Paragraph position="4"> The main comparative evaluation between the Question Answering systems and the Information Retrieval engines (Kwok, 2001) considered only the reading time while counting characters to be read to reach the answer.</Paragraph>
      <Paragraph position="5"> Knowing the delay needed to obtain the results in most Question Answering systems (McGowan, 2005), it seems necessary to take also in account this delay if we want to measure the global user effort to obtain an answer to his question.</Paragraph>
      <Paragraph position="6"> We consider that the user wants a correct answer to his question and we consider that the answer is correct if this answer can be found in the snippet or in the text linked to the snippet for Google (to the sentence for Qristal). So we compared the quality of the systems as follows: percentage of correct answers ranked first, ranked within the five first, the ten first and the hundred first. We considered both the answer as part of the snippets or part or the snippets and documents.</Paragraph>
      <Paragraph position="7"> For the user, the quality of the answers returned, especially in the first results page, is paramount. But, we think another item has to be taken into account: the time needed to obtain the answer. This time is the compound of three elements: * the time to key in the question, * the delay before the results display, * the reading time of the snippets or sentences to reach a correct answer.</Paragraph>
      <Paragraph position="8"> EACL 2006 Workshop on Multilingual Question Answering - MLQA06  Addition of these three elements provides the measure of the &amp;quot;user effort&amp;quot;.</Paragraph>
      <Paragraph position="9"> Time to key in the question This time is shorter for a &amp;quot;Boolean like&amp;quot; engine. Typing a question in Qristal needs in average nine seconds more than with the query in Google. However it supposes that the user types the Boolean like request at the same speed than a natural language request. This implies that the user is very familiar with the Google syntax. For example the question 6: Quel age a l'abbe Pierre ? will be converted into Google syntax by: &amp;quot;abbe Pierre&amp;quot; ans This Boolean request increases the probability to obtain the effective age, not only snippets with the words &amp;quot;age&amp;quot; and &amp;quot;abbe Pierre&amp;quot;. Other example, the question 37: Quel evenement a eu lieu le 27 decembre 1978 en Algerie ? will be converted into Google syntax: &amp;quot;27 decembre 1978&amp;quot; Algerie This Boolean request is needed either words like &amp;quot;evenement&amp;quot; or &amp;quot;avoir lieu&amp;quot; will bring out more noise than correct answers.</Paragraph>
      <Paragraph position="10"> So, we translated the complete list of question in the Google syntax in addition to the questions set in natural language. Compared results were shown on Table 1.</Paragraph>
      <Paragraph position="11"> To measure the time necessary to enter questions, we counted the number of characters typed (always inferior in Google) and multiplied this number by an average speed of 150 characters by minute. We know that a professional typist types at a speed of 300 to 400 characters per minute, so our chosen speed corresponds to a user keying in with two fingers. The following table gives the numbers of characters and the times for the two systems:  Delay to display the results This is the elapsed time between the click on the button &amp;quot;OK&amp;quot; and the display of the results. Note here that, strangely, Google Desktop has a response time distinctly bigger than the response time of Google on the Web, especially when the request contains a group of words between quotes.</Paragraph>
      <Paragraph position="12"> Reading time to reach one answer To fix a reading speed, we tested several users. An average speed of 40 characters by second (2 400 characters per minute, or also 400 words per minute) seems a fair measure. It corresponds to a reader with a higher-education background, according to Richaudeau, 1977.</Paragraph>
      <Paragraph position="13"> While making these tests, we noted that, if the user knows the answer, the reading speed of both the snippets and texts would increase to 100 characters per second (6 000 characters per minute, or 1 000 words per minute). Even if very few questions have an obvious answer, we decided to calculate the reading times with both speeds of 40 characters per second and 100 characters per second.</Paragraph>
      <Paragraph position="14"> The speed of 100 characters per second will however be considered as a superior limit that favours clearly Google where the snippets, constituted by fragments of sentences, sometimes fragments of words, are more difficult and longer to read than the sentences returned by Qristal.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="3" end_page="6" type="metho">
    <SectionTitle>
4 Results of the benchmark
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="3" end_page="4" type="sub_section">
      <SectionTitle>
4.1 Evaluation on the 330 questions
</SectionTitle>
      <Paragraph position="0"> For the 330 questions of the evaluation, results  Qristal returns an exact answer for nearly 70% of questions and a correct answer is returned as exact answer or in the first sentence for 82% of the 330 questions. This has to be compared with the 10% of answer found in first position by Google Desktop. If we consider the snippets and the documents, Qristal returns a correct answer in first rank for 86% of the questions and Google in more than 30%.</Paragraph>
      <Paragraph position="1"> These results give a clear advantage to the Question Answering system on the Information Retrieval engine. This superiority in quality exists also in quantity. Here is the table of user efforts to obtain a correct answer: EACL 2006 Workshop on Multilingual Question Answering - MLQA06  (on 330 questions) It appears that the time to type down the question in Qristal is nearly 9 seconds longer than with Google. The elapsed time before display is similar. On the other hand, as Google gives a correct answer in a higher rank or in a document, not a snippet, the number of characters to be read before reaching an answer is finally more important. Finally, if we consider the average reading speed of 40 characters per second, Qristal needs in average 29 seconds against 73 seconds, in other words the user effort to obtain a good answer is 2.5 times higher with Google than with Qristal.</Paragraph>
      <Paragraph position="2"> If we don't take into account the time to enter the question, Google requires a user effort 6 to 7 time higher than Qristal to reach an answer. This comparison would be effective in the case of voice-based submitted query. In that case, the acquisition of the question would become more difficult according to the syntax of the Boolean engine (&amp;quot;open the quotes&amp;quot;, &amp;quot;close the quotes&amp;quot;...)</Paragraph>
    </Section>
    <Section position="2" start_page="4" end_page="5" type="sub_section">
      <SectionTitle>
4.2 Evaluation on 231 questions
</SectionTitle>
      <Paragraph position="0"> Looking carefully at each answer returned by Google Desktop, we discovered that it ignored some texts or, more exactly, some parts of texts, especially the end of these texts. The help pages of this software point out this &amp;quot;bug&amp;quot; : However, if you're searching for a word within the file, please note that Google Desktop searches only about the first 10,000 words. In a few cases, Google Desktop may index slightly fewer words to save space in your search index and on your hard drive.6 Of course this default impacted on the results and the comparisons. Thus, we decided, in a second iteration of this evaluation, to consider only the 231 questions where Google Desktop found at</Paragraph>
      <Paragraph position="2"> least one correct answer. Google Desktop found no answer with those 99 (330-231) removed questions for two main reasons. Firstly, as it doesn't manage a full indexation of documents.</Paragraph>
      <Paragraph position="3"> Secondly, as some complex questions like &amp;quot;why&amp;quot; or &amp;quot;how&amp;quot; questions often lead it to silence on this evaluation.</Paragraph>
      <Paragraph position="4"> This selection of 231 questions favours Google Desktop but it allows a more accurate comparison. Here are the results for those 231  The corpus of 231 questions is thus &amp;quot;easier&amp;quot; than that of 325 questions. This confirms the score of Qristal for the exact answers: 73.6% versus 69.7%, and the score for the correct answer in first rank: 89.6% versus 81.8%. But the results of Google are of course better: 13.9% in the first snippet against 9.7%, 43.7% in the first snippet or the first document, against 30.6%. However the advantage of the QA system over the IR engine is clear in terms of quality, especially if we consider only the snippets. This advantage is also clear for the user effort, even if any of the answer not found by Qristal penalizes this system as the reading time of this question is the consolidation of all reading times of all the snippets displayed for this question!  The mean times of question entering and displaying results are nearly the same for those 231 questions than for the 325 questions. But, because Google Desktop finds an answer to all questions, the reading time before a correct answer is, in that case, reduced for Google.</Paragraph>
      <Paragraph position="5"> The mean times of question entering and displaying results are nearly the same for those 231 questions than for the 325 questions. But, because Google Desktop finds an answer to all questions, the reading time before a correct answer is, in that case, reduced for Google.</Paragraph>
      <Paragraph position="6"> Finally, with an average reading speed of 40 characters by second, the user effort is two times higher with Google Desktop than it is with Qristal. And if we don't take into account the time to type in the question, the user effort with Google is 6 times higher than with Qristal.</Paragraph>
      <Paragraph position="7"> Using the same presentation than Kwok, 2001, the following graph gives the compared results of the two systems. On Y-axis is the number of correct answers and in X-axis the number of characters read, for the 231 questions: Figure 8 : number of correct answers by characters The interest of Question Answering systems is particularly noticeable at the beginning of the graph, seeing that Qristal displays a correct answer as exact answer at the top of the screen in more than 70% of the questions while Google Desktop needs to read about 1000 characters in the snippets and in the documents to obtain a similar success rate.</Paragraph>
    </Section>
    <Section position="3" start_page="5" end_page="6" type="sub_section">
      <SectionTitle>
4.3 Comparison by type of question(s)
</SectionTitle>
      <Paragraph position="0"> The above statistics concern all types of queries.</Paragraph>
      <Paragraph position="1"> In fact, 25 questions wait a definition, the others being factual requests. The following table of the 231 questions corpus gives the results for these two categories:  The only significant gap in these results is that Google provides better results for definitions (32% of correct answers in the first snippet against 12% for the factual questions).</Paragraph>
      <Paragraph position="2"> We also looked at the questions beginning by &amp;quot;comment&amp;quot; (&amp;quot;how&amp;quot;), but we excluded those beginning by &amp;quot;comment s'appelle&amp;quot; (&amp;quot;how is called&amp;quot;) or &amp;quot;comment est mort&amp;quot; (&amp;quot;how did somebody die&amp;quot;), i.e. 16 questions (3, 50, 59, 90, 93, 117, 148, 154, 165, 196, 199, 234, 247, 249, 263, 295). The results are :  beginning by &amp;quot;comment&amp;quot; (&amp;quot;how&amp;quot;) These results are not satisfying for any of the two systems but Qristal displays a correct answer in 9 cases on 16, versus 0 for Google Desktop. This underlines that the Question Answering systems are more successful when the queries are not purely factual requests. Most certainly this could be caused by the fact that those questions require a deeper analysis.</Paragraph>
      <Paragraph position="3"> A closer examination of factual questions revealed that the most difficult questions for the Information Retrieval engines are the questions about location. For example Google Desktop finds the country related to the Vilvorde town only at the 23rd rank, and the country related to Johannesburg is given only at the 13th rank; the region of Cancale is displayed at the 18th rank; EACL 2006 Workshop on Multilingual Question Answering - MLQA06  and the department (county) where is located Annemasse only at the 23rd rank.</Paragraph>
      <Paragraph position="4"> More generally, a search engine finds answers more easily when these answers contain the words of the query. For example, to the query 252 (&amp;quot;A quelle peine fut condamne Jean-Marie Villemin le 16 decembre 1993?&amp;quot; [&amp;quot;What was the sentence received by Jean-Marie Villemin on 16 December 1993 ?&amp;quot;]), Google Desktop does not find any answer because the acceptable answers &amp;quot;cinq ans de prison&amp;quot; (five years of prison) or &amp;quot;cinq annees d'emprisonnement&amp;quot; (a five-year prison sentence) does not contain, in French, any word of the query. Similarly the search engine has many difficulties to display the development of acronyms like those of the question 141 (&amp;quot;Que signifie CGT?&amp;quot; [&amp;quot;What is the significance of CGT?&amp;quot;]), or of the question 319 (&amp;quot;Qu'est-ce que l'EEE?&amp;quot; [&amp;quot;What is EEE?&amp;quot;]). Because the answers are developments of these capital letters which are not so frequent, except when the acronym is rare, as in this case the acronym is often followed or preceded by his significance, like for the questions 327 (&amp;quot;Qu'est-ce que le Cermoc ?&amp;quot; [&amp;quot;What is Cermoc?&amp;quot;]) or 330 (&amp;quot;Qu'est-ce que l'OACI ?&amp;quot; [What is OACI?&amp;quot;]). For these two questions, Google Desktop returns the correct answer in the first snippet.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="6" end_page="7" type="metho">
    <SectionTitle>
5 Perspectives
</SectionTitle>
    <Paragraph position="0"> The evaluation was designed in such fair way to take into account all the differences between the Information Retrieval engines and the Question Answering systems.</Paragraph>
    <Paragraph position="1"> We made all possible efforts not to favour the QA systems and avoid non equitable comparison. For example, the evaluation includes the requests to Google Desktop made with the most sophisticated achievable query syntax to generate a return of the best answers, knowing that if they were keyed in as for the natural language requests, their success rate would have dropped considerably (see Table 1).</Paragraph>
    <Paragraph position="2"> It is most unlikely that one is able to formulate queries in Boolean like style as quickly as in NL questions. Conversely, for a same given number of characters, reading Google snippets requires most likely far more time than reading complete NL sentences. However, despite all these metrical choices more favourable to the classical search engine, the Question Answering system obtains better results with regard to the quality of the answers and to the user effort.</Paragraph>
    <Paragraph position="3"> If we were able to compare the Web versions of Google and Qristal, the results would be probably different.</Paragraph>
    <Paragraph position="4"> First, because Qristal uses the search engines as a meta-engine without any indexation. Next, because GoogleWeb is really faster at displaying the results from the Web than Google Desktop.</Paragraph>
    <Paragraph position="5"> At last because the redundancy, due to the large volume of indexed pages on the Web, allows the implementation of some very successful techniques.</Paragraph>
    <Paragraph position="6"> For example it may happen that you find the questions in natural language followed by their answers inside Web pages and this in such a way that asking a request in natural language in Google, you can obtain sometimes a very pertinent answer. To the question &amp;quot;Pourquoi le ciel est bleu?&amp;quot; (&amp;quot;Why the sky is blue?&amp;quot;) or to the question &amp;quot;Pourquoi la mer est bleue?&amp;quot; (&amp;quot;Why the sea is blue?&amp;quot;), Google Web returns in first rank snippets and documents very accurately.</Paragraph>
    <Paragraph position="7"> However the analysis of the documents and contained answers permit to the Question Answering systems to return more accurate answers. For example, with the request &amp;quot;capitale anglaise&amp;quot; (&amp;quot;English capital&amp;quot;), Google returns a lot of snippets containing the phrase &amp;quot;capitale anglaise&amp;quot; (&amp;quot;English capital&amp;quot;) but not the word Londres or London in these snippets. In an Information Retrieval engine the answers are very often less justified by the context than it is the case with Question Answering systems. This is because the snippets group essentially words contained in the query. For example, to the question 26 (&amp;quot;Qui a ecrit Germinal?&amp;quot; [&amp;quot;Who wrote Germinal?&amp;quot;], converted in Google syntax by : &amp;quot;auteur Germinal&amp;quot; [&amp;quot;writer Germinal&amp;quot;]), the search engine returns &amp;quot;Emile Zola&amp;quot; in the second snippet but the snippet &amp;quot;L'exposition &amp;quot;Emile Zola, photographe&amp;quot; fait escale&amp;quot; (The exhibition &amp;quot;Emile Zola, photograph&amp;quot; stops at) would be considered as an answer out of its context and non receivable within a campaign like TREC, even if we can read in the text : &amp;quot;L'auteur de &amp;quot;Germinal&amp;quot;, l'ecrivain francais Emile Zola (1840-1902), etait aussi un photographe de talent&amp;quot; (&amp;quot;The author of Germinal, the writer Emile Zola (1840-1902), was also a talented photograph&amp;quot;). We could almost say that the classical search engines return far better results when the user already knows the answer to his query.</Paragraph>
    <Paragraph position="8"> A complete compared benchmark and exhaustive evaluation of search engines and question answering systems needs to be made on the Web.</Paragraph>
    <Paragraph position="9"> EACL 2006 Workshop on Multilingual Question Answering - MLQA06  The evaluation method described above could be applied but, knowing the difficulty to validate a Web corpus of answers, specially the difficulty to keep it referentially constant, the effort to estimate the quality of the returned answers would be far much enormous than the one engaged for this evaluation on a closed corpus.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML