File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1027_metho.xml

Size: 12,371 bytes

Last Modified: 2025-10-06 14:15:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1027">
  <Title>Should we Translate the Documents or the Queries in Cross-language Information Retrieval?</Title>
  <Section position="3" start_page="208" end_page="209" type="metho">
    <SectionTitle>
2 Translation Model
</SectionTitle>
    <Paragraph position="0"> The algorithm for fast translation, which has been described previously in some detail (McCarley and Roukos, 1998) and used with considerable success in TREC (Franz et al., 1999), is a descendent of IBM Model 1 (Brown et al., 1993). Our model captures important features of more complex models, such as fertility (the number of French words  output when a given English word is translated) but ignores complexities such as dis- J tortion parameters that are unimportant for IR. Very fast decoding is achieved by implementing it as a direct-channel model rather than as a source-channel model. The basic structure of the English~French model is the probability distribution fl...A, le,,co text(e,)). (1) of the fertility ni of an English word ei and a set of French words fl...f,~ associated with that English word, given its context. Here we regard the context of a word as the preceding and following non-stop words; our approach can easily be extended to other types of contextual features. This model is trained on approximately 5 million sentence pairs of Hansard (Canadian parliamentary) and UN proceedings which have been aligned on a sentence-by-sentence basis by the methods of (Brown et al., 1991), and then further aligned on a word-by-word basis by methods similar to (Brown et al., 1993). The French::~English model can be described by simply interchanging English and French notation above. It is trained separately on the same training data, using identical procedures. null</Paragraph>
  </Section>
  <Section position="4" start_page="209" end_page="210" type="metho">
    <SectionTitle>
3 Information Retrieval
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="209" end_page="210" type="sub_section">
      <SectionTitle>
Experiments
</SectionTitle>
      <Paragraph position="0"> The document sets used in our experiments were the English and French parts of the document set used in the TREC-6 and TREC-7 CLIR tracks. The English document set consisted of 3 years of AP newswire (1988-1990), comprising 242918 stories originally occupying 759 MB. The French document set consisted of the same 3 years of SDA (a Swiss newswire service), comprising 141656 stories and originally occupying 257 MB. Identical query sets and appropriate relevance judgments were available in both English and French. The 22 topics from TREC-6 were originally constructed in English and translated by humans into French. The 28 topics from TREC-7 were originally constructed (7 each from four different sites) in English, French, German, and Italian, and human translated into all four languages. We have no knowledge of which TREC-7 queries were originally constructed in which language. The queries contain three SGML fields (&lt;topic&gt;, &lt;description&gt;, &lt;narrative&gt;), which allows us to' contrast short (&lt;description&gt; field only) and long (all three fields) forms of the queries.</Paragraph>
      <Paragraph position="1"> Queries from TREC-7 appear to be somewhat &amp;quot;easier&amp;quot; than queries from TREC-6, across both document sets. This difference is not accounted for simply by the number of relevant documents, since there were considerably fewer relevant French documents per TREC-7 query than per TREC-6 query.</Paragraph>
      <Paragraph position="2"> With this set of resources, we performed the two different sets of CLIR experiments, denoted EqFd (English queries retrieving French documents), and FqBd (French queries retrieving English documents.) In both EqFd and' FqEd we employed both techniques (translating the queries, translating the documents). We emphasize that the query translation in EqFd was performed with the same English=~French translation system as the document translation in FqEd, and that the document translation EqFd was performed with the same French=~English translation system as the query translation in FqEd. We further emphasize that both translation systems were built from the same training data, and thus are as close to identical quality as can likely be attained. Note also that the results presented are not the TREC-7 CLIR task, which involved both cross-language information retrieval and the merging of documents retrieved from sources in different languages.</Paragraph>
      <Paragraph position="3"> Preprocessing of documents includes part-of-speech tagging and morphological analysis. (The training data for the translation models was preprocessed identically, so that the translation models translated between morphological root words rather than between words.) Our information retrieval systems consists of first pass scoring with the Okapi formula (Robertson et al., 1995) on unigrams and symmetrized bigrams (with  en, des, de, and - allowed as connectors) followed by a second pass re-scoring using local context analysis (LCA) as a query expansion technique (Xu and Croft, 1996). Our primary basis for comparison of the results of the experiments was TREC-style average precision after the second pass, although we have checked that our principal conclusions follow on the basis of first pass scores, and on the precision at rank 20. In the query translation experiments, our implementation of query expansion corresponds to the post-translation expansion of (Ballasteros and Croft, 1997), (Ballasteros and Croft, 1998).</Paragraph>
      <Paragraph position="4"> All adjustable parameters in the IR system were left unchanged from their values in our TREC ad-hoc experiments (Chan et al., 1997),(Franz and Roukos, 1998), (Franz et al., 1999) or cited papers (Xu and Croft, 1996), except for the number of documents used as the basis for the LCA, which was estimated at 15 from scaling considerations.</Paragraph>
      <Paragraph position="5"> Average precision for both query and document translation were noted to be insensitive to this parameter (as previously observed in other contexts) and not to favor one or the other method of CLIR.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="210" end_page="211" type="metho">
    <SectionTitle>
4 Results
</SectionTitle>
    <Paragraph position="0"> In experiment EqFd, document translation outperformed query translation, as seen in columns qt and dt of Table 1. In experiment FqEd, query translation outperformed document translation, as seen in the columns qt and dt of Table 2. The relative performances of query and document translation, in terms of average precision, do not differ between long and short forms of the queries, contrary to expectations that query translation might fair better on longer queries. A more sophisticated translation model, incorporating more nonlocal features into its definition of context might reveal a difference in this aspect. A simple explanation is that in both experiments, French=eeEnglish translation outperformed English=~French translation. It is surprising that the difference in performance is this large, given that the training of the translation systems was identical. Reasons for this difference could be in the structure of the languages themselves; for example, the French tendency to use phrases such as pomme de terre for potato may hinder retrieval based on the Okapi formula, which tends to emphasize matching unigrams. However, separate monolingual retrieval experiments indicate that the advantages gained by indexing bigrams in the French documents were not only too small to account for the difference between the retrieval experiments involving opposite translation directions, but were in fact smaller than the gains made by indexing bigrams in the English documents. The fact that French is a more highly inflected language than English is unlikely to account for the difference since both translation systems and the IR system used morphologically analyzed text. Differences in the quality of pre-processing steps in each language, such as tagging and morphing, are more difficult to account for, in the absence of standard metrics for these tasks. However, we believe that differences in preprocessing for each language have only a small effect on retrieval performance. Furthermore, these differences are likely to be compensated for by the training of the translation algorithm: since its training data was preprocessed identically, a translation engine trained to produce language in a particular style of morphing is well suited for matching translated documents with queries morphed in the same style. A related concern is &amp;quot;matching&amp;quot; between translation model training data and retrieval set - the English AP documents might have been more similar to the Hansard than the Swiss SDA documents. All of these concerns heighten the importance of studying both translation directions within the language pair.</Paragraph>
    <Paragraph position="1"> On a query-by-query basis, the scores are quite correlated, as seen in Fig. (1). On TREC-7 short queries, the average precisions of query and document translation are within 0.1 of each other on 23 of the 28 queries, on both FqEd and EqFd. The remaining outlier points tend to be accounted for by simple translation errors, (e.g. vol  All numbers are TREC average precisions.</Paragraph>
    <Paragraph position="2"> qt : query translation system dt : document translation system qt + dt : hybrid system combining qt and dt ht : monolingual baseline (equivalent to human translation) ht + dt : hybrid system combining ht and dt d'oeuvres d'art --4 flight art on the TREC-7 query CL,036.) With the limited number of queries available, it is not clear whether the difference in retrieval results between the two translation directions is a result of small effects across many queries, or is principally determined by the few outlier points.</Paragraph>
    <Paragraph position="3"> We remind the reader that the query translation and document translation approaches to CLIR are not symmetrical. Information is distorted in a different manner by the two approaches, and thus a combination of the two approaches may yield new information. We have investigated this aspect by developing a hybrid system in which the score of each document is the mean of its (normalized) scores from both the query and document translation experiments. (A more general linear combination would perhaps be more suitable if the average precision of the two retrievals differed substantially.) We observe that the hybrid systems which combine query translation and document translation outperform both query translation and document translation individually, on both sets of documents. (See column qt + dt of Tables</Paragraph>
  </Section>
  <Section position="6" start_page="211" end_page="212" type="metho">
    <SectionTitle>
1 and 2.)
</SectionTitle>
    <Paragraph position="0"> Given the tradeoff between computer resources and quality of translation, some would propose that correspondingly more computational effort should be put into query translation. From this point of view, a document translation system based on fast MT should be compared with a query translation system based on higher quality, but slower MT. We can meaningfully investigate this limit by regarding the human-translated versions of the TREC queries as the extreme high-quality limit of machine translation. In this task, monolingual retrieval (the usual baseline for judging the degree to which translation degrades retrieval performance in CLIR) can be regarded as the extreme high-quality limit of query trans- null provides another source of information, since the context sensitive aspects of the translation account for context in a manner distinct from current algorithms of information retrieval. Thus we do a further set of experiments in which we mix document translation and monolingual retrieval. Surprisingly, we find that the hybrid system outperforms the pure monolingual system. (See columns ht and ht +dr of Tables 1 and 2.) Thus we conclude that a mixture of document translation and query translation can be expected to outperform pure query translation, even very high quality query translation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML