File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/99/p99-1029_concl.xml

Size: 5,165 bytes

Last Modified: 2025-10-06 13:58:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1029">
  <Title>Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting</Title>
  <Section position="7" start_page="226" end_page="227" type="concl">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> We developed a system for our cross-language IR techniques and conducted some basic experiments using the collection from the Cross-Language Track of TREC 6. The 24 English queries are comprised of three fields: titles, descriptions, and narratives. These English queries were manually translated into Korean queries so that we can pretend as if the Korean queries had been generated by human users for cross-language IR. In order to compare cross-language IR and mono-language IR, we used the Smart 11.0 system developed by Cornell University.</Paragraph>
    <Paragraph position="1"> Our goal was to examine the efficacy of the disambiguation and term weighting schemes in our query translation. We ran our system with three sets of queries, differentiated by the query lengths: 'title' queries with title fields only, 'short' queries with description fields only, and 'long' queries with all the three fields. The retrieval effectiveness measured with l 1-point average precision was used for comparison against the baseline of monolingual retrieval using the original English query.</Paragraph>
    <Paragraph position="2"> Table 3 gives the experimental results from using the four types of query set. The result from &amp;quot;Translated Query I&amp;quot; was generated only with the keyword selection and dictionary-based query translation stages. The result &amp;quot;Translated Query II&amp;quot; was generated after all the stages of our word disambiguation and query term weighting were done. And the result from the manually disambiguated query set was generated by manually selecting the best candidate terms from the Translated Query I.</Paragraph>
    <Paragraph position="3">  The performance of the Translated query set I was about 70%, 67%, and 56% of monolingual retrieval for the three cases, respectively. The performances of the translated query set II were about 82%, 85%, and 79% of monolingual retrieval for the three cases, respectively. The performance of the disambiguated queries, 85%, 94%, and 86% of monolingual retrieval for the three cases, respectively, can be treated as the upper limit for the cross-language retrieval. The reason why they are not 100% is attributed to the several factors. They are: 1) the inaccuracy of the manual translation of the original English query into the Korean queries, 2) the inaccuracy of the Korean morphological analyzer and the tagger in generating query words, and 3) the inaccuracy in generating candidate terms using the bilingual dictionary.</Paragraph>
    <Paragraph position="4"> The difference between Translated Query I and Translated Query II indicates that the Ml-based disambiguation and the term weighting schemes are effective in enhancing the retrieval effectiveness. In addition, the results show that the use of these query translation schemes is more effective with long queries than with shorter queries. This is expected because the longer the queries are, the more contextual information can be used for mutual disambiguation.</Paragraph>
    <Paragraph position="5"> Conclusion It has been known that query translation using a simple bilingual dictionary leads to a more than 40% drop in retrieval effectiveness due to translation ambiguity. Our query translation method uses mutual information extracted from the 1988 - 1990 AP corpus in order to solve the problems of the bilingual word disambiguation and query term weighting. The experiments using test collection of TREC-6 Cross-Language Track show that the method improves retrieval effectiveness in Korean-to-English cross-language IR. The performance can be up to 85% of the monolingual retrieval case. We also found that we obtained the largest percent increase with long queries.</Paragraph>
    <Paragraph position="6"> While the experimental results are very promising, there are several issues to be explored. First, we need to test how effectively the method can be applied. Second, we intend to experiment with other co-occurrence metrics, instead of the mutual information statistic, for possible improvement. This investigation is motivated by our observation of some counter-intuitive MI values. Third, we also plan on using different algorithms for choosing the terms and calculating the weights.</Paragraph>
    <Paragraph position="7"> In addition, we plan to use the pseudo relevance feedback method that has been proven to be effective in monolingual retrieval. Terms in some top-ranked documents are thrown into the original query with an assumption that at least some, if not all, of the documents are relevant to the original query and that the terms appearing in the documents are useful in representing user's information need. Here we need to determine a threshold value for the number of top ranked document for our cross-language retrieval situation, let alone other phenomenon.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML