File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/w00-1312_concl.xml
Size: 2,374 bytes
Last Modified: 2025-10-06 13:52:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1312"> <Title>Cross-lingual Information Retrieval using Hidden Markov Models</Title> <Section position="12" start_page="101" end_page="101" type="concl"> <SectionTitle> 12 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> We proposed an approach to cross-lingual IR based on hidden Markov models, where the system estimates the probability that a query in one language could be generated from a document in another language. Experiments using the TREC5 and TREC6 Chinese test sets and the TREC4 Spanish test set show the following: * Our retrieval model can reduce the performance degradation due to translation ambiguity This had been a major limiting factor for other query-translation approaches.</Paragraph> <Paragraph position="1"> * Some earlier studies suggested that query translation is not an effective approach to cross-lingual IR (Carbonell et al, 1997).</Paragraph> <Paragraph position="2"> However, our results suggest that query translation can be effective particularly if a bilingual dictionary is the primary bilingual resource available.</Paragraph> <Paragraph position="3"> * Manual selection from the translations in the bilingual dictionary improves performance little over the HMM.</Paragraph> <Paragraph position="4"> * We believe an algorithm cannot rule out a possible translation with absolute confidence; it is more effective to rely on probability estimation/re-estimation to differentiate likely translations and unlikely translations.</Paragraph> <Paragraph position="5"> * Rather than translation ambiguity, a more serious limitation to effective cross-lingual IR is incompleteness of the bilingual lexicon used for query translation.</Paragraph> <Paragraph position="6"> * Cross-lingual IR performance is typically 75% that of mono-lingual for our HMM on the Chinese and Spanish collections.</Paragraph> <Paragraph position="7"> Future improvements in cross-lingual IR will come by attacking the incompleteness of bilingual dictionaries and by improved query expansion and context-dependent translation. Our current model assumes that query terms are generated one at time. We would like to extend the model to allow phrase generation in the query generation process. We also wish to explore techniques to extend bilingual lexicons.</Paragraph> </Section> class="xml-element"></Paper>