File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1110_intro.xml
Size: 3,246 bytes
Last Modified: 2025-10-06 14:02:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1110"> <Title>Issues in Preand Post-translation Document Expansion: Untranslatable Cognates and Missegmented Words</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> This work draws on prior research in pseudo-relevance feedback for both queries and documents.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Pre- and Post-translation Query Expansion </SectionTitle> <Paragraph position="0"> In pre-translation query expansion, the goal is both that of monolingual query expansion - providing additional terms to refine the query and to enhance the probability of matching the terminology chosen by the authors of the document - and to provide additional terms to limit the possibility of failing to translate a concept in the query simply because the particular term is not present in the translation lexicon. (Ballesteros and Croft, 1997) evaluated pre- and post-translation query expansion in a Spanish-English cross-language information retrieval task and found that combining pre- and post-translation query expansion improved both precision and recall with pre-translation expansion improving both precision and recall, and post-translation expansion enhancing precision. (McNamee and Mayfield, 2002)'s dictionary ablation experiments on the effect of translation resource size and pre- and post-translation query expansion effectiveness demonstrated the key and dominant role of pre-translation expansion in providing translatable terms. If too few terms are translated, post-translation expansion can provide little improvement.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Document Expansion </SectionTitle> <Paragraph position="0"> The document expansion approach was first proposed by (Singhal et al., 1999) in the context of spoken document retrieval. Since spoken document retrieval involves search of error-prone automatic speech recognition transcriptions, Singhal et al introduced document expansion as a way of recovering those words that might have been in the original broadcast but that had been misrecognized. They speculated that correctly recognized terms would yield a topically coherent transcript, while the sporadic errors would be from a random distribution.</Paragraph> <Paragraph position="1"> Enriching the documents with highly selective terms drawn from highly ranked documents retrieved by using the document itself as a query yielded retrieval effectiveness that improved not only over the original errorful transcription but also over a perfect manual transcription. (Levow and Oard, 2000) applied post-translation document expansion to both spoken documents and newswire text in Mandarin-English multi-lingual retrieval and found some improvements in retrieval effectiveness. (Levow, 2003) evaluated multi-scale units (words and bigrams) for post-transcription expansion of Mandarin spoken documents, finding the significant improvements for expansion with word units using bigram based indexing.</Paragraph> </Section> </Section> class="xml-element"></Paper>