File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/03/w03-1110_abstr.xml
Size: 1,445 bytes
Last Modified: 2025-10-06 13:43:05
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1110"> <Title>Issues in Preand Post-translation Document Expansion: Untranslatable Cognates and Missegmented Words</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Query expansion by pseudo-relevance feedback is a well-established technique in both mono- and cross- lingual information retrieval, enriching and disambiguating the typically terse queries provided by searchers. Comparable document-side expansion is a relatively more recent development motivated by error-prone transcription and translation processes in spoken document and cross-language retrieval. In the cross-language case, one can perform expansion before translation, after translation, and at both points. We investigate the relative impact of pre- and post- translation document expansion for cross-language spoken document retrieval in Mandarin Chinese. We find that post-translation expansion yields a highly significant improvement in retrieval effectiveness, while improvements due to pre-translation expansion alone or in combination do not reach significance. We identify two key factors of segmentation and translation in Chinese orthography that limit the effectiveness of pre-translation expansion in the Chinese-English case, while post-translation expansion yields its full benefit.</Paragraph> </Section> class="xml-element"></Paper>