File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-2020_intro.xml
Size: 2,001 bytes
Last Modified: 2025-10-06 14:01:25
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-2020"> <Title>Looking for candidate translational equivalents in specialized, comparable corpora</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"> Salton (1970) first demonstrated that with carefully constructed thesauri, cross-language retrieval can perform as well as monolingual retrieval. In many experiments, parallel corpora have been used for training statistical models for bilingual lexicon compilation and disambiguation of query translation (Hiemstra et al., 1997; Littman et al., 1998). A limiting factor in these experiments was an expensive investment of human effort for collecting large-size parallel corpora, although Chen and Nie (2000)'s experiments show a potential solution by automatically collecting parallel Web pages.</Paragraph> <Paragraph position="1"> Comparable corpora are &quot;texts which, though composed independently in the respective language communities, have the same communicative function&quot; (Laffling, 1992). Such non-parallel texts can become prevalent in the development of bilingual lexicons and in cross-language information research as they may be easier to collect than parallel corpora (Fung and Yee, 1998; Rapp, 1999; Picchi and Peters, 1998). Among these, Rapp (1999) proposed that in any language there is a correlation between the cooccurrences of words which are translations of each other.</Paragraph> <Paragraph position="2"> Fung and Yee (1998) demonstrated that the associations between a word and its context seed words are preserved in comparable texts of different languages. By designing procedures to retrieve crosslingual lexical equivalents together, Picchi and Peters (1998) proposed that their system could have applications such as retrieving documents containing terms or contexts which are semantically equivalent in more than one language.</Paragraph> </Section> class="xml-element"></Paper>