File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1108_concl.xml

Size: 2,454 bytes

Last Modified: 2025-10-06 13:53:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1108">
  <Title>Learning Bilingual Translations from Comparable Corpora to Cross-Language Information Retrieval: Hybrid Statistics-based and Linguistics-based Approach</Title>
  <Section position="7" start_page="2" end_page="2" type="concl">
    <SectionTitle>
5 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> Dictionary-based translation has been widely used in CLIR because of its simplicity and availability.</Paragraph>
    <Paragraph position="1"> However, failure to translate words and compounds as well as limitations of general-purpose dictionaries especially for specialized vocabulary are among the reasons of drop in retrieval performance especially when dealing with CLIR. Enriching bilingual dictionaries and thesauri is possible through bilingual terminology acquisition from large corpora. Parallel corpora are costly to acquire and their availability is extremely limited for any pair of languages or even not existing for some languages, which are characterized by few amounts of Web pages on the WWW.</Paragraph>
    <Paragraph position="2"> In contrast, comparable corpora are more abundant, more available in different domains, less expensive and more accessible through the WWW.</Paragraph>
    <Paragraph position="3"> In the present paper, we investigated the approach of extracting bilingual terminology from comparable corpora in order to enrich existing bilingual lexicons and thus enhance Cross-Language Information Retrieval. We proposed a two-stages translation model consisting of bi-directional extraction, merging and disambiguation of the extracted bilingual terminology. A hybrid combination to linguistics-based pruning showed its efficiency across Japanese-English pair of languages. Most of the selected terms could be considered as translation candidates or expansion terms in CLIR.</Paragraph>
    <Paragraph position="4"> Ongoing research is focused on the integration of transliteration for the special phonetic alphabet.</Paragraph>
    <Paragraph position="5"> Techniques on phrasal translation will be investigated in order to select best phrasal translation alternatives in CLIR. Evaluations using other combinations and more efficient weighting schemes that are not included in SMART retrieval system such as OKAPI, which showed great success in information retrieval, are among the future subjects of our research on CLIR.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML