File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/p03-2025_evalu.xml

Size: 3,172 bytes

Last Modified: 2025-10-06 13:58:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-2025">
  <Title>Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval</Title>
  <Section position="4" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
3 Experiments and Evaluations in CLIR
</SectionTitle>
    <Paragraph position="0"> Experiments have been carried out to measure the improvement of our proposal on bilingual Japanese-English tasks in CLIR, i.e. Japanese queries to retrieve English documents. Collections of news articles from Mainichi Newspapers (1998-1999) for Japanese and Mainichi Daily News (1998-1999) for English were considered as comparable corpora. We have also considered documents of NTCIR-2 test collection as comparable corpora in order to cope with special features of the test collection during evaluations. NTCIR-2 (Kando, 2001) test collection was used to evaluate the proposed strategies in CLIR. SMART information retrieval system (Salton, 1971), which is based on vector space model, was used to retrieve English documents.</Paragraph>
    <Paragraph position="1"> Thus, Content words (nouns, verbs, adjectives, adverbs) were extracted from English and Japanese texts. Morphological analyzers, ChaSen version 2.2.9 (Matsumoto and al., 1997) for texts in Japanese and OAK2 (Sekine, 2001) for texts in English were used in linguistic pre-processing. EDR (EDR, 1996) was used to translate context vectors of source and target languages.</Paragraph>
    <Paragraph position="2"> First experiments were conducted on the several combinations of weighting parameters and schemes of SMART retrieval system for documents terms and query terms. The best performance was realized by ATN.NTC combined weighting scheme.</Paragraph>
    <Paragraph position="3"> The proposed two-stages model using comparable corpora showed a better improvement in terms of average precision compared to the simple model (onestage comparable corpora-based translation) with +27.1% and a difference of -32.87% in terms of average precision of the monolingual retrieval. Combination to linguistics-based pruning showed a better performance in terms of average precision with +41.7% and +11.5% compared to the simple comparable corpora-based model and the two-stages comparable corpora-based model, respectively.</Paragraph>
    <Paragraph position="4"> Applying re-scoring techniques to phrasal translation yields significantly better results with 10.35%, 8.27% and 3.08% for the WWW-based, the NTCIR-based and comparable corpora-based techniques, respectively compared to the hybrid two-stages comparable corpora and linguistics-based pruning.</Paragraph>
    <Paragraph position="5"> The proposed approach based on bi-directional comparable corpora largely affected the translation because related words could be added as translation alternatives or expansion terms. Effects of extracting bilingual terminology from bi-directional comparable corpora, pruning using linguistics-based knowledge and re-scoring using different phrasal translation techniques were positive on query translation/expansion and thus document retrieval.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML