File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1075_intro.xml

Size: 3,010 bytes

Last Modified: 2025-10-06 14:03:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1075">
  <Title>The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval</Title>
  <Section position="3" start_page="0" end_page="593" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Cross-Language Information Retrieval (CLIR) enables users to construct queries in one language and search the documents in another language. CLIR requires that either the queries or the documents be translated from a language into another, using available translation resources.</Paragraph>
    <Paragraph position="1"> Previous studies have concentrated on query translation because it is computationally less expensive than document translation, which requires a lot of processing time and storage costs (Hull &amp; Grefenstette, 1996).</Paragraph>
    <Paragraph position="2"> There are three kinds of methods to perform query translation, namely Machine Translation (MT) based methods, dictionary-based methods and corpus-based methods. Corresponding to these methods, three types of translation resources are required: MT systems, bilingual wordlists and parallel or comparable corpora.</Paragraph>
    <Paragraph position="3"> CLIR effectiveness depends on both the design of the retrieval system and the quality of the translation resources that are used.</Paragraph>
    <Paragraph position="4"> In this paper, we explore the relationship between the translation quality of the MT system and the retrieval effectiveness. The MT system involved in this research is a rule-based English-to-Chinese MT (ECMT) system. We degrade the MT system in two ways. One is to degrade the rule base of the system by progressively removing rules from it. The other is to degrade the dictionary by gradually removing word entries from it. In both methods, we observe successive changes on translation quality of the MT system.</Paragraph>
    <Paragraph position="5"> We conduct query translation with the degraded MT systems and obtain translated queries of varying quality. Then we submit the translated queries to the IR system and evaluate the performance. Retrieval effectiveness is found to be strongly influenced by the translation quality of the queries. We further analyze the factors that affect the retrieval effectiveness. Title queries are found to be preferred in MT-based query translation. In addition, the size of the dictionary is shown to have stronger impact on retrieval effectiveness than the size of the rule base in MT-based query translation.</Paragraph>
    <Paragraph position="6"> The remainder of this paper is organized as follows. In section 2, we briefly review related work. In section 3, we introduce two systems involved in this research: the rule-based ECMT system and the KIDS IR system. In section 4, we describe our experimental method. Section 5 and section 6 reports and discusses the experimental results. Finally we present our conclusion and future work in section 7.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML