File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1086_intro.xml

Size: 3,288 bytes

Last Modified: 2025-10-06 14:01:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1086">
  <Title>Implicit Ambiguity Resolution Using Incremental Clustering in Korean-to-English Cross-Language Information Retrieval</Title>
  <Section position="2" start_page="2" end_page="2" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper describes a method of applying dynamic incremental clustering to the implicit resolution of query ambiguities in Korean-to-English cross-language information retrieval. The method uses the clusters of retrieved documents as a context for re-weighting each retrieved document and for re-ranking the retrieved documents.</Paragraph>
    <Paragraph position="1"> Cross-language information retrieval (CLIR) enables users to retrieve documents written in a language different from a query language. The methods used in CLIR fall into two categories: statistical approaches and translation approaches. Statistical methods establish cross-lingual associations without language translation (Dumais et al, 1997; Rehder et al, 1997; Yang et al, 1998). They require large-scale bilingual corpora. In translation approach, either queries or documents are translated. Though document translation is possible when high quality machine translation systems are available (Kwon et al, 1997; Oard and Hackett, 1997), it is not very practical. Query translation methods (Hull and Grefenstette, 1996; Davis, 1996; Eichmann et al, 1998; Yang et al, 1998; Jang et al, 1999; Chun, 2000) based on bilingual dictionaries, multilingual ontology or thesaurus are much more practical. Many researches adopt dictionary-based query translation because it is simpler and practical, given the wide availability of bilingual or multilingual dictionaries. In order to achieve a high performance CLIR using dictionary-based query translation, however, it is necessary to solve the problem of increased ambiguities of query terms. One way of resolving query ambiguities is to use the statistics, such as mutual information (Church and Hanks, 1990), to measure associations of query terms, on the basis of existing corpora (Jang et al, 1999).</Paragraph>
    <Paragraph position="2"> Document clusters, widely adopted in various applications such as browsing and viewing of document results (Hearst and Pedersen, 1996) or topic detection (Allan et al, 1998), also reflect the association of terms and documents. Lee et al (2001) showed that incorporating a document re-ranking method based on document clusters into the vector space retrieval achieved the significant improvement in monolingual IR, as it contributed to resolving ambiguities caused by polysemous query terms.</Paragraph>
    <Paragraph position="3"> The noise or ambiguity produced by dictionary-based query translation in CLIR is much larger than the polysemous ambiguities in monolingual IR. For example, a Korean term 'eunhaeng[eun-haeng]' is a polysemous term with two meanings: 'bank' and 'ginkgo'. The English term 'bank' itself is polysemous, so the translated query ends up having magnified ambiguities. We will show that the method we propose, i.e. implicit ambiguity resolution using incremental clustering, is highly effective in dealing with the increased query ambiguities in CLIR.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML