File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/w99-0605_evalu.xml

Size: 7,336 bytes

Last Modified: 2025-10-06 14:00:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0605">
  <Title>Cross-Language Information Retrieval for Technical Documents</Title>
  <Section position="5" start_page="32" end_page="35" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> This section investigates the performance of our CLIR system based on the TREC-type evaluation methodology: the system outputs 1,000 top documents, and TREC evaluation software is used to calculate the recall-precision trade-off and ll-point average precision.</Paragraph>
    <Paragraph position="1"> For the purpose of our evaluation, we used the NACSIS test collection (Kando et al., 1998).</Paragraph>
    <Paragraph position="2"> This collection consists of 21 Japanese queri~'s and approximately 330,000 documents (in el- null ther a confl)ination of English and Japanese or either of the languages individually), collected fi'om technical papers published by 65 Japanese associations tbr various fields. Each document consists of the document ID, title, name(s) of author(s), name/date of conference, hosting organization, abstract and keywords, from which titles, abstracts and keywords were used for our evaluation. We used as target documents approximately 187,000 entries where abstracts are in both English and Japanese. Each query consists of the title of the topic, description, narrative and list of synonyms, from which we used only the description. Roughly speaking, most topics are related to electronic, information and control engineering. Figure 4 shows example descriptions (translated into English by one of the authors). Relevance assessment was performed based on one of the three ranks of relevance, i.e., &amp;quot;relevant&amp;quot;, &amp;quot;partially relevant&amp;quot; and &amp;quot;irrelevant&amp;quot;. In our evaluation, relevant documents refer to both &amp;quot;relevant&amp;quot; and &amp;quot;partially relevant&amp;quot; documents 5.</Paragraph>
    <Section position="1" start_page="33" end_page="34" type="sub_section">
      <SectionTitle>
4.1 Evaluation of compound word
</SectionTitle>
      <Paragraph position="0"> translation We compared the following query translation methods: (1 i a control, in which all possible translations derived from the (original) EDR technical terminology dictionary are used as query terms (&amp;quot;EDR&amp;quot;), (2) all possible base word translations derived from our dictionary are used (&amp;quot;all&amp;quot;),  not.</Paragraph>
      <Paragraph position="1"> (3) randomly selected k translations derived from our bilingual dictionary are used (&amp;quot;random&amp;quot;), (4) k-best translations through compound  word translation are used (&amp;quot;C\Y=T&amp;quot;). For system &amp;quot;EDR&amp;quot;, compound words unlisted in the EDR dictionary were manuMly segmented so that substrings (shorter compound words or base words) can be translated. For both systems &amp;quot;random&amp;quot; and &amp;quot;CWT&amp;quot;, we arbitrarily set k = 3. Figure 5 and Table 2 show the recall-precision curve and l 1-point average precision for each method, respectively. In these, &amp;quot;J-J&amp;quot; refers to the result obtained by the Japanese-Japanese IR system, which uses as documents Japanese titles/abstracts/keywords comparable to English fields in the NACSIS collection. This can be seen as the upper bound for CLIR performance 6. Looking at these results, we can conclude that the dictionary production and probabilistic translation methods we proposed are effective for CLIR.</Paragraph>
      <Paragraph position="2">  of compound word translation 6Regrettably, since the NACSIS collection does not contain English queries, we cannot estimate the upper bound performance by English-English IR.</Paragraph>
    </Section>
    <Section position="2" start_page="34" end_page="34" type="sub_section">
      <SectionTitle>
4.2 Evaluation of transliteration
</SectionTitle>
      <Paragraph position="0"> In the NACSIS collection, three queries contain katakana (base) words unlisted in our bilingual dictionary: Those words are &amp;quot;ma-i-ni-n-gu (mining)&amp;quot; and &amp;quot;ko-ro-ke-i-sho-n (collocation)&amp;quot;. However, to emphasize the effectiveness of transliteration, we compared the following extreme cases: (1) a control, in which every katakana word is discarded from queries (&amp;quot;control&amp;quot;), (2) a case where transliteration is applied to every katakana word and top 10 candidates are used (&amp;quot;translit&amp;quot;).</Paragraph>
      <Paragraph position="1"> Both cases use system &amp;quot;CWT&amp;quot; in Section 4.1. In the case of &amp;quot;translit&amp;quot;, we do not use katakana entries listed in the base word dictionary. Figure 6 and Table 3 show the recall-precision curve and ll-point average precision for each case, respectively. In these, results for &amp;quot;CWT&amp;quot; correspond to those in Figure 5 and Table 2, respectively. We can conclude that our transliteration method significantly improves the baseline perfomlance (i.e., &amp;quot;control&amp;quot;), and comparable to word-based translation ill terms of CLIR performance. null An interesting observation is that the use of transliteration is robust against typos in documents, because a number of similar strings are used as query terms. For example, our transliteration method produced the following strings for &amp;quot;ri-da-ku-sho-n (reduction)&amp;quot;: riduction, redction, redaction, reduction. null All of these words are effective for retrieval, because they are contained in the target documents. null</Paragraph>
    </Section>
    <Section position="3" start_page="34" end_page="35" type="sub_section">
      <SectionTitle>
4.3 Evaluation of the overall
</SectionTitle>
      <Paragraph position="0"> performance We compared our system (&amp;quot;CWT+translit&amp;quot;) with the Japanese-Japanese IR system, where (unlike the evaluation in Section 4.2) transliteration was applied only to &amp;quot;ma-i-ni-n-gu (mining)&amp;quot; and &amp;quot;ko-ro-ke-i-sho-n (collocation)&amp;quot;. Figure 7 and Table 4 show the recall-precision curve and l 1-point average precision for each system, respectively, from which one can see that our CLIR system is quite comparable with the monolingual IR system in performance. In addition, from Figure 5 to 7, one can see that the monolingual system generally performs better  at lower re(:all while the CLIR system pertbrms b(,It(,r at higher recall.</Paragraph>
      <Paragraph position="1"> For further investigation, let us discuss similar (~xperim(mtal results reported by Kando and Aizawa (1998), where a bilingual dictionary produced ti'om Japanese/English keyword pairs in the NACSIS documents is used for query translation. Their evaluation method is almost the same as pertbrmed in our experinmnts. One difference is that they use the &amp;quot;OpenText&amp;quot; search engine 7, and thus the performance tbr Jal)anese-Japanese IR is higher than obtained in out&amp;quot; evaluation. However, the performance of their Japanese-English CLIR systems, which is roughly 50-60% of that for their Japanese-Japanese IR system, is comparable with our CLIR system performance. It is expected that using a more sophisticated search engine, our CLIR system will achieve a higher performance than that obtained by Kando and Aizawa.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML