File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-2008_intro.xml

Size: 3,268 bytes

Last Modified: 2025-10-06 14:01:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-2008">
  <Title>A Ranking Model of Proximal and Structural Text Retrieval Based on Region Algebra</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In the biomedical area, the number of papers is very large and increases, as it is difficult to search the information. Although keyword-based retrieval systems can be applied to a database of papers, users may not get the information they want since the relations between these keywords are not specified. If the document structures, such as &amp;quot;title&amp;quot;, &amp;quot;sentence&amp;quot;, &amp;quot;author&amp;quot;, and relation between terms are tagged in the texts, then the retrieval is improved by specifying such structures. Models of the retrieval specifying both structures and words are pursued by many researchers (Chinenyanga and Kushmerick, 2001; Wolff et al., 1999; Theobald and Weilkum, 2000; Deutsch et al., 1998; Salminen and Tompa, 1994; Clarke et al., 1995). However, these models are not robust unlike keyword-based retrieval, that is, they retrieve only the exact matches for queries.</Paragraph>
    <Paragraph position="1"> In the previous research (Masuda et al., 2003), we proposed a new ranking model that enables proximal and structural search for structured text. This paper investigates an application of the ranked region algebra to information retrieval from large scale but unannotated documents. We reports in detail what kind of data can be retrieved in the experiments. Our approach is to annotate documents with document structures and semantic tags by taggers automatically, and to retrieve information by specifying both structures and words using ranked region algebra. In this paper, we apply our approach to the OHSUMED test collection (Hersh et al., 1994), which is a public test collection for information retrieval in the field of biomedical science but not tag-annotated. We annotate OHSUMED by various taggers and retrieve information from the tag-annotated corpus.</Paragraph>
    <Paragraph position="2"> We have implemented the ranking model in our retrieval engine, and had preliminary experiments to evaluate our model. In the experiments, we used the GENIA corpus (Ohta et al., 2002) as a small but manually tag-annotated corpus, and OHSUMED as a large but automatically tag-annotated corpus. The experiments show that our model succeeded in retrieving the relevant answers that an exact-matching model fails to retrieve because of lack of robustness, and the relevant answers that a non-structured model fails because of lack of structural specification. We report how structural specification works and how it doesn't work in the experiments with OHSUMED.</Paragraph>
    <Paragraph position="3"> Section 2 explains the region algebra. In Section 3, we describe our ranking model for the structured query and texts. In Section 4, we show the experimental results of this system.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Expression Description
</SectionTitle>
      <Paragraph position="0"/>
      <Paragraph position="2"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML