File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/03/n03-2020_abstr.xml
Size: 3,497 bytes
Last Modified: 2025-10-06 13:42:47
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2020"> <Title>A Robust Retrieval Engine for Proximal and Structural Search</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In the text retrieval area including XML and Region Algebra, many researchers pursued models for specifying what kinds of information should appear in specified structural positions and linear positions (Chinenyanga and Kushmerick, 2001; Wolff et al., 1999; Theobald and Weilkum, 2000; Clarke et al., 1995). The models attracted many researchers because they are considered to be basic frameworks for retrieving or extracting complex information like events. However, unlike IR by keyword-based search, their models are not robust, that is, they support only exact matching of queries, while we would like to know to what degree the contents in specified structural positions are relevant to those in the query even when the structure does not exactly match the query.</Paragraph> <Paragraph position="1"> This paper describes a new ranked retrieval model that enables proximal and structural search for structured texts. We extend the model proposed in Region Algebra to be robust by i) incorporating the idea of rankedness in keyword-based search, and ii) expanding queries. While in ordinary ranked retrieval models relevance measures are computed in terms of words, our model assumes that they are defined in more general structural fragments, i.e., extents (continuous fragments in a text) proposed in Region Algebra. We decompose queries into subqueries to allow the system not only to retrieve exactly matched extents but also to retrieve partially matched ones. Our model is robust like keyword-based search, and also enables us to specify the structural and linear positions in texts as done by Region Algebra.</Paragraph> <Paragraph position="2"> The significance of this work is not in the development of a new relevance measure nor in showing superiority of structure-based search over keyword-based search, but in the proposal of a framework for integrating proximal and structural ranking models. Since the model treats all types of structures in texts, not only ordinary text structures like &quot;title,&quot; &quot;abstract,&quot; &quot;authors,&quot; etc., but also semantic tags corresponding to recognized named entities or events can also be used for indexing text fragments and contribute to the relevance measure. Since extents are treated similarly to keywords in traditional models, our model will be integrated with any ranking and scalability techniques used by keyword-based models.</Paragraph> <Paragraph position="3"> We have implemented the ranking model in our retrieval engine, and had preliminary experiments to evaluate our model. Unfortunately, we used a rather small corpus for the experiments. This is mainly because there is no test collection of the structured query and tag-annotated text. Instead, we used the GENIA corpus (Ohta et al., 2002) as structured texts, which was an XML document annotated with semantics tags in the filed of biomedical science. The experiments show that our model succeeded in retrieving the relevant answers that an exact-matching model fails to retrieve because of lack of robustness, and the relevant answers that a non-structured model fails because of lack of structural specification. null</Paragraph> </Section> class="xml-element"></Paper>