File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/w00-1102_abstr.xml
Size: 5,810 bytes
Last Modified: 2025-10-06 13:41:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1102"> <Title>Exploiting Lexical Expansions and Boolean Compositions for Web Querying</Title> <Section position="1" start_page="0" end_page="13" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper describes an experiment aiming at evaluating the role of NLP based optimizations (i.e. morphological derivation and synonymy expansion) in web search strategies. Keywords and their expansions are composed in two different Boolean expressions (i.e. expansion insertion and Cartesian combination) and then compared with a keyword conjunctive composition, considered as the baseline.</Paragraph> <Paragraph position="1"> Results confirm the hypothesis that linguistic optirnizations significantly improve the search engine performances.</Paragraph> <Paragraph position="2"> Introduction The purpose of this work was to verify if, and in which measure, some linguistic optimizations on the input query can improve the performance of an existing search engine on the web 1.</Paragraph> <Paragraph position="3"> First of all we tried to determine a proper baseline to compare the optimized search strategies. Such a baseline should reflect as much as possible the average use of the search engine by typical users when querying the web. A query is usually composed of a limited number of keywords (i.e. two or three), in a lemmatized form, that the search engine composes by default in a conjunctive 1 The results reported in this paper are part of a more extended project under development at ITC-irst, which involves a collaboration with Kataweb, an Italian web portal. We thank both Kataweb and Inktomi Corporation for kindly having placed the search engine for the experiments at our disposal. expression. Starting from this level (we call it &quot;basic level&quot;) we have designed two more sophisticated search strategies that introduce a number of linguistic optirnizations over the keywords and adopt two composition modalities allowed by the &quot;advanced search&quot; capabilities of the search engine. One modality (i.e. Keyword expansion Insertion Search - KIS) first expands each keyword of the base level with morphological derivations and synonyms, then it builds a Boolean expression where each expansion is added to the base keyword list. The second modality (i.e. Keyword Cartesian expansion Search KCS) adopts the same expansions of the previous one, but composes a Boolean expression where all the possible tuples among the base keywords and expansions are considered.</Paragraph> <Paragraph position="4"> The working hypothesis is that the introduction of lexical expansions should bring an improvement in the retrieval of relevant documents. To verify the hypothesis, a comparative evaluation has been carried out using the three search modalities described above over a set of factual questions. The results of the queries have been manually scored along a five value scale, with the aim of taking into account not only the presence in the document of the answer to the question, but also the degree of contextual information provided by the document itself with respect to the question. Both the presence of the answer and the contextual information have been estimated by two relevance functions, one that considers the document position, the other that does not.</Paragraph> <Paragraph position="5"> The experiment results confirm that the introduction of a limited number of lexical expansions (i.e. 2-3) improves the engine performance. In addition, the Cartesian composition of the expansions behaves significantly better than the; search modality based on keyword insertion.</Paragraph> <Paragraph position="6"> Some of the problems that we faced with in this work have been already discussed in previous works in the literature. The use of query expansions for text retrieval is a debated topic. Voorhees (1998) argues that WordNet derived query expansions are effective for very short queries, while they do not bring any improvements for long queries. From a number of experiments (Mandala et al., 1998) conclude that WordNet query expansions can increase recall but degrade precision performances. Three reasons are suggested to explain this behavior: (i) the lack of relations among terms of different parts of speech in WordNet; (ii) many semantic relations are not present in WordNet; (iii) proper names are not included in WordNet. (Gonzalo et al., 1998) pointed out some more weaknesses of WordNet for Information Retrieval purposes, in particular the lack of domain information and the fact that sense distinctions are excessively fine-grained for the task. A related topic of query expansion is query I~anslation, which is performed in Cross-Language Information Retrieval (Verdejo et al. 2000).</Paragraph> <Paragraph position="7"> This work brings additional elements in favor of the thesis that using linguistic expansions can improve IR in a web search scenario. In addition we argue that, to be effective, query expansion has to be combined with proper search modalities. The evaluation experiment we carried out, even within the limitations due to time and budget constraints, was designed to take into account the indications that came out at the recent TREC workshop on Question Answering (Voorhees, 2000).</Paragraph> <Paragraph position="8"> The paper is structured as follows. Section 1 and 2 respectively present the modalities for the linguistic expansion and for the query composition. Section 3 reports the experimental setting for the comparative evaluation of the three search modalities. Section 4 describes and discusses the results obtained, while in the conclusions we propose some directions for future work.</Paragraph> </Section> class="xml-element"></Paper>