File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1104_intro.xml

Size: 4,107 bytes

Last Modified: 2025-10-06 14:00:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1104">
  <Title>Semantic Indexing using WordNet Senses</Title>
  <Section position="2" start_page="0" end_page="35" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The main problem with the traditional boolean word-based approach to Information Retrieval (IR) is that it usually returns too many results or wrong results to be useful.</Paragraph>
    <Paragraph position="1"> Keywords have often multiple lexical functionalities (i.e. can have various parts of speech) or have several semantic senses. Also, relevant information can be missed by not specifying the exact keywords.</Paragraph>
    <Paragraph position="2"> The solution is to include more information in the documents to be indexed, such as to enable a system to retrieve documents based on the words, regarded as lexical strings, or based on the semantic meaning of the words.</Paragraph>
    <Paragraph position="3"> With this idea in mind, we designed an IR system which performs a combined word-based and sense-based indexing and retrieval. The inputs to ~ systems consist of a question/query and a set of documents from which the information has to be retrieved. We add lexical and semantic information to both the query and the documents, during a preprocessing phase in which the input question and the texts are disambiguated. The disambiguation process relies on contextual information, and identify the meaning of the words based on WordNet 1 (FeUbaum, 1998) senses.</Paragraph>
    <Paragraph position="4"> As described in the fourth section, we have opted for a disambiguation algorithm which is semi-complete (it dis~mbiguates about 55% of the nouns and verbs), but is highly precise (over 92~ accuracy), instead of using a complete but less precise disambiguation. A part of speech tag is also appended to each word.</Paragraph>
    <Paragraph position="5"> After adding these lexical and semantic tags to the words, the documents are ready to be indexed: the index is created using the words as lexical strings (to ensure a word-based retrieval), and the semantic tags (for the sense-based retrieval).</Paragraph>
    <Paragraph position="6"> Once the index is created, an input query is ~n~wered using the document retrieval component of our system. First, the query is fully disambiguated; then, it is adapted to a specific format which incorporates semantic information, as found in the index, and uses the AND and OR operators implemented in the retrieval module.</Paragraph>
    <Paragraph position="7"> Hence, using semantic indexing, we try to solve the two main problems of the m systems described earlier. (1) relevant information is not missed by not specifying the exact keywords; with the new tags added to the words, we also retrieve words which are semantically related to the input keywords; (2) using the sense-based component of our retrieval sys-XWordNet 1.6 is used in our system.</Paragraph>
    <Paragraph position="8">  tern, the number of results returned from a search can be reduced, by specifying exactly the lexical functionality and/or the meaning of an input keyword.</Paragraph>
    <Paragraph position="9"> The system was tested using the Cranfield standard test collection. This collection consists of 1400 docllments, SGML formated, from the aerodynamics field. From the 225 questions associated with this data set, we have randomly selected 50 questions and build for each of them three types of queries: (1) a query that uses only keywords selected from the question, stemmed using the WordNet stemmer2; (2) a query that uses the keywords from the question and the synsets 3 for these keywords and (3) a query that uses the keywords from the question, the synsets for these keywords and the synsets for the keywords hypernyms. All these types of queries have been run against the semantic index described in this paper. Comparative results indicate the performance benefits of a retrieval system that uses a combined word-based and synset-based indexing and retrieval over the classic word based indexing.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML