File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1110_intro.xml

Size: 2,304 bytes

Last Modified: 2025-10-06 14:01:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1110">
  <Title>Automatic summarization of search engine hit lists</Title>
  <Section position="3" start_page="99" end_page="100" type="intro">
    <SectionTitle>
2 Search
</SectionTitle>
    <Paragraph position="0"> The search component of SNS is a personalized search engine called MySearch. MySearch utilizes a centralized relational database to store all the URL indexes and other related URL information. Spiders are used to fetch URLs from the Internet. After a URL is downloaded, the following steps are applied to index the URL:  along with its frequency and position information.</Paragraph>
    <Paragraph position="1"> The contents of URLs are indexed based on the locations of the keywords: Anchor, Title, and Body. This allows weighted retrieval based on different word positions. For example, a user can specify that he'd like to give a weight 5 for the keyword appearing in the title, 4 for anchor, and 2 for body. This information can be saved in his personal profile and used for later weighted ranking. Besides the weighted search, MySearch also supports Boolean search and Vector Space search (Salton, 1989). For the vector space model, the famous TF-IDF is used for ranking purpose. We used a modified version of TF-IDF: log(or+O.5)*log(N/df), where if means the number of times a term appeared in the content of an URL, N is the total number of documents in the text collection, and dfstands for the number of unique URLs in which a term appears in the entire collection.</Paragraph>
    <Paragraph position="2"> A user can choose which search method he wants to use. He/she can also combine Boolean search with Vector Space search.</Paragraph>
    <Paragraph position="3"> These options are provided to give users more flexibility to control the retrieval results as  past research indicated that different ranking functions give different performances (Salton, 1989).</Paragraph>
    <Paragraph position="4"> A sample search for &amp;quot;Clinton&amp;quot; using the TF-IDF Vector Space search is shown in Figure 3. The keyword &amp;quot;Clinton&amp;quot; is highlighted using a different color to help users get more contextual information. The retrieval status value is shown in a bold black font after the URL title.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML