File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/n04-4013_metho.xml

Size: 13,010 bytes

Last Modified: 2025-10-06 14:08:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4013">
  <Title>Web Search Intent Induction via Automatic Query Reformulation</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Prior Work
</SectionTitle>
    <Paragraph position="0"> Our interest is in informational queries. The general approach we explore to assist users nd what they want is to present structured results. Dumais et. al. (2001) have shown that displaying structured results improves a user's ability to nd relevant documents quickly.</Paragraph>
    <Paragraph position="1"> There are three general techniques for presenting web search results in a structured manner, ranging from totally supervised methods to totally unsupervised methods. The rst approach, manual classi cation, is typied by a system like Yahoo!, where humans have created a hierarchical structure describing the web and manually classify web pages into this hierarchy. The second approach, automatic classi cation (see, for instance, the classi cation system reported by Dumais (2000)) builds on the hierarchies constructed for manual classi cation systems, but web pages are categorized by a (machinelearned) text classi cation system. The third approach, typi ed by systems such as Vivisimo and the system of Zamir et al. (1999), look at the text of the returned documents and perform document clustering.</Paragraph>
    <Paragraph position="2"> A related unsupervised approach to this problem is from Beeferman and Berger (2000). Their approach leverages click-through data to cluster related queries.</Paragraph>
    <Paragraph position="3"> The intuition behind their method is that if two different queries lead to users clicking on the same URL, then these queries are related (and vice-versa). They perform agglomerative clustering to group queries, based on click-through data.</Paragraph>
    <Paragraph position="4"> Our approach is most closely related to this agglomerative clustering approach, but does not require click-through data. Moreover, the use of click-through data can result in query clusters with low user utility (see Section 3.2). Furthermore, our approach does not suffer from the computation cost of document clustering by text and produces structured results with meaningful names without the economic cost of building hierarchies.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Methodology
</SectionTitle>
    <Paragraph position="0"> Our goal is to provide a range of possible needs to a user whose query is underspeci ed. Suppose a naive user John enters a query for y shing. This query will retrieve a large set of documents. We assume that John's search need (information about ies for catching trout) is somewhere in or near this set, but we do not know exactly where. However, we can attempt to identify other queries, made by other people, that are relevant to John's need. We refer to this process as Query Driven Search Expansion and henceforth refer to our system as the QDSE system.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Formal Speci cation
</SectionTitle>
      <Paragraph position="0"> Formally, if Q is the set of queries to our search engine and D is the set of indexed documents, let R be a binary relation on Q D where qRd if and only if d is in the return set for the query q. It is likely that the set of related queries is quite large for a given q (in practice the size is on the order of ten thousand; for our dataset, y shing has 29; 698 related queries). However, some of these queries will be only tangentially related to q. Moreover, some of them will be very similar to each other. In order to measure these similarities, we de ne a distance metric between two queries q and q0 based on their returned document sets, ignoring the text of the query:</Paragraph>
      <Paragraph position="2"> One could then sort the set of related queries according to kq; q0k and present the top few to the user. Unfortunately, this is insuf cient: the top few are often too similar to each other to provide any new useful information.</Paragraph>
      <Paragraph position="3"> To get around this problem, we use the maximal marginal relevance (MMR) scheme originally introduced by Carbonell et. al. (1998). In doing so, we order alternative quereies according to: argmin</Paragraph>
      <Paragraph position="5"> (2) where q0s are drawn from unreturned query expansions and q00s are drawn from the previously returned set.1</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Alternative Distance Metrics
</SectionTitle>
      <Paragraph position="0"> One particular thing to note in Equation 1 is that we do not take relative rankings into account in calculating distance. One could de ne a distance metric weighted by each document's position in the return list.</Paragraph>
      <Paragraph position="1"> We ran experiments using PageRank to weight the distance (calculated based on a recent full web crawl). System output was observed to be signi cantly inferior to the standard ranking. We attribute this degradation to the following: if two queries agree only on their top documents, they are too similar to be worth presenting to the user as alternatives. This is the same weakness as is found in the Beeferman and Berger (2000) approach.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 System
</SectionTitle>
    <Paragraph position="0"> The system described above functions in a completely automatic fashion and responds in real-time to users queries. Across the top of the return results, the query 1Queries that appear to be URLs, and strings with a very small edit distance to the original are discarded.</Paragraph>
    <Paragraph position="1"> is listed, as are the top ranked alternative queries. Each of these query suggestions is a link to a heading, which are shown below. Below this list are the top ve search result links from MSN Search under the original query2.</Paragraph>
    <Paragraph position="2"> After the top ve results from MSN Search, we display each header with a +/- toggle to expand or collapse it.</Paragraph>
    <Paragraph position="3"> Under each expanded query we list its top 4 results.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Evaluation Setup
</SectionTitle>
    <Paragraph position="0"> Evaluating the results of search engine algorithms without embedding these algorithms in an on-line system is a challenge. We evaluate our system against a standard web search algorithm (in our case, MSN Search). Ideally, since our system is focused on informational queries, we would like a corpus of hquery; intenti pairs, where the query is underspeci ed. One approach would be to create this corpus ourselves. However, doing so would bias the results. An alternative would be to use query logs; unfortunately, these do not include intents. In the next section, we explain how we create such pairs.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Deriving Query/Intent Pairs
</SectionTitle>
      <Paragraph position="0"> We have a small collection of click-through data, based on experiments run at Microsoft Research over the past year. Given this data, for a particular user and query, we look for the last URL they clicked on and viewed for at least two minutes3. We consider all of these documents to be satisfactory solutions for the user's search need. We further discard pairs that were in the top ve because we intend to use these pairs to evaluate our system against vanilla MSN Search. Since the rst ve results our system returns are identical to the rst ve results MSN Search returns, it is not worthwhile annotating these data-points (this resulted in a removal of about 20% of the data, most of which were navigational queries).</Paragraph>
      <Paragraph position="1"> These hquery; URLi pairs give us a hint at how to get to the desired hquery; intenti pairs. For each hquery; URLi pair, we looked at the query itself and the web page at the URL. Given the query, the relevant URL and the top ve MSN Search results, we attempted to create a reasonable search intent that was (a) consistent with the query and the URL, but (b) not satis ed by any of the top ve results. There were a handful of cases (approximately an additional 5%) where we could not think of a reasonable intent for which (b) held in these cases, we discarded that pair.4 In all, we created 52 such pairs; four randomly  to be relevant. This does not concern us, as we do not actually use these URLs for evaluation purposes we simply use them to gain insight into intents.</Paragraph>
      <Paragraph position="2"> 4We make no claim that the intents we derive were necessarily the original intent in the mind of the user. We only go through this process to get a sense of the sorts of information chosen hquery; URL; intenti triples are shown in Table 1.</Paragraph>
      <Paragraph position="3"> Once the intents have been derived, the original URLs are thrown away: they are not used in any of our experiments.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Relevance Annotation
</SectionTitle>
      <Paragraph position="0"> Our evaluation now consists of giving human annotators hquery; intenti pairs and having them mark the rst relevant URL in the return set (if there is one). However, in order to draw an unbiased comparison between our system and vanilla MSN Search, we need to present the output from both as a simple ordered list. This requires rst converting our system's output to a list.</Paragraph>
      <Paragraph position="1">  We wish to linearize our results in such a way that the position of the rst relevant URL enables us to draw meaningful inferences. In vanilla MSN search, we can ascribe a cost of 1 to reading each URL in the list: having a relevant URL as the 8th position results in a cost of 8. Similarly, we wish to ascribe a cost to each item in our results. We do this by making the assumption that the user is able to guess (with 100% accuracy) which sub-category a relevant URL will be in (we will evaluate this assumption later). Given this assumption, we say that the cost of a link in the top 5 vanilla MSN links is simply its position on the page. Further down, we assume there is a cost for reading each of the MSN links, as well as a cost for reading each header until you get to the one you want.</Paragraph>
      <Paragraph position="2"> Finally, there is a cost for reading down the list of links under that header. Given this cost model, we can linearize our results by simply sorting them by cost (in this model, several links will have the same cost in this case, we fall back to the original ordering).</Paragraph>
      <Paragraph position="3">  We divided the 52 hquery; intenti pairs into two sets of 32 (12 common pairs). Each set of 32 was then scrambled and half were assigned to class System 1 and half were assigned to class System 2. It was ensured that the 12 overlapping pairs were evenly distributed.</Paragraph>
      <Paragraph position="4"> Four annotators were selected. The rst two were presented with the rst 32 pairs and the second two were presented with the second 32 pairs, but with the systems swapped.5 Annotators were given a query, the intent, and the top 100 documents returned from the search according to the corresponding system (in the case of QDSE, enough alternate queries were selected so that there would be exactly 100 total documents listed). The annotator selected the rst link which answered the intent. If there was no relevant link, they recorded that. people really are looking for, so that we need not invent queries off the tops of our heads.</Paragraph>
      <Paragraph position="5"> 5The interface used for evaluation converted the QDSE results into a linear list using our linearization technique so that the interface was consistent for both systems.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Predictivity Annotation
</SectionTitle>
      <Paragraph position="0"> Our cost function for the linearization of the hierarchical results (see Section 5.2.1) assumes that users are able to predict which category will contain a relevant link. In order to evaluate this assumption, we took our 52 queries and the automatically generated category names for each using the QDSE system. We then presented four new annotators with the queries, intents and categories. They selected the rst category which they thought would contain a relevant link. They also were able to select a None category if they did not think any would contain relevant links. Each of the four annotators performed exactly the same annotation it was done four times so agreement could be calculated.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML