XML Viewer - w05-0205

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0205_metho.xml
Size: 9,064 bytes
Last Modified: 2025-10-06 14:09:55
<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0205">
  <Title>Towards Intelligent Search Assistance for Inquiry-Based Learning</Title>
  <Section position="4" start_page="25" end_page="26" type="metho">
    <SectionTitle>
3 Method
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="25" end_page="25" type="sub_section">
      <SectionTitle>
3.1 Utilizing Learning Context
</SectionTitle>
      <Paragraph position="0"> OLISA acquires search context by parsing OIBL logs and by monitoring search history. For example, in the planning phase of a learning task, IdeaKeeper asks students to input DQ, Sub-Questions (SQs), potential keywords, and to answer some questions such as &amp;quot;what do I know&amp;quot;, &amp;quot;what do I want to know&amp;quot;, etc.</Paragraph>
      <Paragraph position="1"> The context information is represented as bag-of-words feature vectors. To calculate the vectors, we first remove common terms. We compiled a corpus of 30 million words from 6700 full-length documents collected from diverse resources. Word frequencies are calculated for 168K unique words in the corpus. A word is considered common if it is in the 1000 most frequent word list. Remaining words are stemmed using Porter's algorithm (Porter, 1980).</Paragraph>
      <Paragraph position="2"> All contextual information are combined to form a main feature vector ( ), where is the weight of the ith term in combined context. It's defined by product of term frequency (tf) and inverse document frequency (idf).</Paragraph>
      <Paragraph position="4"> Comparing with traditional tf measure, we do not assign a uniform weight to all words in context.</Paragraph>
      <Paragraph position="5"> Rather, we consider DQ/SQ and the current query more important than the rest of context. We define their differently from other context. tf</Paragraph>
      <Paragraph position="7"> The is calculated similarly. For the term frequency of current query , we assign it a larger weight as it represents the current information needs:</Paragraph>
      <Paragraph position="9"> where N is total number of documents in the corpus, and n</Paragraph>
      <Paragraph position="11"> is the number of documents containing ith term. The term weight is defined by:</Paragraph>
      <Paragraph position="13"> These context feature vectors are calculated for later use in re-ranking search results.</Paragraph>
      <Paragraph position="14"> Meanwhile, we use Brill's tagger (Brill, 1995) to determine parts of speech (POS) of words in DQ/SQ. Heuristic rules (Zhang and Xuan, 2005) based on POS are used to extract noun phrases.</Paragraph>
      <Paragraph position="15"> Noun phrases containing words with high term weight are considered as keyphrases. The key-phrase weight is defined by:</Paragraph>
      <Paragraph position="17"/>
    </Section>
    <Section position="2" start_page="25" end_page="26" type="sub_section">
      <SectionTitle>
3.2 Term Suggestion
</SectionTitle>
      <Paragraph position="0"> When a user commits a query, OLISA will first search it on selected search engines (Google as default). If the total hit exceeds certain threshold (2 million as default), we consider the query potentially too general. In addition to the original query, we will call term suggestion component to narrow down the search concept by expanding the query.</Paragraph>
      <Paragraph position="1"> WordNet (Fellbaum, 1998) is used during the expansion. Below is the outline of our heuristic algorithm in generating term suggestion.</Paragraph>
      <Paragraph position="2"> for each keyword in original query do if the keyword is part of a keyphrase then form queries by merging each phrase with the original query if multiple keyphrases are involved then select up to #maxPhrase keyphrases with highest weights if #queries&gt;0 then return queries for each keyword that has hyponyms in WordNet do if some hyponym occur at least once in learning context then form queries by merging the hyponym with the original query else form suggestions by merging the hyponym with the original query if #queries&gt;0 or #suggestions&gt; 0 then return queries and suggestions for each keyword in original query that has synonyms in WordNet do if some synonym is part of a keyphrase then form suggestions by merging keywords in phrase with original query if multiple keyphrases are involved then select up to #maxPhrase keyphrases with highest weights return suggestions  On the other hand, if the total hit is below certain threshold, the query is potentially too specific. Thus term suggestion component is called to generalize the query. The procedure is similar to the algorithm above, but will be done in the reverse direction. For example, keywords will replace phrases and hypernyms will replace hyponyms.</Paragraph>
      <Paragraph position="3"> Since there are cases where learners desire specific search terms, both original and expanded queries will be submitted, and results for the former will be presented at the top of the returned list.</Paragraph>
      <Paragraph position="4"> If no new queries are constructed, OLISA will return the results from original query along with suggestions. Otherwise, OLISA will send requests for each expanded query to selected search engines. Since by default we return up to R</Paragraph>
      <Paragraph position="6"> search engine results to user, we will extract the</Paragraph>
      <Paragraph position="8"> /(#newQuery+1) entries from results of each new query and original query. These results will be re-ranked by an algorithm that we will describe later. Then the combined results will be presented to the user in IdeaKeeper along with a list of expanded queries and suggestions.</Paragraph>
    </Section>
    <Section position="3" start_page="26" end_page="26" type="sub_section">
      <SectionTitle>
3.3 Query Reformulation
</SectionTitle>
      <Paragraph position="0"> From our observation, in OIBLE students often submit questions in natural language. However, most of the time, such type of queries does not return desirable results. Therefore, we loosely follow Kwok (2001) to reformulate queries. We apply Link Grammar Parser (Sleator and Temperley, 1993) to parse sentence structure. For example, one student asked &amp;quot;What is fat good for&amp;quot;. The parser generates the following linkage:</Paragraph>
      <Paragraph position="2"> LEFT-WALL what is.v fat.n good.a for.p ? where &amp;quot;SI&amp;quot; is used in subject-verb inversion. By getting this linkage, we are able to reformulate the query as &amp;quot;fat is good for&amp;quot;. Meanwhile, regular expressions are developed to eliminate interrogative words, e.g. &amp;quot;what&amp;quot; and &amp;quot;where&amp;quot;.</Paragraph>
      <Paragraph position="3"> Search engines may return very different results for the original query and the reformulated queries.</Paragraph>
      <Paragraph position="4"> For example, for the example above, Google returned 620 hits for the reformulated query, but only 2 hits for the quoted original question.</Paragraph>
      <Paragraph position="5"> By sending request in both original and reformulated forms, we can significantly improve recall ratio without losing much precision.</Paragraph>
    </Section>
    <Section position="4" start_page="26" end_page="26" type="sub_section">
      <SectionTitle>
3.4 Integrating Multiple Search Engines
</SectionTitle>
      <Paragraph position="0"> We enhanced the searching component of IdeaKeeper by integrating multiple search engines (e.g. Google, AskJeeves, NSDL, etc.). IdeaKeeper will parse and transform search results and present users with a uniform format of results from different search engines. A spelling check function for search keywords is built in OLISA, which combined spelling check results from Google as well as suggestions from our own program based on a local frequency-based dictionary.</Paragraph>
    </Section>
    <Section position="5" start_page="26" end_page="26" type="sub_section">
      <SectionTitle>
3.5 Search Results Re-Ranking
</SectionTitle>
      <Paragraph position="0"> After query reformulation OLISA will send requests to selected search engines. For performance issue, we only retrieve a total of 100 snippets (R Q snippets from each query) from web search engines. Feature vector is calculated for each snippet in the measure similar to (5), except that tf is actual frequency without assigning additional weight. The similarity between learning context C and each document D (i.e. snippet) is calculated as:  The higher the similarity score, the more relevant it will be to user's query as well as to the overall learning context.</Paragraph>
      <Paragraph position="1"> OLISA re-ranks snippets by similarity scores.</Paragraph>
      <Paragraph position="2"> To avoid confusion to learners, the snippets from the original query and the expanded queries are re-ranked independently. R Q re-ranked results from original query appear at the top as default, followed by other re-ranked results with signs indicating corresponding queries. The expanded queries and further search term suggestions are shown in a dropdown list in IdeaKeeper.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML