File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-2029_metho.xml

Size: 8,484 bytes

Last Modified: 2025-10-06 14:10:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-2029">
  <Title>Adaptivity in Question Answering with User Modelling and a Dialogue Interface</Title>
  <Section position="4" start_page="199" end_page="199" type="metho">
    <SectionTitle>
2 User model
</SectionTitle>
    <Paragraph position="0"> Depending on the application of interest, the UM can be designed to suit the information needs of the QA module in different ways. As our current application, YourQA2,isalearning-oriented, web-based system, our UM consists of the user's:  1) age range, a [?]{7 [?] 11,11 [?] 16,adult}; 2) reading level, r [?]{poor,medium,good}; 3) webpages of interest/bookmarks, w.</Paragraph>
    <Paragraph position="1">  Analogies can be found with the SeAn (Ardissono et al., 2001) and SiteIF (Magnini and Strapparava, 2001) news recommender systems where age and browsing history, respectively, are part of the UM. In this paper we focus on how to filter and adapt search results using the reading level parameter.</Paragraph>
  </Section>
  <Section position="5" start_page="199" end_page="199" type="metho">
    <SectionTitle>
3 Dialogue interface
</SectionTitle>
    <Paragraph position="0"> The dialogue component will interact with both the UM and the QA module. From a UM point of view, the dialogue history will store previous conversations useful to construct and update a model of the user's interests, goals and level of understanding. From a QA point of view, the main goal of the dialogue component is to provide users with a friendly interface to build their requests. A typical scenario would start this way:  -- System: Hi, how can I help you? -- User: I would like to know what books Roald Dahl wrote.  The query sentence &amp;quot;what books Roald Dahl wrote&amp;quot;, is thus extracted and handed to the QA module. In a second phase, the dialogue module is responsible for providing the answer to the user once the QA module has generated it. The dialogue manager consults the UM to decide on the most suitable formulation of the answer (e.g. short sentences) and produce the final answer accordingly, e.g.: --System: RoaldDahlwrotemanybooksforkidsandadults, including: &amp;quot;The Witches&amp;quot;, &amp;quot;Charlie and the Chocolate Factory&amp;quot;, and &amp;quot;James and the Giant Peach&amp;quot;.</Paragraph>
  </Section>
  <Section position="6" start_page="199" end_page="200" type="metho">
    <SectionTitle>
4 Question Answering Module
</SectionTitle>
    <Paragraph position="0"> The flow between the three QA phases - question processing, document retrieval and answer generation - is described below (see Fig. 2).</Paragraph>
    <Section position="1" start_page="199" end_page="199" type="sub_section">
      <SectionTitle>
4.1 Question processing
</SectionTitle>
      <Paragraph position="0"> We perform query expansion, which consists in creating additional queries using question word synonyms in the purpose of increasing the recall of the search engine. Synonyms are obtained via</Paragraph>
    </Section>
    <Section position="2" start_page="199" end_page="200" type="sub_section">
      <SectionTitle>
4.2 Retrieval
</SectionTitle>
      <Paragraph position="0"> Document retrieval We retrieve the top 20 documents returned by Google4 for each query produced via query expansion. These are processed in the following steps, which progressively narrow the part of the text containing relevant information. null Keyphrase extraction Once the documents are retrieved, we perform keyphrase extraction to determine their three most relevant topics using Kea (Witten et al., 1999), an extractor based on Naive Bayes classification.</Paragraph>
      <Paragraph position="1"> Estimation of reading levels To adapt the readability of the results to the user, we estimate the reading difficulty of the retrieved documents using the Smoothed Unigram Model (Collins-Thompson and Callan, 2004), which proceeds in  two phases. 1) In the training phase, sets of representativedocumentsarecollectedforagivennum- null ber of reading levels. Then, a unigram language model is created for each set, i.e. a list of (word stem, probability) entries for the words appearing in its documents. Our models account for the following reading levels: poor (suitable for ages 711), medium (ages 11-16) and good (adults). 2) In the test phase, given an unclassified document D, its estimated reading level is the model lmi maximizing the likelihood that D [?] lmi5.</Paragraph>
      <Paragraph position="2"> Clustering We use the extracted topics and estimated reading levels as features to apply hierarchical clustering on the documents. We use the WEKA (Witten and Frank, 2000) implementation of the Cobweb algorithm. This produces a tree where each leaf corresponds to one document, and sibling leaves denote documents with similar topics and reading difficulty.</Paragraph>
    </Section>
    <Section position="3" start_page="200" end_page="200" type="sub_section">
      <SectionTitle>
4.3 Answer extraction
</SectionTitle>
      <Paragraph position="0"> In this phase, the clustered documents are filtered based on the user model and answer sentences are located and formatted for presentation.</Paragraph>
      <Paragraph position="1"> UM-based filtering The documents in the cluster tree are filtered according to their reading difficulty: only those compatible with the UM's reading level are retained for further analysis6.</Paragraph>
      <Paragraph position="2"> Semantic similarity Within each of the retained documents, we seek the sentences which are semantically most relevant to the query by applying the metric in (Alfonseca et al., 2001): we represent each document sentence p and the query q as word sets P = {pw1,...,pwm} and Q = {qw1,...,qwn}. The distance from p to q is then distq(p) = summationtext1[?]i[?]m minj[d(pwi,qwj)], where d(pwi,qwj) is the word-level distance between pwi and qwj based on (Jiang and Conrath, 1997).</Paragraph>
      <Paragraph position="3"> Ranking Given the query q, we thus locate</Paragraph>
      <Paragraph position="5"> comes the document score. Moreover, each clus-</Paragraph>
      <Paragraph position="7"> word in the document, C(w,d) is the number of occurrences of w in D and P(w|lmi) is the probability with which w occurs in lmi 6However, if their number does not exceed a given threshold, we accept in our candidate set part of the documents havingthenextlowestreadability-oramediumreadabilityifthe null user's reading level is low ter is assigned a score consisting in the maximal score of the documents composing it. This allows to rank not only documents, but also clusters, and  presentresultsgroupedbyclusterindecreasingorder of document score. Answer presentation We present our answers in an HTML page, where results are listed following the ranking described above. Each result consists of the title and clickable URL of the originating document, and the passage where the sentence which best answers the query is located and highlighted. Question keywords and potentially useful information such as named entities are in colour.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="200" end_page="200" type="metho">
    <SectionTitle>
5 Sample result
</SectionTitle>
    <Paragraph position="0"> We have been running our system on a range of queries, including factoid/simple, complex and controversial ones. As an example of the latter, we report the query &amp;quot;Who wrote the Iliad?&amp;quot;, which is a subject of debate. These are some top results: -- UMgood: &amp;quot;Most Classicists would agree that, whether there was ever such a composer as &amp;quot;Homer&amp;quot; or not, the Homeric poems are the product of an oral tradition [...] Could the Iliad and Odyssey have been oral-formulaic poems, composed on the spot by the poet using a collection of memorized traditional verses and phases?&amp;quot; -- UMmed: &amp;quot;No reliable ancient evidence for Homer [...] General ancient assumption that same poet wrote Iliad and Odyssey (and possibly other poems) questioned by many modern scholars: differences explained biographically in ancient world (e g wrote Od. in old age); but similarities could be due to imitation.&amp;quot; -- UMpoor: &amp;quot;Homer wrote The Iliad and The Odyssey (at least, supposedly a blind bard named &amp;quot;Homer&amp;quot; did).&amp;quot; In the three results, the problem of attribution of the Iliad is made clearly visible: document passages provide a context which helps to explain the controversy at different levels of difficulty.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML