File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2506_intro.xml
Size: 3,792 bytes
Last Modified: 2025-10-06 14:02:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2506"> <Title>A Novel Approach to Focus Identification in Question/Answering Systems</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> One method of retrieving information from vast document collections is by using textual Question/Answering.</Paragraph> <Paragraph position="1"> Q/A is an Information Retrieval (IR) paradigm that returns a short list of answers, extracted from relevant documents, to a question formulated in natural language. Another, different method to find the desired information is by navigating along subject categories assigned hierarchically to groups of documents, in a style made popular by Yahoo.com among others. When the defined category is reached, documents are inspected and the information is eventually retrieved.</Paragraph> <Paragraph position="2"> Q/A systems incorporate a paragraph retrieval engine, to find paragraphs that contain candidate answers, as reported in (Clark et al., 1999; Pasca and Harabagiu, 2001). To our knowledge no information on the text categories of these paragraphs is currently employed in any of the Q/A systems. Instead, another semantic information, such as the semantic classes of the expected answers, derived from the question processing, is used to retrieve paragraphs and later to extract answers. Typically, the semantic classes of answers are organized in hierarchical ontologies and do not relate in any way to the categories associated with documents.</Paragraph> <Paragraph position="3"> The ontology of expected answer classes contains concepts like PERSON, LOCATION or PRODUCT, whereas categories associated with documents are more similar to topics than concepts, e.g., acquisitions, trading or earnings. Given that text categories indicate different semantic information than the classes of the expected answers, we argue in this paper that text categories can be used to improve the quality of textual Q/A. In fact, by assigning text categories to both questions and answers, we have additional information on their similarity, which allows systems to perform a first level of word disambiguation. For example, if a user asks about the Apple characteristics, two type of answers may be retrieved: (a) about the apple company and (b) related to the agricultural domain. Instead, if the computer subject is selected, only the answers involving the Apple company will be considered. Thus, topic categories allows Q/A systems to detect the correct focus and consequently filter out many incorrect answers.</Paragraph> <Paragraph position="4"> In order to assign categories to questions and answers, the set of documents, on which the Q/A systems operate, has to be pre-categorized. For our experiments we trained our basic Q/A system on the well-known text categorization benchmark, Reuters-21578. This allows us to assume as categories of an answer the categories of the documents, which contain such answer. More difficult, instead, is assigning categories to questions as: (a) they are not known in advance and (b) their reduced size (in term of number of words) often prevents the detection of their categories.</Paragraph> <Paragraph position="5"> The article is organized as follows: Section 2 describes our Q/A system whereas Section 3 shows the question categorization problem and the solutions adopted. Section 4 presents the filtering and the re-ranking methods that combine the basic Q/A with the question classification models. Section 5 reports the experiments on question categorization, basic Question Answering and Question Answering based on Text Categorization (TC). Finally, Section 6 derives the conclusions.</Paragraph> </Section> class="xml-element"></Paper>