File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-1118_intro.xml

Size: 3,162 bytes

Last Modified: 2025-10-06 14:02:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1118">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 939-946, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Integrating linguistic knowledge in passage retrieval for question answering</Title>
  <Section position="2" start_page="0" end_page="939" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Improving information retrieval (IR) through natural language processing (NLP) has been the goal for many researchers. NLP techniques such as lemmatization and compound splitting have been used in several studies (Krovetz, 1993; Hollink et al., 2003). Linguistically motivated syntactic units such as noun phrases (Zhai, 1997), head-modifier pairs (Fagan, 1987; Strzalkowski et al., 1996) and subject-verb-object triples (Katz and Lin, 2003) have also been integrated in information retrieval. However, most of these studies resulted in only little success or even decreasing performance. It has been argued that NLP and especially deep syntactic analysis is still too brittle and ineffective (Katz and Lin, 2003).</Paragraph>
    <Paragraph position="1"> Integrating NLP in information retrieval seems to be very hard because the task here is to match plain text keywords to natural language documents.</Paragraph>
    <Paragraph position="2"> In question answering (QA), however, the task is to match natural language questions to relevant answers within document collections. For this, we have to analyze the question in order to determine what kind of answer the user is expecting. Traditional information retrieval is used in QA systems to filter out relevant passages from the document collection which are then processed to extract possible answers. Hence, the performance of this passage retrieval component (especially in terms of recall) is crucial for the success of the entire system. NLP tools and linguistic resources are frequently used in QA systems, e.g. (Bernardi et al., 2003; Moldovan et al., 2002), although not very often for passage retrieval (some exceptions are (Strzalkowski et al., 1996; Katz and Lin, 2003; Neumann and Sacaleanu, 2004)).</Paragraph>
    <Paragraph position="3"> Our goal is to utilize information that can be extracted from the analyzed question in order to match linguistic features and syntactic units in analyzed  documents. The main research question is to find appropriate units and features that actually help to improve the retrieval component. Furthermore, we have to find an appropriate way of combining query terms to optimize IR performance. For this, we apply an iterative learning approach based on example questions annotated with their answers.</Paragraph>
    <Paragraph position="4"> In the next section we will give a brief description of our question answering system with focus on the passage retrieval component. Thereafter we will discuss the query optimization algorithm followed by a section on experimental results. The final section contains our conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML