File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0504_intro.xml
Size: 2,474 bytes
Last Modified: 2025-10-06 14:02:29
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0504"> <Title>A Qualitative Comparison of Scientific and Journalistic Texts from the Perspective of Extracting Definitions</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> An information retrieval system informs on the existence (or non-existence) and whereabouts of documents relating to the request of a user (Lancaster, 1968). On the other hand, a question answering (QA) system allows a user to ask a question in natural language and receive a concise answer, possibly with a validating context (Hirschman and Gaizauskas, 2001).</Paragraph> <Paragraph position="1"> Questions asking about definitions of terms (i.e., 'What is X?') occur frequently in the query logs of search engines (Voorhees, 2003). However, due to their complexity, recent work in the field of question answering has largely neglected them and concentrated instead on answering factoid questions for which the answer is a single word or short phrase (Blair-Goldensohn et al., 2003). Much of this work has been motivated by the question answering track of the Text REtrieval Conference (TREC), which evaluates systems by providing them with a common challenge.</Paragraph> <Paragraph position="2"> In a recent project inspired by our experiences in TREC (Sutcliffe et al., 2003), a system was built for extracting definitions of technical terms from scientific texts. The topic was salmon fish biology, a very different one from that of news articles.</Paragraph> <Paragraph position="3"> What, then, is the effect of domain on the applicability of QA? In this paper we attempt to answer this question, focusing on definitions and drawing on our findings from previous projects.</Paragraph> <Paragraph position="4"> The rest of the paper is structured as follows: First, we review recent related work. Second, we summarise the objectives, methods and findings of the SOK-I QA project, named after the sockeye salmon. Third, we compare the characteristics of scientific text with those of newspaper articles illustrating our points with examples from our SOK-I collection as well from the New York Times, CLEF 1994 Los Angeles Times collection and AQUAINT corpus. Fourth, we discuss the implications that these have for definitional QA.</Paragraph> <Paragraph position="5"> Finally, we draw conclusions from the study.</Paragraph> </Section> class="xml-element"></Paper>