File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2166_intro.xml
Size: 2,431 bytes
Last Modified: 2025-10-06 14:06:05
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2166"> <Title>Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> With increasing amounts of machine readable information being available, one of the major problems for users is to find those texts that are most relevant to their interests and needs in as short an amount of time as possible.</Paragraph> <Paragraph position="1"> The traditional IR approach is that the user inputs a boolean query (possibly in a natural language-like formulation) and the system responds by presenting to the user the texts that are a &quot;best match&quot; to his query. In corpora where abstracts are not already provided it might facilitate the retrieval process a lot if text abstracts could be generated automatically either off-line to be stored together with tile texts (e.g., as ranked sentence numbers), or on-line, in accordance with the user's query.</Paragraph> <Paragraph position="2"> So far, there have been two main approaches in this field (for overviews on abstracting and summarizing see, e.g., (?) or (?)). One is oriented more towards information extraction, working with a knowledge base in a limited domain (&quot;top down&quot;, see e.g., (?; ?; ?)), tile other type relies mainly on various heuristics (&quot;bottom up&quot;, see e.g., (?; ?)) which are less dependent on the domain but are still at least; tuned to the text sort and thus have to be adapted whenever the system would have to be applied outside its original environment. Combinations of these methods have also been attempted recently (see e.g. (?)).</Paragraph> <Paragraph position="3"> The focus of this paper will be the description and evaluation of an abstracting system which avoids the disadvantages coming along with most of these traditional approaches, while still being able to achieve a performance which matches closely the results of an identical abstracting task performed by human subjects in a comparative study.</Paragraph> <Paragraph position="4"> The results indicate that it is indeed possible to build a system relying on a simple and efficient algorithm, using standard tf*idf weights only, while still achieving a satisfying output}</Paragraph> </Section> class="xml-element"></Paper>