File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0704_intro.xml
Size: 2,078 bytes
Last Modified: 2025-10-06 14:06:23
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0704"> <Title>Automated Text Summarization in SUMMARIST</Title> <Section position="3" start_page="0" end_page="18" type="intro"> <SectionTitle> L2 SUMMARIST </SectionTitle> <Paragraph position="0"> Over the past two years we have been developing . the text summarization, system.</Paragraph> <Paragraph position="1"> SUMMARIST- In this paper, we describe its structure and provide de~ls on the evaluated results of two of its component modules The goal of SUMMARIST is to provide both extracts and absWacts for arbitrary English (and later, other-language) input text SUMMARIST combines symbolic world knowledge (embodied m WordNet, dicUonanes, and s~mxlar resources) with robust NLP processing (using IR and statistical techniques) to overcome the problems endemic to either approach alone These .problems arise because exmtmg robust NLP methods tend to operate at the word level, and hence miss concept-level generalizations, which are provided by symbolic world knowledge, whale on the other hand symbolic knowledge is too difficult to acqmre m large enough scale to provide coverage and robustness. For robust summarization, both aspects are needed The heart of abstract formation Is the interpretation process performed to fuse concepts This step occurs in the middle of the summarization procedure, to find the appropriate set of concepts in an Input text, an initial stage of concept identification and extraction is required, to produce the summary, a final stage of generation Is needed Thus SUMMARIST IS based on the following 'equatson' summanzauon = topic ,denttficat, on + mterpretatwn + generation</Paragraph> <Paragraph position="3"> This breakdown is motivated as follows 1. Identification&quot; Select or filter the input to determine the most important, central, topics For generahty we assume that a text can have many (sub)-toplcs, and that the topic extraction process can be parametertzed to include more or fewer of them to produce longer or shorter summaries</Paragraph> </Section> class="xml-element"></Paper>