File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/x98-1022_intro.xml
Size: 4,761 bytes
Last Modified: 2025-10-06 14:06:50
<?xml version="1.0" standalone="yes"?> <Paper uid="X98-1022"> <Title>AN NTU-APPROACH TO AUTOMATIC SENTENCE EXTRACTION FOR SUMMARY GENERATION</Title> <Section position="3" start_page="0" end_page="163" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Towards the end of the 20 th century, the Internet has become a part of life style. People enjoy Internet services from various providers and these ISPs (Internet Services Providers) do their best to fulfill users' information need. However, if we investigate the techniques used in these services, we will find out that they are not different from those used in traditional Information Retrieval or Natural Language Processing. However, the cyberspace provides us an environment to utilize these techniques to serve more persons than ever before.</Paragraph> <Paragraph position="1"> The members under the leadership of Professor Hsin-Hsi Chen of Natural Language Processing Lab.</Paragraph> <Paragraph position="2"> (NLPL) in Department of Computer Science and Information Engineering, National Taiwan University have dedicated themselves in researches of NLP for many years. The research results have been reported in literature and received the reputation from colleagues of NLP field. Many systems for various NLP applications have been developed, especially for Chinese and English. Some systems could be accessed directly via WWW browsers. For example, an MT meta-server \[1\] provides an online English-to-Chinese translation service. (http://nlg3.</Paragraph> <Paragraph position="3"> csie. ntu.edu.tw/mtir/mtir.html) Language & Information Processing System Lab.</Paragraph> <Paragraph position="4"> (LIPS) in Department of Library and Information Science, National Taiwan University also devotes itself in researches of language, information and library sciences. Chen and Chen \[2\] proposed hybrid model for noun extraction from running texts and provided an automatic evaluation method. Chen \[3\] proposed a cowus-based model to identify topics and used it to determine sub-topical structures.</Paragraph> <Paragraph position="5"> Generally speaking, we are capable of dealing with numerous NLP applications or apply NLP techniques to other applications using our current research results. The two laboratories think that current Internet services are not enough for the people living in the next century. At least, two kinds of services are important and crucial in the 21 st century: one is the information extraction; the other is automatic summarization.</Paragraph> <Paragraph position="6"> Information Extraction (IE) \[4\] systems manage to extract predefined information from data or documents. What kind of information is appropriate is a domain-dependent problem. For example, the information conveyed by business news and by terrorism news is very different. As a result, the predefined information plays an important role in IE systems. In fact, the predefined information is the so-called metadata \[5\]. The joint efforts on IE and metadata will benefit both sides.</Paragraph> <Paragraph position="7"> Automatic summarization is to use automatic mechanism to produce a finer version for the original document. Two possible methodologies could be applied to constructing summaries. The first is to extract sentences directly from texts; the second is to analyze the text, extract the conceptual representation of the text, and then generate summary based on the conceptual representation. No matter what methodology is adopted, the processing time should be as little as possible for Internet applications.</Paragraph> <Paragraph position="8"> As we mentioned above, information extraction and automatic summarization are regarded as two important Internet services in the next century.</Paragraph> <Paragraph position="9"> Therefore, we take part in MET-2 and SUMMAC-1 for the respective purposes. In this paper, we will focus on the tasks of SUMMAC-1 and the details of MET-2 can be referred to the paper presented in MET-2 Conference \[6\].</Paragraph> <Paragraph position="10"> This paper is organized as follows. Section 2 discusses the types of summaries and their functions. In addition, the tasks of SUMMAC-1 and the corresponding functions to the traditional summaries are also described. Sections 3 and 4 propose the models to carry out the categorization task and adhoc task, respectively. The method for extracting feature vectors, calculating extraction strengths, and identifying discourse segments are illustrated in detail in the two sections. Section 5 shows our results in summary and compares with other systems.</Paragraph> <Paragraph position="11"> Section 6 gives a short conclusion.</Paragraph> </Section> class="xml-element"></Paper>