XML Viewer - x98-1022

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/x98-1022_metho.xml
Size: 15,557 bytes
Last Modified: 2025-10-06 14:15:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="X98-1022">
  <Title>AN NTU-APPROACH TO AUTOMATIC SENTENCE EXTRACTION FOR SUMMARY GENERATION</Title>
  <Section position="4" start_page="163" end_page="164" type="metho">
    <SectionTitle>
2. SUMMARY AND SUMMAC-1 TASKS
</SectionTitle>
    <Paragraph position="0"> In general, summarization is to create a short version for the original document. The functions of summaries are shown as follows \[7\]:  * Announcement: announce the existence of the original document * Screening: determine the relativeness of the original document * Substitution: replace the original document * Retrospection: point to the original document  A summary can be one of four types, i.e., indicative summary, informative summary, critical summary, and extract. Indicative summaries are usually of functions of announcement and screening. By contrast, informative summaries are of function of substitution. It is very difficult to generate critical summaries in automatic ways. Extract can be of announcement, and replacement. In general, all of the four types of summaries are retrospective.</Paragraph>
    <Paragraph position="1"> The most important summary types are indicative summary and informative summary in the Internet environment. However, for researchers devoting themselves in automatic summarization, the common type of summary is extract. This is because the extract is produced through extracting the sentences in the original document and this is an easier way to produce a summary. But, how to make extract possess the functionality of informative summary and that of indicative summary? A common way is to produce a fix-length extract for indicative summary and to produce a best extract for informative summary. That is the also two different summaries underlying the tasks of SUMMAC- 1.</Paragraph>
    <Paragraph position="2"> SUMMAC-1 announces three tasks for automatic summarization: the first is categorization task; the second is adhoc task; the third is Q&amp;A task. These three tasks have their own designated purposes. As the SUMMAC-1 design, the tasks address the following types of summaries:  Although the definitions shown above are not the same as we talk about in previous paragraph, this will not interfere the development of an automatic summarization system.</Paragraph>
    <Paragraph position="3"> Because we have many experiences in applying language techniques to dealing with the similar tasks \[3, 8\], we decide to take part in Categorization task and Adhoc task after long discussion. The reasons are described as follows. For an application in the Internet environment, to provide introductory information for naive users is very important. It is very suitable to use generic indicative summaries to fulfill this function. However, the users have their own innate knowledge and they want that the generated summary is relative to the issued query at times. Therefore, the two different needs are fulfilled as the first and the second tasks initiated by SUMMAC-1. As to the third task, Q&amp;A, we think that it is much more relative to the information  extraction. It can be resolved in association with IE as a part of MUC's tasks.</Paragraph>
  </Section>
  <Section position="5" start_page="164" end_page="165" type="metho">
    <SectionTitle>
3.CATEGORIZATION TASK
</SectionTitle>
    <Paragraph position="0"> As the call for paper of SUMMAC- 1 says, the goal of the categorization task is to evaluate generic summaries to determine if the key concept in a given document is captured in the summary. The SUMMAC-1 documents fall into sets of topics and each topic contains approximately 100 documents.</Paragraph>
    <Paragraph position="1"> The task asks summarization systems to produce summary for each document, The assessor will read the summary and then assign the summary into one of five topics or the sixth topic, 'non-relevant' topic. The testing set of documents consists of two general domains, environment and global economy.</Paragraph>
    <Paragraph position="2"> Each domain in turn consists of five topics and each topic contains 100 documents. As a result, these documents could be regarded as the positive cues for the corresponding topic. By contrast, documents of other topics could be treated as the negative cues for the topic under consideration. The training stage and the testing stage are described in the following paragraph.</Paragraph>
    <Paragraph position="3"> For each topic, the following procedure is  executed in the training stage.</Paragraph>
    <Paragraph position="4"> (1) Screen out function words for each document (2) Calculate word frequency for current topic as positive feature vector (PFV) (3) Calculate word frequency for other topics as  negative feature vector (NFV) The testing stage is shown as follows.</Paragraph>
    <Paragraph position="5">  (1) Exclude function words in test documents (2) Identify the appropriate topic for testing documents (3) Use PFV and NFV of the identified topic to rank sentences in test documents (4) Select sentences to construct a best summary (5) Select sentences to construct a fixed-length summary  Based on this line, the approach for summary generation under the categorization task could be depicted as Figure 1 shows.</Paragraph>
    <Paragraph position="6"> Step (1) in training stage and testing stage are to exclude function words. A stop list is used as this purpose. A stop list widely distributed in the Internet and another list collected by us are combined. The resultant stop list consists of 744 words, such as abaft, aboard, about, above, across, afore, after, again, against, ain't, aint, albeit, all, almost, alone, along, alongside, already, also, although, always, am, amid, and so on.</Paragraph>
    <Paragraph position="7"> Steps (2) and (3) in training stage regard the document collection of a topic as a whole to extract the PFV and NFV. Firstly, the document collection of a topic is thought as the pool of words. Step (2) calculates the frequency of each word in this pool and screens out those words with frequency lower than 3. Step (3) repeats the same procedure. However, this time the pool consists of words from document collections of other topics. After normalization, two feature vectors PFV = (pwl, pw2, pw 3 ..... pwn) and NFV = (nw 1, nw2, nw 3 ..... nwn) are constructed to be unit vectors. The PFV and NFV are used to extract sentences of document and those extracted sentences consist of the summary. The idea behind this approach is that we use documents to retrieve the strongly related sentences in parallel to IR system use query sentence to retrieve the related documents.  Step (2) in testing stage is to identify which topic the testing document belongs to. The PFVs and the NFVs are used to compare with testing documents. Assume that the testing document D consists of dw t, dw2, dw 3 ..... and dw. words, i.e., D = (dw l, dw 2, dw 3 ..... dw,) and there are m pairs of PFV and NFV. The following equation is used to determine that the ~&amp;quot;th topic is best for the document under consideration.</Paragraph>
    <Paragraph position="9"> The similarity shown in the following is measured by inner product.</Paragraph>
    <Paragraph position="10"> n sim(PFV, D) = Z (pwj x dwj) j=l While the topic is determined, Step (3) uses the corresponding PFV~ and NFV~ to select sentences in the document. Whether a sentence S = (sw~, sw2, sw 3 ..... sw,) is selected as part of a summary depends on the relative score shown as follows. The similarity is also measured by inner product.</Paragraph>
    <Paragraph position="12"> In Step (4), the ranked list of RSes is examined and the maximal score gap between two immediate RSes is identified. If the number of sentences above the identified gap is between 10% to 50% of that of all sentences, these sentences are extracted as the best summary. Otherwise, the next maximal gap is examined whether it is a suitable gap or not. Step (5) just uses the best summary generated in Step (4) and makes a fixed-length summary according to the SUMMAC-1 rule.</Paragraph>
  </Section>
  <Section position="6" start_page="165" end_page="166" type="metho">
    <SectionTitle>
4. ADHOC TASK
</SectionTitle>
    <Paragraph position="0"> Adhoc Task is designed to evaluate user-directed summaries, that is to say, the generated summary should be closely related to the user's query. This kind of summary is much more important for Internet applications. We have devoted ourselves in related researches for a long time. A text model based on the interaction of nouns and verbs was proposed in \[3\], which is used to identify topics of documents. Chen and Chen \[8\] extended the text model to partition texts into discourse segments.</Paragraph>
    <Paragraph position="1"> The following shows the process of NTU's approach to adhoc task in SUMMAC-1 formal run.</Paragraph>
    <Paragraph position="2">  (1) Assign a part of speech to each word in texts. (2) Calculate the extraction strength (ES) for each sentence.</Paragraph>
    <Paragraph position="3"> (3) Partition the text into meaningful segments. (4) Filter out irrelevant segments according to the user's query.</Paragraph>
    <Paragraph position="4"> (5) Filter out irrelevant sentences based on ES. (6) Generate the best summary.</Paragraph>
    <Paragraph position="5"> (7) Generate the fixed-length summary from the best summary.</Paragraph>
    <Paragraph position="6">  Step (1) is used to identify the nouns and the verbs in texts, which are regarded as the core words in texts and will be used in Step (2). Step (2) is the major stage in our approach and will be discussed in detail. Generally speaking, each word in a sentence has its role. Some words convey ideas, suggestions, and concepts; some words are functional rather than meaningful. Therefore, it is much more reasonable to strip out these function words, while we manage to model information flow in texts. Nouns and verbs are two parts of speech under consideration. In addition, a measure for word importance should be worked out to treat each noun or verb in an appropriate scale. In tradition, term frequency (TF) is widely used in researches of information retrieval. The idea is that after excluding the functional words, the words occur frequently would carry the meaning underlying a text. However, if these words appear in many documents, the discriminative power of words will decrease.</Paragraph>
    <Paragraph position="7"> Spack Jones \[9\] proposed inverse document frequency (IDF) to rectify the aforementioned shortcoming. The IDF is shown as follows:</Paragraph>
    <Paragraph position="9"> where P is the number of documents in a collection, O(w) is the number of documents with word w.</Paragraph>
    <Paragraph position="10"> Nouns and verbs in well-organized texts are coherent in general. In order to automatically summarize texts, it is necessary to analyze the factors of composing texts. That is, the writing process of human beings. We use four distributional parameters to construct a text model:  The following will discuss each factor in sequence. The word importance means that when a word appears in texts, how strong it is to be the core word of texts. In other words, it represents the possibility of selecting this word as an index term. The IDF is chosen to measure the word importance in this paper. In addition, the frequency of a word itself does also play an important role in texts. For example, the word with high frequency usually makes readers impressive. The proposed model combines the two factors as the predecessors did.</Paragraph>
    <Paragraph position="11"> If a text discusses a special subject, there should be many relative words together to support this subject. That is to say, these relative words will co-occur frequently. From the viewpoint of statistics, some kind of distributional parameters like mutual information \[10\] could be used to capture this phenomenon.</Paragraph>
    <Paragraph position="12">  Including the distance factor is motivated by the fact that related events are usually located in the same texthood. The distance is measured by the difference between cardinal numbers of two words. We assign a cardinal number to each verb and noun in sentences. The cardinal numbers are kept continuous across sentences in the same paragraph. As a result, the distance between two words, w~ and</Paragraph>
    <Paragraph position="14"> where the D denotes the distance and C the cardinal number.</Paragraph>
    <Paragraph position="15"> Consider the four factors together, the proposed model for adhoc task is shown as follows:</Paragraph>
    <Paragraph position="17"> CS is the connective strength for a noun n, where SNN denotes the strength of a noun with other nouns, SNV the strength of a noun with other verbs, and pn and pv are the weights for SNN and SNV, respectively.</Paragraph>
    <Paragraph position="18"> The determination of pn and pv is via deleted interpolation \[11\] (Jelinek, 1985). The equations for</Paragraph>
    <Paragraph position="20"> f(wi,wj) is the co-occurrence of words wi and wj, and f(w) is the frequency of word w. In fact, f(wi,wj)/f(wi)xf(wj) is a normalized co-occurrence measure with the same form as the mutual information.</Paragraph>
    <Paragraph position="21"> When the connectivity score for each noun in a sentence is available, the chance for a sentence to be extracted as a part of summary can be expressed as follows. We call it extraction strength (ES).</Paragraph>
    <Paragraph position="23"> where m is the number of nouns in sentence Si.</Paragraph>
    <Paragraph position="24"> Because texts are well organized and coherent, it is necessary to take the paragraph into consideration for summary generation. However, the number of sentences in paragraphs may be one or two, especially in newswire. It is indispensable to group sentences into meaningful segments or discourse segments before carrying out the summarization task.</Paragraph>
    <Paragraph position="25"> Step (3) is for this purpose. A sliding window with size W is moved from the first sentence to the last sentence and the score for sentences within the window is calculated. Accordingly, a series of scores is generated. The score-sentence relation determines the boundaries of discourse segments. Figure 2 shows aforementioned process and how to calculate the scores. The window size W is 3 in this experiment.</Paragraph>
    <Paragraph position="26"> While discourse segments are determined, the user's query is used to filter out less relevant segments. This is fulfilled in Step (4). The nouns of a query are compared to the nouns in each segment and the same technique for calculating SNN mentioned above is used \[8\]. As a result, the precedence of segments to the query is calculated and then the medium score is identified. The medium is used to normalize the calculated score for each segment. The segments with normalized score lower than 0.5 are filtered out.</Paragraph>
    <Paragraph position="27">  Step (5) is to filter out the irrelevant sentences in the selected segments in Step (4). The ES of each sentence calculated in Step (2) is used as the ranking basis, but the ES of first sentence and that of the last sentence are doubled. Again, the medium of these ESes is chosen to normalize these score. The sentences with normalized score higher than 0.5 are selected as the best summary in Step (6). Because the length of fixed-length summary cannot exceed the 10% of the original text, Step (7) selects the top sentences that do not break this rule to form the fixed-length summary.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML