File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-3007_metho.xml

Size: 6,773 bytes

Last Modified: 2025-10-06 14:10:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-3007">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Investigations on Event-Based Summarization</Title>
  <Section position="5" start_page="37" end_page="38" type="metho">
    <SectionTitle>
3 Independent Event-based Summari-
</SectionTitle>
    <Paragraph position="0"> zation Based on our observation, we assume that events in the documents may have different importance. Important event terms will be repeated and always occur with more event elements, because reporters hope to state them clearly. At the same time, people may omit time or location of an important event after they describe the event previously. Therefore in our research, event terms occurs in different circumstances will be assigned different weights. Event terms occur between two event elements should be more important than event terms occurring just beside one event elements. Event terms co-occurring with participants may be more important than event terms just beside time or location.</Paragraph>
    <Paragraph position="1"> The approach on independent event-based summarization involves following steps.</Paragraph>
    <Paragraph position="2">  1. Given a cluster of documents, analyze each sentence one at a time. Ignore sentences that do not contain any event element. null 2. Tag the event terms in the sentence, which is between two event elements or near an event element with the distance limitation.</Paragraph>
    <Paragraph position="3"> For example, [Event Element A, Even Term, Event Element B], [Event Term, Event Element A], [Event Element A, Event Term] 3. Assign different weights to different event  terms, according to contexts of event terms. Different weight configurations are described in Section 5.2. Contexts refer to number of event elements beside event terms and types of these event elements.</Paragraph>
    <Paragraph position="4"> 4. Get the average tf*idf score as the weight of every event term or event element. The algorithm is similar with Centroid.</Paragraph>
    <Paragraph position="5">  5. Sum up the weights of event terms and event elements in a sentence.</Paragraph>
    <Paragraph position="6"> 6. Select the top sentences with highest  weights, according to the length of summary. null</Paragraph>
  </Section>
  <Section position="6" start_page="38" end_page="39" type="metho">
    <SectionTitle>
4 Relevant Event-based Summarization
</SectionTitle>
    <Paragraph position="0"> Independent event-based approaches do not exploit relevance between events. However, we think that it may be useful to identify important events. After a document is represented by events, relevant events are linked together. We made the assumption that important events may be mentioned often and events associated to important events may be important also. PageRank is a suitable algorithm to identify the importance of events from a map, according to the previous assumption. In the following sections, we will discuss how to represent documents by events and how to identify important event with PageRank algorithm.</Paragraph>
    <Section position="1" start_page="38" end_page="38" type="sub_section">
      <SectionTitle>
4.1 Document Representation
</SectionTitle>
      <Paragraph position="0"> We employ an event map to represent content of a document cluster, which is about a certain topic. In an event map, nodes are event terms or event elements, and edges represent association or modification between two nodes. Since the sentence is a natural unit to express meanings, we assume that all event terms in a sentence are all relevant and should be linked together. The links between every two nodes are undirectional.</Paragraph>
      <Paragraph position="1"> In an ideal case, event elements should be linked to the associated event terms. At the same time, an event element may modify another element. For example, one element is a head noun and another one is the modifier. An event term (e.g., verb variants) may modify an event element or event term of another event. In this case, a full parser should be employed to get associations or modifications between different nodes in the map. Because the performance of current parsing technology is not perfect, an effective approach is to simulate the parse tree to avoid introducing errors of a parser. The simplifications are described as follows. Only event elements are attached with corresponding event terms. An event term will not be attached to an event element of another event. Also, an event element will not be attached to another event element. Heuristics are used to attach event elements with corresponding event terms.</Paragraph>
      <Paragraph position="2"> Given a sentence &amp;quot;Andrew had become little more than a strong rainstorm early yesterday, moving across Mississippi state and heading for the north-eastern US&amp;quot;, the event map is shown in Fig. 1. After each sentence is represented by a map, there will be multiple maps for a cluster of documents. If nodes from different maps are lexical match, they may denote same thing and should be linked. For example, if named entity</Paragraph>
    </Section>
    <Section position="2" start_page="38" end_page="39" type="sub_section">
      <SectionTitle>
4.2 Importance Identification by PageRank
</SectionTitle>
      <Paragraph position="0"> Given a whole map for a cluster of documents, the next step is to identify focus of these documents. Based on our assumption about important content in the previous section, PageRank algorithm (Page et al., 1998) is employed to fulfill this task. PageRank assumes that if a node is connected with more other nodes, it may be more likely to represent a salient concept. The nodes relevant to the significant nodes are closer to the salient concept than those not. The algorithm assigns the significance score to each node according to the number of nodes linking to it as well as the significance of the nodes. In PageRank algorithm, we use two directional links instead for every unidirectional link in Figure 1.</Paragraph>
      <Paragraph position="1"> The equation to calculate the importance (indicated by PR) of a certain node A is shown as follows:</Paragraph>
      <Paragraph position="3"> are all nodes which link to the node A. C(B</Paragraph>
      <Paragraph position="5"> ) is the number of outgoing links from the node B</Paragraph>
      <Paragraph position="7"> . The weight score of each node can be gotten by this equation recursively. d is the factor used to avoid the limitation of loop in the map structure. As the literature (Page et al., 1998) suggested, d is set as 0.85. The significance of each sentence to be included in the  summary is then derived from the significance of the event terms and event elements it contains.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML