File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1047_intro.xml

Size: 13,511 bytes

Last Modified: 2025-10-06 14:03:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1047">
  <Title>Extractive Summarization using Interand Intra- Event Relevance</Title>
  <Section position="4" start_page="369" end_page="371" type="intro">
    <SectionTitle>
2. Related Work
</SectionTitle>
    <Paragraph position="0"> Event-based summarization has been investigated in recent research. It was first presented in (Daniel, Radev and Allison, 2003), who treated a news topic in multi-document summarization as a series of sub-events according to human understanding of the topic. They determined the degree of sentence relevance to each sub-event through human judgment and evaluated six extractive approaches. Their paper concluded that recognizing the sub-events that comprise a single news event is essential for producing better summaries. However, it is difficult to automatically break a news topic into sub-events.</Paragraph>
    <Paragraph position="1"> Later, atomic events were defined as the relationships between the important named entities (Filatova and Hatzivassiloglou, 2004), such as participants, locations and times (which are called relations) through the verbs or action nouns labeling the events themselves (which are called connectors). They evaluated sentences based on co-occurrence statistics of the named entity relations and the event connectors involved. The proposed approach claimed to out-perform conventional tf*idf approach. Apparently, named entities are key elements in their model. However, the constraints defining events seemed quite stringent.</Paragraph>
    <Paragraph position="2"> The application of dependency parsing, anaphora and co-reference resolution in recognizing events were presented involving NLP and IE techniques more or less (Yoshioka and Haraguchi, 2004), (Vanderwende, Banko and Menezes, 2004) and (Leskovec, Grobelnik and Fraling, 2004). Rather than pre-specifying events, these efforts extracted (verb)-(dependent relation)-(noun) triples as events and took the triples to form a graph merged by relations.</Paragraph>
    <Paragraph position="3"> As a matter of fact, events in documents are related in some ways. Judging whether the sentences are salient or not and organizing them in a coherent summary can take advantage from event relevance. Unfortunately, this was neglected in most previous work. Barzilay and Lapata (2005) exploited the use of the distributional and referential information of discourse entities to improve summary coherence. While they captured text relatedness with entity transition sequences, i.e. entity-based summarization, we are particularly interested in relevance between events in event-based summarization.</Paragraph>
    <Paragraph position="4"> Extractive summarization requires ranking sentences with respect to their importance.</Paragraph>
    <Paragraph position="5"> Successfully used in Web-link analysis and more recently in text summarization, Google's PageRank (Brin and Page, 1998) is one of the most popular ranking algorithms. It is a kind of graph-based ranking algorithm deciding on the importance of a node within a graph by taking into account the global information recursively computed from the entire graph, rather than relying on only the local node-specific information. A graph can be constructed by adding a node for each sentence, phrase or word. Edges between nodes are established using inter-sentence similarity relations as a function of content overlap or grammatically relations between words or phrases.</Paragraph>
    <Paragraph position="6"> The application of PageRank in sentence extraction was first reported in (Erkan and Radev, 2004). The similarity between two sentence nodes according to their term vectors was used to generate links and define link strength. The same idea was followed and investigated exten- null sively (Mihalcea, 2005). Yoshioka and Haraguchi (2004) went one step further toward event-based summarization. Two sentences were linked if they shared similar events. When tested on TSC-3, the approach favoured longer summaries. In contrast, the importance of the verbs and nouns constructing events was evaluated with PageRank as individual nodes aligned by their dependence relations (Vanderwende, 2004; Leskovec, 2004).</Paragraph>
    <Paragraph position="7"> Although we agree that the fabric of event constitutions constructed by their syntactic relations can help dig out the important events, we have two comments. First, not all verbs denote event happenings. Second, semantic similarity or relatedness between action words should be taken into account.</Paragraph>
    <Paragraph position="8">  3. Event-based Summarization 3.1. Event Definition and Event Map  Events can be broadly defined as &amp;quot;Who did What to Whom When and Where&amp;quot;. Both linguistic and empirical studies acknowledge that event arguments help characterize the effects of a verb's event structure even though verbs or other words denoting event determine the semantics of an event. In this paper, we choose verbs (such as &amp;quot;elect&amp;quot;) and action nouns (such as &amp;quot;supervision&amp;quot;) as event terms that can characterize or partially characterize actions or incident occurrences. They roughly relate to &amp;quot;did What&amp;quot;. One or more associated named entities are considered as what are denoted by linguists as event arguments. Four types of named entities are currently under the consideration. These are &lt;Person&gt;, &lt;Organization&gt;, &lt;Location&gt; and &lt;Date&gt;.</Paragraph>
    <Paragraph position="9"> They convey the information of &amp;quot;Who&amp;quot;, &amp;quot;Whom&amp;quot;, &amp;quot;When&amp;quot; and &amp;quot;Where&amp;quot;. A verb or an action noun is deemed as an event term only when it presents itself at least once between two named entities.</Paragraph>
    <Paragraph position="10"> Events are commonly related with one another semantically, temporally, spatially, causally or conditionally, especially when the documents to be summarized are about the same or very similar topics. Therefore, all event terms and named entities involved can be explicitly connected or implicitly related and weave a document or a set of documents into an event fabric, i.e. an event graphical representation (see Figure 1). The nodes in the graph are of two types. Event terms (ET) are indicated by rectangles and named entities (NE) are indicated by ellipses. They represent concepts rather than instances. Words in either their original form or morphological variations are represented with a single node in the graph regardless of how many times they appear in documents. We call this representation an event map, from which the most important concepts can be pick out in the summary.</Paragraph>
    <Paragraph position="11"> Figure 1 Sample sentences and their graphical representation The advantage of representing with separated action and entity nodes over simply combining them into one event or sentence node is to provide a convenient way for analyzing the relevance among event terms and named entities either by their semantic or distributional similarity. More importantly, this favors extraction of concepts and brings the conceptual compression available.</Paragraph>
    <Paragraph position="12"> We then integrate the strength of the connections between nodes into this graphical model in terms of the relevance defined from different perspectives. The relevance is indicated by</Paragraph>
    <Paragraph position="14"> sent two nodes, and are either event terms (</Paragraph>
    <Paragraph position="16"> ne ). Then, the significance of each node, indicated by )( i nodew , is calcu&lt;Organization&gt; America Online &lt;/Organization&gt; was to buy &lt;Organization&gt; Netscape &lt;/Organization&gt; and forge a partnership with &lt;Organization&gt; Sun &lt;/Organization&gt;, benefiting all three and giving technological independence from &lt;Organization&gt; Microsoft &lt;/Organization&gt;.</Paragraph>
    <Paragraph position="17">  lated with PageRank ranking algorithm. Sections 3.2 and 3.3 address the issues of deriving ),( ji nodenoder according to intra- or/and inter-event relevance and calculating )( i nodew in detail. null</Paragraph>
    <Section position="1" start_page="371" end_page="371" type="sub_section">
      <SectionTitle>
3.2 Intra- and Inter- Event Relevance
</SectionTitle>
      <Paragraph position="0"> We consider both intra-event and inter-event relevance for summarization. Intra-event relevance measures how an action itself is associated with its associated arguments. It is indicated as ),( NEETR and ),( ETNER in Table 1 below. This is a kind of direct relevance as the connections between actions and arguments are established from the text surface directly. No inference or background knowledge is required.</Paragraph>
      <Paragraph position="1"> We consider that when the connection between an event term i et and a named entity</Paragraph>
      <Paragraph position="3"> are related as explained in Section 2. By means of inter-event relevance, we consider how an event term (or a named entity involved in an event) associate to another event term (or another named entity involved in the same or different events) syntactically, semantically and distributionally. It is indicated by ),( ETETR or ),( NENER in Table 1 and measures an indirect connection which is not explicit in the event map needing to be derived from the external resource or overall event distribution.</Paragraph>
      <Paragraph position="4">  The intra-event relevance ),( NEETR can be simply established by counting how many times</Paragraph>
      <Paragraph position="6"> One way to measure the term relevance is to make use of a general language knowledge base, such as WordNet (Fellbaum 1998). Word-Net::Similarity is a freely available software package that makes it possible to measure the semantic relatedness between a pair of concepts, or in our case event terms, based on WordNet (Pedersen, Patwardhan and Michelizzi, 2004). It supports three measures. The one we choose is the function lesk.</Paragraph>
      <Paragraph position="8"> Alternatively, term relevance can be measured according to their distributions in the specified documents. We believe that if two events are concerned with the same participants, occur at same location, or at the same time, these two events are interrelated with each other in some ways. This observation motivates us to try deriving event term relevance from the number of name entities they share.</Paragraph>
      <Paragraph position="10"> associate.   ||indicates the number of the elements in the set. The relevance of named entities can be derived in a similar way.</Paragraph>
      <Paragraph position="12"> The relevance derived with (E3) and (E4) are indirect relevance. In previous work, a clustering algorithm, shown in Figure 2, has been proposed (Xu et al, 2006) to merge the named entity that refer to the same person (such as Ranariddh, Prince Norodom Ranariddh and President Prince Norodom Ranariddh). It is used for co-reference resolution and aims at joining the same concept into a single node in the event map. The experimental result suggests that merging named entity improves performance in some extend but not evidently. When applying the same algorithm for clustering all four types of name entities in DUC data, we observe that the name entities in the same cluster do not always refer to the same objects, even when they are indeed related in some way. For example, &amp;quot;Mississippi&amp;quot; is a state in the southeast United States, while &amp;quot;Mississippi River&amp;quot; is the secondlongest rever in the United States and flows through &amp;quot;Mississippi&amp;quot;.</Paragraph>
      <Paragraph position="13"> Step1: Each name entity is represented by</Paragraph>
      <Paragraph position="15"> w is the ith word in it. The cluster it belongs to, in-</Paragraph>
      <Paragraph position="17"> itself.</Paragraph>
      <Paragraph position="18"> Step2: For each name entity</Paragraph>
      <Paragraph position="20"> In addition, the relevance of the named entities can be sometimes revealed by sentence context. Take the following most frequently used sentence patterns as examples: Figure 3 The example patterns Considering that two neighbouring name entities in a sentence are usually relevant, the following window-based relevance is also experimented with.</Paragraph>
    </Section>
    <Section position="2" start_page="371" end_page="371" type="sub_section">
      <SectionTitle>
3.3 Significance of Concepts
</SectionTitle>
      <Paragraph position="0"> The significance score, i.e. the weight</Paragraph>
      <Paragraph position="2"> node , is then estimated recursively with PageRank ranking algorithm which assigns the significance score to each node according to the number of nodes connecting to it as well as the strength of their connections. The  node . d is the factor used to avoid the limitation of loop in the map structure. It is set to 0.85 experimentally. The significance of each sentence to be included in the summary is then obtained from the significance of the events it contains. The sentences with higher significance are picked up into the summary as long as they are not exactly the same sentences. We are aware of the important roles of information fusion and sentence compression in summary generation. However, the focus of this paper is to evaluate event-based approaches in extracting the most important sentences. Conceptual extraction based on event relevance is our future direction.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML