File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1108_metho.xml

Size: 12,761 bytes

Last Modified: 2025-10-06 14:08:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1108">
  <Title>Improving Chronological Sentence Ordering by Precedence Relation</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Sentence Ordering
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Sentence ordering problem
</SectionTitle>
      <Paragraph position="0"> Our goal is to determine the most probable permutation of given sentences and to generate a well-structured text. When a human is asked to make an arrangement of sentences, he or she may perform this task without difficulty just as we write out thoughts in a text. However, we must consider what accomplishes this task since computers are unaware of order of things by nature. Discourse coherence as typified by rhetorical relation (Mann and Thompson, 1988) and coherence relation (Hobbs, 1990) is of help to this question. Hume (Hume, 1748) claimed that qualities from which association arises and by which the mind is conveyed from one idea to another are three: resemblance; contiguity in time or place; and cause and effect. That is to say we should organize a text from fragc) Dolly gave birth to two children in her life. b) The father is of a different kind and Dolly had been pregnant for about five months. a) Dolly the clone sheep was born in 1996.</Paragraph>
      <Paragraph position="1">  relation. It is especially true in sentence ordering of newspaper articles because we must arrange a large number of time-series events concerning several topics.</Paragraph>
      <Paragraph position="2"> Barzilay et al. (Barzilay et al., 2002) address the problem of sentence ordering in the context of multi-document summarization and the impact of sentence ordering on readability of a summary. They proposed two naive sentence-ordering techniques such as majority ordering (examines most frequent orders in the original documents) and chronological ordering (orders sentence by the publication date). Showing that using naive ordering algorithms does not produce satisfactory orderings, Barzilay et al. also investigates through experiments with humans, how to identify patterns of orderings that can improve the algorithm. Based on the experiments, they propose another algorithm that utilizes chronological ordering with topical segmentation to separate sentences referring to a topic from ones referring to another.</Paragraph>
      <Paragraph position="3"> Lapata (Lapata, 2003) proposes another approach to information ordering based on a probabilistic model that assumes the probability of any given sentence is determined by its adjacent sentence and learns constraints on sentence order from a corpus of domain specific texts. Lapata estimates transitional probability between sentences by some attributes such as verbs (precedence relationships of verbs in the corpus), nouns (entity-based coherence by keeping track of the nouns) and dependencies (structure of sentences).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Improving chronological ordering
</SectionTitle>
      <Paragraph position="0"> Against the background of these studies, we propose the use of antecedence sentences to arrange sentences. Let us consider an example shown in Figure 1. There are three sentences a, b, and c from which we get an order [a-b-c] by chronological ordering. When we read these sentences in this order, we find sentence b to be incorrectly positioned. This is because sentence b is written on the presupposition that the reader may know that Dolly had a child. In other words, it is more fitting to assume sentence b to be an elaboration of sentence c. As one may easily imagine, there are some precedent sentences prior to sentence b in the original document. Lack of presupposition obscures what a sentence is saying and confuses the readers. Hence, we should refine the chronological order and revise the order to [a-c-b], putting sentence c before sentence b.</Paragraph>
      <Paragraph position="1"> We show a block diagram of our ordering algorithm shown in Figure 2. Given nine sentences denoted by [a b ... i], for example, the algorithm eventually produces an ordering, [a-b-f-c-i-g-d-h-e]. We consider topical segmentation and chronological ordering to be fundamental to sentence ordering as well as conventional ordering techniques (Barzilay et al., 2002) and make an attempt to refine the ordering. We firstly recognize topics in source documents to separate sentences referring to a topic from ones referring to another. In Figure 2 example we obtain two topical segments (clusters) as an output from the topical clustering. In the second phase we order sentences of each segment by the chronological order. If two sentences have the same chronological order, we elaborate the order on the basis of sentence position and resemblance relation. Finally, we refine each ordering by resolving antecedent sentences and output the final ordering. In the rest of this section we give a detailed description of each phase.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Topical clustering
</SectionTitle>
      <Paragraph position="0"> The first task is to categorize sentences by their topics. We assume a newspaper article to be written about one topic. Hence, to classify topics in sentences, we have only to classify articles  by their topics. Given l articles and we found m kinds of terms in the articles. Let D be a document-term matrix (l xm), whose element Dij represents frequency of a term #j in document #i, We use Di to denote a term vector (i-component row vector) of document #i. After measuring distance or dissimilarity between two articles #x and #y:</Paragraph>
      <Paragraph position="2"> we apply the nearest neighbor method (Cover and Hart, 1967) to merge a pair of clusters when their minimum distance is lower than a given parameter a = 0.3 (determined empirically). At last we classify sentences according to topical clusters, assuming that a sentence in a document belonging to a cluster also belongs to the same cluster.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Chronological ordering
</SectionTitle>
      <Paragraph position="0"> It is difficult for computers to find a resemblance or cause-effect relation between two phenomena while we do not have conclusive evidence whether a pair of sentences gathered arbitrarily from multiple documents has some relation. A newspaper usually deals with novel events that have occurred since the last publication. Hence, publication date (time) of each article turns out to be a good estimator of resemblance relation (i.e., we observe a trend or series of relevant events in a time period), contiguity in time, and cause-effect relation (i.e., an event occurs as a result of previous events). Although resolving temporal expressions in sentences (e.g., yesterday, the next year, etc.) (Mani and Wilson, 2000; Mani et al., 2003) may give a more precise estimation of these relations, it is not an easy task. For this reason we order sentences of each segment (cluster) by the chronological  ment by precedence relation.</Paragraph>
      <Paragraph position="1"> order, assigning a time stamp for each sentence by its publication date (i.e., the date when the article was written).</Paragraph>
      <Paragraph position="2"> When there are sentences having the same time stamp, we elaborate the order on the basis of sentence position and sentence connectivity. We restore an original ordering if two sentences have the same time stamp and belong to the same article. If sentences have the same time stamp and are not from the same article, we arrange a sentence which is more similar to previously ordered sentences to assure sentence connectivity.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.5 Ordering refinement by precedence
</SectionTitle>
      <Paragraph position="0"> relation After we obtain an ordering of a topical segment by chronological ordering, we improve it as shown in Figure 1 based on antecedence sentences. Figure 3 shows the background idea of ordering refinement by precedence relation. Just as in the example in Figure 1, we have three sentences a, b, and c in chronological order. At first we get sentence a out of the sentences and check its antecedent sentences. Seeing that there are no sentences prior to sentence a in article #1, we accept to put sentence a here. Then we get sentence b out of remaining sentences and check its antecedent sentences.</Paragraph>
      <Paragraph position="1"> We find several sentences before sentence b in article #2 this time. Grasping what the antecedent sentences are saying, we confirm first of all whether what they are saying is mentioned by previously arranged sentences (i.e., sentence a). If it is mentioned, we put sentence b here and extend the ordering to [a-b]. Otherwise, wesearchasubstitutionforwhattheprecedence sentences are saying from the remaining sentences (i.e., sentence c in this example). In the Figure 3 example, we find out that sentence a is not referring to what sentence c' is saying but sentence c is approximately referring to that.</Paragraph>
      <Paragraph position="2">  No precedent sentences before sentence a. Choose a. Choose the rest, sentence f.</Paragraph>
      <Paragraph position="3"> The refined ordering isa-b-e-c-d-f.</Paragraph>
      <Paragraph position="4"> No precedent sentences before sentence b. Choose b. There are precedent sentences before sentence c. Search a shortest path from c to b and a. We found sentence e to be the closest to the precedent sentences of c. Search a shortest path from e to b and a. No precedent sentences before e. Choose e.We find a path from c to b and a via e is the shortest.</Paragraph>
      <Paragraph position="5"> There are precedent sentences before sentence d.</Paragraph>
      <Paragraph position="6"> Search a shortest path from d to c, e, b and a. We find the direct path from d to c is the shortest.</Paragraph>
      <Paragraph position="7">  Putting sentence c before b, we finally get the refined ordering [a-c-b].</Paragraph>
      <Paragraph position="8"> Supposing that sentence c mentions similar information as c' but expresses more than c', it is nothing unusual that an extraction method does not choose sentence c' but sentence c.</Paragraph>
      <Paragraph position="9"> Because a method for multi-document summarization (e.g., MMR (Carbonell and Goldstein, 1998)) makes effort to acquire information coverage and refuse redundant information at the same time, it is quite natural that the method does not choose both sentence c' and c in terms of redundancy and prefers sentence c as c' in terms of information coverage.</Paragraph>
      <Paragraph position="10"> Figure 4 illustrates how the algorithm refines a given chronological ordering [a-b-c-d-e-f].</Paragraph>
      <Paragraph position="11"> We define distance as a dissimilarity value of precedent information of a sentence. When a sentence has antecedent sentences and their content is not mentioned by previously arranged sentences, this distance will be high. When a sentence has no precedent sentences, we define the distance to be 0. In the example shown in Figure 4 example we do not change position of sentences a and b because they do not have precedent sentences (i.e., they are lead sentences). On the other hand, sentence c has some precedent sentences in its original document. Preparing a term vector of the precedent sentences, we calculate how much the precedent content is covered by other sentences using distance defined above. In Figure 4 example the distance from sentence a and b to c is high (distance = 0.7). We search a shortest path from sentence c to sentences a and b by best-first search in order to find suitable sentences before sentence c. Given that sentence e in Figure 4 describes similar content as the precedent sentences of sentence c and is a lead sentence, we trace the shortest path from sentence c to sentences a and b via sentence e. We extend the resultant ordering to [a-b-e-c], inserting sentence e before sentence c. Then we consider sentence d, which is not a lead sentence again (distance = 0.4). Preparing a term vector of the precedent sentences of sentence d, we search a shortest path from sentence d to sentences a, b, c, and e. The search result shows that we should leave sentence d this time because the precedent content seems to be described in sentences a, b, c, and e better than f. In this way we get the final ordering, [a-b-e-c-d-f].</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML