File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/p02-1059_metho.xml

Size: 12,231 bytes

Last Modified: 2025-10-06 14:07:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1059">
  <Title>Supervised Ranking in Open-Domain Text Summarization</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Diversity Based Summarization
</SectionTitle>
    <Paragraph position="0"> As an unsupervised summarizer, we use diversity based summarization (DBS) (Nomoto and Matsumoto, 2001c). It takes a cluster-and-rank approach to generating summaries. The idea is to form a summary by collecting sentences representative of diverse topics discussed in the text. A nice feature about their approach is that by creating a summary covering potential topics, which could be marginal to the main thread of the text, they are in fact able to accommodate the variability in sentence selection: some people may pick up subjects (sentences) as important which others consider irrelevant or only marginal for summarization. DBS accomodates this situation by picking them all, however marginal they might be.</Paragraph>
    <Paragraph position="1"> More specifically, DBS is a tripartite process consisting of the following:  1. Find-Diversity: find clusters of lexically similar sentences in text. (In particular, we represent a sentence here a vector of tfidf weights of index terms it contains.) 2. Reduce-Redundancy: for each cluster found, choose a sentence that best represents that cluster. null 3. Generate-Summary: collect the representa null tive sentences, put them in some order, and return them to the user.</Paragraph>
    <Paragraph position="2"> Find-Diversity is based on the K-means clustering algorithm, which they extended with Minimum Description Length Principle (MDL) (Li, 1998; Yamanishi, 1997; Rissanen, 1997) as a way of optimizing K-means. Reduce-Redundancy is a tfidf based ranking model, which assigns weights to sentences in the cluster and returns a sentence that ranks highest. The weight of a sentence is given as the sum of tfidf scores of terms in the sentence.</Paragraph>
    <Paragraph position="3">  function. t(~u) is some leaf node assigned to ~u by DT. P(Select j ~u;DT) = fi the number of &amp;quot;Select&amp;quot; sentences at t(~u) the total number of sentences at t(~u)</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
P
4 Combining ProbDT and DBS
</SectionTitle>
    <Paragraph position="0"> Combining ProbDT and DBS is done quite straight-forwardly by replacing Reduce-Redundacy with ProbDT. Thus instead of picking up a sentence with the highest tfdif based weight, DBS/ProbDT attempts to find a sentences with the highest score for P(Select j ~u;DT).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Features
</SectionTitle>
      <Paragraph position="0"> The following lists a set of features used for encoding a sentence in ProbDT. Most of them are either length- or location-related features.1  a decision tree or, for that matter, to use features other than tfidf for representing sentences in clustering. The idea is worthy of consideration, but not pursued here.</Paragraph>
      <Paragraph position="1">  graph in which X occurs, 'Length(Par(X))' denotes the number of sentences that occur in that paragraph. LocWithinPar takes continuous values ranging from 0 to l!1l , where l is the length of a paragraph: a paragraph initial sentence would have 0 and a paragraph final sentence l!1l .</Paragraph>
      <Paragraph position="2"> &lt;LenText&gt; The text length in Japanese character i.e. kana, kanji.</Paragraph>
      <Paragraph position="3"> &lt;LenSen&gt; The sentence length in kana/kanji.</Paragraph>
      <Paragraph position="4"> Some work in Japanese linguistics found that a particular grammatical class a sentence final element belongs to could serve as a cue to identifying summary sentences. These include categories like PAST/NON-PAST, INTERROGATIVE, and NOUN and QUESTION-MARKER. Along with Ichikawa (1990), we identified a set of sentence-ending cues and marked a sentence as to whether it contains a cue from the set.2 Included in the set are inflectional classes PAST/NON-PAST (for the verb and verbal adjective), COPULA, and NOUN, parentheses, and QUESTION-MARKER -ka. We use the following attribute to encode a sentence-ending form.</Paragraph>
      <Paragraph position="5"> &lt;EndCue&gt; The feature encodes one of sentence2Word tokens are extracted by using CHASEN, a Japanese morphological analyzer which is reported to achieve the accuracy rate of over 98% (Matsumoto et al., 1999). ending forms described above. It is a discrete valued feature. The value ranges from 0 to 6. (See Table 2 for details.) Finally, one of two class labels, 'Select' and 'Don't Select', is assigned to a sentence, depending on whether it is wis or not. The 'Select' label is for wis sentences, and the 'Don't Select' label for non-wis sentences.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Decision Tree Algorithms
</SectionTitle>
    <Paragraph position="0"> To examine the generality of our approach, we consider, in addition to C4.5 (Quinlan, 1993), the following decision tree algorithms. C4.5 is used with default options, e.g., CF=25%.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 MDL-DT
</SectionTitle>
      <Paragraph position="0"> MDL-DT stands for a decision tree with MDL based pruning. It strives to optimize the decision tree by pruning the tree in such a way as to produce the shortest (minimum) description length for the tree. The description length refers to the number of bits required for encoding information about the decision tree. MDL ranks, along with Akaike Information Criterion (AIC) and Bayes Information Criterion (BIC), as a standard criterion in machine learning and statistics for choosing among possible (statistical) models. As shown empirically in Nomoto and Matsumoto (2000) for discourse domain, pruning DT with MDL significantly reduces the size of tree, while not compromising performance. null</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 SSDT
</SectionTitle>
      <Paragraph position="0"> SSDT or Subspace Splitting Decision Tree represents another form of decision tree algorithm.(Wang and Yu, 2001) The goal of SSDT is to discover patterns in highly biased data, where a target class, i.e., the class one likes to discover something about, accounts for a tiny fraction of the whole data. Note that the issue of biased data distribution is particularly relevant for summarization, as a set of sentences to be identified as wis usually account for a very small portion of the data.</Paragraph>
      <Paragraph position="1"> SSDT begins by searching the entire data space for a cluster of positive cases and grows the cluster by adding points that fall within some distance to the center of the cluster. If the splitting based on the cluster offers a better Gini index than simply using  positive class, white circles represent negative class. SSDT starts with a small spherical cluster of positive points (solid circle) and grows the cluster by 'absorbing' positive points around it (dashed circle). one of the attributes to split the data, SSDT splits the data space based on the cluster, that is, forms one region outside of the cluster and one inside.3 It repeats the process recursively on each subregions spawned until termination conditions are met. Figure 2 gives a snapshot of SSDT at work. SSDT locates some clusters of positive points, develops spherical clusters around them.</Paragraph>
      <Paragraph position="2"> With its particular focus on positive cases, SSDT is able to provide a more precise characterization of them, compared, for instance, to C4.5.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Test Data and Procedure
</SectionTitle>
    <Paragraph position="0"> We asked 112 Japanese subjects (students at graduate and undergraduate level) to extract 10% sentences in a text which they consider most important in making a summary. The number of sentences to extract varied from two to four, depending on the length of a text. The age of subjects varied from 18 to 45. We used 75 texts from three different categories (25 for each category); column, editorial and news report. Texts were of about the same size in terms of character counts and the number of paragraphs, and were selected randomly from articles that appeared in a Japanese financial daily (Nihon-Keizai-Shimbun-Sha, 1995). There were, on average, 19.98 sentences per text.</Paragraph>
    <Paragraph position="1"> 3For a set S of data with k classes, its Gini index is given as: Gini(S) = 1!x50ki p2i , where pi denotes the probability of observing class i in S.</Paragraph>
    <Paragraph position="2">  The kappa agreement among subjects was 0.25. The result is in a way consistent with Salton et al. (1999), who report a low inter-subject agreement on paragraph extracts from encyclopedias and also with Gong and Liu (2001) on a sentence selection task in the cable news domain. While there are some work (Marcu, 1999; Jing et al., 1998) which do report high agreement rates, their success may be attributed to particularities of texts used, as suggested by Jing et al. (1998). Thus, the question of whether it is possible to establish an ideal summary based on agreement is far from settled, if ever. In the face of this, it would be interesting and perhaps more fruitful to explore another view on summary, that the variability of a summary is the norm rather than the exception.</Paragraph>
    <Paragraph position="3"> In the experiments that follow, we decided not to rely on a particular level of inter-coder agreement to determine whether or not a given sentence is wis. Instead, we used agreement threshold to distinguish between wis and non-wis sentences: for a given threshold K, a sentence is considered wis (or positive) if it has at least K votes in favor of its inclusion in a summary, and non-wis (negative) if not. Thus if a sentence is labeled as positive at K , 1, it means that there are one or more judges taking that sentence as wis. We examined K from 1 to 5.</Paragraph>
    <Paragraph position="4"> (On average, seven people are assigned to one article. However, one would rarely see all of them unanimously agree on their judgments.) Table 3 shows how many positive/negative instances one would get at a given agreement threshold. At K , 1, out of 1424 instances, i.e., sentences, 707 of them are marked positive and 717 are marked negative, so positive and negative instances are evenly spread across the data. On the other hand, at K , 5, there are only 72 positive instances. This means that there is less than one occurrence of wis case per article.</Paragraph>
    <Paragraph position="5"> In the experiments below, each probabilistic rendering of the DTs, namely, C4.5, MDL-DT, and SSDT is trained on the corpus, and tested with and without the diversity extension (Find-Diversity).</Paragraph>
    <Paragraph position="6"> When used without the diversity component, each ProbDT works on a test article in its entirety, producing the ranked list of sentences. A summary with compression rate is obtained by selecting top percent of the list. When coupled with Find-Diversity, on the other hand, each ProbDT is set to work on each cluster discovered by the diversity component, producing multiple lists of sentences, each corresponding to one of the clusters identified.</Paragraph>
    <Paragraph position="7"> A summary is formed by collecting top ranking sentences from each list.</Paragraph>
    <Paragraph position="8"> Evaluation was done by 10-fold cross validation. For the purpose of comparison, we also ran the diversity based model as given in Nomoto and Matsumoto (2001c) and a tfidf based ranking model (Zechner, 1996) (call it Z model), which simply ranks sentences according to the tfidf score and selects those which rank highest. Recall that the diversity based model (DBS) (Nomoto and Matsumoto, 2001c) consists in Find-Diversity and the ranking model by Zechner (1996), which they call Reduce-Redundancy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML