XML Viewer - a97-1042

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/a97-1042_metho.xml
Size: 17,782 bytes
Last Modified: 2025-10-06 14:14:32
<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1042">
  <Title>Identifying Topics by Position</Title>
  <Section position="4" start_page="283" end_page="289" type="metho">
    <SectionTitle>
3 Training the Rules
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="283" end_page="284" type="sub_section">
      <SectionTitle>
3.1 Background
</SectionTitle>
      <Paragraph position="0"> The purposes of our study are to clarify these contradictions, to test the abovementioned intuitions and results, and to verify the hypothesis that the importance of a sentence in a text is indeed related to its ordinal position. Furthermore, we wish to discover empirically which textual positions are in fact the richest ones for topics, and to develop a method by which the optimal positions can be determined automatically and their importance evaluated.</Paragraph>
      <Paragraph position="1"> To do all this, one requires a much larger document collection than that available to Edmundson and Baxendale. For the experiments described here, we used the Ziff-Davis texts from the corpus pro- null duced for DARPA's TIPSTER program (Harman, 1994). Volume 1 of the Ziff corpus, on which we trained the system, consists of 13,000 newspaper texts about new computers and related hardware, computer sales, etc., whose genre can be characterized as product announcements. The average text length is 71 sentences (34.4 paragraphs). Each text is accompanied by both a set of three to eight topic keywords and an abstract of approx. 6 sentences (both created by a human).</Paragraph>
      <Paragraph position="2"> In summary, we did the following: To determine the efficacy of the Position Method, we empirically determined the yield of each sentence position in the corpus, measuring against the topic keywords. We next ranked the sentence positions by their average yield to produce the Optimal Position Policy (OPP) for topic positions for the genre. Finally, now comparing to the abstracts accompanying the texts, we measured the coverage of sentences extracted from the texts according to the policy, cumulatively in the position order specified by the policy. The high degree of coverage indicated the effectiveness of the position method.</Paragraph>
    </Section>
    <Section position="2" start_page="284" end_page="284" type="sub_section">
      <SectionTitle>
3.2 Sentence Position Yields and the
Optimal Position Policy
</SectionTitle>
      <Paragraph position="0"> We determined the optimal position for topic occurrence as follows. Given a text T and a list of topics keywords t/ of T, we label each sentence of T with its ordinal paragraph and sentence number (P~,Sn). We then removed all closed-class words from the texts. We did not perform morphological restructuring (such as canonicalization to singular nouns, verb roots, etc.) or anaphoric resolution (replacement of pronouns by originals, etc.), for want of robust enough methods to do so reliably. This makes the results somewhat weaker than they could be.</Paragraph>
      <Paragraph position="1"> What data is most appropriate for determining the optimal position? We had a choice between the topic keywords and the abstracts accompanying each text in the corpus. Both keywords and abstracts contain phrases and words which also appear in the original texts; on the assumption that these phrases or words are more important in the text than other ones, we can assign a higher importance to sentences with more such phrases or words (or parts of them)) Since a topic keyword has a fixed boundary, using it to rank sentences is easier than using an abstract.</Paragraph>
      <Paragraph position="2"> For this reason we defined sentence yield as the average number of different topic keywords mentioned in a sentence. We computed the yield of each sentence position in each text essentially by counting 1 How many topic keywords would be taken over verbatim from the texts, as opposed to generated paraphrastically by the human extractor, was a question for empirical determination--the answer provides an upper bound for the power of the Position Method.</Paragraph>
      <Paragraph position="3"> the number of different topic keywords contained in the appropriate sentence in each text, and averaging over all texts. Sometimes, however, keywords consist of multiple words, such as &amp;quot;spreadsheet software&amp;quot;. In order to reward a full-phrase mention in a sentence over just a partial overlap with a multi-word keyword/phrase, we used a formula sensitive to the degree of overlap. In addition, to take into account word position, we based this formula on the Fibonacci function; it monotonically increases with longer matched substrings, and is normalized to produce a score of 1 for a complete phrase match. Our hit function H measures the similarity between topic keyword ti and a window wij that moves across each sentence (Pm,Sn) of the text. A window matches when it contains the same words as a topic keyword ti. The length of the window equals the length of the topic keyword. Moving the window from the beginning of a sentence to the end, we computed all the H, scores and added them together to get the total score H, for the whole sentence. We acquired the H, scores for all sentences in T and repeated the whole process for the each text in the corpus.</Paragraph>
      <Paragraph position="4"> After obtaining all the H, scores, we sorted all the sentences according to their paragraph and sentence numbers. For each paragraph and sentence number position, we computed the average Havg score.</Paragraph>
      <Paragraph position="5"> These average yields for each position are plotted in Figure 1, which shows the highest-yield sentence position to be (P2,$1), followed by (P3,$1), followed by (P4,S1), etc.</Paragraph>
      <Paragraph position="6"> Finally, we sorted the paragraph and sentence position by decreasing yield Hang scores. For positions with equal scores, different policies are possible: one can prefer sentence positions in different paragraphs on the grounds that they are more likely to contains distinctive topics. One should also prefer sentence positions with smaller Sin, since paragraphs are generally short. Thus the Optimal Position Policy for the Ziff-Davis corpus is the list</Paragraph>
      <Paragraph position="8"/>
    </Section>
    <Section position="3" start_page="284" end_page="286" type="sub_section">
      <SectionTitle>
3.3 Additional Measures and Checks
</SectionTitle>
      <Paragraph position="0"> Throughout the above process, we performed additional measures and checks in order to help us prevent spurious or wrong rules. We collected facts about the training corpus, including the average number of paragraphs per text (PPT), the average number of sentences per paragraph (SPP), and the average number of sentences per human-made summary (SPS). PPT and SPP prevent us from forming a rule such as 251h sentence in the lO0lh paragraph when PPT is 15 and SPP is 5. SPS suggests how many sentences to extract. For the ZIFF Vol. 1 corpus, PPT is 34.43, SPP is 2.05, and SPS is 5.76.</Paragraph>
      <Paragraph position="1"> Most texts have under 30 paragraphs; 97.2% of para-</Paragraph>
      <Paragraph position="3"> TIPSTER ZIFF VOL1 POLICY DETERMINATION MAP k l&amp;quot;l-L \[ J LI I&amp;quot;t TTTC/.J, ITI'III itlI&amp;quot;H,, , IIIII Illl.l 'J IAtl IIitlt II1&amp;quot;II1&amp;quot;II TI'-I LI ITI't-LIJ.4.1- I L,kl I t,LI kel% l I I I I I I-l-rl I-f 11- I'/&amp;quot; PARAGRAPH PosnloN IN A TEXT  position; lightest shade shows highest yield. graphs have fewer than 5 sentences. 47.7% of paragraphs have only one sentence (thus the first sentence is also the last), and 25.2% only two. With regard to the abstracts, most have 5 sentences and over 99.5% have fewer than 10.</Paragraph>
      <Paragraph position="4"> We also counted how many different topic key-words each specific text unit contains, counted once per keyword. This different hit measure dhit played an important role, since the OPP should be tuned to sentence positions that bear as many different topic keywords as possible, instead of positions with very high appearances of just a few topic keywords. We can compute dhit for a sentence, several sentences, or several paragraphs. Sentenceyield is dhit score of a sentence. Figure 2 shows dhit scores for the first 50 paragraph positions, and Figure 3 dhit scores for the last 50 positions (counting backward from the end of each text). Since PPT=34.43, the first and last 50 positions fully cover the majority of texts. The former graph illustrates the immense importance of the title sentence (dhit = 1.96), and the importance of the second (dhit = 0.75) and third (dhit = 0.64) paragraphs relative to the first (dhit = 0.59). Paragraphs close to the beginning of texts tend to bear more informative content; this is borne out in Figure 3, which clearly indicates that paragraph positions close to the end of texts do not show particularly high values, while the peak occurs at position P-14 with dhit = 0.42. This peak occurs precisely where most texts have their second or third paragraphs (recall that the average text length is 13 to 16 paragraphs).</Paragraph>
      <Paragraph position="5"> To examine Baxendale's first/last sentence hypothesis, we computed the average dhit scores for the first and the last 10 sentence positions in a paragraph as shown in Figure 4 and Figure 5 respectively. The former indicates that the closer a sentence lies  tence positions in a paragraph.</Paragraph>
      <Paragraph position="6"> to the beginning of a paragraph, the higher its dhit score is. This confirms the first sentence hypothesis. On the other hand, the latter figure does not support the last sentence hypothesis; it suggests instead that the second sentence from the end of a paragraph contains the most information. This is explained by the fact that 47.7% of paragraphs in the corpus contain only one sentence and 25.2% of the paragraphs contain two sentences, and the SPP is 2.05: the second-last sentence is the first! Figure 3: Vol. 1 dhit distribution for the last 50 paragraph positions, counting backward. 4 Evaluation The goal of creating an Optimal Position Policy is to adapt the position hypothesis to various domains or genres in order to achieve maximal topic coverage.</Paragraph>
      <Paragraph position="7"> Two checkpoints are required:  tion Map in contour view.</Paragraph>
      <Paragraph position="8"> 1. applying the procedure of creating an OPP to another collection in the same domain should result in a similar OPP, and 2. sentences selected according to the OPP should  indeed carry more information than other sentences. null Two evaluations were conducted to confirm these points.</Paragraph>
      <Paragraph position="9"> In both cases, we compared the sentences extracted according to the OPP to the sentences contained in the human-generated abstracts. Though we could have used topic keywords for both training and evaluation, we decided that the abstracts would provide a more interesting and practical measure for output, since the OPP method extracts from the text full sentences instead of topic phrases. Accordingly, we used as test corpus another, previously unseen, set of 2,907 texts from Vol. 2 of the Ziff-Davis corpus, which contained texts of the same nature and genre as Vol. 1.</Paragraph>
    </Section>
    <Section position="4" start_page="286" end_page="287" type="sub_section">
      <SectionTitle>
4.1 Evaluation I
</SectionTitle>
      <Paragraph position="0"> This evaluation established the validity of the Position Hypothesis, namely that the OPP so determined does in fact provide a way of identifying highyield sentences, and is not just a list of average highyield positions of the corpus we happened to pick.</Paragraph>
      <Paragraph position="1"> following the same steps as before, we therefore derived a new OPP on the test corpus.</Paragraph>
      <Paragraph position="2"> The result of the average scores of 300 positions (Pro, Sn) shown in Figure 6, with 1 &lt; m &lt; 30 and  termination maps of the training and test sets confirms two things: First, correspondences exist between topics and sentence positions in texts such as the ZIFF-Davis collection. Second, the regularity between topics and sentence positions can be used to identify topic sentences in texts.</Paragraph>
    </Section>
    <Section position="5" start_page="287" end_page="289" type="sub_section">
      <SectionTitle>
4.2 Evaluation II
</SectionTitle>
      <Paragraph position="0"> In the evaluation, we measured the word overlap of sentences contained in the abstracts with sentence(s) extracted from a text according to the OPP. For each measure, we recorded scores cumulatively, choosing first the most promising sentence according to the OPP, then the two most promising, and so on.</Paragraph>
      <Paragraph position="1"> We measured word overlap as follows: first, we removed all function (closed-class) words from the abstract and from the text under consideration. Then, for the first 500 sentence positions (the top 1, 2, 3,..., taken according to the OPP), we counted the number of times a window of text in the extracted sentences matched (i.e., exactly equalled) a window of text in the abstract. (Again we performed no morphology manipulations or reference resolution, steps which would improve the resulting scores.) We performed the counts for window lengths of 1, 2, 3, 4, and 5 words. If a sentence in an abstract matched more than one sentence extracted by the OP, only the first match was tallied. For each number of sentences extracted, and for each window size, we averaged the counts over all 2,907 texts.</Paragraph>
      <Paragraph position="2"> We define some terms and three measures used to assess the quality of the OPP-selected extracts. For an extract E and a abstract A: E . wmi. a window i of size m in E.</Paragraph>
      <Paragraph position="3"> wAi: a window i of size m in A.</Paragraph>
      <Paragraph position="4"> IWEI: total number of windows of size m in E.</Paragraph>
      <Paragraph position="5"> IWmAJ: total number of different windows of size m in A, i.e., how many A Wmi  Precision, Pro, measures what percentage of windows of size m in E can also be found in A (that is, P,~ indicates what percentage of E is considered important with regard to A). Recall, Rm, measures the diversity of E. A high P,,~ does not guarantee recovery of all the possible topics in A, but a high Rm does ensure that many different topics in A are covered in E. However, a high Rm alone does not warrant good performance either. For example, an OPP that selects all the sentences in the original text certainly has a very high Rm, but this extract duplicates the original text and is the last thing we want as a summary! Duplicate matches (the same word(s) in different windows) were counted in P but not in R.</Paragraph>
      <Paragraph position="6">  slowly and the recall score increases more rapidly as we choose more sentences according to the OPP. Selecting 7 sentences (is 10% of the average length of a ZIFF text), the precision is 0.38 and the recall 0.35. Considering that the matching process requires exact match and morphological transformation is not used, this result is very encouraging. However, with window size 2, precision and recall scores drop seriously, and more so with even larger windows. This suggests using variable-length windows, sizing according to maximal match. So doing would also avoid counting matches on window size 1 into matches of larger window sizes. The contributions of precision, P~, and recall, R~, from each m-word window alone, can be approximated by:</Paragraph>
      <Paragraph position="8"> Figure 9 and Figure 10 show precision and recall scores with individual contributions from window sizes 1 to 5. Precision P~ and recall R~ of variable-length windows can be estimated as follows:</Paragraph>
      <Paragraph position="10"> The performance of variable-length windows compared with windows of size 1 should have a difference less than the amount shown in the segments of window size &gt; 5.</Paragraph>
      <Paragraph position="11">  Coverage, Cm, tests similarity between E and A in a very loose sense. It counts the number of sentences in A with at least one hit in E (i.e., there exists at least one pair of windows wmiA and wEmj such that wAi = WEj). Cm estimates the potential of the OPP procedure. Figure 11 shows the cumulative average coverage scores of the top ten sentence positions of the training set following the OPP. Figure 11 indicates that 68% of sentences in A shared with the title sentence at least one word, 25% two words, 10% three words, 4% four words, and 2% five words. The amount of sharing at least one word goes up to 88% if we choose the top 5 positions according to the OPP and 95% if we choose the top 10 positions? The contribution of coverage score, C~, solely from m-word match between E and A can be computed as follows:</Paragraph>
      <Paragraph position="13"> The result is shown in Figure 12. Notice that the topmost segment of each column in Figure 12 represents the contribution from matches of at least five words long, since we only have Cm up to m = 5. The average number of sentences per summary (SPS) is 5.76. If we choose the top 5 sentence positions according to the OPP, Figure 12 tells us that these 5-sentences extracts E (the average length of an abstract), cover 88% of A in which 42% derives solely from one-word matches, 22% two words, 11% three words, and 6% four words. The average number of sentences per text in the corpus is about 70. If we produce an extract of about 10% of the average length of a text, i.e. 7 sentences, the coverage score is 0.91. This result is extremely promising and confirms the OPP-selected extract bearing important  OPP selected positions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML