XML Viewer - j02-4006

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/j02-4006_metho.xml
Size: 33,361 bytes
Last Modified: 2025-10-06 14:07:55
<?xml version="1.0" standalone="yes"?>
<Paper uid="J02-4006">
  <Title>Using Hidden Markov Modeling to Decompose Human-Written Summaries</Title>
  <Section position="5" start_page="531" end_page="535" type="metho">
    <SectionTitle>
3 Given an N-word sequence (I
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> times in the document, for i = 1***N, then the total number of possible position sequences is F  The sequences of positions in summary sentence decomposition. general heuristic rules can be safely assumed: First, humans are more likely to cut phrases than single, isolated words; second, humans are more likely to combine nearby sentences into a single sentence than those far apart. These two rules guide us in the decomposition process.</Paragraph>
    <Paragraph position="3"> We translate the heuristic rules into the bigram probability PROB(I</Paragraph>
    <Paragraph position="5"> represent two adjacent words in the input summary sentence (abbreviated henceforth as PROB(I</Paragraph>
    <Paragraph position="7"> assigned as follows:  [?] 1)) (i.e., words in two adjacent positions in the document), then PROB(I</Paragraph>
    <Paragraph position="9"> ) is assigned the maximal value P1.</Paragraph>
    <Paragraph position="10"> For example, PROB((subcommittee =(2, 41)  |communications =(2, 40)) in Figure 1 will be assigned the maximal value. (Rule: Two adjacent words in a summary are most likely to come from two adjacent words in the</Paragraph>
    <Paragraph position="12"> ) is assigned the second-highest value P2. For example, PROB(of =(4, 16)  |subcommittee = (4, 1)) will be assigned a high probability. (Rule: Adjacent words in a summary are highly likely to come from the same sentence in the document, retaining their relative order, as in the case of sentence reduction. This rule can be further refined by adding restrictions on</Paragraph>
    <Paragraph position="14"> ) is assigned the third-highest value P3. For example, PROB(of =(2, 30)  |subcommittee = (2, 41)). (Rule: Adjacent words in a summary can come from the same sentence in the document but change their relative order. For example, a subject can be moved from the end of the sentence to the front, as in</Paragraph>
    <Paragraph position="16"> ) is assigned the fourth-highest value P4. For example, PROB(of =(3, 5)  |subcommittee =</Paragraph>
    <Section position="1" start_page="533" end_page="534" type="sub_section">
      <SectionTitle>
Jing Decomposing Human-Written Summaries
</SectionTitle>
      <Paragraph position="0"> (2, 41)). (Rule: Adjacent words in a summary can come from nearby sentences in the document and retain their relative order, such as in sentence combination. CONST is a small constant such</Paragraph>
      <Paragraph position="2"> ) is assigned the fifth-highest value P5. For example, PROB(of =(1, 10)  |subcommittee = (2, 41)). (Rule: Adjacent words in a summary can come from nearby sentences in the document but reverse their relative orders.)</Paragraph>
      <Paragraph position="4"> ) is assigned the smallest value P6. For example, PROB(of =(23, 43)  |subcommittee =(2, 41)). (Rule: Adjacent words in a summary are not very likely to come from sentences far apart.) Figure 2 shows a graphical representation of the above rules for assigning bi-gram probabilities. The nodes in the figure represent possible positions in the document, and the edges output the probability of moving from one node to another. These bigram probabilities are used to find the most likely position sequence in the next step. Assigning values to P1-P6 is experimental. In our experiments, the maximal value is assigned 1 and others are usually assigned evenly decreasing values: 0.9, 0.8, and so on. These values, however, can be experimentally adjusted for different corpora. We decide the approximate optimal values of P1-P6 by testing different values for P1-P6 and choosing the values that give the best performance in the tests.</Paragraph>
      <Paragraph position="5"> Figure 2 is considered a very abstract representation of our HMM for decomposition. Each word position in the figure represents a state in the HMM. For example, (S, W) is a state, and (S, W + 1) is another state. Note that (S, W) and (S, W + 1) are relative values; the S and W in the state (S, W) have different values based on the  Computational Linguistics Volume 28, Number 4 particular word position under consideration. This relative model can be easily transformed, however, into an absolute model. (S, W) can be replaced by every possible word position in the document; transition probabilities between every possible pair of positions can be assigned in the same way as in Figure 2. In section 3.6, we describe how the abstract model can be transformed into the absolute model and give a formal description of our HMM.</Paragraph>
    </Section>
    <Section position="2" start_page="534" end_page="534" type="sub_section">
      <SectionTitle>
3.3 The Viterbi Algorithm
</SectionTitle>
      <Paragraph position="0"> To find the most likely sequence, we must find a sequence of positions that maximizes the probability PROB(I</Paragraph>
      <Paragraph position="2"> ). Using the bigram model, this probability can be approximated as</Paragraph>
      <Paragraph position="4"> ) has been assigned as indicated earlier, we therefore have all the information needed to solve the problem. We use the Viterbi algorithm (Viterbi 1967) to find the most likely sequence. For an N-word sequence, supposing each word occurs M times in the document, the Viterbi algorithm is guaranteed to find the most likely sequence using k x N x M  steps for some constant k, compared to M N for the brute force search algorithm.</Paragraph>
      <Paragraph position="5"> We have slightly revised the Viterbi algorithm for our application. In the initialization step, equal chance is assumed for each possible document position of the first word in the sequence. In the iteration step, we take special measures to handle the case when a summary word does not appear in the document (i.e., has an empty position list). We mark the word as nonexistent in the original document and continue the computation as if it did not appear in the sequence.</Paragraph>
    </Section>
    <Section position="3" start_page="534" end_page="535" type="sub_section">
      <SectionTitle>
3.4 Postediting
</SectionTitle>
      <Paragraph position="0"> After the phrases are identified, the program postedits to cancel mismatchings that arise because the Viterbi algorithm assigns each word in the input sequence to a position in the document, as long as the word appears at least once. For instance, in the example of sentence combination given in section 2, the summary sentence combined two reduced document sentences by adding the conjunction and. The word and was inserted by the human writer, but the Viterbi algorithm assigned it to a document position, since it occurred in the original document. The goal of the postedit step is to annul such mismatchings.</Paragraph>
      <Paragraph position="1"> The postedit step deals with two types of mismatchings: wrong assignment of document positions for inserted stop words in a summary sentence and wrong assignment of document positions for isolated content words in a summary sentence. To correct the first type of mismatching, if any document sentence contributes only stop words for the summary, the matching is canceled, since the stop words are more likely to have been inserted by humans rather than coming from the original document. This is the case for the example just discussed. To correct the second type of mismatching, if a document sentence provides only a single non-stop word, we also cancel such matching, since humans rarely cut single words from the original text to generate a summary sentence.</Paragraph>
    </Section>
    <Section position="4" start_page="535" end_page="535" type="sub_section">
      <SectionTitle>
Jing Decomposing Human-Written Summaries
3.5 An Example
</SectionTitle>
      <Paragraph position="0"> To demonstrate the program, we now present an example from beginning to end. The following input sample summary sentence is also shown in Figure 3: Arthur B. Sackler, vice president for law and public policy of Time Warner Inc.</Paragraph>
      <Paragraph position="1"> and a member of the Direct Marketing Association, told the communications subcommittee of the Senate Commerce Committee that legislation to protect children's privacy online could destroy the spontaneous nature that makes the Internet unique.</Paragraph>
      <Paragraph position="2"> We first indexed the document, listing for each word its possible positions in the document.</Paragraph>
      <Paragraph position="3">  possible position sequences. Using the bigram probabilities as assigned in section 3.2, we ran the Viterbi algorithm to find the most likely position sequence. After every word was assigned a most likely document position, we marked the phrases in the sentence by conjoining words from adjacent document positions.</Paragraph>
      <Paragraph position="4"> Figure 3 shows the final result for the sample input summary sentence. The phrases in the summary are tagged (FNUM:SNUM actual-text), where FNUM is the sequential number of the phrase and SNUM is the number of the document sentence in which the phrase originates. SNUM = [?]1 means that the phrase is not derived from the original document. The borrowed phrases are tagged (FNUM actual-text) in the document sentences.</Paragraph>
      <Paragraph position="5"> In this example, the program correctly concluded that the summary sentence was constructed by reusing the original text. It identified the four document sentences that were combined into the summary sentence; it also correctly divided the summary sentence into phrases, pinpointing the exact document origin of each. In this example, the phrases that were borrowed from the document ranged from single words to long clauses. Certain borrowed phrases were also syntactically transformed; despite these, the program successfully decomposed the sentence.</Paragraph>
      <Paragraph position="6"> The decomposition outputs such as shown in Figure 3 were then used for building the training corpora for sentence reduction and sentence combination. The output shown in Figure 3 was included in the corpus for sentence combination, since the summary sentence was constructed by merging document sentences. If a summary sentence was constructed by removing phrases from a single document sentence, then it was included in the training corpus for sentence reduction.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="535" end_page="537" type="metho">
    <SectionTitle>
4 The original document contained 25 sentences and 727 words in total.
</SectionTitle>
    <Paragraph position="0"> inc) (F1:S-1 and) (F2:S0 a member of the direct marketing association told) (F3:S2 the communications subcommittee of the senate commerce committee) (F4:S-1 that legislation) (F5:S1 to protect) (F6:S4 children's) (F7:S4 privacy) (F8:S4 online) (F9:S0 could destroy the spontaneous nature that makes the internet unique) Source Document Sentences: Sentence 0: a proposed new law that would require web publishers to obtain parental consent before collecting personal information from children (F9 could destroy the spontaneous nature that makes the internet unique) (F2 a member of the direct marketing association told) a senate panel thursday Sentence 1: (F0 arthur b sackler vice president for law and public policy of time warner inc) said the association supported efforts (F5 to protect) children online but he urged lawmakers to find some middle ground that also allows for interactivity on the internet Sentence 2: for example a child's e-mail address is necessary in order to respond to inquiries such as updates on mark mcguire's and sammy sosa's home run figures this year or updates of an online magazine sackler said in testimony to (F3 the communications subcommittee of the senate commerce committee) Sentence 4: the subcommittee is considering the (F6 children's) (F8 online) (F7 privacy) protection act which was drafted on the recommendation of the federal trade commission Figure 3 A sample output of the summary sentence decomposition program (boldface text indicates material that was cut from the original document and reused in the summary, and italic text in the summary sentence indicates material that was added by the human writer).</Paragraph>
    <Section position="1" start_page="536" end_page="537" type="sub_section">
      <SectionTitle>
3.6 Formal Description of Our Hidden Markov Model
</SectionTitle>
      <Paragraph position="0"> We first illustrate how an absolute model can be created from the relative model represented in Figure 2. For simplicity, suppose there are only two sentences in the original document and each sentence has two words. From the relative model in Figure 2, we can build an absolute model as shown in Figure 4.</Paragraph>
      <Paragraph position="1"> In the absolute model, there are four states, (1,1), (1,2), (2,1), and (2,2), each corresponding to a word position. Each state has only one observation symbol (i.e., out-</Paragraph>
      <Paragraph position="3"> Example of the absolute hidden Markov model.</Paragraph>
    </Section>
    <Section position="2" start_page="537" end_page="537" type="sub_section">
      <SectionTitle>
Jing Decomposing Human-Written Summaries
</SectionTitle>
      <Paragraph position="0"> put): the word in that position. Each state is interconnected with the other states in the model. The state transition probabilities, which represent the probabilities of transitioning from one state to another state, can be assigned following the rules shown in Figure 2. In this case, however, we need to normalize the values of {P  so that for each state the sum of the transition probabilities is one, which is a basic requirement for an HMM. This normalization is needed in theory in order to conform our relative model to a formal model, but in practice it is not needed in the decomposition process, since it does not affect the final result. The initial state distribution is uniform; that is, the initial state, labeled as Ph in Figure 4, has an equal chance to reach any state in the model.</Paragraph>
      <Paragraph position="1"> We give a formal description of our HMM for decomposition as follows. For each original document, we can build an absolute model based on the relative model in Figure 2. In the absolute model, each state corresponds to a word position, and each word position corresponds to a state. The observation symbol set includes all the words in the document, and the observation symbol probabilities are defined as</Paragraph>
      <Paragraph position="3"> ) are defined as we described in Figure 2, with every word position linked to every other word position, and state initial probabilities are uniform as we mentioned. This Markov model is hidden because one symbol sequence can correspond to many state sequences, meaning that many position sequences can correspond to a word sequence, as shown in Figure 1. Generally, in a hidden Markov model, one state sequence can also corrrespond to many symbol sequences. Our HMM does not have this attribute.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="537" end_page="540" type="metho">
    <SectionTitle>
4. Evaluations
</SectionTitle>
    <Paragraph position="0"> Three experiments were performed to evaluate the decomposition module. In the first experiment, we evaluated decomposition in a task called summary alignment. This measured how successfully the decomposition program can align sentences in the summary with document sentences that are semantically equivalent. In the second experiment, we asked humans to judge whether the decomposition results were correct. Compared to the first experiment, this was a more direct evaluation, using a larger collection of documents. The third experiment evaluated the portability of the program.</Paragraph>
    <Paragraph position="1"> The corpus used in the first experiment consisted of 10 documents from the Ziff-Davis corpus, which contains articles related to computer products and is available on TIPSTER discs from Linguistic Data Consortium (LDC) (Harman and Liberman 1993). The corpus used in the second experiment consisted of 50 documents related to telecommunications issues. The corpus used in the third experiment consisted of legal documents on court cases, provided by the Westlaw Group.</Paragraph>
    <Section position="1" start_page="537" end_page="539" type="sub_section">
      <SectionTitle>
4.1 Summary Alignment
</SectionTitle>
      <Paragraph position="0"> The goal of the summary alignment task was to find sentences in the document that were semantically equivalent to the summary sentences. We used a small collection of 10 documents, gathered by Marcu (1999). Marcu presented these 10 documents together with their human-written summaries from the Ziff-Davis corpus to 14 human judges. These human judges were instructed to extract sentences from the original document that were semantically equivalent to the summary sentences. Sentences selected by the majority of human judges were collected to build an extract (i.e., extraction-based summary) of the document. This resulting extract was used as the gold standard in our evaluation. Note that this evaluation will be biased against the  decomposition model, as Marcu's semantic equivalence is a broader concept than our cut-and-paste equivalence.</Paragraph>
      <Paragraph position="1"> Decomposition provides a list of source document sentences for each summary sentence, as shown in Figure 3. We can build an automatic extract for the document by selecting all the source document sentences identified by the decomposition program. We compared this automatic extract with the gold-standard extract. The program achieved an average 81.5% precision, 78.5% recall, and 79.1% F-measure for 10 documents. By comparison, the average performance of 14 human judges was 88.8% precision, 84.4% recall, and 85.7% F-measure. Detailed results for each document are shown in Table 1. Precision, recall, and F-measure are computed as follows:  Further analysis indicates two types of errors made by the program. The first is that the program failed to find semantically equivalent sentences with very different wordings. For example, it did not find the correspondence between the summary sentence Running Higgins is much easier than installing it and the document sentence The program is very easy to use, although the installation procedure is somewhat complex. This is not really an &amp;quot;error,&amp;quot; since the program is not designed to find such paraphrases. For decomposition purposes, the program needs only to indicate that the summary sentence is not produced by cutting and pasting text from the original document. The program correctly indicated this by returning no matching sentence.</Paragraph>
      <Paragraph position="2"> The second problem is that the program may identify a nonrelevant document sentence as relevant if it contains some words common to the summary sentence.</Paragraph>
      <Paragraph position="3"> This typically occurs when a summary sentence is not constructed by cutting and pasting text from the document but shares words with certain document sentences.</Paragraph>
      <Paragraph position="4"> For example, the decomposition program mistakenly linked the summary sentence The program is very easy to use, although the installation procedure is somewhat complex with</Paragraph>
    </Section>
    <Section position="2" start_page="539" end_page="539" type="sub_section">
      <SectionTitle>
Jing Decomposing Human-Written Summaries
</SectionTitle>
      <Paragraph position="0"> the document sentence All you need to decide during the easy installation is where you want to put the Higgins files and associated directories; this must be a directory available to all e-mail users, because they had a number of words in common, including the, is, easy, to, and installation. Our postediting steps are designed to cancel such false matchings, although we cannot remove them completely.</Paragraph>
      <Paragraph position="1"> It is worth noting that the extract based on human judgments, considered the gold standard in this evaluation, is not perfect. For example, two document sentences may express the same information (i.e., they are semantic paraphrases), and all human subjects may consider this information important enough to be in the summary, but half of the subjects selected one sentence and half selected the other; thus, both sentences will be included in the extract although they are semantic paraphrases. Precisely this happened in the extract of document ZF109-601-903. The document sentence This group productivity package includes e-mail, group scheduling and alerting, keyword cross-reference filing, to-do lists, and expense reporting and the document sentence At $695 for 8 users, this integrated software package combines LAN-based e-mail with a variety of personal information management functions, including group scheduling, personal calendars, to-do lists, expense reports, and a cross-referenced key-word database were both included in the extract, although they contain very similar information. The program picked up only the second document sentence, yet this correct decision was penalized in the evaluation because of the mistake in the gold standard.</Paragraph>
      <Paragraph position="2"> The program won perfect scores for 3 out of 10 documents. We checked the three summaries and found that their texts were largely produced by cut and paste, compared to other summaries with sentences written completely from scratch by humans.</Paragraph>
      <Paragraph position="3"> This indicates that when only the decomposition task is considered, the algorithm performs very well.</Paragraph>
    </Section>
    <Section position="3" start_page="539" end_page="539" type="sub_section">
      <SectionTitle>
4.2 Human Judgments of Decomposition Results
</SectionTitle>
      <Paragraph position="0"> Since the first experiment did not directly assess the program's performance for the decomposition task, we conducted another experiment to evaluate the correctness of the decomposition results. First, we selected 50 summaries from a telecommunications corpus and ran the decomposition program.</Paragraph>
      <Paragraph position="1"> A human subject was asked to judge whether the decomposition results were correct. A result was considered correct when all three questions posed in the decomposition problem were correctly answered. As stated in section 1, the decomposition program needs to answer the following three questions: (1) Is a summary sentence constructed by reusing the text from the original document? (2) If so, what phrases in the sentence come from the original document? and (3) From where in the document do the phrases come? The 50 summaries contained a total of 305 sentences.</Paragraph>
      <Paragraph position="2"> Eighteen (6.2%) sentences were wrongly decomposed, for an accuracy rate of 93.8%.</Paragraph>
      <Paragraph position="3"> Most errors occurred when a summary sentence was not constructed by cutting and pasting but contained many overlapping words with certain sentences in the document. The accuracy rate here was much higher than the precision and recall results in the first experiment. An important factor here is that we did not require the program to find semantically equivalent document sentence(s) if a summary sentence used very different wordings.</Paragraph>
    </Section>
    <Section position="4" start_page="539" end_page="540" type="sub_section">
      <SectionTitle>
4.3 Portability
</SectionTitle>
      <Paragraph position="0"> In the third and final evaluation of decomposition, we tested the program on legal documents in a joint experiment with the Westlaw Group, which provides lawyers with court case documents. Such documents start with a &amp;quot;synopsis&amp;quot; of the case, written by attorneys, followed by &amp;quot;headnotes,&amp;quot; which are points of law also written by  notwithstanding) (F3:S0 the return) (F4:S1 operates as an admission by relator of truth of facts well pleaded), (F5:S1 but claims that in law the return presents no sufficient) (F6:S-1 ground) (F7:S1 why relief sought) (F8:S1 should not be granted). OPINION: Sentence 0: As to the effect to be given (F0 the motion for the issuance of the peremptory writ) (F3 the return) of the respondents (F2 notwithstanding), it is well to state at the outset that under our decided cases such a motion stands as the equivalent of a demurrer to a pleading in a law action.</Paragraph>
      <Paragraph position="1"> Sentence 1: It (F4 operates as an admission by the relator of the truth of the facts well pleaded) by the respondent (F5 but claims that in law the return presents no sufficient) reason (F7 why the relief sought) in the alternative writ (F8 should not be granted).</Paragraph>
      <Paragraph position="2"> Figure 5 A sample output of legal document decomposition (boldface text indicates material that was cut from the original document and reused in the summary, and italic text in the summary sentence indicates material that was added by the human writer).</Paragraph>
      <Paragraph position="3"> attorneys and summarized from the discussions. The last part is the discussion, called &amp;quot;opinion.&amp;quot; The task here was to match each headnote entry with the corresponding text in the opinion. When lawyers study a legal case document, they can see not only the important points of law, but also where these points are discussed in the opinion. We applied our decomposition program to this task. We did not adjust our HMM parameters. A sample decomposition result is shown in Figure 5. Similar to the notation used in Figure 3, the phrases in the headnote are tagged (FNUM:SNUM actual-text), where FNUM is the sequential number of the phrase and SNUM is the number of the document sentence where the phrase comes from. SNUM = [?]1 means that the phrase did not come from the original document. The borrowed phrases are tagged (FNUM actual-text) in the opinion. Note that in this example, we ignored the difference of the determiners (&amp;quot;a,&amp;quot; &amp;quot;the,&amp;quot; etc.) in the phrases, so the summary phrase &amp;quot;a motion for issuance of a peremptory writ&amp;quot; was considered to originate from the document phrase &amp;quot;the motion for the issuance of the peremptory writ,&amp;quot; although the two phrases were not identical.</Paragraph>
      <Paragraph position="4"> We received 11 headnotes from Westlaw and examined the decomposition results for all of them. The program found the correct source sentences and identified the correct origins of the phrases for every headnote.</Paragraph>
      <Paragraph position="5"> In summary, we performed three experiments in three different domains--computer, telecommunications news, and legal--and in each case achieved good results, with no change or minimal parameter adjustment to the HMM. This demonstrates that our proposed decomposition approach is portable. The reason for this portability may be that the heuristic rules that we used to build the HMM are indeed general and remain true for different humans and for articles from different domains.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="540" end_page="541" type="metho">
    <SectionTitle>
5. Applications of Decomposition Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="540" end_page="541" type="sub_section">
      <SectionTitle>
5.1 Providing Training and Testing Corpora for Summarization
</SectionTitle>
      <Paragraph position="0"> We have used the decomposition results in our development of a text generation system for domain-independent summarization. The generation system mimics two revision operations presented in section 2: sentence reduction and sentence combination. The decomposition program is used to build corpora for training and evaluating</Paragraph>
    </Section>
    <Section position="2" start_page="541" end_page="541" type="sub_section">
      <SectionTitle>
Jing Decomposing Human-Written Summaries
</SectionTitle>
      <Paragraph position="0"> the sentence reduction and combination modules. The corpora contained examples as shown in Figure 3. Details of the summarization system can be found in Jing (2001).</Paragraph>
    </Section>
    <Section position="3" start_page="541" end_page="541" type="sub_section">
      <SectionTitle>
5.2 Corpus Analysis
</SectionTitle>
      <Paragraph position="0"> We performed a corpus analysis using the decomposition program. We automatically analyzed 300 human-written summaries of news articles on telecommunications, provided by the Benton Foundation. The number of sentences in each summary ranged from 2 to 21; the corpus contained a total of 1,642 summary sentences. The results indicated that 315 summary sentences (19%) did not have matching sentences in the document: They were written from scratch by humans rather than by cutting and pasting phrases from the original text. Of the summary sentences, 686 (42%) matched a single sentence in the document. These sentences were constructed by sentence reduction, sometimes together with other operations such as lexical paraphrasing and syntactic transformation. In addition, 592 sentences (36%) matched two or three sentences in the document and 49 sentences (3%) matched more than three sentences in the document. These sentences were constructed by sentence combination, often together with other operations, especially sentence reduction, since the sentences were usually reduced before they were combined. These results suggested that a significant portion (81%) of summary sentences produced by humans were based on cutting and pasting the original text. Sentence reduction was applied in at least 42% of the cases.</Paragraph>
      <Paragraph position="1"> Sentence combination was applied in 39% of the cases.</Paragraph>
    </Section>
    <Section position="4" start_page="541" end_page="541" type="sub_section">
      <SectionTitle>
5.3 Improving User Interfaces
</SectionTitle>
      <Paragraph position="0"> The decomposition result can be used in applications other than summarization. For example, in the experiment we performed jointly with Westlaw (see section 4.3), we found that linking summaries and original documents can potentially improve user interfaces, helping users to easily browse and find relations between portions of the text.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="541" end_page="542" type="metho">
    <SectionTitle>
6. Related Work
</SectionTitle>
    <Paragraph position="0"> Researchers have previously tried to align summary sentences with sentences in a document, mostly by manual effort (Edmundson 1969; Kupiec, Pedersen, and Chen 1995; Teufel and Moens 1997). Given the cost of this manual annotation process, only small collections of text have been annotated. Decomposition provides a means of performing this alignment automatically, building large corpora for summarization research.</Paragraph>
    <Paragraph position="1"> Marcu (1999) presented an approach for aligning summary sentences with semantically equivalent sentences in a document. It adopted an information retrieval based approach, coupled with discourse processing. Although our decomposition also aims to link summaries with the original documents, major differences exist between the two approaches. While Marcu's algorithm operates at the sentence or clause level, our decomposition program deals with phrases at various granularities (anything from a word to a complicated phrase to a complete sentence). Furthermore, the approaches used by the two systems are distinct. Marcu's approach first breaks sentences into clauses, then uses rhetorical structure to decide which clauses should be considered, and finally employs an IR-based similarity measure to decide which clauses in the document are similar to those in human-written abstracts. Our HMM solution first builds the HMM, then uses a dynamic programming technique to find the optimal answer.</Paragraph>
    <Paragraph position="2"> Marcu reported a performance of 77.45%, 80.06%, and 78.15% for precision, recall, and F-measure, respectively, when the system was evaluated at the sentence level in the  Computational Linguistics Volume 28, Number 4 summary alignment task described in section 3.1. When tested on the same set of test documents and for the same task, our system averaged 81.5% precision, 78.5% recall, and 79.1% F-measure, as shown in Table 1.</Paragraph>
    <Paragraph position="3"> We transformed the decomposition problem into a problem of finding the most likely document position for each word in the summary, which is, in some sense, similar to the problem of aligning parallel bilingual corpora (Brown, Lai, and Mercer 1991; Gale and Church 1991). Whereas Brown, Lai, and Mercer and Gale and Church aligned sentences in a parallel bilingual corpus, we aligned phrases in a summary with phrases in a document. Brown, Lai, and Mercer (1991) also used an HMM in their solution for bilingual corpora alignment. Their model and our model, however, differ greatly: Their model used sentence length as a feature, whereas ours used word position as a feature; they used an aligned training corpus to compute transition probabilities, whereas we did not use any annotated training data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML