File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1064_intro.xml

Size: 2,996 bytes

Last Modified: 2025-10-06 14:02:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1064">
  <Title>Dependency-based Sentence Alignment for Multiple Document Summarization</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Many researchers who study automatic summarization want to create systems that generate abstracts of documents rather than extracts. We can generate an abstract by utilizing various methods, such as sentence compaction, sentence combination, and paraphrasing. In order to implement and evaluate these techniques, we need large-scale corpora in which the original sentences are aligned with summary sentences. These corpora are useful for training and evaluating sentence extraction systems.</Paragraph>
    <Paragraph position="1"> However, it is costly to create these corpora.</Paragraph>
    <Paragraph position="2"> Figure 1 shows an example of summary sentences and original sentences from TSC-2 (Text Summarization Challenge 2) multiple document summarization data (Okumura et al., 2003). From this example, we can see many-to-many correspondences.</Paragraph>
    <Paragraph position="3"> For instance, summary sentence (A) consists of a part of source sentence (A). Summary sentence (B) consists of parts of source sentences (A), (B), and (C). It is clear that the correspondence among the sentences is very complex. Therefore, robust and accurate alignment is essential.</Paragraph>
    <Paragraph position="4"> In order to achieve such alignment, we need not only syntactic information but also semantic information. Therefore, we combine two methods. First, we introduce the &amp;quot;dependency tree path&amp;quot; (DTP) for Source(A): a2a4a3a6a5a8a7a10a9a12a11a12a13a15a14a17a16a19a18a21a20a21a22a15a23a25a24a12a26a10a27a19a28a30a29a15a31a33a32</Paragraph>
    <Paragraph position="6"> First, we stop the new investment of 64-Mega bit memory from competitive companies, such as in Korea or Taiwan, and we begin the investment for development of valuable system-on-chip or 256-Mega bit DRAM from now on.</Paragraph>
    <Paragraph position="7"> Source(B): a2a42a144a145a5a42a146 a147 a148 a136 a149a30a150a8a151a133a152a4a153a95a154a25a20a156a155a158a157a6a159 a160 a161 a162 a163</Paragraph>
    <Paragraph position="9"> We begin the investment for valuable development and will be supplied with general-purpose DRAMs for personal computers from Taiwan in the long run.</Paragraph>
    <Paragraph position="10">  their source sentences from TSC-2 multiple document summarization data. Underlined strings are used in summary sentences.</Paragraph>
    <Paragraph position="11"> syntactic information. Second, we introduce the &amp;quot;Extended String Subsequence Kernel&amp;quot; (ESK) for semantic information.</Paragraph>
    <Paragraph position="12"> Experimental results using different similarity measures show that DTP consistently improves alignment accuracy and ESK enhances the performance. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML