File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1064_metho.xml
Size: 9,671 bytes
Last Modified: 2025-10-06 14:08:43
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1064"> <Title>Dependency-based Sentence Alignment for Multiple Document Summarization</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 An Alignment Method based on Syntax </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> and Semantics </SectionTitle> <Paragraph position="0"> For example, Figure 2 shows two sentences that have different word order but the same meaning.</Paragraph> <Paragraph position="1"> The English translation is &quot;I took the lost article to the neighborhood police.&quot;</Paragraph> <Paragraph position="3"> Since conventional techniques other than BOW are strongly influenced by word order, they are fragile when word order is damaged.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Dependency Tree Path (DTP) </SectionTitle> <Paragraph position="0"> When we unify two sentences, some elements become longer, and word order may be changed to improve readability. When we rephrase sentences, the dependency structure does not change in many cases, even if word order changes. For example, the two sentences in Figure 2 share the same dependence structure shown in Figure 3. Therefore, we transform a sentence into its dependency structure.</Paragraph> <Paragraph position="1"> This allows us to consider a sentence as a set of dependency tree paths from a leaf to the root node of the tree.</Paragraph> <Paragraph position="2"> For instance, the two sentences in Figure 2 can be transformed into the following DTPs.</Paragraph> <Paragraph position="3"> a18a20a19a22a21a24a23a26a25a28a27 (I took) watashi ga todoke ta a18a20a29a20a30a32a31a34a33a36a35a38a37a39a23a26a25a40a27 (took to the neighborhood police) kinjo no keisatsu ni todoke ta a18a42a41a34a43a45a44a47a46a34a23a48a25a28a27 (took the lost article) otoshimono wo todoke ta.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 An Alignment Algorithm using DTPs </SectionTitle> <Paragraph position="0"> In this section, we describe a method that aligns source sentences with the summary sentences in an abstract.</Paragraph> <Paragraph position="1"> Our algorithm is very simple. We take the corresponding sentence to be the one whose DTP is most similar to that of the summary sentence. The algorithm consists of the following steps: Step 0 Transform all source sentences into DTPs. Step 1 For each sentence &quot;a49 &quot; in the abstract, apply Step 2 and Step 3.</Paragraph> <Paragraph position="2"> Step 2 Transform &quot;a49 &quot; into a DTP set. Here,a50a52a51a53a49a55a54 denotesa49 's DTP set. a50a52a51a53a56a58a57a59a54 denotes the DTP set of thea60-th source sentences.</Paragraph> <Paragraph position="4"> Step 3 For eacha101a103a102a51a105a104a106a50a52a51a53a49a55a54a105a54, we align an optimal source sentence as follows: We define sima51a101a107a102a103a108a56a83a57a59a54 defa109 max sima51a101a107a102a110a108a74a101 a54. Here,a101 a104a42a50a52a51a105a56a58a57a53a54, where, fora101a110a102 , we align a source sentence that satisfiesa111a113a112a105a114a12a115a116a111a113a117a119a118a105a120a122a121a63a123a125a124a105a126a113a127a125a128a130a129a132a131a134a133a86a115 a51a101a102a108a56a57a54. The above procedure allows us to derive many-to-many correspondences.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Similarity Metrics </SectionTitle> <Paragraph position="0"> We need a similarity metric to rank DTP similarity. The following cosine measure (Hearst, 1997) is used in many NLP tasks.</Paragraph> <Paragraph position="2"> a140a143a142a144a76a146a108 denote the weight of terma147 in textsa135a137a136a108a135a78a138 , respectively. Note that syntactic and semantic information is lost in the BOW representation. null In order to solve this problem, we use similarity measures based on word co-occurrences. As an example of it s application, N-gram co-occurrence is used for evaluating machine translations (Papineni et al., 2002). String Subsequence Kernel (SSK) (Lodhi et al., 2002) and Word Sequence Kernel (WSK) (Cancedda et al., 2003) are extensions of n-gram-based measures used for text categorization.</Paragraph> <Paragraph position="3"> In this paper, we compare WSK to its extension, the Extended String Subsequence Kernel (ESK).</Paragraph> <Paragraph position="4"> First, we describe WSK. WSK receives two sequences of words as input and maps each of them into a high-dimensional vector space. WSK's value is just the inner product of the two vectors.</Paragraph> <Paragraph position="5"> For instance, the WSK value for 'abaca' and 'abbab' is determined as follows. A subsequence whose length is three or less is shown in Table 1.</Paragraph> <Paragraph position="6"> Here, a153 is a decay parameter for the number of skipped words. For example, subsequence 'aba' appears in 'abaca' once without skips. In addition, it appears again with two skips, i.e., 'ab**a.' Therefore, abaca's vector has &quot;1+a153a138 &quot; in the component corresponding to 'aba.' From Table 1, we can calculate the WSK value as follows:</Paragraph> <Paragraph position="8"> In this way, we can measure the similarity between two texts. However, WSK disregards synonyms, hyponyms, and hypernyms. Therefore, we introduce ESK, an extension of WSK and a simplification of HDAG Kernel (Suzuki et al., 2003). ESK allows us to add word senses to each word. Here, we do not try to disambiguate word senses, but use all possible senses listed in a dictionary. Figure 4 shows an example of subsequences and their values.</Paragraph> <Paragraph position="9"> The use of word sense yields flexible matching even when paraphrasing is used for summary sentences.</Paragraph> <Paragraph position="10"> Formally, ESK is defined as follows.</Paragraph> <Paragraph position="11"> Here, a229a186a230a231a10a232a234a233a236a235a53a237a105a238a110a239a113a240 is defined as follows. a233a59a235 anda238a107a239 are nodes ofa241 and a242 , respectively. The function</Paragraph> <Paragraph position="13"> Finally, we define the similarity measure by normalizing ESK. This similarity can be regarded as an extension of the cosine measure.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Corpus </SectionTitle> <Paragraph position="0"> We used the TSC2 corpus which includes both single and multiple document summarization data. Table 2 shows its statistics. For each data set, each of three experts made short abstracts and long abstracts. null For each data, summary sentences were aligned with source sentences. Table 3 shows the distribution of the numbers of aligned original sentences for each summary sentence. The values in brackets are percentages. Table 4 shows the distribution of the number of aligned summary sentences for each original sentence. These tables show that sentences are often split and reconstructed. In particular, multiple document summarization data exhibit very complex correspondence because various summarization techniques such as sentence compaction, sentence combination, and sentence integration are used.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Comparison of Alignment Methods </SectionTitle> <Paragraph position="0"> We compared the proposed methods with a baseline algorithm using various similarity measures.</Paragraph> <Paragraph position="1"> Baseline This is a simple algorithm that compares sentences to sentences. Each summary sentence is compared with all source sentences and the top a55 sentences that have a similarity score over a certain threshold valuea56 are aligned.</Paragraph> <Paragraph position="2"> DTP-based Method This method was described in Section 3.2. In order to obtain DTPs, we used the Japanese morphological analyzer ChaSen and the dependency structure analyzer CaboCha (Kudo and Matsumoto, 2002).</Paragraph> <Paragraph position="3"> We utilized the following similarity measures. BOW BOW is defined by equation (1). Here, we use only nouns and verbs.</Paragraph> <Paragraph position="4"> N-gram This is a simple extension of BOW. We add n-gram sequences to BOW. We examined &quot;2-gram&quot; (unigram + bigram) and &quot;3gram,&quot;(unigram + bigram + trigram). TREE The Tree Kernel (Collins and Duffy, 2001) is a similarity measure based on the number of common subtrees. We regard a sentence as a dependency structure tree.</Paragraph> <Paragraph position="5"> (Ikehara et al., 1997), to obtain word senses. The parameters, a57 and a64 , were changed on the same Conditions as above.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Evaluation Metric </SectionTitle> <Paragraph position="0"> Each system's alignment output was scored by the average F-measure. For each summary sentence, the following F-measure was calculated.</Paragraph> <Paragraph position="1"> Here, Precision a58a82a81a84a83 a244 and Recall a58a82a81a85a83a73a86 , where a244 is the number of source sentences aligned by a system for the summary sentence. a81 is the number of correct source sentences in the output. a86 is the number of source sentences aligned by the human expert. We set a87 to 1, so this F-measure was averaged over all summary sentences.</Paragraph> </Section> </Section> class="xml-element"></Paper>