File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/c04-1064_relat.xml

Size: 2,104 bytes

Last Modified: 2025-10-06 14:15:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1064">
  <Title>Dependency-based Sentence Alignment for Multiple Document Summarization</Title>
  <Section position="3" start_page="0" end_page="0" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> Several methods have been proposed to realize automatic alignment between abstracts and sentences in source documents.</Paragraph>
    <Paragraph position="1"> Banko et al. (1999) proposed a method based on sentence similarity using bag-of-words (BOW) representation. For each sentence in the given abstract, the corresponding source sentence is determined by combing the similarity score and heuristic rules. However, it is known that bag-of-words representation is not optimal for short texts like single sentences (Suzuki et al., 2003).</Paragraph>
    <Paragraph position="2"> Marcu (1999) regards a sentence as a set of &amp;quot;units&amp;quot; that correspond to clauses and defines similarity between units based on BOW representation. Next, the best source sentences are extracted in terms of &amp;quot;unit&amp;quot; similarity. Jing et al. (Jing and McKeown, 1999) proposed bigram-based similarity using the Hidden Markov Model. Barzilay (Barzilay and Elhadad, 2003) combines edit distance and context information around sentences. However, these three methods tend to be strongly influenced by word order. When the summary sentence and the source sentences disagree in terms of word order, the methods fail to work well.</Paragraph>
    <Paragraph position="3"> The supervised learning-based method called SimFinder was proposed by Hatzivassiloglou et al.</Paragraph>
    <Paragraph position="4"> (Hatzivassiloglou et al., 1999; Hatzivassiloglou et al., 2001). They translate a sentence into a feature vector based on word counts and proper nouns, and so on, and then sentence pairs are classified into &amp;quot;similar&amp;quot; or not. Their approach is effective when a lot of training data is available. However, the human cost of making this training data cannot be disregarded. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML