File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/03/w03-1102_relat.xml

Size: 2,969 bytes

Last Modified: 2025-10-06 14:15:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1102">
  <Title>A Practical Text Summarizer by Paragraph Extraction for Thai</Title>
  <Section position="3" start_page="0" end_page="0" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> A comprehensive survey of text summarization approaches can be found in (Mani, 1999). We briefly review here based on extraction approach.</Paragraph>
    <Paragraph position="1"> Luhn (1959) proposed a simple but effective approach by using term frequencies and their related positions to weight sentences that are extracted to form a summary. Subsequent works have demonstrated the success of Luhn's approach (Buyukkokten et al., 2001; Lam-Adesina and Jones, 2001; Jaruskulchai et al., 2003). Edmunson (1969) proposed the use of other features such as title words, sentence locations, and bonus words to improve sentence extraction. Goldstein et al. (1999) presented an extraction technique that assigns weighted scores for both statistical and linguistic features in the sentence. Recently, Salton et al. (1999) have developed a model for representing a document by using undirected graphs. The basic idea is to consider vertices as paragraphs and edges as the similarity between two paragraphs. They suggested that the most important paragraphs should be linked to many other paragraphs, which are likely to discuss topic covered in those paragraphs.</Paragraph>
    <Paragraph position="2"> Statistical learning approaches have also been studied in text summarization problem. The first known supervised learning algorithm was proposed by Kupiec et al. (1995). Their approach estimates the probability that a sentence should be included in a summary given its feature values based on the independent assumption of Bayes' Rule. Other supervised learning algorithms have already been investigated. Chuang and Yang (2000) studied several algorithms for extracting sentence segments, such as decision tree, naive Bayes classifier, and neural network. They also used rhetorical relations for representing features. One drawback of the supervised learning algorithms is that they require an annotated corpus to learn accurately. However, they may perform well for summarizing documents in a specific domain.</Paragraph>
    <Paragraph position="3"> This paper presents an approach for extracting the most relevant paragraphs from the original document to form a summary. The idea of our approach is to exploit both the local and global properties of paragraphs. The local property can be considered as clusters of significant words within each paragraph, while the global property can be though of as relations of all paragraphs in the document. These two properties can be combined and tuned to produce a single measure reflecting the informativeness of each paragraph. Finally, we can apply this combination measure for ranking and extracting the most relevant paragraphs.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML