File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1102_evalu.xml

Size: 5,611 bytes

Last Modified: 2025-10-06 13:59:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1102">
  <Title>A Practical Text Summarizer by Paragraph Extraction for Thai</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Data Sets
</SectionTitle>
      <Paragraph position="0"> The typical approach for testing a summarization system is to create an &amp;quot;ideal&amp;quot; summary, either by professional abstractors or merging summaries provided by multiple human subjects using methods such as majority opinion, union, or intersection (Jing et al., 1998). This approach is known as intrinsic method. Unlike in English, standard data sets in Thai are not yet available for evaluating text summarization system. However, in order to observe characteristics of our algorithm, we collected Thai documents, including agricultural news (D1.AN), general news (D2.GN), and columnist's articles (D3.CA) to make data sets. Each data set consists of 10 documents, and document sizes range from 1 to 4 pages. We asked a student in the Department of Thais, Faculty of Liberal Arts, for manual summarization by selecting the most relevant paragraphs that can indicate the main points of the document. These paragraphs are called extracts, and then are used for evaluating our algorithm.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Performance Evaluations
</SectionTitle>
      <Paragraph position="0"> We evaluate results of summarization by using the standard precision, recall, and F1. Let J be the number of extracts in the summary, K be the number of selected paragraphs in the summary, and M be the number of extracts in the test document. We then refer to precision of the algorithm as the fraction between the number of extracts in the summary and the number of selected paragraphs in the summary:</Paragraph>
      <Paragraph position="2"> recall as the fraction between the number of extracts in the summary and the number of extracts in the test document:</Paragraph>
      <Paragraph position="4"> Finally, F1, a combination of precision and recall, can be calculated as follows:</Paragraph>
      <Paragraph position="6"/>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Experimental Results
</SectionTitle>
      <Paragraph position="0"> In this section, we provide experimental evidence that our algorithm gives acceptable performance.</Paragraph>
      <Paragraph position="1"> The compression rate of paragraph extraction to form a summary is 20% and 30%. These rates yield the number of extracts in the summary comparable to the number of actual extracts in a given test document. The threshold a of the cosine similarity is 0.2. The parameter l for combining the local and global properties is 0.5. For the distance between significant words in a cluster, we set that significant words are separated by not more than three insignificant words.</Paragraph>
      <Paragraph position="2"> Table 1 and 2 show a summary of precision, recall, and F1 for each compression rate, respectively. We can see that average precision values of our algorithm slightly decrease, but average recall values increase when we increase the compression rate. kh Maas Maakh ay (Keywords): ephneth iiyme` m , ombaayl oprechsech`r , oprechsech`r , ` inethl , ekhr uue`ngon tb ukh , edsk th `p , ethkhonoly ii, pras ithth iphaaph , kaaraich phl angngaan , br is ath` inethl , r unp cchcch ub an th Maakaary `e`ksaarth ii 20% (Summarization result at 20%): th iin aas angektk kh uue`th ang &amp;quot;ephrskh`tt &amp;quot; aela &amp;quot;ephneth iiyme` m &amp;quot; n an aid r abkaarph athnaakh uenmaabnph uuenthaan sthaap tykrrmed iiywk an n anhmaaykhwaamw aa t`nn iiedsk th `poprechsech`r aelaombaayl oprechsech`r kh`ng ` inethlm iipras ithth iphaaphaelakhwaamsaamaarthth adeth iiymk anael w hr uue`th aacchat aangk ank khngel kn `y ainaeng kh`ngethkhonoly ii ` inethlombaayl oprechsech`r ainp cchcch ub ancchaep nkaarph athnaat `y`dcchaakedsk th `p oprechsech`r odym iikaarpr abpr ungaih m iikaaraich phl angngaann `ylng k inaifn `ylng ch uengekhr uue`ngon tb ukhth iiaich ` inethl ombaayl oprechsech`r r unp cchcch ub ancchasaamaarthr anbnaebtet`r iiaid naan 1-4 ch awomng ainaeng kh`ngkaartlaad ` inethlcchath Maatlaadoprechsech`r ephneth iiyme` m phaayait ch uue` &amp;quot;echnthr ion &amp;quot; (Centrino) odywaangcch Maahn wyep nch udaeph khekcch th iin`kcchaakccham iioprechsech`r ael w y angm iich ipech taelaomd uulrabbs uue`saarair  pression rate 20%.</Paragraph>
      <Paragraph position="3"> Since using higher compression rate tends to select more paragraphs from the document, it increases the chance that the selected paragraphs will be matched with the target extracts. On the other hand, it also selects irrelevant paragraphs to be included in the summary, so precision can decrease. Further experiments on larger text corpora are needed to determine the performance of our summarizer. However, these preliminary results are very encouraging. Figure 4 illustrates an example of keywords and extracted summaries for a Thai document using compression rate 20% . The implementation of our algorithm is now available for user testing at http://mickey.</Paragraph>
      <Paragraph position="4"> sci.ku.ac.th/~TextSumm/index.html. The computation time to summarize moderately-sized documents, such as newspaper articles, is less one second. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML