File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/03/w03-0504_abstr.xml

Size: 803 bytes

Last Modified: 2025-10-06 13:43:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0504">
  <Title>Summarization of Noisy Documents: A Pilot Study</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> We investigate the problem of summarizing text documents that contain errors as a result of optical character recognition. Each stage in the process is tested, the error effects analyzed, and possible solutions suggested. Our experimental results show that current approaches, which are developed to deal with clean text, suffer significant degradation even with slight increases in the noise level of a document. We conclude by proposing possible ways of improving the performance of noisy document summarization.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML