File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/w06-1707_abstr.xml

Size: 803 bytes

Last Modified: 2025-10-06 13:45:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1707">
  <Title>Corporator: A tool for creating RSS-based specialized corpora Cedrick Fairon</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper presents a new approach and a software for collecting specialized corpora on the Web. This approach takes advantage of a very popular XML-based norm used on the Web for sharing content among websites: RSS (Really Simple Syndication). After a brief introduction to RSS, we explain the interest of this type of data sources in the framework of corpus development. Finally, we present Corporator, an Open Source software which was designed for collecting corpus from RSS feeds.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML