File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1705_concl.xml

Size: 1,096 bytes

Last Modified: 2025-10-06 13:55:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1705">
  <Title>Annotated web as corpus</Title>
  <Section position="7" start_page="31" end_page="31" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> Future work includes an analysis of the balance between computational and bandwidth requirements. It is essential in distributing the corpus annotation to achieve small amounts of data transmission in return for large computational gains for each work-unit.</Paragraph>
    <Paragraph position="1"> In this paper, we have discussed the requirement for annotation of web-derived corpus data. Currently, a bottleneck exists in the tagging of web-derived corpus data due to the voluminous amount of corpus processing involved. Our proposal is to construct a framework for large-scale distributed corpus annotation using existing peer-to-peer technology. We have presented the challenges that lie ahead for such an approach. Work is now underway to address the clean-up of PDF data for inclusion into corpora downloaded from the web.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML