File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1710_concl.xml

Size: 1,337 bytes

Last Modified: 2025-10-06 13:55:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1710">
  <Title>Web Corpus Mining by instance of Wikipedia</Title>
  <Section position="6" start_page="72" end_page="72" type="concl">
    <SectionTitle>
4 Conclusion
</SectionTitle>
    <Paragraph position="0"> We presented a cluster-based approach to structure learning in the area of web documents. This was done in order to approach the goal of a combined algorithm of webgenre exploration and categorization. As argued in section (1), such an algorithm is needed in web corpus linguistics for webgenre tagging as a prerequisite of measuring genre-sensitive collocations. In order to evaluate the present approach, we utilized a corpus of wiki-based articles. The evaluation showed that there is an information gain when measuring the similarities of web documents irrespective of their lexical content. This is in the line of the genre model of systemic functional linguistics (Ventola, 1987) which prospects an impact of genre membership on text structure. As the corpus used for evaluation is limited to tree-like structures, this approach is in need for further development. Future work will address this task. This regards especially the classification of graph-like representations of web documents which take their link structure into account. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML