File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2090_intro.xml

Size: 2,835 bytes

Last Modified: 2025-10-06 14:03:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2090">
  <Title>Implementing a Characterization of Genre for Automatic Genre Identification of Web Pages</Title>
  <Section position="4" start_page="700" end_page="701" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"> Although both Crowston and Williams (2000) and Shepherd and Watters (1998) have well described the evolution of genres on the web, when it comes to the actual genre identification of web pages (Roussinov et al., 2001; and Shepherd et al., 2004, respectively), they set aside the evolutionary aspect and consider genre from a static point of view. For Crowston and Williams (2000) and the follow-up Roussinov et al. (2001) most genres imply a combination of &lt;purpose/function, form, content&gt;, and, as they are complex entities, a multi-facetted classification seems appropriate (Kwasnik and Crowston, 2004). For Shepherd and Watters (1998) and the practical implementation Shepherd et al. (2004), cybergenres or web genres are characterized by the triple &lt;content, form, functionality&gt;, where functionality is a key evolutionary aspect afforded by the web.</Paragraph>
    <Paragraph position="1"> Crowston and co-workers have not yet implemented the combination of &lt;purpose/function, form, content&gt; together with the facetted classification in any automatic classification model, but the tuple &lt;content, form, function&gt; has been employed by Rehm (2006) for an original approach to single-web genre analysis, the personal home pages in the domain of academia. Rehm (2006) describes the relationship between HTML and web genres and depicts the evolutionary processes that shape and form web genres. In the practical implementation, however, he focuses only on a single web genre, the academic's personal home page, that is seen from a static point of view. As far as we know, Boese and Howe (2005) is the only study that tries to implement a diachronic view on genre of web pages using the triple  &lt;style, form, content&gt;. This study has the practical aim of finding out whether feature sets for genre identification need to be changed or updated because of genre evolution. They tried to detect the change through the use of a classifier on two parallel corpora separated by a six-year gap. Although this study does not focus on how to detect newly created web genres or how to deal with difficult web pages, it is an interesting starting point for traditional diachronic analysis applied to automatic genre classification.</Paragraph>
    <Paragraph position="2"> In contrast, the model described in this paper aims at pointing out genre hybridism and individualisation in web pages. These two phenomena can be interpreted in terms of genre evolution in future investigations.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML