File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-1016_concl.xml

Size: 3,824 bytes

Last Modified: 2025-10-06 13:54:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-1016">
  <Title>The Web as a Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of NLP Tasks</Title>
  <Section position="10" start_page="0" end_page="0" type="concl">
    <SectionTitle>
9 Conclusions
</SectionTitle>
    <Paragraph position="0"> We showed that simple, unsupervised models using web counts can be devised for a variety of NLP tasks. The tasks were selected so that they cover both syntax and semantics, both generation and analysis, and a wider range of n-grams than have been previously used.</Paragraph>
    <Paragraph position="1"> For all but two tasks (candidate selection for MT and noun countability detection) we found that simple, unsupervised models perform significantly better when n-gram frequencies are obtained from the web rather than from a standard large corpus. This result is consistent with Keller and Lapata's (2003) findings that the web yields better counts than the BNC. The reason for this seems to be that the web is much larger than the BNC (about 1000 times); the size seems to compensate for the fact that simple heuristics were used to obtain web counts, and for the noise inherent in web data.</Paragraph>
    <Paragraph position="2"> Our results were less encouraging when it comes to comparisons with state-of-the-art models. We found that in all but one case, web-based models fail to significantly outperform the state of the art. The exception was compound noun interpretation, for which the Altavista model was significantly better than the Lauer's (1995) model.</Paragraph>
    <Paragraph position="3"> For three tasks (candidate selection for MT, adjective ordering, and compound noun bracketing), we found that the performance of the web-based models was not significantly different from the performance of the best models reported in the literature.</Paragraph>
    <Paragraph position="4"> Note that for all the tasks we investigated, the best performance in the literature was obtained by supervised models that have access not only to simple bigram or tri-gram frequencies, but also to linguistic information such as part-of-speech tags, semantic restrictions, or context (or a thesaurus, in the case of Lauer's models). When unsupervised web-based models are compared against supervised methods that employ a wide variety of features, we observe that having access to linguistic information makes up for the lack of vast amounts of data.</Paragraph>
    <Paragraph position="5"> Our results therefore indicate that large data sets such as those obtained from the web are not the panacea that they are claimed to be (at least implicitly) by authors such as Grefenstette (1998) and Keller and Lapata (2003).</Paragraph>
    <Paragraph position="6"> Rather, in our opinion, web-based models should be used as a new baseline for NLP tasks. The web baseline indicates how much can be achieved with a simple, unsupervised model based on n-grams with access to a huge data set. This baseline is more realistic than baselines obtained from standard corpora; it is generally harder to beat, as our comparisons with the BNC baseline throughout this paper have shown.</Paragraph>
    <Paragraph position="7"> Note that for certain tasks, the performance of a web baseline model might actually be sufficient, so that the effort of constructing a sophisticated supervised model and annotating the necessary training data can be avoided.</Paragraph>
    <Paragraph position="8"> Another possibility that needs further investigation is the combination of web-based models with supervised methods. This can be done with ensemble learning methods or simply by using web-based frequencies (or probabilities) as features (in addition to linguistically motivated features) to train supervised classifiers.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML