File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1029_concl.xml

Size: 3,454 bytes

Last Modified: 2025-10-06 13:53:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1029">
  <Title>Ensemble Methods for Automatic Thesaurus Extraction</Title>
  <Section position="9" start_page="0" end_page="0" type="concl">
    <SectionTitle>
8 Conclusion
</SectionTitle>
    <Paragraph position="0"> This paper demonstrates the effectiveness of ensemble methods for thesaurus extraction and investigates the performance of ensemble extractors on corpora ranging up to 300 million words in size. Contrary to work reported by Banko and Brill (2001), the ensemble methods continue to outperform the best individual systems for very large corpora. The trend in Figure 3 suggests that this may continue for corpora even larger than we have experimented with.</Paragraph>
    <Paragraph position="1"> Further, this paper examines the differences between thesaurus extraction and confusion set disambiguation, and links ensemble ef cacy to the nature of each task and the problems of representation sparseness and noise. This is done by evaluating ensembles with varying levels of contextual complexity and constraints.</Paragraph>
    <Paragraph position="2"> The poorly constrained window methods, where contextual correlation is often low, outperformed the ensembles, which parallels results from (Banko and Brill, 2001). This suggests that large training sets ameliorate the predominantly noise-induced bias of the best individual learner better than amortising the bias over many similar ensemble constituents. Noise is reduced as occurrence counts stabilise with larger corpora, improving individual classi er performance, which in turn causes ensemble constituents to converge, reducing complementarity.</Paragraph>
    <Paragraph position="3"> This reduces the ef cacy of classi er combination and contributes to individual classi ers outperforming the ensemble methods.</Paragraph>
    <Paragraph position="4"> For more complex, constrained methods the same principles apply. Since the correlation between context and target is much stronger, there is less noise in the representation. However, the added constraints reduce the number of contextual relations extracted from each sentence, leading to data sparseness. These factors combine so that ensemble methods continued to outperform the best individual methods.</Paragraph>
    <Paragraph position="5"> Finally, corpus size must be considered with respect to the parameters of the contextual representation extracted from the corpus. The value of larger corpora is partly dependent on how much information is extracted from each sentence of training material. We fully expect individual thesaurus extractors to eventually outperform ensemble methods as sparseness and complementarity are reduced, but this is not true for 100 or 300 million words since the best performing representations extract very few contexts per sentence.</Paragraph>
    <Paragraph position="6"> We would like to further investigate the relationship between contextual complexity, data sparseness, noise and learner bias on very large corpora. This includes extending these experiments to an even larger corpus with the hope of establishing the cross over point for thesaurus extraction. Finally, although wider machine learning research uses large ensembles, many NLP ensembles use only a handful of classi ers. It would be very interesting to experiment with a large number of classi ers using bagging and boosting techniques on very large corpora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML