XML Viewer - w04-0861

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0861_metho.xml
Size: 4,893 bytes
Last Modified: 2025-10-06 14:09:13
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0861">
  <Title>The &amp;quot;Meaning&amp;quot; System on the English Allwords Task</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Integration of WSD modules
</SectionTitle>
    <Paragraph position="0"> All the individual modules have to be integrated in order to construct a complete allwords WSD system. Following the architecture described in section 1, we decided to apply the unsupervised modules only to the subset of the corpus not covered by the training examples. Some efforts on applying the unsupervised modules jointly with the supervised failed at improving accuracy. See an example in table 3.</Paragraph>
    <Paragraph position="1">  As a first approach, we devised three baseline systems (Base-1, Base-2, and Base-3), which use the best modules available in both subsets. Base-1 applies the SVMa8 supervised method and the MFS for the non supervised part. Base-2 applies also the SVMa8 supervised method and the cascade DDDa0 -MFS for the non supervised part (MFS is used in the cases in which DDDa0 abstains). Base-3 shares the same approach but uses a third unsupervised module: DDDa0 -DDDa1a3a2 -MFS.</Paragraph>
    <Paragraph position="2"> The precision results of the baselines systems can be found in the right hand side of table 3. As it can be observed, the positive contribution of the DDDa0 module is very significant since Base-2 performs</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 points higher than Base-1. The addition of the
</SectionTitle>
      <Paragraph position="0"> third unsupervised module (DDDa1 a2 ) makes Base-3 to gain 0.4 extra precision points.</Paragraph>
      <Paragraph position="1"> As simple combination schemes we considered majority voting and weighted voting. More sophisticated combination schemes are difficult to tune due to the extreme data sparseness on the validation set. In the case of unsupervised systems, these combination schemes degraded accuracy because the least accurate systems perform much worse that the best ones. Thus, we simply decided to apply a cascade of unsupervised modules sorted by precision on the Senseval-2 corpus.</Paragraph>
      <Paragraph position="2"> In the case of the supervised classifiers there is a chance of improving the global performance, since there are several modules performing almost as well as the best. Previous to the experiments, we calculated the agreement rates on the outputs of each pair of systems (low agreements increase the probability of uncorrelatedness between errors of different systems). We obtained an average agreement of 83.17%, with values between 64.7% (AB vs SVMa9 ) and 88.4% (SVMa9 vs cosVSM).</Paragraph>
      <Paragraph position="3"> The ensembles were obtained by incrementally aggregating, to the best performing classifier, the classifiers from a list sorted by decreasing accuracy. The ranking of classifiers can be performed by evaluating them at different levels of granularity: from particular words to the overall accuracy on the whole validation set. The level of granularity defines a tradeoff between classifier specialization and risk of overfitting to the tuning corpus. We decided to take an intermediate level of granularity, and sorted the classifiers according to their performance on word sets based on the number of training examples available3.</Paragraph>
      <Paragraph position="4"> Table 2 contains the results of the ranking experiment, by considering five word-sets of increasing number of training examples: between 10 and 20, between 21 and 40, between 41 and 80, etc. At each cell, the accuracy value is accompanied by the relative position the system achieves in that particular subset. Note that the resulting orderings, though highly correlated, are quite different from the one derived from the overall results.</Paragraph>
      <Paragraph position="5">  Table 3 shows the precision results4 of the Meaning system obtained on the whole Senseval-2 corpus by combining from 1 to 7 supervised classifiers according to the classifier orderings of table 2 for each subset of words. The unsupervised classifiers are all applied in a cascade sorted by precision. M-Vot stands for a majority voting scheme, while W-Vot refers to the weighted voting scheme. The weights for the classifiers are simply the accuracy values on the validation corpus. As an additional example, the column M-Vot+ shows the results of the voting scheme when the unsupervised DDDa0 module is also included in the ensemble. The table also includes the baseline results.</Paragraph>
      <Paragraph position="6"> Unfortunately, the ensembles of classifiers did not provide significant improvements on the final precision. Only in the case of weighted voting a slight improvement is observed when adding up to</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML