XML Viewer - w04-3204

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-3204_evalu.xml
Size: 7,557 bytes
Last Modified: 2025-10-06 13:59:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3204">
  <Title>Unsupervised WSD based on automatically retrieved examples: The importance of bias</Title>
  <Section position="5" start_page="5" end_page="8" type="evalu">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> In all experiments, the recall of the systems is presented as evaluation measure. There is total coverage (because of the high overlap of topical features) and the recall and precision are the same  .</Paragraph>
    <Paragraph position="1"> In order to evaluate the acquired corpus, our first task was to analyze the impact of bias. The results are shown in Table 5. There are 2 figures for each distribution: (1) simply assign the first ranked sense, and (2) use the monosemous corpus following the predetermined bias. As we described in Section 3, the testing part of the Senseval-2 lexical sample data was used for evaluation. We also include the results using Senseval2 bias, which is taken from the training part. The recall per word for some distributions can be seen in Table 4.</Paragraph>
    <Paragraph position="2"> The results show clearly that when bias information from a hand-tagged corpora is used the recall improves significantly, even when the bias comes from a corpus -Semcor- different from the target corpus -Senseval-. The bias is useful by itself, and we see that the higher the performance of the 1st ranked sense heuristic, the lower the gain using the monosemous corpus. We want to note that in fully unsupervised mode we attain a recall of 43.2% with the automatic ranking. Using the minimally supervised information of bias, we get 49.8% if we have the bias from an external corpus (Semcor) and  Except for the experiment in Section 4.3, where using local features the coverage is only partial.</Paragraph>
    <Paragraph position="3">  mous corpus with Senseval-2 training bias (MR, and substitution), Semcor bias, and Automatic bias. The Senseval-2 results are given by feature type.</Paragraph>
    <Paragraph position="4"> 57.5% if we have access to the bias of the target corpus (Senseval  ). This results show clearly that the acquired corpus has useful information about the word senses, and that bias is extremely important. We will present two further experiments performed with the monosemous corpus resource. The goal of the first will be to measure the WSD performance that we achieve using Semcor as the only supervised data source. In our second experiment, we will compare the performance of our totally unsupervised approach (monosemous corpus and automatic bias) with other unsupervised approaches in the Senseval-2 English lexical task.</Paragraph>
    <Section position="1" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
5.1 Monosemous corpus and Semcor bias
</SectionTitle>
      <Paragraph position="0"> In this experiment we compared the performance using the monosemous corpus (with Semcor bias and minimum ratio), and the examples from Semcor. We noted that there were clear differences depending on the number of training examples for  Bias obtained from the training-set.</Paragraph>
      <Paragraph position="1"> each word, therefore we studied each word-set described in Section 3.4 separately. The results per word-set are shown in Table 6. The figures correspond to the recall training in Semcor, the webcorpus, and the combination of both.</Paragraph>
      <Paragraph position="2"> If we focus on set B (words with less than 10 examples in Semcor), we see that the MFS figure is very low (40.1%). There are some words that do not have any occurrence in Semcor, and thus the sense is chosen at random. It made no sense to train the DL for this set, therefore this result is not in the table. For this set, the bias information from Semcor is also scarce, but the DLs trained on the web-corpus raise the performance to 47.8%.</Paragraph>
      <Paragraph position="3"> For set A, the average number of examples is higher, and this raises the results for Semcor MFS (51.9%). We see that the recall for DL training in Semcor is lower that the MFS baseline (50.5%).</Paragraph>
      <Paragraph position="4"> The main reasons for these low results are the differences between the training and testing corpora (Semcor and Senseval). There have been previous works on portability of hand-tagged corpora that show how some constraints, like the genre or topic of the corpus, affect heavily the results (Martinez and Agirre, 2000). If we train on the web-corpus the results improve, and the best results are obtained with the combination of both corpora, reaching 51.6%. We need to note, however, that this is still lower than the Semcor MFS.</Paragraph>
      <Paragraph position="5"> Finally, we will examine the results for the whole set of nouns in the Senseval-2 lexical-sample (last row in Table 6), where we see that the best approach relies on the web-corpus. In order to disambiguate the 29 nouns using only Semcor, we apply MFS when there are less than 10 examples (set B), and train the DLs for the rest.</Paragraph>
      <Paragraph position="6"> The results in Table 6 show that the web-corpus raises recall, and the best results are obtained combining the Semcor data and the web examples (50.3%). As we noted, the web-corpus is specially useful when there are few examples in Semcor (set B), therefore we made another test, using the web-corpus only for set B, and applying MFS for set A. The recall was slightly better (50.5%), as is shown in the last column.</Paragraph>
    </Section>
    <Section position="2" start_page="7" end_page="8" type="sub_section">
      <SectionTitle>
5.2 Monosemous corpus and Automatic bias
</SectionTitle>
      <Paragraph position="0"> (unsupervised method) In this experiment we compared the performance of our unsupervised system with other approaches. For this goal, we used the resources available from the Senseval-2 competition  , where the answers of the participating systems in the different tasks were  sample, using different bias to create the corpus. The type column shows the kind of system.  web corpus (Semcor bias), and a combination of both, compared to that of the Semcor MFS. available. This made possible to compare our results and those of other systems deemed unsupervised by the organizers on the same test data and set of nouns.</Paragraph>
      <Paragraph position="1"> From the 5 unsupervised systems presented in the Senseval-2 lexical-sample task as unsupervised, the WASP-Bench system relied on lexicographers to hand-code information semi-automatically (Tugwell and Kilgarriff, 2001). This system does not use the training data, but as it uses manually coded knowledge we think it falls clearly in the supervised category.</Paragraph>
      <Paragraph position="2"> The results for the other 4 systems and our own are shown in Table 7. We show the results for the totally unsupervised system and the minimally unsupervised system (Semcor bias). We classified the UNED system (Fernandez-Amoros et al., 2001) as minimally supervised. It does not use hand-tagged examples for training, but some of the heuristics that are applied by the system rely on the bias information available in Semcor. The distribution of senses is used to discard low-frequency senses, and also to choose the first sense as a back-off strategy. On the same conditions, our minimally supervised system attains 49.8 recall, nearly 5 points more.</Paragraph>
      <Paragraph position="3"> The rest of the systems are fully unsupervised, and they perform significantly worse than our system. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML