File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-2608_evalu.xml

Size: 3,064 bytes

Last Modified: 2025-10-06 13:59:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2608">
  <Title>Syntagmatic Kernels: a Word Sense Disambiguation Case Study</Title>
  <Section position="7" start_page="61" end_page="61" type="evalu">
    <SectionTitle>
6 Evaluation
</SectionTitle>
    <Paragraph position="0"> In this section we evaluate the Syntagmatic Kernel, showing that it improves over the standard feature extraction technique based on bigrams and trigrams of words and PoS tags.</Paragraph>
    <Section position="1" start_page="61" end_page="61" type="sub_section">
      <SectionTitle>
6.1 Experimental settings
</SectionTitle>
      <Paragraph position="0"> We conducted the experiments on two lexical sample tasks (English and Italian) of the Senseval-3 competition (Mihalcea and Edmonds, 2004). In lexical-sample WSD, after selecting some target words, training data is provided as a set of texts.</Paragraph>
      <Paragraph position="1"> For each text a given target word is manually annotated with a sense from a predetermined set of possibilities. Table 2 describes the tasks by reporting the number of words to be disambiguated, the mean polysemy, and the dimension of training, test and unlabeled corpora. Note that the organizers of the English task did not provide any unlabeled material.</Paragraph>
      <Paragraph position="2"> So for English we used a domain model built from the training partition of the task (obviously skipping the sense annotation), while for Italian we acquired the DM from the unlabeled corpus made available by the organizers.</Paragraph>
    </Section>
    <Section position="2" start_page="61" end_page="61" type="sub_section">
      <SectionTitle>
6.2 Performance of the Syntagmatic Kernel
</SectionTitle>
      <Paragraph position="0"> Table 3 shows the performance of the Syntagmatic Kernel on both data sets. As baseline, we report the result of a standard approach consisting on explicit bigrams and trigrams of words and PoS tags around the words to be disambiguated (Yarowsky, 1994). The results show that the Syntagmatic Kernel outperforms the baseline in any configuration (hard/soft-matching). The soft-matching criteria further improve the classification performance. It is interesting to note that the Domain Proximity methodology obtained better results than WordNet  nel.</Paragraph>
      <Paragraph position="1"> Synonymy. The different results observed between Italian and English using the Domain Proximity soft-matching criterion are probably due to the small size of the unlabeled English corpus.</Paragraph>
      <Paragraph position="2"> In these experiments, the parameters n and l are optimized by cross-validation. For KnColl, we obtained the best results with n = 2 and l = 0.5. For KnPoS, n = 3 and l - 0. The domain cardinality kprime was set to 50.</Paragraph>
      <Paragraph position="3"> Finally, the global performance (F1) of the full WSD system (see Section 5) on English and Italian lexical sample tasks is 73.3 for English and 61.3 for Italian. To our knowledge, these figures represent the current state-of-the-art on these tasks.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML