File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-2036_evalu.xml
Size: 3,065 bytes
Last Modified: 2025-10-06 13:59:39
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2036"> <Title>Word Domain Disambiguation via Word Sense Disambiguation</Title> <Section position="4" start_page="142" end_page="142" type="evalu"> <SectionTitle> 3 Evaluation </SectionTitle> <Paragraph position="0"> To evaluate our WDD approach, we used both the SemCor and Senseval3 data sets. Both corpora were stripped of their sense annotations and processed with an extension of the WSD algorithm of Sanfilippo et al. (2006) to assign a WordNet sense to each noun, verb and adjective.</Paragraph> <Paragraph position="1"> The extension consisted in extending the training data set so as to include a selection of WordNet examples (full sentences containing a main verb) and the Open Mind Word Expert corpus (Chklovski and Mihalcea 2002).</Paragraph> <Paragraph position="2"> The original hand-coded word sense annotations of the SemCor and Senseval3 corpora and the word sense annotations assigned by the WSD algorithm used in this study were mapped into subject domain annotations using WordNet Domains, as described in the opening paragraph of section 2 above. The version of the SemCor and Senseval3 corpora where subject domain annotations were generated from hand-coded word senses served as gold standard. A baseline for both corpora was obtained by assigning to each lemma the subject domain corresponding to sense 1 of the lemma.</Paragraph> <Paragraph position="3"> WDD results of a tenfold cross-validation for the SemCor data set are given in Table 2. Accuracy is high across nouns, verbs and adjectives.2 To verify the statistical significance of these results against the baseline, we used a standard proportions comparison test (see Fleiss 1981, p.</Paragraph> <Paragraph position="4"> 30). According to this test, the accuracy of our system is significantly better than the baseline.</Paragraph> <Paragraph position="5"> The high accuracy of our WDD algorithm is corroborated by the results for the Senseval3 data set in Table 3. Such corroboration is important as the Senseval3 corpus was not part of the data set used to train the WSD algorithm which provided the basis for subject domain assign- null Our WDD algorithm compares favorably with the approach explored in Bagnini and Strapparava (2000), who report 0.82 p/r in the WDD tasks for a subset of nouns in SemCor.</Paragraph> <Paragraph position="6"> Suarez and Palomar (2002) report WDD results of 78.7% accuracy for nouns against a baseline of 68.7% accuracy for the same data set. As in the present study, Suarez and Palomar derive the baseline by assigning to each lemma the subject domain corresponding to sense 1 of the lemma. Unfortunately, a meaningful comparison with Suarez and Palomar (2002) is not possible as they use a different data set, the DSO corpus.3 We are currently working on repeating our study with the DSO corpus and will include the results of this evaluation in the final version of the paper to achieve commensurability with the results reported by Suarez and Palomar.</Paragraph> </Section> class="xml-element"></Paper>