File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2807_intro.xml
Size: 5,641 bytes
Last Modified: 2025-10-06 14:02:44
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2807"> <Title>Different Sense Granularities for Different Applications</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The difficulty of finding consistent criteria for making sense distinctions has been thoroughly attested to in the literature (Kilgarriff, '97, Hanks, '00). Difficulties have been found with truth-theoretical criteria, linguistic criteria and definitional criteria (Sparck-Jones, '86, Geeraerts, '93). In spite of the proliferation of dictionaries, there is no methodology by which two lexicographers working independently are guaranteed to derive the same set of distinctions for a given word, with objects and events vying for which is the most difficult to characterize (Cruse, '86, Apresjan, '74, Pustejovsky, '91, '95).</Paragraph> <Paragraph position="1"> On the other hand, accurate Word Sense Disambiguation (WSD) could significantly improve the precision of Information Retrieval by ensuring that the senses of verbs in the retrieved documents match the sense of the verb in the query. For example, the two queries What do you call a successful movie? and Whom do you call for a successful movie? submitted to AskJeeves both retrieve the same set of documents, even though they are asking quite different questions, referencing very different senses of call. The documents retrieved are also not very relevant, again because they do not distinguish which matches contain relevant senses and which do not.</Paragraph> <Paragraph position="2"> Tips on Being a Successful Movie Vampire ... I shall call the police.</Paragraph> <Paragraph position="3"> Successful Casting Call & Shoot for ``Clash of Empires'' ... thank everyone for their participation in the making of yesterday's movie.</Paragraph> <Paragraph position="4"> Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague...</Paragraph> <Paragraph position="5"> VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer.</Paragraph> <Paragraph position="6"> The two senses of call in the two queries can be easily distinguished by their differing predicate-argument structures. They are also separate senses in WordNet, but WordNet has an additional 26 senses for call, and the current best performance of an automatic Word Sense Disambiguation system this type of polysemous verb is only 60.2% (Dang and Palmer, 2002). Is it possible that sense distinctions that are less fine-grained than Word-Net's distinctions could be made more reliably, and could still benefit this type of NLP application? The idea of underspecification as a solution to WSD has been proposed in Buitelaar 2000 (among others), who pointed out that for some applications, such as document categorization, information retrieval, and information extraction it may be sufficient to know if a given word belongs to a certain class of WordNet senses or under-specified sense. On the other hand, there is evidence that machine translation of languages as diverse as Chinese and English will require all of the fine-grained sense distinctions that WordNet is capable of providing, and even more (Ng, et al 2003, Palmer, et. al., to appear).</Paragraph> <Paragraph position="7"> An hierarchical approach to verb senses, of the type discussed in this paper, presents obvious advantages for the problem of word sense disambiguation. The human annotation task is simplified, since there are fewer choices at each level and clearer distinctions between them. The automated systems can combine training data from closely related senses to overcome the sparse data problem, and both humans and systems can back off to a more coarse-grained choice when fine-grained choices prove too difficult.</Paragraph> <Paragraph position="8"> The approach to verb senses presented in this paper assumes three different levels of sense distinctions: Prop-Bank Framesets, WordNet groupings, and WordNet senses. In a project for the semantic annotation of predicate-argument structure, PropBank, we have made coarse-grained sense distinctions for the 700 most polysemous verbs in the Penn TreeBank (Kingsbury and Palmer, '02). These distinctions are based primarily on different subcategorization frames that require different argument label annotations. In a separate project, as discussed in Palmer et al 2004, we have grouped SENSEVAL-2 verb senses (which came from WordNet 1.7). These manual groupings were shown to reconcile a substantial portion of the manual and automatic tagging disagreements, showing that many of these disagreements are fairly subtle (Palmer, et.al., '04).</Paragraph> <Paragraph position="9"> The tree levels of sense distinctions form a continuum of granularity. Our criterion for the Framesets, being primarily syntactic, is also the most clear cut. These distinctions are based primarily on usages of a verb that have different numbers of predicate-arguments, however they also separate verb senses on semantic grounds, if these senses are not closely related. Sense groupings provide an intermediate level of hierarchy, where groups are distinguished by more fine-grained criteria. Both Frameset and grouping distinctions can be made consistently by humans and systems (over 90% accuracy for Framesets and 82% for groupings) and are surprisingly compatible; 95% of our groups map directly onto a single PropBank sense.</Paragraph> </Section> class="xml-element"></Paper>