File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2411_intro.xml

Size: 7,281 bytes

Last Modified: 2025-10-06 14:02:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2411">
  <Title>Calculating Semantic Distance between Word Sense Probability Distributions</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Much attention has recently been given to calculating the similarity of word senses, in support of various natural language learning and processing tasks. Such techniques apply within a semantic hierarchy, or ontology, such as WordNet. Typical methods comprise an edgedistance measurement over the two sense nodes being compared within the hierarchy (Leacock and Chodorow, 1998; Rada et al., 1989; Wu and Palmer, 1994). Other approaches instead assume a probability distribution over the entire sense hierarchy; similarity is captured between individual senses by a formula over the information content (negative log probabilities) of relevant nodes (e.g., Jiang and Conrath, 1997; Lin, 1998).</Paragraph>
    <Paragraph position="1"> The latter case assumes that there is a single WordNet probability distribution of interest, which is estimated by populating the hierarchy with word frequencies from an appropriate corpus (e.g, Jiang and Conrath, 1997). But some problems more naturally give rise to multiple conditional probability distributions estimated from counts that are conditioned on various contexts, such as different corpora or differing word usage within a single corpus. Each of these contexts would yield a distinct Word-Net probability distribution, or what we will call a sense profile. In this situation, instead of asking how similar are two senses within a single sense profile, one may want to know how similar are two sense profiles--i.e., two (conditional) distributions across the entire set of nodes.</Paragraph>
    <Paragraph position="2"> This question could be important to a number of applications. When two sets of WordNet frequency counts are conditioned on differing contexts, a comparison of the resulting probability distributions can give us a measure of the degree of semantic similarity of the conditioning contexts themselves. These conditioning contexts may be any relevant ones defined by the application, such as differing sets of documents (to support asking how similar various document collections are), or differing usages of words within or across document collections (to support asking questions about the similarity of various words in their usages). For example, we foresee comparing the sense profile of the objects of some verb in a particular set of documents to that of its objects in another set of documents, as an indicator of differing senses of the verb across the collections.</Paragraph>
    <Paragraph position="3"> We have developed a general method for answering such questions, formulating a measure of the distance between probability distributions defined over an ontological hierarchy, which we call &amp;quot;sense profile distance,&amp;quot; or SPD. SPD is calculated as a tree distance that aggregates the individual semantic distances between nodes in the hierarchy, weighted by their probability in the two sense profiles. SPD can be calculated between two probability distributions over any hierarchy that supports a user-supplied semantic distance function. (In fact, the two sense profiles need not strictly be probability distributions--the measure is well-defined as long as the sum of the values of the two sense profiles is equal.) We demonstrate our method on a problem that arises in lexical acquisition, of determining whether two different argument positions across syntactic usages of a verb are assigned the same semantic role. For example, even though the truck shows up in two different syntactic positions, it is the Destination of the action in both of the sentences I loaded the truck with hay and I loaded hay onto the truck. Automatic detection of such argument alternations is important to acquisition of verb lexical semantics (Dang et al., 2000; Dorr and Jones, 2000; Merlo and Stevenson, 2001; Schulte im Walde and Brew, 2002; Tsang et al., 2002), and moreover, may play a role in automatic processing of language for applied tasks, such as question-answering (Katz et al., 2001), information extraction (Riloff and Schmelzenbach, 1998), detection of text relations (Teufel, 1999), and determination of verb-particle constructions (Bannard, 2002). We focus on this problem to illustrate how our general method works, and how it aids in a particular natural language learning task.</Paragraph>
    <Paragraph position="4"> As in McCarthy (2000), we cast argument alternation detection as a comparison of sense profiles across two different argument positions of a verb. Our method differs, however, in two important respects. First, our measure can be used on any probability distribution, while McCarthy's approach applies only to a very narrow form of sense profile known as a tree cut.1 The dependence on tree cuts greatly limits the applicability of her measure in both this and other problems, since only a particular method can be used for populating the WordNet hierarchy with probability estimates. Second, our approach provides a much finer-grained measure of the distance between the two profiles. McCarthy's method rewards probability mass that occurs in the same subtree across two distributions, but does not take into account the distance between the classes that carry the probability mass.</Paragraph>
    <Paragraph position="5"> Our new SPD method integrates a comparison of probability distributions over WordNet with a node distance measure. SPD thus enables us to calculate a more detailed comparison over the probability patterns of Word-Net classes. As our results indicate, this has advantages for argument alternation detection, but more importantly, we think it is crucial for generalizing the method to a wider range of problems.</Paragraph>
    <Paragraph position="6"> 1A tree cut for tree T is a set of nodes C in T such that every leaf node of T has exactly one member of C on a path between it and the root (Li and Abe, 1998). As a sense profile, a tree cut will have a non-zero probability associated with every node in C, and a zero probability for all other nodes in T. Figure 1 in Section 3 has examples of two tree cuts.</Paragraph>
    <Paragraph position="7"> In the next section, we present background work on comparing sense profiles, and on using them to detect alternations. In Section 3, we describe our new SPD measure, and show how it captures both the general differences between WordNet probability distributions, as well as the fine-grained semantic distances between the nodes that comprise them. Section 4 presents our corpus methodology and experimental set-up. In Section 5, we evaluate SPD against other distance measures, and evaluate the different effects of our experimental factors, such as the precise distance functions we use in SPD and the division of our verbs into frequency bands. By classifying the frequency bands separately, our method achieves a combined accuracy of 70% overall on unseen test verbs, in a task with a baseline of 50%. We summarize our findings in Section 6 and point to directions in our on-going work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML