File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1103_metho.xml
Size: 27,043 bytes
Last Modified: 2025-10-06 14:10:39
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1103"> <Title>investigations</Title> <Section position="4" start_page="0" end_page="8" type="metho"> <SectionTitle> 2 Philosophical evidence </SectionTitle> <Paragraph position="0"> Children have a natural eagerness to recognize regularities in the world and to mimic the behavior of competent members of their linguistic community. It is in these words that Wittgenstein (1980) simply expresses how infants acquire the community's language. What underlies the activities surrounding a common use of language is similar to our usage of words to express something: &quot;Consider for example the proceedings that we call games. I mean board-games, cardgames, ball-games, Olympic games, and so on.</Paragraph> <Paragraph position="1"> What is common to them all?&quot; (Wittgenstein, 1968: 66). Wittgenstein answers that these expressions are characterized by similarities he calls family resemblances.</Paragraph> <Paragraph position="2"> Given that a dictionary's purpose is to define concepts, we could hope to see such family resemblances among its definitions. Contrarily to this intuition, Table 1 shows definitions and examples for a few senses of game in Wordnet1, from which resemblance cannot be found in terms of common words in the definitions or examples. Nevertheless, humans are able to give different judgments of similarity between different senses of the word game. For example, similarity between sense 1 and sense 3 is intuitively larger than between sense 1 and sense 4.</Paragraph> <Paragraph position="3"> 1 A single play of a sport or other contest. The game lasted two hours.</Paragraph> <Paragraph position="4"> 2 A contest with rules to determine a winner. You need four people to play this game.</Paragraph> <Paragraph position="5"> 3 The game equipment needed in order to play a particular game. The child received several games for his birthday.</Paragraph> <Paragraph position="6"> 4 Your occupation or line of work He's in the plumbing game.</Paragraph> <Paragraph position="7"> 5 A secret scheme to do something (especially something underhand or illegal). [...] I saw through his little game from the start.</Paragraph> <Paragraph position="8"> Before being tempted to call up gigabytes of corpus evidence data and computational strength to help us identify the family of resemblance emerging here, let us further look at the nature of that notion from a philosophical point of view. Possible senses of individual things could be traced back to Aristotle's work and identified &quot;without qualification&quot; as the primary substance of a thing (Cassam, 1986). What accounts for the substance of an object, for Aristotle, was the thing itself, namely its essence. Taking a slightly different view on the notion of family of objects, Putnam (1977) instead pursues a quest for natural kinds and according to him, the distinguishing characteristics that &quot;hold together&quot; natural kinds are the &quot;core facts [...] conveying the use of words of that kind&quot; (Putnam, 1977: 118). Putnam disagrees with any analytical approaches sustaining that the meaning of a word X is given by a conjunction of properties P = {P1, P2,... Pn} in such a way that P is the essence of X. The problem is that a &quot;natural kind may have abnormal members&quot; (Putnam, 1977: 103). For instance, normal lemons have a yellow peel but let's suppose in accordance with Putnam, that a new environmental condition makes lemon peel become</Paragraph> </Section> <Section position="5" start_page="8" end_page="9" type="metho"> <SectionTitle> 1 See http://wordnet.princeton.edu/ </SectionTitle> <Paragraph position="0"> blue. An analytical view will be unable to state which one amongst the yellow or the blue ones is now the normal member of the natural class of lemons. Putnam rather relies on a &quot;scientific theory construction&quot; to define what an object of natural kind is, and therefore, does not see that dictionaries &quot;are cluttered up [...] with pieces of empirical information&quot; (Putnam, 1977: 118) as a defect to convey core facts about a natural class.</Paragraph> <Paragraph position="1"> In contrast to Putnam, Fodor (1998) is a virulent opponent to a mind-independent similarity semantics subject to scientific discoveries. With his ostentatious doorknob example, Fodor shows that there is not any natural kind, hidden essence or peculiar structure that makes a doorknob a doorknob. &quot;No doubt, some engineer might construct a counter-example-a mindless doorknob detector; and we might even come to rely on such a thing when groping for a doorknob in the dark&quot; (Fodor, 1998: 147). However, the construct will have to be done on what strikes us as doorknobhood or satisfying the doorknob stereotype, i.e. &quot;the gadget would have to be calibrated to us since there is nothing else in nature that responds selectively to doorknobs&quot; (Fodor, 1998: 147). According to Fodor, our capacity to acquire the concept of doorknob involves a similarity metric, and it is the human innate capacity to determine the concepts similar to doorknob that allow the characterization of doorknobhood.</Paragraph> <Paragraph position="2"> Therefore, Fodor states that the meaning of concepts is mind-dependent and that individuation is not intractable since members of a language community, although experiencing diverse forms of a concept will tend to acquire similar stereotypes of such a concept.</Paragraph> <Paragraph position="3"> This brief exploration into philosophical approaches for concept representation and delimitation can inform us on the establishment of a gold standard by humans for the word sense disambiguation (WSD) task. In fact, the adherence to one model rather than another has an impact on who should be performing the evaluation2. Senseval-2 was in line with Putnam's view of 'division of linguistic labour' by relying on lexicographers' judgments to build a gold standard (Kilgarrif, 1998). On the other hand, Senseval-3 collected data via Open-Mind Initiative3, which was much more in line with Fodor's view that any common people can use their own similarity metric to disambiguate polysemous terms. Interestingly, a recent empirical study (Murray and Green 2004) showed how judgments by ordinary people were consistent among themselves but different from the one of lexicographers. It is important to decide who the best judges are; a decision which can certainly be based on the foreseen application, but also, as we suggest here, on some theoretical grounds.</Paragraph> </Section> <Section position="6" start_page="9" end_page="21" type="metho"> <SectionTitle> 3 Psychological Evidence </SectionTitle> <Paragraph position="0"> We pursue our quest for insights in the establishment of gold standards by humans for the WSD task, now trying to answer the &quot;how&quot; question rather then the &quot;who&quot; question. Indeed, Fodor's view might influence us in deciding that non-experts can perform similarity judgments, but this does not tell us how these judgments should be performed. Different psychological models will give possible answers. In fact, similarity judgments have been largely studied by experimental psychologists and distinctive theories give some evidence about the existence of a human internal cognitive mechanism for such judgments. In this section, we present three approaches: subjective scaling and objective scaling (Voinov, 2002), and semantic differential (Osgood et al. 1957).</Paragraph> <Section position="1" start_page="9" end_page="21" type="sub_section"> <SectionTitle> 3.1 Subjective Scaling </SectionTitle> <Paragraph position="0"> In subjective scaling (Voinov, 2002), the subjective human judgment is considered as a convenient raw material to make comparison between empirical studies of similarity. Subjects are asked to point out the &quot;similarities among n objects of interest - whether concepts, persons, traits, symptoms, cultures or species&quot; (Shepard, 1974: 373). Then the similarity judgments are represented in an n x n matrix of objects by a multidimensional scaling (MDS) of the distance between each object. Equation 1 shows the evaluation of similarity, where ),( jkik xxd stands for the distance between objects ix and jx on stimulus (dimension) k and kw is the psychological salience of that stimulus k:</Paragraph> <Paragraph position="2"> Shepard's MDS theory assumes that a monotonic transformation should be done from a nonmetric psychological salience of a stimulus to a metric space model. By definition, the resulting metric function over a set X should fullfill the following conditions: Xzyx [?][?] ,, : 1. 0),(),( =[?] xxdyxd (minimality), 2. ),(),( xydyxd = (symmetry), 3. ),(),(),( yzdzxdyxd +[?] (triangle ineq.). Accordingly to Shepard (1974), the distance in equation (1) can be computed with different metrics. Some of these metrics are given in Lebart and Rajman (2000). The Euclidean metric is the best known:</Paragraph> <Paragraph position="4"> There is a main concern with the MDS model.</Paragraph> <Paragraph position="5"> Tversky (1977) criticized the adequacy of the metric distance functions as he showed that the three conditions of minimality, symmetry and triangle inequality are sometimes empirically violated. For instance, Tversky and Gati showed empirically that assessment of the similarity between pairs of countries was asymetric when they asked for &quot;the degree to which Red China is similar to North Korea&quot; (1978: 87) and in the reverse order, i.e. similarity between North</Paragraph> </Section> <Section position="2" start_page="21" end_page="21" type="sub_section"> <SectionTitle> Korea and Red China. 3.2 Objective Scaling </SectionTitle> <Paragraph position="0"> The second approach is called objective scaling by Voinov &quot;though this term is not widely accepted&quot; (Voinov, 2002). According to him, the objectivity of the method comes from the fact that similarity measures are calculated from the ratio of objective features that describe objects under analysis. So, subjects are asked to make qualitative judgments on common or distinctive features of objects and the comparison is then made by any distance axioms. Tversky's (1977) contrast model (CM) is the best known formalization of this approach. In his model, the measure of similarity is computed by:</Paragraph> <Paragraph position="2"> where )( BAf a0 represents a function of the common features of both entities A and B, )( BAf [?] is the function of the features belonging to A but not B, )( ABf [?] is the function of the features belonging to B but not A and khba ,, are their respective weighting parameters. Equation (5) is the matching axiom of the CM. A second fundamental property of that model is given by the axiom of monotonicity:</Paragraph> <Paragraph position="4"> ACAB [?][?][?] , then (6) is satisfied. With these two axioms (5-6), Tversky (1977) defined the basis of what he called the matching function using the theoretical notion of feature sets rather then the geometric concept of similarity distance.</Paragraph> <Paragraph position="5"> Interesting empirical studies followed this research on CM and aimed at finding the correlation between human judgments of similarity and difference. Although some results show a correlation between these judgments, there is limitation to their complementarity: &quot;the relative weights of the common and distinctive features vary with the nature of the task and support the focusing hypothesis that people attend more to the common features in judgments of similarity than in judgments of the difference&quot; (Tverski and Gati, 1978: 84). Later on, Medin et al. (1990) also reported cases when judgments of similarity and difference are not inverses: first, when entities differ in their number of features, and second when similarity/difference judgments involve distinction of both attributes and relations. &quot;Although sameness judgments are typically described as more global or non-analytic than difference judgments, an alternative possibility is that they focus on relations rather than attributes&quot; (Medin et al., 1990: 68).</Paragraph> </Section> <Section position="3" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 3.3 Semantic Differential </SectionTitle> <Paragraph position="0"> One standard psycholinguistic method to measure the similarity of meaning combines the use of subjective scaling transposed in a semantic space. One well-known method is Semantic Differential (SD) developed by Osgood et al. (1957).</Paragraph> <Paragraph position="1"> The SD methodology measures the meanings that individual subjects grant to words and concepts according to a series of factor analyses. These factor analyses are bipolar adjectives put at each end of a Likert scale (Likert, 1932) devised to rate the individual reaction to the contrasted stimulus. For instance, the SD of a concept can be rated with two stimuli of goodness and temperature: If the subject feels that the observed concept is neutral with regards to the polar terms, his check-mark should be at the position 0. In our example, the mark on the good-bad scale being at the 1 on the left side of the neutral point 0, the judgment means slighthy good. Positions 2 and 3 on that same side would be respectively quite good and extremely good. A similar analysis applies for the cold-hot scale shown.</Paragraph> <Paragraph position="2"> The theoretical background of that methodology, which tries to standardize across subjects the meaning of the same linguistic stimulus, relies on psychological research on synestesia. Simply explained, synestesia is similar to a double reaction to a stimulus. For example, when presented with images of concepts, subjects do not only have a spontaneous reaction to the images, but they are also able to characterize the associated concept in terms of almost any bipolar adjective pairs (hot-cold, pleasant-unpleasant, simple-complex, vague-precise, dull-sharp, static-dynamic, sweetbitter, emotional-rational, etc.). According to Osgood et al. &quot;the imagery found in synesthesia is intimately tied up with language metaphor, and both represent semantic relations&quot; (1957: 23). In SD, bipolar adjectives used in succession can mediate a generalization to the meaning of a sign, as uncertainty on each scale is reduced with the successive process of elicitation. By postulating representation in a semantic space, each orthogonal axis of selection produces a semantic differentiation when the subjects rate the semantic alternatives on a bipolar scale.</Paragraph> <Paragraph position="3"> Although that space could be multidimensional, empirical studies (Osgood et al., 1957) on factor analysis showed stability and relative importance of three particular dimensions labeled as Evaluation, Potency, and Activity (EPA). We refer the reader to Osgood et al. (1957) for further explanation on these EPA dimensions.</Paragraph> </Section> <Section position="4" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 3.4 WSD and human judgments </SectionTitle> <Paragraph position="0"> Table 2 emphasizes commonalities and differences between the three psychological models explored.</Paragraph> <Paragraph position="1"> In Table 2, we show that both MDS (Shepard, 1974) and CM (Tversky, 1977) rely on a set of predefined traits. This is a major problem, as it leads to the necessity of defining in advance such a set of traits on which to judge similarity between objects. On the other hand, SD (Osgood et al. 1957), although using a few bipolar scales for positioning concepts, argues that these scales are not concept-dependent, but rather they can be used for grasping the meaning of all concepts. A second major difference highlighted in Table 2 is that MDS is the only approach looking at continuous perceptual dimensions of stimulus, contrarily to CM in which the scaling procedes with discrete conceptual traits, and even more in opposition to SD which considers entities as primitives. Finally, Table 2 shows the interesting observation brought forth by Tversky and later empirical studies of Medin et al. (1980) of the nonequivalence between the notion of similarity and difference.</Paragraph> <Paragraph position="2"> Coming back to the question of &quot;how&quot; human evaluation could be performed to provide a gold standard for the WSD task, considering the pros and cons of the different models lead us to suggest a particular strategy of sense attribution. Combining the similarity/difference of Tversky with the successive elucidation of Osgood et al., two bipolar Likert scales could be used to delimit a similarity concept: a resembling axis and a contrasting axis. In this approach, the similarity concept still stays general, avoiding the problems of finding specific traits for each instance on which to have a judgment.</Paragraph> <Paragraph position="3"> Already in the empirical studies of Murray and Green (2004), a Likert scale is used, but on an &quot;applying&quot; axis. Subjects are asked for each definition of a word to decide whether it &quot;applies perfectly&quot; or rather &quot;barely applies&quot; to a context containing the word. The choice of such an axis has limitations in its applicability for mapping senses on examples. More general resembling and contrasting axis would allow for similarity judgments on any statements whether they are two sense definitions, two examples or a sense definition with an example.</Paragraph> </Section> </Section> <Section position="7" start_page="21" end_page="21" type="metho"> <SectionTitle> 4 Mathematical Models of Similarity </SectionTitle> <Paragraph position="0"> Logic and mathematics are extremely prolific in similarity measurement models. According to Dubois et al (1997), they are used for cognitive tasks like classification, case-based reasoning and interpolation. In the present study, we restrict our investigation to the classification task as representative on the unsupervised WSD task.</Paragraph> <Paragraph position="1"> The other approaches are inferential strategies, using already solved problems to extrapolate or interpolate solutions to new problems. Those would be appropriate for WSD in a supervised context (provided training data), but due to space constraints, we postpone discussion of those models to a later study. Our present analysis divides classification models into two criteria: the cardinality of sets and the proximity-based similarity measures.</Paragraph> <Section position="1" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 4.1 Cardinality of sets </SectionTitle> <Paragraph position="0"> In line with De Baets et al. (2001), similarity measures can be investigated under a rational cardinality-based criterion of sets. In an extensive study of 28 similarity measures for ordinary sets, this research showed that measures can be classified on the basis of only a few properties.</Paragraph> <Paragraph position="1"> They proposed at first to build the class of cardinality-based similarity measures from one ge-</Paragraph> <Paragraph position="3"> The classification of these 28 similarity measures (which can all be linked to the general formula) becomes possible by borrowing from the framework of fuzzy sets the concepts of T for tnorm (fuzzy intersection) operators and T-equivalence for the property of Tindistinguishability (De Baets et al., 2001). So, a typical measure M of T-equivalence under the universe U must satisfy the following conditions for any (x, y, z) U[?] : (i) 1),( =xxM (re-</Paragraph> <Paragraph position="5"> transitivity).</Paragraph> <Paragraph position="6"> All 28 measures show reflexivity and symmetry but they vary on the type of transitivity they achieve. In fact, studying boundary and monotonicity behavior of the different measures, De Baets et al. (2001) group them under four types corresponding to four different formulas of fuzzy intersections (t-norms): the standard intersection ),min(),( babaZ = , the Lukasiewicz tnorm )1,0max(),( [?]+= babaL , the algebraic product abbaP =),( and the drastic intersection abaD (),( = when 1=b , b when 1=a and 0 otherwise). We refer the reader to De Baets et al. (2001) to get the full scope of their results. Accordingly, Jaccard's coefficient J (equation 9) and Russel-Rao's coefficient R (equation 10) are both, for example, L-transivive</Paragraph> </Section> <Section position="2" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 4.2 Proximity-based </SectionTitle> <Paragraph position="0"> Following our second criterion of classification, mathematics also uses diverse proximity-based similarity measures. We subdivide these mathematical measures into three groups: the distance model, the probabilistic model, and the angular coefficients. The first one, the distance model, overlaps in part with the subjective scaling of similarity as presented in the psychological approaches (section 3.1). The mathematical model is the same with a metric of distance ),( yxd computed between the objects in a space.</Paragraph> <Paragraph position="1"> Algorithms like formulae (2), (3) and (4) of section 3.1 are amongst the proximity-based similarity measures.</Paragraph> <Paragraph position="2"> Second, the probabilistic model is based on the statistical analysis of objects and their attributes in a data space. Lebart & Rajman (2000) gave many examples of that kind of proximity measures, such as the Kullback-Leiber distance The third mathematical model is also a metric space model but it uses angular measures between vectors of features to determine the similarity between objects. A well-known measure from that group is the cosine-correlation: Although conditions applying on proximity-based measures are shortly described in Cross and Sudkamp (2002) and Miyamoto (1990) for fuzzy sets, we are not aware of an extensive research such as the one by De Baets et al. (2001), presented in section 4.1, for classifying cardinality of sets types. We make such an attempt in the following section.</Paragraph> </Section> </Section> <Section position="8" start_page="21" end_page="21" type="metho"> <SectionTitle> 5 Analysis of similarity metrics </SectionTitle> <Paragraph position="0"> In this section, we perform a classification and analysis exercise for similarity measure4, possibly used for WSD, but more generally used in any task where similarity between words is required. Table 3 shows the measures classified in the four categories of the mathematical model presented in section 4: measures of cardinality (Card), of distance (Dist), of probability (Prob) and of angle (Ang).</Paragraph> <Paragraph position="1"> We sustain that these groupings can be further justified based on two criteria: the psychological model of meaning (Table 2) and the typical properties of the classes (Table 4). The first criterion refers to the representation of concepts distinguishing between the dense-state and the discrete-state5 of concept (meaning) attributes. That psychological distinction is helpful to categorize some metrics, like Gotoh, which seems hybrid (Card and Dist). In such a metric, the penalty for the gap between two concepts applies on the defect of the dense-state, such as for a blurred im- null that MDS better suits continuous perceptual domains and set-theoretic accommodate discrete features like in the CM. age rather then the absence of the discrete-state, i.e. of a feature; it is therefore classified in the Dist category.</Paragraph> <Paragraph position="2"> The second criterion is a study on shared properties for each category of the mathematical model. Table 4 summarizes the properties using the following schema: (m) minimality, (r) reflexivity, (s) symmetry, (ti) triangle inequality, (tr) transitivity.</Paragraph> <Paragraph position="3"> From Table 4, we see for instance that reflexivity is a basic property for cardinality measures because we wish to regularly count discrete objects in a set. On the opposite side, the minimality property is a characteristic of a distance measure, since it is noticeable by the displacement or the change, for example, in distinctive images. According to Fodor (1998), we say that statistical or probabilistic approaches exhibit several necessary and sufficient conditions for the inclusion of elements in the extension of a concept, but the dominant element, such as the pattern of comparison (in Maximal matches for instance) is anti-reflexive and asymmetric with the resulting elements. However, there is symmetry in the resultant, but there is still antireflexivity. null We also single out the angular metrics from distance measures even though they use a similar analysis of the qualitative variation of entities. According to Ekman & Sjoberg (1965), a method using similarity converted into cosine representation has the advantage to reveal two components of percepts, i.e. the two-dimensional vector is a modeling in magnitude and direction. Thus, angular metrics can be a means used to contrast two semantic features of entities.</Paragraph> <Section position="1" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 5.1 A closer look at properties </SectionTitle> <Paragraph position="0"> Finding out that different sets of properties can serve as dividing lines between groups of metrics is interesting in itself, but does not answer the question as to which set is more appropriate than others. We do not wish to answer this question here as we believe it is application-dependent, but we do wish to emphasize that a questioning should take place before choosing a particular measure. In fact, for each property, there is an appropriate question that can be asked, as is summarized in Table 5.</Paragraph> <Paragraph position="1"> Table 5 - Questioning for Measure Selection</Paragraph> </Section> <Section position="2" start_page="21" end_page="21" type="sub_section"> <SectionTitle> Property Question </SectionTitle> <Paragraph position="0"> Minimality Is the minimal distance between objects the distance of an object with itself? Symmetry Is it true that the distance between x and y is always the same as the distance between y and x? Triangle Inequality Is it appropriate that a direct distance between x and z is always smaller than a composed distance from x to y and y to z? Reflexivity Is it true that the relation that it holds between an object and itself is always the same? Transitivity Is it necessarily the case that when x is similar to y and y is similar to z, that x be similar to z? For the task of WSD investigated in this paper, we hope to open the debate as to which properties are to be taken into consideration.</Paragraph> </Section> </Section> class="xml-element"></Paper>