XML Viewer - w06-2506

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2506_metho.xml
Size: 26,689 bytes
Last Modified: 2025-10-06 14:10:51
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2506">
  <Title>Characterizing Response Types and Revealing Noun Ambiguity in German Association Norms</Title>
  <Section position="4" start_page="41" end_page="42" type="metho">
    <SectionTitle>
3 Data Collection Method
</SectionTitle>
    <Paragraph position="0"> This section introduces our elicitation procedure.</Paragraph>
    <Paragraph position="1"> Materials: 409 German nouns referring to picturable objects were chosen as target stimuli. To ensure broad coverage, target objects represented a variety of semantic classes including animals, plants, professions, food, furniture, vehicles, and tools. Simple black and white line drawings of target stimuliweredrawn fromseveral sources, including Snodgrass and Vanderwart (1980) and the picture database from the Max Planck Institute for Psycholinguistics in the Netherlands.</Paragraph>
    <Paragraph position="2"> Participants: 300 German participants, mostly students from Saarland University, received either course credit or monetary compensation for filling out a questionnaire.</Paragraph>
    <Paragraph position="3"> Procedure: The409target stimuliweredivided randomly into three separate questionnaires consisting of approximately 135 nouns each. Each questionnaire was printed in two formats: target objects were either presented as pictures together with their preferred name (to ensure that associate responses were provided for the desired lexical item) or the name of the target objects was presented without a representative picture accompanying it. Next to each target stimulus three lines were printed on which participants could write up to three semantic associate responses for the stimulus. The order of stimulus presentation was individually randomized for each participant. Participants were instructed to give one associate word per line, for a maximum of three responses per trial. No time limits were given for responding, though participants were told to work swiftly and withoutinterruption. Eachversion ofthequestionnaire was filled out by 50 participants, resulting in a maximum of 300 data points for any given target stimulus (50 participants a3 2 presentation modes a3 3 responses).</Paragraph>
    <Paragraph position="4"> Collected associate responses were entered into a database with the following additional infor- null mation:a4 For each target stimulus we recorded a) whether it was presented as a picture or in written form, and b) whether the name was a homophone (and thus likely to elicit semantic associates for multiple meanings). For each response type provided by a participant, we coded a) the order of the response, i.e., first, second, third, b) the part-of-speech of the response, and c) the type of semanticrelationbetweenthetarget stimulusandthe response (e.g., part-whole relations such as car wheel, and categorical relationship such as hypernymy, hyponymy, and synonymy).</Paragraph>
  </Section>
  <Section position="5" start_page="42" end_page="44" type="metho">
    <SectionTitle>
4 Analysis of Response Types
</SectionTitle>
    <Paragraph position="0"> As described in Section 2, one might expect variation in the response types for the two presentation modes, because the associations provided in the 'picture+word' condition were biased towards the depicted sense of the target noun. Our first analysis evaluates what sorts of differences are in fact observed in the data, i.e., which intuitions are empirically supported, and which are not. To this end, this section is concerned with systematic differences in response types when target stimuli were presented in written form ('word only', subsequently W condition) or when the written form was accompanied by a picture ('picture+word', subsequently PW condition). We first give our predictions for the differences in response types and then continue with the corresponding analyses of response types.</Paragraph>
    <Section position="1" start_page="42" end_page="42" type="sub_section">
      <SectionTitle>
4.1 Predictions
</SectionTitle>
      <Paragraph position="0"> Based on our intuitions, we predicted the following differences.</Paragraph>
      <Paragraph position="1"> 1. The overall number of response tokens is unlikely to differ for the two presentation modes, since participants are limited to three associate responses per target stimulus in both presentation modes.</Paragraph>
      <Paragraph position="2"> 2. The overall number of response types, however, should differ: in the PW condition we expect a bias towards the depicted noun sense, resulting in a smaller number of response types than in the W condition.</Paragraph>
      <Paragraph position="3">  3. The PW condition produces less idiosyncratic response types than the W condition, because pictures reinforce associations that are either depicted, or at least related to the depicted sense and its characteristics, resulting in less response diversity.</Paragraph>
      <Paragraph position="4"> 4. The PWcondition receives more associations that show a part-of relation to the target stim- null ulus than the W condition, because characteristics of the pictures can highlight specific parts of the whole.</Paragraph>
      <Paragraph position="5"> 5. The type agreement, i.e., the number of response types on which the PW and the W conditions agree is expected to differ with respect to the target noun. For target nouns thatarehighlyambiguousweexpectlowtype agreement. Note that this prediction does not refer to a PW-W distinction, but instead uses the PW-W distinction to approach the issue of noun senses.</Paragraph>
    </Section>
    <Section position="2" start_page="42" end_page="42" type="sub_section">
      <SectionTitle>
4.2 Response Type Distributions
</SectionTitle>
      <Paragraph position="0"> The analyses to follow are based on stimulusresponse frequency distributions: For each target stimulus and each response type, we calculated how often the response type was provided. The result was a frequency distribution for the 409 target nouns, providing frequencies for each responsetype. Thefrequency distributions were distinguished for the PW condition and the W condition. Table 1 provides an example of the most frequent response types and their frequencies for the homophone target noun Schloss, as described in Section 2; the 'lock' meaning was depicted, 'castle' is an alternative meaning. Hereafter, we will refer to an association provided in the PW condition as association PW, and an association provided in the W condition as association W, e.g.,</Paragraph>
    </Section>
    <Section position="3" start_page="42" end_page="44" type="sub_section">
      <SectionTitle>
4.3 Results
</SectionTitle>
      <Paragraph position="0"> Based on the frequency distributions in Section4.2, we analyzed theresponse types according to our predictions in Section 4.1.</Paragraph>
      <Paragraph position="1">  Number of response tokens: The number of response tokens was compared for each target stimulus in both presentation modes. The total number ofresponse tokens was58,642 (with mean a5a7a6a9a8a11a10a13a12 ) in the PW condition and 58,072 (a5a14a6 142)in theW condition. Wehad predicted that Token(PW) a15 Token(W). The analysis showed, however, that in 243 of 409 cases (59%) the number of response tokens was larger for PW than for W (Token(PW) a16 Token(W)); in 132 cases (32%) Token(PW) a17 Token(W), and in 34 cases (8%) Token(PW) a6 Token(W). The unpredicted difference betweenpresentationmodeswassignificantacross items in a two-tailed t-test, a18a20a19 a10a22a21a22a23a22a24a25a6a27a26a29a28a30a21a13a31a32a31a34a33a36a35 a17 a28a30a21a32a21a37a8 . We take the result as an indication that picturesfacilitate the production ofassociations. This is an interesting insight especially since the number of associate responses per target stimulus was limited while response time was not.</Paragraph>
      <Paragraph position="2"> Number of response types: The number of response types was compared for each target stimulus in both presentation modes. The total number of response types in the PW condition was 19,800 (a5a38a6 48) compared with 20,332 (a5a39a6 50) in the W condition. We had predicted that Type(W) a16 Type(PW). The results showed indeed that in 229 of the 409 cases (56%) the number of response types was larger for W than</Paragraph>
      <Paragraph position="4"> ence, although small, was significant, a18a11a19 a10a22a21a22a23a22a24a40a6</Paragraph>
      <Paragraph position="6"> Idiosyncratic response types: The proportions of idiosyncratic response types (i.e., associate responses that were provided only once for a certain target stimulus) were compared for each target stimulus in both presentation modes. In total, 12,011 (a5a42a6 29) idiosyncratic responses were provided in the PW condition and 12,582 (a5a42a6 31) idiosyncratic responses in the W condition. We had predicted that Idio(W) a16 Idio(PW). The analysis showed indeed that in 216 of the 409 cases (53%) the number of idiosyncratic responses was larger for W than for PW (Idio(W) a16 Idio(PW)); in 175</Paragraph>
      <Paragraph position="8"> notion of a restricted set of responses in the PW condition relative to the W condition.</Paragraph>
      <Paragraph position="9"> Part-of response types: Based on the manual annotation of semantic relations between target nouns and responses, proportions of response types which stand in a part-of relation to the target nouns were determined. The total number of part-of response types was 876 (a5a47a6 2.7) in the PW condition, and 901 (a5a48a6 2.8) in the W condition.</Paragraph>
      <Paragraph position="10"> We predicted that Part(PW) a16 Part(W). The analysis showed however that in only 94 of the 409 cases (29%) the number of part-of responses was larger for PW than for W (Part(PW) a16 Part(W)); in 114 cases (35%) Part(W) a16 Part(PW), and in 115 cases (36%) Part(W) a6 Part(PW). The differencebetweenconditionswasnot significantacross items, a18a20a19 a12a32a49a32a49a22a24a50a6a51a8a46a28a52a10a13a49a29a33a36a35 a16 a28a53a8 . The absence of a reliable difference in this analysis possibly suggests that our pictures did not regularly enhance a part-whole relationship.</Paragraph>
      <Paragraph position="11"> Type agreement: The final analysis was based on response type agreement for PW and W. However, this analysis did not aim to distinguish between the two presentation modes but rather used the agreement proportions as a diagnostic of potential target noun ambiguity. Here we calculated thetotalamountofoverlapbetweenthePWandW conditions. For this calculation, we identified the number of response types that occur in both the PW and W conditions for a particular target stimulus and divided that number by the total number of response types produced for that target stimulus, irrespective of condition. In other words, if a noun PW receives responses A and B and noun W receives responses B and C, then the total number of shared response types is 1, namely response B, and the total number of response types across conditions is 3, namely A, B and C. Thus, the proportion of agreement is .33.</Paragraph>
      <Paragraph position="12"> We reasoned that target nouns with low type agreement are likely to be ambiguous. To test this, we sorted the targets by their proportion of agreement, and comparedthe top and bottom 20 targets.</Paragraph>
      <Paragraph position="13"> In the manual annotation of our stimuli, cf. Section 3, we had recorded that 10% of our stimuli were homophones. Thus, a random distribution would predict two ambiguous items in a 20 item sample if the proportion of agreement is not an indicator of ambiguity. Instead, we found 11 ambiguous nouns in the set of 20 targets with lowest agreement proportions and 2 ambiguous nouns in the set of 20 targets with highest agreement proportions. A a54a56a55 test indicated that the number of  ambiguousa57 nouns found in the two sets differed significantly, a54a56a55 a6a58a31a34a28a41a49a32a59a29a33a36a35 a17 a28a30a21a37a8 . Summarizing this first set of analyses, we found that the associate responses for concrete German nouns differed significantly depending on the format under which they were elicited, namely the presentation mode. The fact that we found more response types in total and also more idiosyncratic responses when target nouns were presented in the 'word only' vs. the 'picture+word' condition suggests that alternative meanings were more active when participants were presented with written stimuli compared to depicted stimuli. It is also interesting to note that not all our intuitive predictions wereborn out. Forexample, despiteour feeling that the picture should bias the inclusion of depicted part-of relations, such as the broom a15 witch example discussed above, this intuition was not supported by the data. This fact highlights the importance of first analyzing the responses to ensure the necessary conditions are present for the identification of ambiguous words.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="44" end_page="47" type="metho">
    <SectionTitle>
5 Analysis of Noun Senses
</SectionTitle>
    <Paragraph position="0"> The second analysis in this paper addresses the distinction of noun senses on the basis of associations. Our goal is to identify the - potentially multiple - senses of target nouns, and to reveal differences in the noun senses with respect to the presentation modes. The analysis was done as follows. null  1. Thetarget-responsepairswereclustered. The soft cluster analysis was expected to assign semantically similar noun senses into common clusters, as based on shared associate responses. (Section 5.1) 2. The clusters were used to predict the ambiguityof nouns andtheir respective senses. (Section 5.2) 3. The clusters and their predictability were  evaluated by annotating noun senses with Duden dictionary definitions, and calculating interannotator agreement. (Section 5.3)</Paragraph>
    <Section position="1" start_page="44" end_page="45" type="sub_section">
      <SectionTitle>
5.1 Latent Semantic Noun Clusters
</SectionTitle>
      <Paragraph position="0"> Target nouns were clustered on the basis of their association frequencies, cf. Table 1. I.e., the clustering result was determined by joint frequencies of the target nouns and the respective associations. The targets themselves were described by the noun-condition combination, e.g. Schloss PW, and Schloss W. We used noun-condition combinations as compared to nouns only, because the clustering result should not only distinguish senses of nouns in general, but in addition predict the noun senses with respect to the condition.</Paragraph>
      <Paragraph position="1"> Various techniques have been exploited for word sense disambiguation. Closely related to our work, Schvaneveldt's pathfinder networks (Schvaneveldt, 1990) were based on word associations and were used to identify word senses.</Paragraph>
      <Paragraph position="2"> An enourmous number of approaches in computational linguistics can be found on the SENSEVAL webpage (SENSEVAL,), which hosts a word sense disambiguation competition. We applied Latent Semantic Clusters (LSC) to our association data. The LSC algorithm is an instance of the Expectation-Maximisation algorithm (Baum, 1972) for unsupervised training based on unannotated data, and has been applied to model the selectional dependency between two sets of words participatinginagrammaticalrelationship(Rooth, 1998; Rooth et al., 1999). The resulting cluster analysis defines two-dimensional soft clusters whichareabletogeneraliseoverhiddendata. LSC training learns three probability distributions, one for the probabilities of the clusters, and one for each tuple input item and each cluster (i.e., a probability distribution for the target nouns and each cluster, and one for the associations and each cluster), thus the two dimensions. We use an implementation of the LSC algorithm as provided by Helmut Schmid.</Paragraph>
      <Paragraph position="3"> The LSC output depends not only on the distributional input, but also on the number of clusters tomodel. Asarule, themoreclustersaremodeled, the more skewed the resulting probability distributions for cluster membership are. Since the goal of this work was not to optimize the clustering parameters, but to judge the general predictability of such models, we concentrated on two clustering models, with 100 and 200 clusters, respectively.</Paragraph>
      <Paragraph position="4"> Table 2 presents the most probable noun-condition combinations for a cluster from the 100clusteranalysis: Theclusterprobability is0.01295 (probabilities ranged from 0.00530 to 0.02674).</Paragraph>
      <Paragraph position="5"> Themostprobable associationsthatwerecommon to members of this cluster were Ritter 'knight',  sca60 harf 'sharp'. This example shows that the associationsprovide asemanticdescription ofthecluster, and the target nouns themselves appear in the cluster if one of their senses is related to the cluster description. In addition, we can see that, e.g., Schloss appearsinthisclusteronlyinthe Wcondition. The reason for this is that the picture showed the 'lock' sense of Schloss, so the PW condition was less likely to elicit 'castle'-related responses.</Paragraph>
      <Paragraph position="6"> This example cluster illustratesnicely what we expect from the cluster analysis with respect to distinguishing noun senses.</Paragraph>
    </Section>
    <Section position="2" start_page="45" end_page="46" type="sub_section">
      <SectionTitle>
5.2 Prediction of Noun Ambiguity and Noun
Senses
</SectionTitle>
      <Paragraph position="0"> The noun clusters were used to predict the ambiguity of nouns and their respective senses. The two-dimensional cluster probabilities, as introduced above, offer the following information: a61 Which associations are highly probable for a cluster? The most probable associations are considered as defining the semantic content of the cluster.</Paragraph>
      <Paragraph position="1"> a61 Which target nouns are highly probable for a cluster and its semantic content, i.e. the associations? Relating the target nouns in a cluster with the cluster associations defines the respective sense of the noun. To refer to the above example, finding Schloss in a cluster together with associations such as 'castle' and 'fight' relates this instance of Schloss to the 'castle' sense and not the 'lock' sense.</Paragraph>
      <Paragraph position="2"> a61 Which target nouns are in the same cluster and therefore refer to a common sense/aspect of the nouns? This information is relevant for revealing sense differences of target nouns with respect to the conditions PW vs. W.</Paragraph>
      <Paragraph position="3"> In order to predict whether a noun is in a cluster or not, we needed a cut-off value for the membership probability. We settled on 1%, i.e., a target noun withaprobabilityof a62 1%wasconsideredamember of a cluster. Based on the 200-cluster information, we then performed the following analyses on noun ambiguity and noun senses.</Paragraph>
      <Paragraph position="4"> Prediction of noun ambiguity: For each target noun, we predicted its ambiguity by the number of clusters it was a member of. For example, the highly ambiguous noun Becken 'basin, cymbal, pelvis' (among other senses), was a member of 8 clusters, as compared to the unambiguous B&amp;quot;acker 'baker' which was a member of only one cluster. Membership in several clusters does not necessarily point to multiple noun senses (becausedifferent combinationsofassociationsmight define similar semantic contents), but nevertheless the clusters provide an indication of the degree of noun ambiguity. The total number of senses in the 200-cluster analysis was 735, which means an average of 1.8 senses for each target stimulus (across presentation condition).</Paragraph>
      <Paragraph position="5"> Discrimination of noun senses: The most probableassociationsin the clusterswereassumed to describe the semantic content of the clusters.</Paragraph>
      <Paragraph position="6"> They can be used to discriminate noun senses of polysemous nouns. Referring back to our example noun Becken, it appeared in one cluster with the most probable associations Wasser 'water', Garten 'garden', Feuerwehr 'fire brigade', giessen 'water', and nass 'wet', describing the 'basin' sense of the target noun; in a second cluster it appearedwithMusik 'music', laut 'loud', Instrument 'instrument', Orchester 'orchestra', and Jazz, describing the music-related sense; and in a third cluster it appeared with Hand 'hand', Bein 'leg', Ellenbogen 'elbow', K&amp;quot;orper 'body' and Muskel 'muscle', describing the body-related sense, etc.</Paragraph>
      <Paragraph position="7"> Noun similarity: Those target nouns which were assigned to a common cluster were assumed  tobesemanticallysimilar(withrespecttothecluster content). Again, referring back to our example noun Becken and the three senses discriminated above, in the first cluster refering to the 'basin' sense we find other nouns such as Eimer 'bucket', Font&amp;quot;ane 'fountain', Brunnen 'fountain, well', Weiher 'pond', and Vase 'vase', all related to water andwatercontainer; inthesecondclusterreferring to the music sense we find Tuba 'tuba', Trompete 'trumpet', Saxophon 'sax', and Trommel 'drum',  anda57 in the third cluster referring to the body sense we find Arm 'arm', and Knochen 'bone'.</Paragraph>
      <Paragraph position="8"> Discrimination of PW vs. W noun senses: Combining the previous two analyses allowed us to discriminate senses as provided by the two experimental conditions. Remember that the target nouns in the clusters included the specification of the condition. If we find a target noun in a certain cluster with both condition specifications, it means that some associations produced to both the PW and the W conditions referred to the same noun sense. If a target noun appears in a certain cluster only with one condition specified, it means that the associations captured the respective noun sense only in one condition. Thus, a target noun appearing in a cluster in only one condition was an indication for ambiguity. Going back to our example noun Becken and its three example clusters, we find the noun in both conditions only in one of the three clusters, namely the cluster for the music sense, and this happens to be the sense depicted in the PW condition. In the two other clusters, we only find Becken in the W condition. In total, Becken appears in both conditions only in 1 out of 8 clusters, in only the PW condition in 1 cluster, and in only the W condition in 6 clusters.</Paragraph>
      <Paragraph position="9"> The four analyses demonstrate that and how the clusters can be used to predict and discriminate noun senses. Of course, the predictions are not perfect, but they apprximately correspond to our linguistic intuitions. Impressively, the clusters revealed not only blatantly polysemous words such as Becken but also distinct facets of a word. For example, the stimulus Filter 'filter' had associations to coffee-related senses as well as cigaretterelated senses, both of which were then reflected in the clusters.</Paragraph>
    </Section>
    <Section position="3" start_page="46" end_page="47" type="sub_section">
      <SectionTitle>
5.3 Evaluation of Noun Clusters
</SectionTitle>
      <Paragraph position="0"> In order to perform a more independent evaluation of the clusters which is not only based on specific examples, we assessed the clusters by two annotators. 20homophones weremanuallyselectedfrom the 409 target nouns. In addition, we relied on the indicators for ambiguity as defined in Section 4, and selected the 20 top and bottom nouns from the ordered list of type agreement for the two conditions. The manual list showed some overlap with the selection dependent on type agreement, resulting in a list of 51 target nouns.</Paragraph>
      <Paragraph position="1"> For each of the selected target nouns, we looked up the noun senses as defined by the Duden, a standard German dictionary. We primarily used the stylistic dictionary (Dudenredaktion, 2001), but used the foreign language dictionary (Dudenredaktion, 2005) ifthe noun was missing in the former. Each target noun was defined by its (short version) sense definitions. For example, Schloss was defined by the senses Vorrichtung zum Verschliessen 'device for closing' and Wohngeb&amp;quot;aude von F&amp;quot;ursten und Adeligen 'residential building for princes and noblemen'.</Paragraph>
      <Paragraph position="2"> As targets for the evaluation, we used the two cluster analyses as mentioned above, containing 100 and 200 clusters with membership probability cut-offs at 1%. Two annotators were then presented with two lists each: For each cluster analysis, they saw a list of the 51 selected target nouns, accompanied by the clusters they were members of,i.e., for whichthey showed aprobability a62 a8a64a63 , ignoring the condition of the target noun (PW vs.</Paragraph>
      <Paragraph position="3"> W). In total, the annotators were given 82/91 clusters which included any of the 51 selected nouns.</Paragraph>
      <Paragraph position="4"> For each cluster, the annotators saw the five most probable associations, and all cluster members.</Paragraph>
      <Paragraph position="5"> The annotators were asked to select a Duden sense for each cluster, if possible. The results of the annotation are presented in Table 3. Annotator 1 identified a Duden sense for 72/75% of the clusters, annotator 2for 78/71%. Interannotator agreement on which of the Duden senses was appropriate for a cluster (if any) was 81/85%; a65 a6a66a28a45a31a46a23a22a67a34a28a45a31a46a68 .  The evaluation of the clusters as carried out by the sense annotation demonstrates that the cluster senses correspond largely to Duden senses. This first kind of evaluation models the precision of the cluster analyses. A second kind of evaluation assessed how many different Duden senses we capturewiththe clusteranalyses; thisevaluation modells the recall of the cluster analyses. Duden defines a total of 113 senses to our target nouns. Table 4 specifies the recall for the data sets and annotators. null The evaluations show that the precision is much larger than the recall. It might be worth applying the clustering with a different number of clusters  and/or a different cut-off for the cluster membership probability, but that would lower the precisionoftheanalyses. Webelieve thattheevaluation numbers are quite impressive, especially considering that Duden not only specifies everyday vocabulary, but includes colloquial expressions (such as Ballon as 'human head'), out-dated senses (such as Mond as 'month'), and domain-specific senses (such as Blatt as 'shoulder of a hoofed game').</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML