File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1401_evalu.xml

Size: 12,632 bytes

Last Modified: 2025-10-06 13:59:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1401">
  <Title>Metonymy as a Cross-lingual Phenomenon</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5. Evaluation
</SectionTitle>
    <Paragraph position="0"> The cross-linguistic filter yields a subset of the monolingual analysis data described in section 4.1. It covers 404 distinct English nouns out of a total of 8062 (5%).</Paragraph>
    <Paragraph position="1"> This original filter considered nouns satisfying the criteria of Apresjan (cf.</Paragraph>
    <Paragraph position="2"> section 1), i.e. they are one of at least 2 words with sense distinctions that exhibit a particular relationship.</Paragraph>
    <Paragraph position="3"> The percentage covered by the cross-linguistic data compared to the original analysis gradually varies from a 100% for the very small potential classes of regular polysemy (2-3 words) to 1-2% for middle sized (30-50 words) and large classes (100+ words).</Paragraph>
    <Paragraph position="4"> In order to create a set for manual evaluation, the set of 404 English nouns was reduced by strengthening the Apresjan criterion and requiring that a word be considered only if it was one of at least a three word set illustrating the regular polysemy (RP). We will refer to this as a three-word RP class. The rationale behind this was that two word candidate RP classes introduce noise because of the increased  probability of a fortuitous coincidence of senses belonging to a set of just two words. This step reduced the number of participating words to 394. At this point, 177 words were randomly chosen from this set for manual evaluation. The evaluation consisted of examining the hypernym pairs that reflect a candidate regular polysemic relation.1 The criteria used in this step are semantic homogeneity (the semantic relation that defines the candidate RP class should apply to the majority of the participating words) and specificity of the pattern (the lower the position of the hypernymic pair in the hierarchy, the more specific the semantic relation).</Paragraph>
    <Paragraph position="5"> 109 of these words displayed valid regular polysemic patterns (62%), 68 did not (38%). This means that by means of this automatic filtering method we have a 62% success rate for identifying valid regular polysemic patterns. Below are a few examples of cross-linguistic RP classes that have satisfied the criteria of the evaluation.</Paragraph>
    <Paragraph position="6">  combinations of hypernym pairs can be considered for the same set of words. (In fact the possibilities are the Cartesian product of the ancestors of each of the hypernyms in the pair). If all hypernymic combinations were taken into account this amounts to an average of 17 classes per word.</Paragraph>
    <Paragraph position="7"> crocheting natural or synthetic fibers) Covering (a natural object that covers or envelops) English Rp class (4 total): wool, hair, fleece, tapa Dutch RP class (1 total): wol Spanish RP class (1 total): lana Coverage of the intersection between all three languages: 25% of set derived from WordNet Hypernymic Pair: Plant (a living organism lacking the power of locomotion) - Edible fruit (edible reproductive body of a seed plant especially one having sweet flesh) English RP class (159 total): apple, boxberry, blackcurrant, banana, fig . . .</Paragraph>
    <Paragraph position="8"> Dutch RP clas s (9 total): banaan, vijg, persimoen, meloen...</Paragraph>
    <Paragraph position="9"> Spanish RP class (20 total): banana, platano, melon, caqui, higo...</Paragraph>
    <Paragraph position="10"> Coverage of the intersection between all three languages: 2.5% of set derived from</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
WordNet
</SectionTitle>
      <Paragraph position="0"> Hypernymic Pair: Person (a human being) - Quality (an essential and distinguishing attribute of something or someone)  English RP class (11 total): attraction, authority, beauty, . . .</Paragraph>
      <Paragraph position="1"> Dutch RP class (1 total): schoonheid Spanish RP class (4 total): belleza,atraccion, autoridad, imagen Word intersection between all three languages: 9% of set derived from WordNet Hypernymic Pair: Substance (that which has mass and occupies space) - Drug (something that is used as a medicine or narcotic) English RP class (25 total): alcohol, bromide, dragee, histamine, iodine, liquor... Dutch RP class (2 total): broom, cocktail Spanish RP class (10 total): bromuro, histamina, muscatel, yodo...</Paragraph>
      <Paragraph position="2"> Word intersection between all three languages: 4% of set derived from WordNet  It is possible to view these results as an indication of the cross-linguistic validity of the regular polysemic patterns and their level of universality relative to the language families represented by the wordnets. The hypothesis is that if a metonymic pattern occurs in several languages, there is stronger evidence for a higher level of universality of the regular polysemic pattern.</Paragraph>
      <Paragraph position="3"> Of course there is interference with the coverage of the wordnets in EuroWordNet. Since the Dutch and Spanish wordnets are only half the size of the English wordnet only limited coverage can be expected. Still, the coverage seems to be consistently low in most cases, often not more than 2-5%. On the basis of wordnet size only one would expect a higher coverage.</Paragraph>
      <Paragraph position="4"> There are other explanations for the lack of identical lexicalizations in other target language wordnets: 1. The metonymic pattern is language specific, and is not realised as a polysemous word in the target language. For example, the Dutch kantoor is synonymous to the English office in the sense 'where professional or clerical duties are performed', but its sense distinctions can not mirror the regular polysemic relation in English with 'a job in an organization or hierarchy'.</Paragraph>
      <Paragraph position="5"> 2. The pattern is unattested in the target language in terms of usage but forms a potential sense extension in that language. For instance, the Spanish iglesia and the Dutch kerk both mean 'building for worship' and 'a service conducted in a church'. The Spanish wordnet has an additional systematically related sense for iglesia ('institution to express belief in a divine power') that is not shared by its Dutch counterpart but is a valid new sense.</Paragraph>
      <Paragraph position="6"> 3. The missing sense can in fact only be lexicalized by another word or compound or derivation related to the word with the potentially missing sense. For example, the Dutch vereniging has the sense (an association of people with similar interests). The English equivalent is club, for which there is another sense in Wordnet (a building occupied by a club). This is not a felicitous sense extension for the Dutch vereniging, because the favoured lexicalization is the compound verenigingshuis (club house).</Paragraph>
      <Paragraph position="7"> 4. The metonymic pattern is in fact attested in the language, but one or more senses participating in the patterns has not yet been captured in the wordnet. One of the reasons could be the sense granularity of the resource on the basis of which the wordnet has been built. For example , embassy has one sense in WordNet (a building where ambassadors live or work). The Dutch translational equivalent ambassade has an additional sense denoting the people representing their country. This sense can be projected to the English WordNet as a regular polysemy pattern that is also valid in English. In fact, LDOCE (Procter,1978) only lists the sense which is missing in WordNet.</Paragraph>
      <Paragraph position="8"> 7. Coverage and extendibility There are many RP classes whose English word members do not all have a Dutch or Spanish counterpart. We wanted to evaluate the universality of the regular polysemic relations by testing native speaker intuitions about these regular polysemic gaps. This was done by projecting the senses of the participating English words in an RP class onto Dutch and Spanish, and to assess whether the missing senses were adequate additional senses in these two languages. The experiment we conducted was very small. We intend to perform more experiments of this kind in the future. The pattern we examined is the hypernymic combination occupation (the principal activity in your life) - discipline (a branch of knowledge). This RP class has five members. Two Dutch and two Spanish native speakers were asked to judge the felicitousness of the senses that are missing in the Dutch and Spanish wordnets. Below is a short discussion of each member.</Paragraph>
      <Paragraph position="9"> interior design 1. the trade of planning the layout and furnishings of an architectural interior 2. the branch of architecture dealing with the selection and organization of furnishings for an architectural interior The corresponding Dutch word binnenhuisarchitectuur has only one sense which is linked to both WordNet senses by means of a near-synonymy relation. This means that the Dutch wordnet is underspecified for the distinction of these metonymically related senses and can be extended with the specific sense distinctions (see explanation 4 above). This coincided with the verdict of the Dutch jury.</Paragraph>
      <Paragraph position="10"> The Spanish WordNet has a separate translation for each sense: interiorismo (corresponding to interior design 1) and deseno de interiores (corresponding to interior design 2). The latter translational equivalent was considered to also have a possible trade reading.</Paragraph>
      <Paragraph position="11"> law 1. the learned profession that is mastered by graduate study in a law school and that is responsible for the judicial system 4. the branch of philosophy concerned with the law The Dutch 'rechtswetenschap' has only one sense, which is linked to both WordNet senses by means of a near-synonymy relation. This again means that the Dutch wordnet is underspecified for the distinction of these metonymically related senses and can be extended with the specific sense distinctions (see explanation 4 above). This coincided with the verdict of the Dutch jury. The Spanish equivalent of law 4 is jurisprudencia, whereas law 1 does not have a correspondence in the Spanish wordnet. The profession reading was not considered a felicitous additional sense for this word. Both subjects remarked that another word captures both meaning: leyes, which is not present in the Spanish wordnet.</Paragraph>
      <Paragraph position="12"> literature:  1. the profession or art of a writer 2. the humanistic study of a body of  literature The Dutch letterkunde is only linked up to sense literature no. 2. Sense no. 1 was not considered to be a straightforward new sense for this word by the judges.</Paragraph>
      <Paragraph position="13"> The Spanish literatura lacks a profession reader in the Spanish wordnet. This sense was considered as valid by one subject, but rejected by the other subject.</Paragraph>
      <Paragraph position="14"> politics 1. the profession devoted to governing and to political affairs 2. the study of government of states and other political units The Dutch word politicologie also has only one sense that is linked to both WordNet senses by means of a near-synonymy relation. This again means that the Dutch wordnet is underspecified for the distinction of these metonymically related senses and can be extended with the specific sense distinctions. The Dutch subjects, however, were not happy with the profession reading. The Spanish politica lacks a profession reading in the Spanish wordnet. The Spanish subjects considered this a valid sense for this word.</Paragraph>
      <Paragraph position="15"> theology 1. the learned profession acquired by specialized courses in religion (usually taught at a college or seminary 2. the rational and systematic study of religion and its influences and of the nature of religious truth The Dutch theologie has no profession reading. This reading was considered valid by the Dutch subjects.</Paragraph>
      <Paragraph position="16"> The Spanish teologia has both senses in the Spanish wordnet, and this coincides with the subjects' intuition.</Paragraph>
      <Paragraph position="17"> The results are summarized in table 2 below. Overall, the projection of the word senses onto the Dutch wordnet yields a sense extension for one word out of a possible two. For the Spanish wordnet the same process creates valid new senses for two out of four words.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML