File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/j04-1002_metho.xml
Size: 36,849 bytes
Last Modified: 2025-10-06 14:08:45
<?xml version="1.0" standalone="yes"?> <Paper uid="J04-1002"> <Title>c(c) 2004 Association for Computational Linguistics CorMet: A Computational, Corpus-Based Conventional Metaphor Extraction System</Title> <Section position="3" start_page="24" end_page="31" type="metho"> <SectionTitle> 2. The Metaphor Extraction Engine </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="24" end_page="25" type="sub_section"> <SectionTitle> 2.1 Searching the Net for Domain Corpora </SectionTitle> <Paragraph position="0"> Ideally, CorMet could draw on a large quantity of manually vetted, highly representative domain-specific documents. The precompiled corpora available on-line (Kucera 1992; Marcus, Santorini, and Marcinkiewicz 1993) do not span enough subjects. Other on-line data sources include the Internet's hierarchically structured indices, such as Yahoo's ontology (www.yahoo.com) and Google's (www.google.com). Each index entry contains a small number of high-quality links to relevant Web pages, but this is not helpful, because CorMet requires many documents, and those documents need not be of more than moderate quality. Searching the Internet for domain-specific text seems to be the only way to obtain sufficiently large, diverse corpora.</Paragraph> <Paragraph position="1"> CorMet obtains documents by submitting queries to the Google search engine.</Paragraph> <Paragraph position="2"> There are two types of queries: one to fetch any domain-specific documents and an-</Paragraph> </Section> <Section position="2" start_page="25" end_page="25" type="sub_section"> <SectionTitle> Mason CorMet </SectionTitle> <Paragraph position="0"> other to fetch domain-specific documents that contain a particular verb. The first kind of query consists of a conjunction of from two to five randomly selected domain keywords. Domain keywords are words characteristic of a domain, supplied by the user as an input. For the FINANCE domain, a reasonable set of keywords is stocks, bonds, NASDAQ, Dow, investment, finance. Each query incorporates only a few keywords in order to maximize the number of distinct possible queries.</Paragraph> <Paragraph position="1"> Queries for domain-specific documents containing a particular verb are composed of a conjunction of domain-specific terms and a disjunction of forms of the verb that are more likely to be verbs than other parts of speech. For the verb attack, for instance, acceptable forms are attacked and attacking, but not attack and attacks, which are more likely to be nouns. The syntactic categories in which a word form appears are determined by reference to WordNet.</Paragraph> <Paragraph position="2"> Some queries for the verb attack in the FINANCE domain are: 1. (attacked OR attacking) AND (bonds AND Dow AND investment) 2. (attacked OR attacking) AND (NASDAQ AND investment AND finance) 3. (attacked OR attacking) AND (stocks AND bonds AND NASDAQ) 4. (attacked OR attacking) AND (stocks AND NASDAQ AND Dow) Queries return links to up to 10,000 documents, of which CorMet fetches and analyzes no more than 3,000. In the 13 domains studied, about 75% of these documents are relevant to the domain of interest (as measured through a randomly chosen, handevaluated sample of 100 documents per domain), so the noise is substantial. The documents are processed to remove embedded scripts and HTML tags.</Paragraph> <Paragraph position="3"> The mined documents are parsed with the apple pie parser (Sekine and Grishman 1995). Case frames are extracted from parsed sentences using templates; for instance, (S (NP & OBJ) (VP (were |was |got |get) (VP WORDFORM-PASSIVE)) is used to extract roles for passive, agentless sentences (where WORDFORM-PASSIVE is replaced by a passive form of the verb under analysis).</Paragraph> </Section> <Section position="3" start_page="25" end_page="26" type="sub_section"> <SectionTitle> 2.2 Finding Characteristic Predicates </SectionTitle> <Paragraph position="0"> Learning the selectional preferences for a verb in a domain is expensive in terms of time, so it is useful to find a small set of important verbs in each domain. CorMet seeks information about verbs typical of a domain, because these verbs are more likely to figure in metaphors in which that domain is the metaphor's source. Besiege, for instance, is characteristic of the MILITARY domain and appears in many instances of the MILITARY - MEDICINE mapping, such as The antigens besieged the virus.</Paragraph> <Paragraph position="1"> To find domain-characteristic verbs, CorMet dynamically obtains a large sample of domain-relevant documents, decomposes them into a bag-of-words representation, stems the words with an implementation of the Porter (1980) stemmer, and finds the ratio of occurrences of each word stem to the total number of stems in the domain corpus. The frequency of each stem in the corpus is compared to its frequency in general English (as recorded in an English-language frequency dictionary [Kilgarriff 2003]).</Paragraph> <Paragraph position="2"> The 400 verb stems with the highest relative frequency (computed as a ratio of the stem's frequency in the domain to its frequency in the English frequency dictionary) are considered characteristic. CorMet treats any word form that may be a verb (according to WordNet) as though it is a verb, which biases CorMet toward verbs with common nominal homonyms. Word stems that have high relative frequency in more than one domain, like e-mail and download, are eliminated on the suspicion that they are more characteristic of documents on the Internet in general than of a substantive domain. Table 1 lists the 20 highest-scoring stems in the LAB and FINANCE domains.</Paragraph> </Section> <Section position="4" start_page="26" end_page="27" type="sub_section"> <SectionTitle> 2.3 Selectional Preference Learning </SectionTitle> <Paragraph position="0"> There are three constraints on CorMet's selectional-preference-learning algorithm. First, it must tolerate noise, because complex sentences are often misparsed, and the case frame extractor is error prone. Second, it should be able to work around WordNet's lacunae. Finally, there should be a reasonable metric for comparing the similarity between selectional preferences.</Paragraph> <Paragraph position="1"> CorMet first uses the selectional-preference-learning algorithm described in Resnik (1993), then clustering over the results. Resnik's algorithm takes a set of words observed in a case slot (e.g., the subject of pour or the indirect object of give) and finds the WordNet nodes that best characterize the selectional preferences of that slot. (Note that WordNet nodes are treated as categories subcategorizing their descendants.) A case slot has a preference for a WordNet node to the extent that that node, or one of its descendants, is more likely to appear in that case slot than it is to appear at random.</Paragraph> <Paragraph position="2"> An overall measure of the choosiness of a case slot is selectional-preference strength, S R (p), defined as the relative entropy of the posterior probability P(c|p) and the prior probability P(c) (where P(c) is the a priori probability of the appearance of a WordNet node c, or one of its descendants, and P(c|p) is the probability of that node or one of its descendants appearing in a case slot p.) Recall that the relative entropy of two distributions X and Y, D(X||Y), is the inefficiency incurred by using an encoding optimal for Y to encode X.</Paragraph> <Paragraph position="4"/> </Section> <Section position="5" start_page="27" end_page="28" type="sub_section"> <SectionTitle> Mason CorMet </SectionTitle> <Paragraph position="0"> The degree to which a case slot selects for a particular node is measured by selectional association. In effect, the selectional associations divide up the selectional preference strength for a case slot among that slot's possible fillers. Selectional association is defined as</Paragraph> <Paragraph position="2"> (p, c), what is needed is a distribution over word classes, but what is observed in the corpus is a distribution over word forms. Resnik's algorithm works around this problem by approximating a word class distribution from the word form distribution. For each word form observed filling a case slot, credit is divided evenly among all of that word form's possible senses (and their ancestors in WordNet). Although Resnik's algorithm makes no explicit attempt at sense disambiguation, greater activation tends to accumulate in those nodes that best characterize a predicate's selectional preferences.</Paragraph> <Paragraph position="3"> CorMet uses Resnik's algorithm to learn domain-specific selection preferences. It often finds different selectional preferences for predicates whose preferences should, intuitively, be the same. In the MILITARY domain, the object of assault selects strongly for fortification but not social group, whereas the selectional preferences for the object of attack are the opposite. Taking the cosine of the selectional preferences of these two case slots (one of many possible similarity metrics) gives a surprisingly low score. In order to facilitate more accurate judgments of selectional-preference similarity, CorMet finds clusters of WordNet nodes that, although not as accurate, allow more meaningful comparisons of selectional preferences.</Paragraph> <Paragraph position="4"> Clusters are built using the nearest-neighbor clustering algorithm (Jain, Murty, and Flynn 1999). A predicate's selectional preferences are represented as vectors whose nth element represents the selectional association of the nth WordNet node for that predicate. The similarity function used is the dot product of the two selectional-preference vectors. Empirically, the level of granularity obtained by running nearest-neighbor clustering twice (i.e., clustering over the sets of nodes constituting selectional preferences, then clustering over the clusters) produces the most conceptually coherent clusters.</Paragraph> <Paragraph position="5"> There are typically fewer than 100 second-order clusters (i.e., clusters of clusters) per domain. In the LAB domain there are 54 second-order clusters, and in the FINANCE domain there are 67. The time complexity of searching for metaphorical interconcept mappings between two domains is proportional to the number of pairs of salient domain objects, so it is more efficient to search over pairs of salient clusters than over the more numerous individual salient nodes.</Paragraph> <Paragraph position="6"> Table 2 shows a MILITARY cluster. These clusters are helpful for finding verbs with similar, but not identical, selectional preferences. Although attack, for instance, does not select for fortification, it does select for other elements of fortification's cluster, such as building and defensive structure.</Paragraph> <Paragraph position="7"> The fundamental limitation of WordNet with respect to selectional-preference learning is that it fails to exhaust all possible lexical relationships. WordNet can hardly be blamed: The task of recording all possible relationships between all English words is prohibitively large, if not infinite. Nevertheless, there are many words that intuitively should have a common parent but do not. For instance, liquid body substance and water should both be hyponyms of liquid, but in WordNet their shallowest common ancestor is substance. One of the descendants of substance is solid, so there is no single node that represents all liquids.</Paragraph> <Paragraph position="8"> Li and Abe (1998) describe another method of corpus-driven selectional-preference learning that finds a tree cut of WordNet for each case slot. A tree cut is a set of nodes that specifies a partition of the ontology's leaf nodes, where a node stands for all the leaf nodes descended from it. The method chooses among possible tree cuts according to minimum-description-length criteria. The description length of a tree cut representation is the sum of the size of the tree cut itself (i.e., the minimum number of nodes specifying the partition) and the space required for representing the observed data with that tree cut. For CorMet's purposes, the problem with this approach is that it is difficult to find clusters of (possibly hypernymically related) nodes representing a selectional preference using its results (because the tree cut includes exactly one node on each path from each leaf node to the root). There are similar objections to similar approaches such as that of Carroll and McCarthy (2000).</Paragraph> </Section> <Section position="6" start_page="28" end_page="29" type="sub_section"> <SectionTitle> 2.4 Polarity </SectionTitle> <Paragraph position="0"> Polarity is a measure of the directionality and magnitude of structure transfer between two concepts or two domains. Nonzero polarity exists when language characteristic of a concept from one domain is used in a different domain of a different concept.</Paragraph> <Paragraph position="1"> The kind of characteristic language CorMet can detect is limited to verbal selectional preferences.</Paragraph> <Paragraph position="2"> Say CorMet is searching for a mapping between the concepts liquids (characteristic of the LAB domain) and assets (characteristic of the FINANCE domain), as illustrated in Figure 1. There are verbs in LAB that strongly select for liquids, such as pour, flow, and freeze.InFINANCE, these verbs select for assets.InFINANCE there are verbs that strongly select for assets such as spend, invest, and tax. In the LAB domain, these verbs select for nothing in particular. This suggests that liquid is the source concept and asset is the target concept, which implies that LAB and FINANCE are the source and target domains, respectively. CorMet computes the overall polarity between two domains (as opposed to between two concepts) by summing over the polarity between each pair of high-salience concepts from the two domains of interest.</Paragraph> <Paragraph position="3"> Interconcept polarity is defined as follows: Let a be the set of case slots in domain X with the strongest selectional preference for the node cluster A. Let b be the set of case slots in domain Y with the strongest selectional preferences for the node cluster B. The degree of structure flow from A in X to B in Y is computed as the degree to which the predicates a select for the nodes B in Y,orselection strength(Y,a,B).</Paragraph> <Paragraph position="4"> Structure flow in the opposite direction is selection strength(X,b, A). The definition of selection strength(Domain, case slots, node cluster) is the average of the selectional-preference strengths of the predicates in case slots for the nodes in node cluster in Domain. The polarity for a and b is the difference in the two quantities. If the polarity is near zero, there is not much structure flow and no evidence for a metaphoric mapping.</Paragraph> <Paragraph position="5"> In some cases a difference in selectional preferences between domains does not indicate the presence of a metaphor. To take a fictitious but illustrative example, say</Paragraph> </Section> <Section position="7" start_page="29" end_page="31" type="sub_section"> <SectionTitle> Mason CorMet </SectionTitle> <Paragraph position="0"> Figure 1 Asymmetric structure transfer between LAB and FINANCE. Predicates from LAB that select for liquids are transferred to FINANCE and select for money. On the other hand, predicates from FINANCE that select for money are transferred to LAB and do not select for liquids. that in the LAB domain the subject of sit has a preference for chemists whereas in the FINANCE domain it has a preference for investment bankers. The difference in selectional preferences is caused by the fact that chemists are the kind of person more likely to appear in LAB documents and investment bankers in FINANCE ones. Instances like this are easy to filter out because their polarity is zero.</Paragraph> <Paragraph position="1"> A verb is treated as characteristic of a domain X if it is at least twice as frequent in the domain corpus as it is in general English and it is at least one and a half times as frequent in domain X as in the contrasting domain Y (these ratios were chosen empirically). Pour, for instance, occurs three times as often in FINANCE and twenty-three times as often in LAB as it does in general English. Since it is nearly eight times as frequent in LAB as in FINANCE, it is considered characteristic of the former.</Paragraph> <Paragraph position="2"> This heuristic resolves the confusion than can be caused by the ubiquity of certain conventional metaphors--the high density of metaphorical uses of pour in FINANCE could otherwise make it seem as though pour is characteristic of that domain.</Paragraph> <Paragraph position="3"> A verb with weak selectional preferences (e.g., exist) is a bad choice for a characteristic predicate even if it occurs disproportionately often in a domain. Highly selective verbs are more useful because violations of their selectional preferences are more informative. For this reason a predicate's salience to a domain is defined as its selectional-preference strength times the ratio of its frequency in the domain to its frequency in English.</Paragraph> <Paragraph position="4"> Literal and metaphorical selectional preferences may coexist in the same domain.</Paragraph> <Paragraph position="5"> Consider the selectional preferences of pour in the chemical and financial domains. In the LAB domain, pour is mostly used literally: People pour liquids. There are occasional Computational Linguistics Volume 30, Number 1 metaphorical uses (e.g., Funding is pouring into basic proteomics research), but the literal sense is more common. In FINANCE, pour is mostly used metaphorically, although there are occasionally literal uses (e.g., Today oil poured into the new Turkmenistan pipeline). Algorithms 1-3 show pseudocode for finding metaphoric mappings between concepts. null Algorithm 1: Find Inter Concept Mappings(domain1, domain2) comment: Find mappings from concepts in domain1 to concepts in domain2 or vice versa Domain 1 Clusters - Get Best Clusters(domain1) Domain 2 Clusters - Get Best Clusters(domain2)</Paragraph> <Paragraph position="7"> if Polarity score > NOISE THRESHOLD then output mapping(Concept 1 - Concept 2) if Polarity score < [?]NOISE THRESHOLD then output mapping(Concept 2 - Concept 1) Algorithm 2: Detect Inter Concept Mapping(Concept 1, Concept 2, domain1, domain2) null polarity from 1 to 2 - Inter Concept Polarity(Concept 1, Concept 2, domain1, domain2) polarity from 2 to 1 - Inter Concept Polarity(Concept 2, Concept 1, domain2,</Paragraph> <Paragraph position="9"> if absolute value(polarity from 1 to 2 [?] polarity from 2 to 1) < C1 then return (0); if polarity from 1 to 2 > C2 and polarity from 2 to 1 > C2 then return (0); return (polarity from 1 to 2 [?] polarity from 2 to 1)</Paragraph> </Section> <Section position="8" start_page="31" end_page="31" type="sub_section"> <SectionTitle> Mason CorMet 2.5 Systematicity </SectionTitle> <Paragraph position="0"> According to the thematic-relation hypothesis (Grubner 1976), many domains are conceived of in terms of physical objects moving along paths between locations in space.</Paragraph> <Paragraph position="1"> In the money domain, assets are mapped to objects and asset holders are mapped to locations. In the idea domain, ideas are mapped to objects, minds are mapped to locations, and communications are mapped to paths. Axioms of inference from the target domain usually become available for reasoning about the source domain, unless there is an aspect of the source domain that specifically contradicts them. For instance, in the domain of material objects, a thing moved from point X to point Y is no longer at X, but in the idea domain, it exists at both locations.</Paragraph> <Paragraph position="2"> Thematically related metaphors may consistently co-occur in the same sentences.</Paragraph> <Paragraph position="3"> For example, the metaphors LIQUID - MONEY and CONTAINERS - INSTITUTIONS often co-occur, as in the sentence Capital flowed into the new company. Conversely, co-occurring metaphors are often components of a single metaphorical conceptualization.</Paragraph> <Paragraph position="4"> A metaphorical mapping is therefore more credible when it is a component of a system of mappings.</Paragraph> <Paragraph position="5"> In CorMet, systematicity measures a metaphorical mapping's tendency to co-occur with other mappings. The systematicity score for a mapping X is defined as the number of strong, distinct mappings co-occurring with X. This measure goes only a little way toward capturing the extent to which a metaphor exhibits the structure described in the thematic-relations hypothesis, but extending CorMet to find the entities that correspond to objects, locations, and paths is beyond the scope of this article.</Paragraph> </Section> <Section position="9" start_page="31" end_page="31" type="sub_section"> <SectionTitle> 2.6 Confidence Rating </SectionTitle> <Paragraph position="0"> CorMet computes a confidence measure for each metaphor it discovers. Confidence is a function of three things. The more verbs mediating a metaphor (as attack and assault mediate ENEMY - DISEASE in The antigen attacked the virus and Chemotherapy assaults the tumor), the more credible it is. Strongly unidirectional structure flow from source domain to target makes a mapping more credible. Finally, a mapping is more likely to be correct if it systematically co-occurs with other mappings. The confidence measure should not be interpreted as a probability of correctness: The data available for calibrating such a distribution are inadequate. The weights of each factor, empirically assigned plausible values, are given in Table 3.</Paragraph> <Paragraph position="1"> The confidence measure is intended to wrap all the available evidence about a metaphor's credibility into one number. A principled way of doing this is desirable, but unfortunately there are not enough data to make meaningful use of machine-learning techniques to find the best set of components and weights. There is substantial arbitrariness in the confidence rating: The components used and the weights they are assigned could easily be different and are best considered guesses that give reasonable results.</Paragraph> </Section> </Section> <Section position="4" start_page="31" end_page="37" type="metho"> <SectionTitle> 3. Two Examples </SectionTitle> <Paragraph position="0"> This section provides a walk-through of the derivation and analysis of the concept mapping LIQUID - MONEY and components of the interconcept mapping WAR -MEDICINE. In the interests of brevity only representative samples of CorMet's data are shown. See Mason (2002) for a more detailed account.</Paragraph> <Section position="1" start_page="31" end_page="35" type="sub_section"> <SectionTitle> 3.1 LIQUID - MONEY </SectionTitle> <Paragraph position="0"> CorMet's inputs are two domain sets of characteristic keywords for each domain (Table 4). The keywords must characterize a cluster in the space of Internet documents, but CorMet is relatively insensitive to the particular keywords.</Paragraph> <Paragraph position="1"> It is difficult to find keywords characterizing a cluster centering on money alone, so keywords for a more general domain, FINANCE, are provided. It is also difficult to characterize a cluster of documents mostly about liquids. Chemical-engineering articles and hydrographic encyclopedias tend to pertain to the highly technical aspects of liquids instead of their everyday behavior. Documents related to laboratory work are targeted on the theory that most references to liquids in a corpus dedicated to the manipulation and transformation of different states of matter are likely to be literal and will not necessarily be highly technical. Tables 5 and 6 show the top 20 characteristic verbs for LAB and FINANCE, respectively.</Paragraph> <Paragraph position="2"> CorMet finds the selectional preferences of all of the characteristic predicates' case slots. A sample of the selectional preferences of the top 20 verbs in LAB and FINANCE are shown in Tables 7 and 8, respectively. The leftmost columns of these two tables have the (stemmed form of the) characteristic verb and the thematic role characterized. The right-hand sides have clusters of characteristic nodes. The numbers associated with the nodes are the bits of uncertainty about the identity of a word x resolved by the fact that x fills the given case slot, or P(x - N)[?] P(x - N|case slot(x)) (where x - N is read as x is N or a hyponym of N).</Paragraph> <Paragraph position="3"> All of the 400 possible mappings between the top 20 concepts (clusters) from the two domains are examined. Each possible mapping is evaluated in terms of polarity, the number of frames instantiating the mapping, and the systematic co-occurrence of that mapping with different, highly salient mappings. The best mappings for LAB x FINANCE are shown in Table 9.</Paragraph> <Paragraph position="4"> Mappings are expressed in abbreviated form for clarity, with only the most recognizable (if not necessarily the most salient) node of each concept displayed. The foremost mapping characterizes money in terms of liquid, the mapping for which the two domains were selected. The second represents a somewhat less intuitive mapping from liquids to institutions. This metaphor is driven primarily by institutions' capacity</Paragraph> </Section> <Section position="2" start_page="35" end_page="36" type="sub_section"> <SectionTitle> Mason CorMet </SectionTitle> <Paragraph position="0"> to dissolve. Of course, this mapping is incorrect insofar as solids undergo dissolution, not liquids. CorMet made this mistake because of faulty thematic-role identification; it frequently failed to distinguish between the different thematic roles played by the subjects in sentences like The company dissolved and The acid dissolved the compound. The third mapping characterizes communication as a liquid. This was not the mapping the author had in mind when he chose the domains, but it is intuitively plausible: One speaks of information flowing as readily as of money flowing. That this mapping appears in a search not targeted to it reflects this metaphor's strength. It also illustrates a source of error in inferring the existence of conventional metaphors between domains from the existence of interconcept mappings. The fourth mapping is from containers to organizations. This mapping complements the first one: As liquids flow into containers, so money flows into organizations. Another good mapping, not present here, is money flows into equities and investments. CorMet misses this mapping because, at the level of concepts, money and equities are conflated. This happens because they are near relatives in the WordNet ontology and because there is very high overlap between the predicates selecting for them.</Paragraph> <Paragraph position="1"> Compare the mappings CorMet derived with the Master Metaphor List's (Lakoff, Espenson, and Schwartz 1991) characterization of the MONEY IS A LIQUID metaphor: 1. Cash is a Liquid.</Paragraph> <Paragraph position="2"> (a) liquid assets (b) currency (c) liquidating assets (d) My money is all dried up (e) He's just sponging off you (absorbing cash) (f) He's solvent/insolvent 2. Gain/Loss is Movement of a Liquid.</Paragraph> <Paragraph position="3"> (a) cash flow (b) influx and outflux of money (c) Don't pour your money down the drain 3. Money Which Cannot be Accessed is Frozen (a) frozen assets (b) price freeze 4. Control in Financial Situation is Control in Liquid (a) keep your head above water, financially (b) get in over your head (c) stay afloat (d) the business went under/sunk (e) drowning in debts The Master Metaphor List also describes INVESTMENTS ARE CONTAINERS FOR MONEY, as exemplified in the following: 1. Put your money in bonds.</Paragraph> <Paragraph position="4"> 2. The bottom of the economy dropped out.</Paragraph> <Paragraph position="5"> melt stocks 3. I'm down to my bottom dollar.</Paragraph> <Paragraph position="6"> 4. This is an airtight investment.</Paragraph> <Paragraph position="7"> CorMet has found mappings that can reasonably be construed as corresponding to these metaphors. Compare the mappings from the Master Metaphor List with frames mined by this system and identified as instantiating liquid-income, shown in Table 10. It is important to note that although CorMet can list the case frames that have driven the derivation of a particular high-level mapping, it is designed to discover high-level mappings, not interpret or even recognize particular instances of metaphorical language. Just as in the Master Metaphor List, there are frames in the CorMet listing in which money and equities are characterized as liquids, are moved as liquids (i.e., pouring earnings and pumping reserves) and change state as liquids (i.e., melting stocks, dissolving stakes, evaporating profits, frozen money).</Paragraph> </Section> <Section position="3" start_page="36" end_page="37" type="sub_section"> <SectionTitle> 3.2 MILITARY - MEDICINE </SectionTitle> <Paragraph position="0"> This subsection describes the search for mappings between the MEDICINE and MIL- null in Tables 12 and 13, respectively. Their selectional preferences are given in Tables 14 and 15, respectively.</Paragraph> <Paragraph position="1"> The highest-quality mappings between the MILITARY and MEDICINE domains are shown in Table 16. This pair of domains produces more mappings than the the LAB and FINANCE pair. Many source concepts from the MILITARY domain are mapped to body parts. The heterogeneity of the source concepts seems to be driven by the heterogeneity of possible military targets. Similarly, many source concepts are mapped to drugs. The case frames supporting this mapping suggest that this is because of</Paragraph> </Section> <Section position="4" start_page="37" end_page="37" type="sub_section"> <SectionTitle> Mason CorMet </SectionTitle> <Paragraph position="0"> the heterogeneity of military aggressors (fortifications do not generally fall into this category; this mapping is an error caused by the frame extractor's frequent confusion of subject and object). These mappings can be interpreted as indicating that things that are attacked map to body parts and things that attack map to drugs.</Paragraph> <Paragraph position="1"> The mapping fortification-illness represents the mapping of targetable strongholds to disease. Illnesses are conceived of as fortifications besieged by treatment.</Paragraph> <Paragraph position="2"> Compare this with the Master Metaphor List's characterization of TREATING ILL-</Paragraph> </Section> </Section> <Section position="5" start_page="37" end_page="40" type="metho"> <SectionTitle> NESS IS FIGHTING A WAR: </SectionTitle> <Paragraph position="0"> 1. The Disease is an Enemy.</Paragraph> <Paragraph position="1"> 2. The Body is a Battleground.</Paragraph> <Paragraph position="2"> (a) The body is not immune to invasion. (b) The disease infiltrates your body and takes over. 3. Infection is an Attack by the Disease.</Paragraph> <Paragraph position="3"> (a) His body was under siege by AIDS.</Paragraph> <Paragraph position="4"> (b) He was attacked by an unknown virus.</Paragraph> <Paragraph position="5"> (c) The virus began an attack on the organ systems. 4. Medicine is a Weapon.</Paragraph> <Paragraph position="6"> (a) The so-called cure is no magic bullet. 5. Medical Procedures are Attacks by the Patient. (a) The doctors tried to wipe out the infection. 6. The Immune System is a Defense.</Paragraph> <Paragraph position="7"> (a) The body normally has its own defenses. 7. Winning the War is being Cured of the Disease. (a) Beating measles takes patience.</Paragraph> <Paragraph position="8"> 8. Being Defeated is Dying.</Paragraph> <Paragraph position="9"> (a) The patient finally gave up the battle. destroy therapy tissue destroy cancer bone destroy virus liver destroy internist stomach target organ target vaccine intestines CorMet's results can reasonably be interpreted as matching all of the mappings from the Master Metaphor List except winning-is-a-cure and defeat-is-dying. CorMet's failure to find this mapping is caused by the fact that win, lose, and their synonyms do not have high salience in the MILITARY domain, which may be a reflection of the ubiquity of win and lose outside of that domain. Table 17 shows sample frames from which the body part -{fortification, vehicle, military action, region, skilled worker} mapping was derived. 4. Testing against the Master Metaphor List This section describes the evaluation of CorMet against a gold standard, specifically, by determining how many of the metaphors in a subset of the Master Metaphor List (Lakoff, Espenson, and Schwartz 1991) can be discovered by CorMet given a characterization of the relevant source and target domains. The final evaluation of the correspondence between the mappings CorMet discovers and the Master Metaphor List entry is necessarily done by hand. This is a highly subjective method of evaluation; a formal, objective evaluation of correctness would be preferable, but at present no such metric is available.</Paragraph> <Paragraph position="10"> The Master Metaphor List is the basis for evaluation because it is composed of manually verified metaphors common in English. The test set is restricted to those elements of the Master Metaphor List with concrete source and target domains. This requirement excludes many important conventional metaphors, such as EVENTS ARE ACTIONS. About a fifth of the Master Metaphor List meets this constraint. This fraction is surprisingly small: It turns out that the bulk of the Master Metaphor List consists of subtle refinements of a few highly abstract metaphors. The concept pairs and corresponding domain pairs for the target metaphors in the Master Metaphor List are given in Table 18.</Paragraph> <Paragraph position="11"> A mapping discovered by CorMet is considered correct if submappings specified in the Master Metaphor List are nearly all present with high salience and incorrect submappings are present with comparatively low salience. The mappings discovered that best represent the targeted metaphors are shown in Table 19. Some of these test cases are marked successes. For instance, ECONOMIC HARM</Paragraph> </Section> <Section position="6" start_page="40" end_page="42" type="metho"> <SectionTitle> IS PHYSICAL INJURY seems to be captured by the mapping from the loss-3 cluster to </SectionTitle> <Paragraph position="0"> the harm-1 cluster. CorMet found reasonable mappings in 10 of 13 cases attempted.</Paragraph> <Paragraph position="1"> This implies 77% accuracy, although in light of the small test and the subjectivity of judgment, this number must not be taken too seriously.</Paragraph> <Paragraph position="2"> Some test cases were disappointing. CorMet found no mapping between THE-ORY and ARCHITECTURE. This seems to be an artifact of the low-quality corpora obtained for these domains. The documents intended to be relevant to architecture were often about zoning or building policy, not the structure of buildings. For theory, many documents were calls for papers or about university department policy. It is unsurprising that there are no particular mappings between two sets of miscellaneous administrative and policy documents. The weakness of the ARCHITECTURE corpus also prevented CorMet from discovering any BODY - ARCHITECTURE mappings.</Paragraph> <Paragraph position="3"> Accuracy could be improved by refining the process by which domain-specific corpora are obtained to eliminate administrative documents or by requiring documents to have a higher density of domain-relevant terms.</Paragraph> <Paragraph position="4"> Is it meaningful when CorMet finds a mapping, or will it find a mapping between any pair of domains? To answer this question, CorMet was made to search for mappings between randomly selected pairs of domains. Table 20 lists a set of arbitrarily selected domain pairs and the strength of the polarization between them. In all cases, the polarization is zero. This can be interpreted as an encouraging lack of false positives. Another perspective is that CorMet should have found mappings between some of these pairs, such as MEDICINE and SOCIETY, on the theory that societies can be said to sicken, die, or heal. Although this is certainly a valid conventional metaphor, it seems to be less prominent than those metaphors that CorMet did discover.</Paragraph> </Section> class="xml-element"></Paper>