File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3506_metho.xml
Size: 8,904 bytes
Last Modified: 2025-10-06 14:11:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3506"> <Title>Catching Metaphors</Title> <Section position="4" start_page="41" end_page="42" type="metho"> <SectionTitle> 3 Objective </SectionTitle> <Paragraph position="0"> While this work in Cognitive Semantics is suggestive, without a corpus-based analysis, it is hard to accuratelyestimatetheimportanceofmetaphoricinformation for Natural Language Processing (NLP) tasks such as Question Answering or Information Distillation. Our work is a first step to remedy this situation. We start with our computational definition of metaphor as a mapping from concrete to abstract domains. We then investigate the Wall Street Journal (WSJ) corpus, selecting a subset of its verbal targets and labeling them as either metaphoric or literal. While we had anticipated the pervasiveness of metaphor, we could not anticipate just how pervasive with over 90% of the labeled data being metaphoric.</Paragraph> <Paragraph position="1"> Provided with labeled training data, our task is to automatically classify the verbal targets of unseen utterances as either metaphoric or literal. Motivated by the intuition that the types of a target's arguments are important for making this determination, we extracted information about the arguments from the PropBank (Kingsbury et al., 2002) annotation for each sentence, using WordNet (Fellbaum, 1998) as the type hierarchy.</Paragraph> <Section position="1" start_page="41" end_page="42" type="sub_section"> <SectionTitle> 3.1 Using Verbal Arguments </SectionTitle> <Paragraph position="0"> A metaphor is a structured mapping between the rolesoftwoframesthatmakesitpossibletodescribe a (usually) more abstract concept in terms of a more concrete one (Lakoff and Johnson, 1980). The more abstract concept is referred to as the target domain while the more concrete concept is referred to as the 1. MET : Texas Air has {run} into difficulty...</Paragraph> <Paragraph position="1"> 2. LIT : &quot;I was doing the laundry and nearly broke my neck {running} upstairs to see ...</Paragraph> <Paragraph position="2"> MET indicates a metaphoric use of the target verb and LIT indicates a literal use.</Paragraph> <Paragraph position="3"> source domain. More precisely, the metaphor maps roles of the target frame onto the source frame. Figure 1 shows some example sentences with a particular verbal target run in curly braces. Example 1 is a metaphoric usage (marked by MET) of run where the destination role is filled by the state of difficulty. Example 2 is a literal usage (marked by LIT) of run.</Paragraph> <Paragraph position="4"> The arguments of a verb are an important factor for determining whether that verb is being used metaphorically. If they come from the source domain frame, then the likelihood is high that the verb is being used literally. In the example literal sentence from Figure 1, the theme is a person, which is a physical object and thus part of the source domain. If, on the other hand, the arguments come from the target domain, then it is likely that the verb is being used metaphorically. Consider the metaphorical run from Figure 1. In that case, both the theme andthegoaloftheactionarefromthetargetdomain.</Paragraph> <Paragraph position="5"> Thus any approach that tries to classify sentences as literal or metaphoric must somehow incorporate information about verbal arguments.</Paragraph> </Section> </Section> <Section position="5" start_page="42" end_page="43" type="metho"> <SectionTitle> 4 Data </SectionTitle> <Paragraph position="0"> Because no available corpus is labeled for the metaphoric/literal distinction, we labeled a subset of the WSJ corpus for our experiments. To focus the task, we concentrated on motion-related frames that act as the source domain for the Event Structure Metaphor and some additional non-motion based frames including Cure and Placing. Figure 2 shows the selected frames along with example lexical units from each frame.</Paragraph> <Paragraph position="1"> To identify relevant sentences we first obtained from FrameNet a list of lexical units that evoke the selected source frames. Since WSJ is labeled with PropBank word senses, we then had to determine which PropBank senses correspond to these Frame Example LUs Motion float, glide, go, soar senses are shown in Figure 3.</Paragraph> <Paragraph position="2"> As anyone who has inspected both PropBank and FrameNet can attest, these two important lexical resources have chosen different ways to describe verbal senses and thus in many cases, determining which PropBank sense corresponds to a particular FrameNet sense is not a straightforward process. Verbs like slide have a single PropBank sense used to describe both the slid in The book slid off the table and the slid in I slid the book off the table. While FrameNet puts slide both in the Motion frame and in the Cause-motion frame, PropBank uses the argument labeling to distinguish these two senses. Periodically, PropBank has two senses, one for the literal interpretation and one for the metaphoric interpretation, where FrameNet uses a single sense. metaphoric or literal, broken down by frame.</Paragraph> <Paragraph position="3"> Because we intended to classify both literal and metaphoric language, both PropBank senses of hobble were included. However most verbs do not have distinct literal and metaphoric senses in PropBank. The final step in obtaining the relevant portion of theWSJcorpusistousethelistsofPropBanksenses that corresponding to the FrameNet frames and extract sentences with these targets. Because the Prop-Bank annotations label which PropBank sense is being annotated, this process is straightforward. Having obtained the WSJ sentences with items thatevoketheselectedsourceframes, welabeledthe data using a three-way split: For our experiments, we concentrated only on those cases where the label was MET or LIT and ignored the unclear cases.</Paragraph> <Paragraph position="4"> As is shown in Figure 4, the WSJ data is heavily weighted towards metaphor over all the frames that we annotated. This tremendous bias towards metaphoric usage of motion/cause-motion lexical items shows just how prevalent the Event Structure Metaphor is, especially in the domain of economics where it is used to describe market fluctuations and policy decisions.</Paragraph> <Paragraph position="5"> Figure 5 shows the breakdown for each lexical item in the Cure frame. Note that most of the frequently occurring verbs are strongly biased towards either a literal or metaphoric usage. Ease, for example, in all 81 of its uses describes the easing of an</Paragraph> </Section> <Section position="6" start_page="43" end_page="44" type="metho"> <SectionTitle> 5 The Approach </SectionTitle> <Paragraph position="0"> As has been discussed in this paper, there are at least two factors that are useful in determining whether the verbal target of an utterance is being used metaphorically: 1. The bias of the verb 2. The arguments of the verbal target in that utterance null To determine whether the arguments suggest a metaphoric or a literal interpretation, the system needs access to information about which constituents of the utterance correspond to the arguments of the verbal target. The PropBank annotations fill this role in our system. For each utterance that is used for training or needs to be classified, the gold standard PropBank annotation is used to determine the verbal target's arguments.</Paragraph> <Paragraph position="1"> For every verbal target in question, we used the following method to extract the types of its arguments: null 1. Used PropBank to extract the target's arguments. null 2. For each argument, we extracted its head using rules closely based on (Collins, 1999).</Paragraph> <Paragraph position="2"> The drug is being used primarily to {treat} anemias. 3. If the head is a pronoun, use the pronoun type (without coreference resolution) as the type of the argument.</Paragraph> <Paragraph position="3"> 4. If the head is a named entity, use the Identifinder tag as the type of the argument (BBN Identifinder, 2004).</Paragraph> <Paragraph position="4"> 5. If neither, use the name of the head's WordNet synset as the type of the argument.</Paragraph> <Paragraph position="5"> Consider the sentence The drug is being used primarily to {treat} anemias. The PropBank annotation of this sentence marks the drug as ARG3 and anemias as ARG2. We turned this information into features for the classifier as shown in Figure 6. The verb feature is intended to capture the bias of the verb. The ARGX TYPE feature captures the typeoftheargumentsdirectly. Tomeasurethetradeoffs between various combinations of features, we randomly partitioned the data set into a training set (65% of the data), a validation set (15% of the data), and a test set (20% of the data).</Paragraph> </Section> class="xml-element"></Paper>