XML Viewer - w05-1007

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1007_metho.xml
Size: 23,917 bytes
Last Modified: 2025-10-06 14:09:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1007">
  <Title>Frame Semantic Enhancement of Lexical-Semantic Resources</Title>
  <Section position="3" start_page="57" end_page="58" type="metho">
    <SectionTitle>
2 Lexical-Semantic Resources
</SectionTitle>
    <Paragraph position="0"> Lexical-semantic resources, such as FrameNet and PropBank, which involve semantic frames and/or semantic roles, are one kind of resource that SemFrame's output can enhance. SemFrame could also benefit a resource like WordNet that captures different kinds of semantic relationships. Here we discuss characteristics of these resources that make them amenable to enhancement through SemFrame.</Paragraph>
    <Section position="1" start_page="57" end_page="57" type="sub_section">
      <SectionTitle>
2.1 FrameNet
</SectionTitle>
      <Paragraph position="0"> FrameNet documents the semantic and syntactic behavior of words with respect to frames. A frame characterizes a conventional conceptual structure, for instance, a situation involving risk, a hitting event, a commercial transaction. Lexical units are said to evoke a frame. For example, use of the literal sense of buy introduces into a discourse an expectation that some object or service (the Goods) passes from one person (the Seller) to another (the Buyer) in exchange for something of (presumably equivalent) value (typically Money).</Paragraph>
      <Paragraph position="1"> A significant contribution of the FrameNet project is the creation of frames, which involves the enumeration both of participant roles in the frame (a.k.a, frame elements, frame slots) and of lexical units that evoke the frame. As of May 2005, 657 frames have been defined in FrameNet; approximately 8600 lexical unit/frame associations have been made.</Paragraph>
      <Paragraph position="2"> FrameNet's approach to identifying frames is &amp;quot;opportunistic&amp;quot; and driven by the corpus data being annotated. Thus the FrameNet team does not expect to have a full inventory of frames until a substantial proportion of the general-purpose vocabulary of English has been analyzed. As the development of FrameNet is labor-intensive, supplementing FrameNet's frames and evoking lexical units using data from SemFrame would be beneficial.</Paragraph>
    </Section>
    <Section position="2" start_page="57" end_page="58" type="sub_section">
      <SectionTitle>
2.2 PropBank
</SectionTitle>
      <Paragraph position="0"> Like FrameNet, PropBank (Kingsbury et al., 2002) is a project aimed at semantic annotation, in this case of the Penn English Treebank.4 The intent of PropBank is to provide for &amp;quot;automatic extraction of relational data&amp;quot; on the basis of consistent labeling of predicate argument relationships. Typically the labels/semantic roles are verb-specific (but are often standardized across synonyms). For example, the set of semantic arguments for promise, pledge, etc. (its 'roleset') includes the promiser, the person promised to, and the promised thing or action. These correspond respectively to FrameNet's Speaker, Addressee, and Message elements within the Commitment frame.</Paragraph>
      <Paragraph position="1"> The more general labels used in FrameNet and SemFrame give evidence of a more systematic approach to semantic argument structure, more easily promoting the discovery of relationships among frames. It can be seen from the terminology used that PropBank is more focused on the individual arguments of the semantic argument structure, while FrameNet and SemFrame are more focused on the overall gestalt of the argument structure, that is, the frame. The use of FrameNet and SemFrame to suggest more generic (that is, frame-relevant) roleset labels would help move PropBank toward greater systematicity. null 4The semantic annotation tasks in the FrameNet and Prop-Bank projects enable them to link semantic roles and syntactic behavior. Enhancing and stabilizing its semantic frame inventory must precede the inclusion of such linkage in SemFrame.</Paragraph>
    </Section>
    <Section position="3" start_page="58" end_page="58" type="sub_section">
      <SectionTitle>
2.3 WordNet
</SectionTitle>
      <Paragraph position="0"> WordNet is a lexical database for English nouns, verbs, adjectives, and adverbs. Fine-grained sense distinctions are recognized and organized into synonym sets ('synsets'), WordNet's basic unit of analysis; each synset has a characterizing gloss, and most are exemplified through one or more phrases or sentences.</Paragraph>
      <Paragraph position="1"> In addition to the synonymy relationship at the heart of WordNet, other semantic relationships are referenced, including, among others, antonymy, hyponymy, troponymy, partonomy, entailment, and cause-to. On the basis of these relationships, Fellbaum (1998) noted that WordNet reflected the structure of frame semantics to a degree, but suggested that its organization by part of speech would preclude a full frame-semantic approach.</Paragraph>
      <Paragraph position="2"> With release 2.0, WordNet added morphological and topical category relationships that cross over part-of-speech boundaries. This development relates to incorporating a full frame-semantic approach in WordNet in two ways.</Paragraph>
      <Paragraph position="3"> First, since the lexical units that evoke a frame are not restricted to a single part of speech, the ability to create links between parts of speech is required in order to encode frame semantic relationships.</Paragraph>
      <Paragraph position="4"> Second, topical categories (e.g., slang, meat, navy, Arthurian legend, celestial body, historical linguistics, Mafia) have a kinship with semantic frames, but are not the same. While topical category domains map between categories and lexical items--as do semantic frames--it is often not clear what internal structure might be posited for a category domain. What, for example, would the participant structure of 'meat' look like? Should WordNet choose to adopt a full frame-semantic approach, FrameNet and SemFrame are natural starting points for identifying frame-semantic relationships between synsets. The most beneficial enhancement would involve WordNet's incorporating FrameNet and/or SemFrame frames as a separate resource, with a mapping between WordNet's synsets and the semantic frame inventory. SemFrame has the extra advantage that its lexical units are already identified as WordNet synsets.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="58" end_page="62" type="metho">
    <SectionTitle>
3 Development of SemFrame
</SectionTitle>
    <Paragraph position="0"> There are two main processing stages in producing SemFrame output: The first establishes verb classes, while the second generates semantic frames. The next two subsections describe these stages.</Paragraph>
    <Section position="1" start_page="58" end_page="60" type="sub_section">
      <SectionTitle>
3.1 Establishing Verb Classes
</SectionTitle>
      <Paragraph position="0"> SemFrame adopts a multistep approach to identifying sets of frame-semantically related verb senses.</Paragraph>
      <Paragraph position="1"> The basic steps involved in the current version5 of  SemFrame are: 1. Building a graph with WordNet verb synsets as vertices and semantic relationships as edges 2. Identifying for each vertex a maximal highly connected component (HCC) (i.e., a highly interconnected subgraph that the vertex is part of) 3. Eliminating HCC's with undesirable qualities 4. Forming preliminary verb semantic classes by supplementing HCC's with reliable semantic relationships 5. Merging verb semantic classes with a high degree of overlap Building the Relationships Graph  WordNet 2.0 includes a vast array of semantic relationships between synsets of the same part of speech and has now been enhanced with relationships linking synsets of different parts of speech. Some of these relationships are almost guaranteed to link synsets that evoke the same frame, while others operate within the bounds of a semantic frame on some occasions, but not others. Among the relationship types in WordNet most fruitful for identifying verb synsets within the same frame semantic verb class are: synonymy (e.g., buy, purchase, as collocated within synsets), antonymy (e.g., buy, 5The process of establishing verb classes has been redesigned. All that has been carried over from the previous/initial version of SemFrame is the use of some of the same WordNet relationships. New in the current version are: the use of relationship types first implemented in WordNet 2.0, the predominant and exclusive use of WordNet as the source of data (the previous version used WordNet as a source secondary to the Longman Dictionary of Contemporary English), and modeling the identification of classes of related verbs as a graph, specifically through the use of highly connected components.  sell), cause-to (e.g., transfer, change hands), entailment (e.g., buy, pay), verb group (e.g., different commercial senses of buy, morphological derivation (e.g., buy, buyer),6 and &amp;quot;see also&amp;quot; (e.g., buy, buy out). Instances of these relationship types for all verb synsets in WordNet 2.0 are represented as edges within the graph.</Paragraph>
      <Paragraph position="2"> Additional edges are inserted between any two synsets/vertices related by two or more of the following: clustering of synsets based on the occurrence of word stems in their glosses and example sentences;7 hyperonymy/hyponymy relationships; and category domain relationships. These three relationship types are too noisy to be used on their own for identifying frame semantic relationships among synsets, but when a relationship is verified by two or more of these relationships, the likelihood that the related synsets evoke the same frame is considerably higher. Table 1 summarizes the number of edges in the graph supported by each relationship type.</Paragraph>
      <Paragraph position="3">  (HCC's) Step 1 constructs a graph interconnecting thousands of WordNet verb synsets. Identifying sets of verb synsets likely to evoke the same semantic frame requires identifying subgraphs with a high degree of interconnectivity. Empirical investigation has 6SemFrame relates verb synsets with a morphological derivation relationship to a common noun synset. This includes verbs related to different members of the shared noun synset. 7Voorhees' (1986) hierarchical agglomerative clustering algorithm was implemented.</Paragraph>
      <Paragraph position="4">  shown that &amp;quot;highly connected components&amp;quot; (Hartuv and Shamir, 2000)--induced subgraphs of size k in which every vertex's connectivity exceeds k2 vertices--identify such sets of verb synsets.8 For example, in a 5-vertex highly connected component, each vertex is related to at least 3 other vertices. Figure 1 shows a portion of the original graph in which relationship arcs constituting an HCC are given as solid lines, while those that fail the interconnectivity threshold are given as dotted lines.</Paragraph>
      <Paragraph position="5"> Given an undirected graph, the Hartuv-Shamir algorithm for identifying HCC's returns zero or more non-overlapping subgraphs (including zero or more singleton vertices). But it is inaccurate to assume that verb synsets evoke only a single frame, as is suggested by non-overlapping subgraphs.9 For this reason, we have modified the Hartuv-Shamir algorithm to identify a maximal HCC, if one exists, for (i.e., that includes) each vertex of the graph. This modification reduces the effort involved in identifying any single HCC: Since the diameter of a HCC is no greater than two, only those vertices who are neighbors of the source vertex, or neighbors of those neighbors, need to be examined.</Paragraph>
      <Paragraph position="6"> 8The algorithm for computing HCC's first finds the minimum cut for a (sub)graph. If the graph meets the highly connected component criterion, the graph is returned, else the algorithm is called recursively on each of the subgraphs created by the cut. The Stoer-Wagner (1997) algorithm has been implemented for finding the minimum cut.</Paragraph>
      <Paragraph position="7"> 9Semantic frames can be defined at varying levels of generality; thus, a given synset may evoke a set of hierarchically related frames. Words/Synsets may also evoke multiple, unrelated frames simultaneously; criticize, for example, evokes both a Judging frame and a Communication frame.</Paragraph>
    </Section>
    <Section position="2" start_page="60" end_page="60" type="sub_section">
      <SectionTitle>
Eliminating Duplicates
</SectionTitle>
      <Paragraph position="0"> Because HCC's were generated for each vertex in the relationships graph, considerable duplication and overlap existed in the output. The output of step 2 was cleaned up using three filters. First, duplicate HCC's were eliminated. Second, any HCC wholly included within another HCC was deleted.10 Third, any HCC based only on morphological derivation relationships was deleted. In SemFrame, all verb synsets morphologically derived from the same noun synset were related to each other. Thus all verb synsets derived from a common noun synset are guaranteed to generate an HCC. If only such relationships support an HCC, the likelihood that all of the interrelated verb synsets evoke the same semantic frame is much lower than if other types of relationships also provide evidence for their interrelationship.</Paragraph>
      <Paragraph position="1"> Supplementing HCC's The HCC's generated in step 2 that survived the filters implemented in step 3 form the basis of verb framesets, that is, sets of verb senses that evoke the same semantic frame. Specifically, all the synsets represented by vertices in a single HCC form a frameset.</Paragraph>
      <Paragraph position="2"> The connectivity threshold imposed by HCC's helps maintain reasonably high precision of the resulting framesets, but is too strict for high recall. Some types of relationships known to operate within frame-semantic boundaries generally do not survive the connectivity threshold cutoff. For example, for frames of a certain level of generality, if a specific verb evokes that frame, it is also the case that its antonym evokes the frame, as antonyms operate against the backdrop of the same situational context; that is, they share participant structure.11 However, since antonymy is (only) a lexical relationship between two word senses, A and B, the tight coupling of A and B is unlikely to be reflected in A's being directly related to other synsets that are related to B and vice-versa. Thus, antonyms are un10Given the interest in generating semantic frames of varying levels of generality, this filter may itself be eliminated in the future.</Paragraph>
      <Paragraph position="3"> 11Identifying antonyms is especially helpful in the case of conversives, as with buy and sell; the inclusion of both in the frameset promotes discovery of all relevant frame participants, in this case, both buyer and seller.</Paragraph>
      <Paragraph position="4"> likely to be highly connected through WordNet to other words/synsets that evoke the frame and thus fail the HCC connectivity threshold. The same argument can be made for causatively related verbs.</Paragraph>
      <Paragraph position="5"> A post-processing step was required therefore to add to a frameset any verb synsets related through WordNet's antonymy or cause-to relationships to a member of the frameset. Similarly, any verb synset entailed by a member of a verb frameset was added to the frameset.</Paragraph>
      <Paragraph position="6"> Other verb synsets fail to survive the connectivity threshold cutoff because they enter into few relationships of any kind. If a verb synset is related to only one other verb synset, the assumption is made that it evokes the same frame as that one other synset; it is then added to the corresponding frameset.</Paragraph>
      <Paragraph position="7"> Lastly, if a synset is related to two or more members of a frameset, the likelihood that it evokes the same semantic frame is reasonably high. Such verb synsets were added to the frameset if not already present.</Paragraph>
      <Paragraph position="8"> At the end of this phase, any framesets wholly included within another frameset were again deleted.</Paragraph>
    </Section>
    <Section position="3" start_page="60" end_page="60" type="sub_section">
      <SectionTitle>
Merging Overlapping Verb Classes
</SectionTitle>
      <Paragraph position="0"> The preceding processes produced many framesets with a significant degree of overlap. For any two framesets, if at least half of the verb synsets in both framesets were also members of the other, the two framesets were merged into a single frameset.</Paragraph>
      <Paragraph position="1"> Summary of Stage 1 Results The above steps generated 1434 framesets, varying in size from 2 to 25 synsets (see Table 2). Small framesets dominate the results, with over 60% of the framesets including only 2 or 3 synsets.</Paragraph>
      <Paragraph position="2"> Representative examples of these framesets are given in Appendix A, where members of each synset appear in parentheses, followed by the synset's gloss. (Examples are ordered by frameset size.) Smaller and medium-sized framesets generally enjoy high precision, but many of the largest framesets would be better split into two or more framesets.</Paragraph>
    </Section>
    <Section position="4" start_page="60" end_page="61" type="sub_section">
      <SectionTitle>
3.2 Generating Semantic Frames
</SectionTitle>
      <Paragraph position="0"> Generating frames from verb framesets relies on the insight that the semantic arguments of a frame are largely drawn from nouns associated with verb  synsets in the frameset. In SemFrame's processing, these include nouns in the gloss of a verb synset or in the gloss of its corresponding LDOCE verb sense(s), as well as nouns (that is, noun synsets) to which a verb synset is morphologically related and those naming the category domain to which a verb synset belongs. In the latter two cases, the nouns come disambiguated within WordNet, but nouns from glosses must undergo disambiguation. The set of noun senses associated with a verb frameset is then analyzed against the WordNet noun hierarchy, using an adaptation of Agirre and Rigau's (1995) conceptual density measure. This analysis identifies a frame name and a set of frame participants, all of which correspond to nodes in the WordNet noun hierarchy.</Paragraph>
      <Paragraph position="1"> Disambiguating Nouns from Glosses First we consider how nouns from WordNet and LDOCE verb glosses are disambiguated.12 This step involves looking for matches between the stems of words in the glosses of WordNet noun synsets that include the noun needing to be disambiguated, on the one hand, and the stems of words in the glosses of all WordNet verb synsets (and corresponding LDOCE verb senses) in the frameset, on the other hand.</Paragraph>
      <Paragraph position="2"> A similarity score is computed by dividing the match count by the number of non-stop-word stems in the senses under consideration. SemFrame favors predominant senses by examining word senses in frequency order. Any sense with a non-zero similarity score that is the highest score yet seen is chosen as an appropriate word sense.</Paragraph>
      <Paragraph position="3"> The various nodes within WordNet's noun net12Identification of LDOCE verb senses that correspond to WordNet verb synsets is carried out using a similar strategy. work that correspond to a verb frameset--either through morphological derivation or category domain relationships in WordNet or through the disambiguation of nouns from the glosses of verbs in the frameset--constitute 'evidence synsets' for the participant structure of the corresponding semantic frame and form the input for the conceptual density calculation.</Paragraph>
      <Paragraph position="4"> In preparation for use in calculating conceptual density, evidence synsets are given weights that take into account the source and basis of the disambiguation. In the current implementation, noun synsets related to the frameset through morphological derivation or shared category domain are given a weight of 4.0 (the nouns are guaranteed to be related to the verbs, and disambiguation of the nouns is built into the fact that relationships are given between synsets); disambiguated noun synsets coming from WordNet verb synsets receive a weight of 2.0 (since the original framesets contain WordNet synsets, and the disambiguation strategy is fairly conservative); non-disambiguated nouns coming from LDOCE verbs related to the frameset have a weight of 0.5 (LDOCE verbs are a step removed from the original framesets, and the nouns have not been disambiguated); all other nouns receive a weight of 1.0. The weight for non-disambiguated nouns is ultimately distributed across the noun's senses, with higher proportions of the weight being assigned to more frequent senses.</Paragraph>
    </Section>
    <Section position="5" start_page="61" end_page="62" type="sub_section">
      <SectionTitle>
Computing Conceptual Density
</SectionTitle>
      <Paragraph position="0"> The overall idea behind transforming the list of evidence synsets into a list of participants involves using the relationship structure of WordNet to identify an appropriately small set of concepts (i.e., synsets) within WordNet that account for (i.e., are superordinate to) as many of the evidence synsets as possible; such synsets will be referred to as 'covering synsets'.</Paragraph>
      <Paragraph position="1"> This task relies on the hypothesis that a frame's evidence synsets will not be randomly distributed across WordNet, but will be clustered in various sub-trees within the hierarchy. Intuitively, when evidence synsets cluster together, the subtrees in which they occur will be more dense than those subtrees where few or no evidence synsets occur. It is hypothesized that the WordNet subtrees with the high- null est density are the most likely to correspond to frame slots. Thus, the task is to identify such clusters/subtrees and then to designate the nodes at the roots of the subtrees as covering synsets (subject to certain constraints).</Paragraph>
      <Paragraph position="2"> The conceptual density measure we have used has been inspired by the measure of the same name in Agirre and Rigau (1995). The conceptual density, CD(n), of a node n is computed as follows:</Paragraph>
      <Paragraph position="4"> Both frame names and frame slots are identified on the basis of this conceptual density measure, with the frame name being taken from the node with the highest conceptual density from a specified group of subnetworks within the WordNet noun network (including abstractions, actions, events, phenomena, psychological features, and states). Frame slots are subject to a density threshold (based on mean density and variance), an evidence-synset-support threshold, and a constraint on the number of possible slots to be taken from specific subnetworks within WordNet. Further details on the computation and interpretation of conceptual density are given in (Green and Dorr, 2004).</Paragraph>
      <Paragraph position="5"> Frame names and frame structures for the framesets in Appendix A are given in Appendix B. The full set of SemFrame's frames (including ca. 30,000 lexical unit/frame associations) is publicly available at: http://www.cs.umd.edu/~rgreen/semframe2.tar.gz.</Paragraph>
      <Paragraph position="6"> The correspondence between frameset sizes and the number of slots generated for the frame is worth noting, since we have independent evidence about the number of slots that should be generated. Frames in FrameNet generally have from 1 to 5 slots (occasionally more). Over 70% of SemFrame's frames contain from 1 to 5 frame slots. Of course, generating an appropriate number of frame slots is not the same as generating the right frame slots, a determination that requires empirical investigation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML