File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0210_metho.xml
Size: 21,574 bytes
Last Modified: 2025-10-06 14:14:37
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0210"> <Title>m m Investigating Complementary Methods for Verb Sense Pruning</Title> <Section position="4" start_page="0" end_page="59" type="metho"> <SectionTitle> 2 Exploiting domain-independent </SectionTitle> <Paragraph position="0"> syntactic clues A given word may have n distinct senses and appear within m different syntactic contexts, but typically, not all n x m combinations are valid. The syntactic context can partly disambiguate the semantic content. For example, when the verb question has a that-clause complement, it cannot have the sense of &quot;ask&quot;, but rather must have the sense of &quot;challenge&quot;. To identify such interacting syntactic and semantic constraints at the lexical level, we utilize three knowledge bases for verbs: * The COMLEX database (Grishman et al., 1994; Macleod and Grishman, 1995), which includes detailed subcategorization information for each verb, and some adjectives and nouns.</Paragraph> <Paragraph position="2"> * Levin's classification of verbs in terms of their allowed alternations (Levin, 1993). Alternations include syntactic transformations such as there-insertion (e.g., A ship appeared on the horizon ---, There appeared a ship on the horizon) and locative-inversion (e.g., --* On the horizon there appeared a ship). Much in the same way as subcategorization frames, alternations are constrained by the sense of the word; for example, the verb appear allows there-insertion and locative-inversion in its senses of &quot;come into being&quot; or &quot;become visible&quot;, but not in its senses of &quot;come out&quot; or &quot;participate in a play&quot;.</Paragraph> <Paragraph position="3"> * WordNet's (Miller et al., 1990) hierarchical semantic classification. WordNet supplies links between semantically related senses as encoded in synonym sets (synsets). Though many words are polysemous, Miller et al. (1990) argue that a set of synonymous or nearly synonymous words can serve to identify the single lexical concept they have in common. It also supplies limited subcategorization information, in the form of allowed sentential frames (&quot;verb frames&quot;) for each sense. WordNet contains the needed information on permissible combinations of syntactic context and semantic content, but its subcategorization information is limited. Thirty-five different subcategorization frames are used for all verbs in WordNet, and the frames supplied are partial. COMLEX provides more detailed specifications of the syntactic frames for each verb (92 distinct subcategorization types). The allowed alternations (which we encoded in machine-readable form from the detailed rules supplied in (Levin, 1993)) provide additional constraints. Mapping the more precise syntactic information in COMLEX to the verb frames of WordNet allows the construction of a more detailed syntactic entry for each word sense, and enables the association of alternation constraints with the senses in WordNet. In the future, it will also allow us to use corpora tagged with COMLEX subcategorization frames, e.g., (Macleod et al., 1996).</Paragraph> <Paragraph position="4"> We have manually constructed a table that maps WordNet syntactic constraints to the ones used in COMLEX (and vice versa) and another that maps allowed alternations from (Levin, 1993) to COM-LEX or WordNet syntactic frames. A program consuits the three databases and the mapping tables and, for each word occurrence constructs a list of the senses that are compatible with the syntactic constraints. During this process, a detailed entry for the word is formed, containing both syntactic and semantic information. The resulting entries comprise a rich lexical resource that we plan to use for text generation and other applications (Jing et al., 1997).</Paragraph> <Paragraph position="5"> For a specific example, consider the verb appear.</Paragraph> <Paragraph position="6"> The pertinent information in the three databases for this word is listed in parts (a)-(c) of Figure 1. For example, one of the subcategorization frames of appear in part (a), aDJP-PRKD-R$, indicates a predicate adjective with subject raising, as in He appeared confused. Part (b) of Figure 1 lists no alternations that are applicable to this subcategorization frame, while part (c) shows only two Word-Net synsets where appear takes an adjectival complement, senses $1 and $8. The complex entry of Figure 2 is produced automatically from these three types of lexical information. The resulting syntax-semantics restriction matrix for appear is shown in Table 1. When appear is encountered in a particular syntactic structure, the program consults the</Paragraph> <Paragraph position="8"> for the verb appear.</Paragraph> <Paragraph position="9"> restriction matrix to eliminate senses that can be excluded. In the case of appear, only 47 cells of the 8 x 23 matrix represent possible combinations of syntactic patterns with senses, corresponding to a 74.5% reduction in ambiguity.</Paragraph> <Paragraph position="10"> Due to incompatibilities between the COMLEX and WordNet representations of syntactic information, and the differences in coverage, the process of linking the information sources can in some cases * result in relatively underspecified rows of a restriction matrix, or to spurious cells. For example, the frame ADVP-PRED-RS in Table I occurs in COMLEX but does not correspond to any of the more general frames mentioned in WordNet. Rather than having no appropriate senses for this syntactic pattern, we map it to WordNet's verb frames &quot;Something s Adjective/Noun&quot; and &quot;Somebody s Adjective&quot; by analyzing experiment results regrsssively.</Paragraph> <Paragraph position="11"> On the other hand, the entry for $2 in the PP-TO-IIIF-RS frame for appear represents a spurious entry: appear does not occur in the $2 meaning of &quot;become visible&quot; with a to-prepositional phrase and a subject-controlled infinitive. In a sentence with this syntactic structure, such as '~fhe river appeared to the residents to be rising too rapidly&quot;, appear can take only senses $1 and $6 for animate subjects and senses $3 and $7 for inanimate subjects. Yet the cell for $2 x PP-T0-IIIF-RS is generated in our matrix because of the overly general specification of verb frames in WordNet. We have chosen to risk overgeneration in these cases at present, rather than accidentally eliminating a valid sense. Eliminating spurious cells by hand would be time-consuming and error-prone, but the automatic classification method we report in the next section may help prune them. Also, as reported elsewhere (Jing et al., 1997), we are extending our lexical resource with annotations of frequency information for each sense-subcategorization pair, derived from sense-tagged corpus data. As data is accumulated, zero frequency could be taken to represent less valid usages.</Paragraph> <Paragraph position="12"> We have performed preliminary evaluation tests of our method for tagging verb occurrences with pruned word sense tags using the Brown corpus. The first step of the method is to identify the subcategorization pattern for a specific verb token. Here we rely on heuristics to identify the major constituents to the left and right of a verb token, as described in (Jing et al., 1997). After hypothesizing the sub-categorization pattern for a specific verb token, we use our sense restriction matrices (as in Table 1) to tag the verb token with a pruned set of senses.</Paragraph> <Paragraph position="13"> We evaluate the resulting sense tag against the version of the Brown corpus that has been hand-tagged with WordNet senses (Miller et al., 1993). For appear, which we use as an example throughout this paper, we find 100 tokens in the Brown corpus. Of these, 46 are intransitive or have a locative prepositional phrase complement. Our method tags each of these tokens with two or three possible senses, and in all but one case, the sense tag includes the valid sense. Another 31 tokens are followed by to and a subject-controlled infinitive. In all these cases, our method makes a single, correct prediction out of the eight possible senses. For all 100 uses of appear in the corpus, the average number of possible senses predicted by our method is 1.99. We find a 75-76% reduction of possible senses (depending on whether we use the additional something~somebody selectional constraints), with only 2-3% of the tags being incorrect.</Paragraph> <Paragraph position="14"> For the 5,676 verbs present in all three databases, the average reduction in ambiguity was 36.82% for words with two to four senses, 59.36% for words with five to ten senses, and 73.86% for words with more than ten senses; the overall average for all polysemous words was 47.91%. Figure 3 is a bar chart showing, for each number of senses from 1 to 41, how many verbs with that number of senses occur</Paragraph> <Paragraph position="16"> of senses. Low frequencies are not drawn to scale; rather, the presence of a bar for a category corresponding to more than 10 senses indicates that at least one verb falls in that category.</Paragraph> <Paragraph position="17"> in our databases. The most polysemous verb in our databases, run, is identified as having 41 senses.</Paragraph> <Paragraph position="18"> About half the verbs have more than one sense, and 20% have more than two. Our method performs better on the more polysemous words, which axe the most difficult to prune. This increased difficulty applies even to statistical methods because of the large number of alternatives and the likely closeness in meaning among them. Selecting a subset of almost synonymous verb senses is significantly harder than, for example, disambiguating bank between the &quot;edge of river&quot; and '~financial institution&quot; senses.</Paragraph> </Section> <Section position="5" start_page="59" end_page="62" type="metho"> <SectionTitle> 3 Using domain-dependent semantic </SectionTitle> <Paragraph position="0"> classifications to identify predominant senses The process outlined above has two significant advantages: first, it can be automatically applied, assuming a robust method for parsing the relevant verb phrase context (the experiments presented in (Pustejovsky et al., 1993) depend on the same type of information). Second, it reduces the ambiguity of a given word without sacrificing accuracy, insofar as the three input knowledge sources are accurate. To further restrict the size of the set of valid senses produced, we are currently exploring domaindependent, automatically constructed semantic classifications. null Semantic classification programs (Brown et al., 1992; Hatzivassiloglou and McKeown, 1993; Pereira et al., 1993) use statistical information based on co-occurrence with appropriate marker words to partition a set of words into semantic groups or classes. For example, using head nouns that occur with premodifying adjectives as one type of marker word, the adjective set {blue, cold, green, hot, red} can be partitioned into the subsets (l~r, ical fields (Lehrer, 1974)) {blue, green, red} and .{cold, hot}. Automatic classification programs can achieve high performance, near that of humans on the same task, when supplied with enough da~a and with appropriate syntactic constraints (see (Hatzivassiloglou, 1996) for a detailed evaluation). However, given that each word must be assigned to one class independently of context, 1 the problem of ambiguity is &quot;solved&quot; by placing each word in the class where it fits best; that is, in the class dictated by the predominant sense of the word in the training text.</Paragraph> <Paragraph position="1"> While this might be a limitation of partitioning methods for lexicographical purposes, it offers an advantage for our task. By an indirect route, it allows the automatic identification of the predominant sense of a word in a given text or subject topic. It is indirect because the actual result is groups of word forms, but we presume each group to represent a relatively homogeneous semantic class. Thus we presume that the relevant sense of a given word form in a group is in the same lexical field as the senses of the other word forms in the same group. The process is highly domain-dependent, i.e., the same set of words will be partitioned in different ways when the domain changes. For example, when our word grouping system (Hatzivassiloglou and McKeown, 1993) classified about 280 frequent adjectives in stock market reports, it formed, among others, the cluster {common, preferred}. This cluster would look odd were not the domain considered. ~ This information on predominant senses for each word form in a given corpus can be computed automatically, but remains implicit. To map the results onto word sense associations, and thus explicitly identify the predominant senses, we utilize the links between senses provided by WordNet. We note that while words like question and ask are ultimately connected in WordNet, the actual connections are only between some of the senses of the two words.</Paragraph> <Paragraph position="2"> Similarly, the words question and dispute are also connected, but through a different subset of senses.</Paragraph> <Paragraph position="3"> Thus, if the automatically induced semantic classification indicates that the predominant sense of question is associated with dispute rather than with ask (by placing question and dispute but not ask in the same group), we can infer which of the WordNet senses of question is the predominant one in this domain. The algorithm involves the following steps: aSome systems produce &quot;soft s clusters, where words can belong into more than one group. These can be converted to non-overlapping groups for the purposes of this discussion by assigning each word to the group for which it has the highest membership coefficient.</Paragraph> <Paragraph position="4"> 2In this domain, the two adjectives are complementaxies, describing the two types of issued stock shares. * Construct the domain-dependent word classification. null * For each word z, let Y - {YI,Y2,...} be the set of other words placed in the same semantic group with z.</Paragraph> <Paragraph position="5"> * For each I~ 6 Y, traverse the WordNet hierarchy and locate the (set of) senses of z, Si, that are connected with some sense of ~. The distance and the types of links that can be traversed while still considering two senses &quot;related&quot; can be heuristically determined; alternatively, we can use a measure of semantic distance such as those proposed in (Resnik, 1995) or (Passonneau et al., 1996).</Paragraph> <Paragraph position="6"> * Finally, the union of the sets S~ contains the predominant sense of x. While in the general case it is possible to have multiple links between word forms (corresponding to different sense pairings), typically each Si will contain only one sense, and their union will contain a few elements. This set ~ can be further reduced, e.g., by giving more weight to senses supported by more than one of the ~'s or by unambiguous Y~'s.</Paragraph> <Paragraph position="7"> For a concrete example, consider the verb ques- tion, which can have, among others, the senses of dispute (sense 1 in WordNet) or inquire (sense 3 in WordNet). If we consider a sense as linked with one of the senses of question if it is in the maximal subtree which includes that sense but no other senses of question, we find the following links between question and the verbs ask, inquire, chal.</Paragraph> <Paragraph position="8"> lenge, and dispute: (question1, asks), (questiou~, asks), (questions, asks), (questions, inquire~), and (question1, challenge~). Thus, if question is placed in the same semantic group with ask and inquire, the three senses {1, 2, 3} survive out of the five senses of question, with a preference for sense 3. If, on the other hand, question is classified with challenge and dispute, only sense 1 survives.</Paragraph> <Paragraph position="9"> We performed an experiment analyzing a specific verb group produced by one semantic clustering program (McMahon and Smith, 1996). This group contains 19 verbs, all but one of them ambiguous, including ask, call, charge, regard, say, and wish.</Paragraph> <Paragraph position="10"> We measured for each sense of the 19 words how many of the other words have at least one sense linked with that sense in WordNet (in the same top-level verb sense tree). The results, part of which is shown in Table 2, indicate that some senses are much more strongly connected with the other words in the group, and so probably predominate in the corpus that was used to induce the group. For example, one of the senses of ask, &quot;require&quot; (as in This job asks (for) long hours) is not linked to any of the other 18 words in the cluster, and should therefore be removed. If, for each word w we analyze, we require that each of its probable senses be linked to at least a fixed percentage (e.g., one-third) of the total number of words linked to to, we can eliminate five verbs on the J part of the Brown corpus.</Paragraph> <Paragraph position="11"> many of the senses as improbable. The achieved reduction in ambiguity (for the 18 ambiguous words) ranges from 20% to 84.62% (including cases of full disambiguation), and its average for all 18 words is 55.89%.</Paragraph> <Paragraph position="12"> In another experiment, we looked at a specific corpus, taking into account the frequency distribution of the verbs in it. We selected the J part of the Brown corpus, which focuses on learned knowledge (the Natural Sciences, Mathematics, Medicine, the Humanities, etc.) (Ku~era and Francis, 1967). This part of the corpus is more homogeneous and contains a larger number of articles (80). The increased homogeneity makes it suitable for investigating our hypothesis of predominant verb senses.</Paragraph> <Paragraph position="13"> We selected five verbs from this sub-corpus (show, describe, present, prove, and introduce), and applied our algorithm assuming that the predominant senses of these verbs axe linked together and consequently, that the five verbs would be placed in the same group by the clustering program.</Paragraph> <Paragraph position="14"> Under this assumption, we measured the reduction in ambiguity (number of possible senses) for each verb (types) as well as over all occurrences of the five verbs in the sub-corpus (tokens) when the cluster-based algorithm is applied. We also counted how many of the verbs receive a wrong tag, i.e., a set of senses that does not include the hand-assigued one. The results of these experiments are shown in Table 3. We observe that the cluster-based method achieves a 49.27% reduction in the number of senses -when measured on types. When the distribution of the words is factored in, the corresponding measure on tokens (which better describes the applicability of the method in practice) is 38.00%. The average error rate is 8.48%; this average is driven up by the inclusion of present, prove, and introduce in our test set. The relatively high error rate for these verbs may be due to their low frequency in our corpus, or may indicate that their predominant senses axe not associated with the predominant senses of show and describe as we hypothesized.</Paragraph> </Section> <Section position="6" start_page="62" end_page="63" type="metho"> <SectionTitle> 4 Combining the two methods </SectionTitle> <Paragraph position="0"> While the syntactic constraints method almost always produces a semantic tag that includes the correct sense for a verb, 3 it has no capability to further distinguish the surviving senses in the tag. The semantic link-based method, on the other hand, can eliminate some senses from this tag. By applying the two methods in tandem and intersecting the sense sets produced by them, we can reduce the size of the final tag. Using the verb &quot;show&quot; of the experiment described in the previous section as an illustration, we note that whenever the verb takes only a direct object, the syntactic method eliminates three of the thirteen possible senses while always retaining the ZAssuming no gaps in the subcategorization information for this verb in COMLEX and WordNet.</Paragraph> <Paragraph position="1"> correct sense in the produced tag (error rate 0%).</Paragraph> <Paragraph position="2"> For the same verb and subcal~.egorization pattern, the cluster-based method rejects four of the thirteen senses with error rate 5% (i.e, 3 out of 58 occurrences in the Y part of the Brown Corpus will be assigned wrong tags). The intersection of the two methods increases the number of rejected senses to five. It reduces the ambiguity by 38% but has the combined error rate of both methods, in this case 5%.</Paragraph> <Paragraph position="3"> As we see from this experiment, the integration of the two methods can improve the reduction rate of ambiguity, but may slightly increase the error rate.</Paragraph> <Paragraph position="4"> We are investigating ways to stratify the application of the cluster-based method on appropriate groups of tokens identified by the syntactic method, by separately clustering tokens of the same verb that appear in different syntactic frames. We expect that this will partly alleviate the increase in the error rate.</Paragraph> </Section> class="xml-element"></Paper>