File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/j04-1003_relat.xml
Size: 14,900 bytes
Last Modified: 2025-10-06 14:15:42
<?xml version="1.0" standalone="yes"?> <Paper uid="J04-1003"> <Title>c(c) 2004 Association for Computational Linguistics Verb Class Disambiguation Using Informative Priors</Title> <Section position="8" start_page="98" end_page="98" type="relat"> <SectionTitle> 8. Related Work </SectionTitle> <Paragraph position="0"> Levin's (1993) seminal study on diathesis alternations and verb semantic classes has recently influenced work in dictionary creation (Dorr 1997; Dang et al. 1998; Dorr and Jones 1996) and notably lexicon acquisition on the basis of the assumption that verbal meaning can be gleaned from corpora using cues pertaining to syntactic structure (Merlo and Stevenson 2001; Schulte im Walde 2000; Lapata 1999; McCarthy 2000).</Paragraph> <Paragraph position="1"> Previous work in word sense disambiguation has not tackled explicitly the ambiguity problems arising from Levin's classification, although methods for deriving informative priors in an unsupervised manner have been proposed by Ciaramita and Johnson (2000) and Chao and Dyer (2000) within the context of noun and adjective sense disambiguation, respectively. In this section we review related work on classification and lexicon acquisition and compare it to our own work.</Paragraph> <Paragraph position="2"> Computational Linguistics Volume 30, Number 1 Dang et al. (1998) observe that verbs in Levin's (1993) database are listed in more than one class. The precise meaning of this ambiguity is left open to interpretation in Levin, as it may indicate that the verb has more than one sense or that one sense (i.e., class) is primary and the alternations for this class should take precedence over the alternations for the other classes for which the verb is listed. Dang et al. augment Levin's semantic classes with a set of &quot;intersective&quot; classes that are created by grouping together sets of existing classes that share a minimum of three overlapping members.</Paragraph> <Paragraph position="3"> Intersective classes are more fine-grained than the original Levin classes and exhibit more-coherent sets of syntactic frames and associated semantic components. Dang et al. further argue that intersective classes are more compatible with WordNet than the broader Levin classes and thus make it possible to attribute the semantic components and associated sets of syntactic frames to specific WordNet senses as well, thereby enriching the WordNet representation and providing explicit criteria for word sense disambiguation.</Paragraph> <Paragraph position="4"> Most statistical approaches, including ours, treat verbal-meaning assignment as a semantic classification task. The underlying question is the following: How can corpus information be exploited in deriving the semantic class for a given verb? Despite the unifying theme of using corpora and corpus distributions for the acquisition task, the approaches differ in the inventory of classes they employ, in the methodology used for inferring semantic classes, and in the specific assumptions concerning the verbs to be classified (e.g., can they be polysemous or not).</Paragraph> <Paragraph position="5"> Merlo and Stevenson (2001) use grammatical features (acquired from corpora) to classify verbs into three semantic classes: unergative, unaccusative, and object drop. These classes are abstractions of Levin's (1993) classes and as a result yield a coarser classification. For example, object-drop verbs comprise a variety of Levin classes such as Gesture verbs, Caring verbs, Load verbs, Push-Pull verbs, Meet verbs, SocialInteractionverbs, andAmuseverbs. Unergative, unaccusative, and object-drop verbs have identical subcategorization patterns (i.e., they alternate between the transitive and intransitive frame), yet distinct argument structures, and therefore differ in the thematic roles they assign to their arguments. For example, when attested in the intransitive frame, the subject of an object-drop verb is an agent, whereas the subject of an unaccusative verb is a theme. Under the assumption that differences in thematic role assignment uniquely identify semantic classes, numeric approximations of argument structure are derived from corpora and used in a machine-learning paradigm to place verbs in their semantic classes. The approach is evaluated on 59 verbs manually selected from Levin (20 unergatives, 20 object drops, and 19 unaccusatives). It is assumed that these verbs are monosemous, that is, they can be ergative, unergative, or object drop. A decision-tree learner achieves an accuracy of 69.8% on the classification task over a chance baseline of 34%.</Paragraph> <Paragraph position="6"> Schulte im Walde (2000) uses subcategorization information and selectional restrictions to cluster verbs into Levin (1993)-compatible semantic classes. Subcategorization frames are induced from the BNC using a robust statistical parser (Carroll and Rooth 1998). The selectional restrictions are acquired using Resnik's (1993) information-theoretic measure of selectional association, which combines distributional and taxonomic information (e.g., WordNet) to formalize how well a predicate associates with a given argument. Two sets of experiments are run to evaluate the contribution of selectional restrictions using two types of clustering algorithms: iterative clustering and latent-class clustering (see Schulte im Walde [2000] for details). The approach is evaluated on 153 verbs taken from Levin, 53 of which are polysemous (i.e., belong to more than one class). The size of the derived clusters is restricted to four verbs and compared to Levin: Verbs are classified correctly if they are members of a nonsingleton cluster Lapata and Brew Verb Class Disambiguation Using Informative Priors that is a subset of a Levin class. Polysemous verbs can be assigned to distinct clusters only using the latent-class clustering method. The best results achieve a recall of 36% and a precision of 61% (over a baseline of 5%, calculated as the number of randomly created clusters that are subsets of a Levin class) using subcategorization information only and iterative clustering. Inclusion of information about selectional restrictions yields a lower accuracy of 38% (with a recall of 20%), again using iterative clustering.</Paragraph> <Paragraph position="7"> Dorr and Jones (1996) use Levin's (1993) classification to show that there is a predictable relationship between verbal meaning and syntactic behavior. They create a database of Levin verb classes and the sentences exemplifying them (including both positive and negative examples, i.e., examples marked with asterisks). A parser is used to extract basic syntactic patterns for each semantic class. These patterns form the syntactic signature of the class. Dorr and Jones show that 97.9% of the semantic classes can be identified uniquely by their syntactic signatures. Grouping verbs (instead of classes) with identical signatures to form a semantic class yields a 6.3% overlap with Levin classes. Dorr and Jones's results are somewhat difficult to interpret, since in practice information about a verb and its syntactic signature is not available, and it is precisely this information that is crucial for classifying verbs into Levin classes. Schulte im Walde's study and our own study show that acquisition of syntactic signatures (i.e., subcategorization frames) from corpora is feasible; however, these acquired signatures are not necessarily compatible with Levin and in most cases will depart from those derived by Dorr and Jones, as negative examples are not available in real corpora.</Paragraph> <Paragraph position="8"> Ciaramita and Johnson (2000) propose an unsupervised Bayesian model for disambiguating verbal objects that uses WordNet's inventory of senses. For each verb the model creates a Bayesian network whose architecture is determined by WordNet's hierarchy and whose parameters are estimated from a list of verb-object pairs found in a corpus. A common problem for unsupervised models trained on verb-object tuples is that the objects can belong to more than one semantic class. The class ambiguity problem is commonly resolved by considering each observation of an object as evidence for each of the classes the word belongs to. The formalization of the problem in terms of Bayesian networks allows the contribution of different senses to be weighted via explaining away (Pearl 1988): If A is a hyponym of B and C is a hyponym of B, and B is true, then finding that C is true makes A less likely.</Paragraph> <Paragraph position="9"> Prior knowledge about the likelihoods of concepts is hand coded in the network according to the following principles: (1) It is unlikely that any given class will be a priori selected for; (2) if a class is selected, then its hyponyms are also likely to be selected; (3) a word is likely as the object of a verb, if at least one of its classes is selected for. Likely and unlikely here correspond to numbers that sum up to to one. Ciaramita and Johnson show that their model outperforms other word sense disambiguation approaches that do not make use of prior knowledge.</Paragraph> <Paragraph position="10"> Chao and Dyer (2000) propose a method for the disambiguation of polysemous adjectives in adjective-noun combinations that also uses Bayesian networks and WordNet's taxonomic information. Prior knowledge about the likelihood of different senses or semantic classes is derived heuristically by submitting queries (e.g., great hurricane) to the AltaVista search engine and extrapolating from the number of returned documents the frequency of the adjective-noun pair (see Mihalcea and Moldovan [1998] for details of this technique). For each polysemous adjective-noun combination, the synonyms representative of each sense are retrieved from WordNet (e.g., {great, large, big} vs. {great, neat, good}). Queries are submitted to AltaVista for each synonym-noun pair; the number of documents returned is used then as an estimate of how likely the different adjective senses are. Chao and Dyer obtain better results when prior knowledge is factored into their Bayesian network.</Paragraph> <Paragraph position="11"> Computational Linguistics Volume 30, Number 1 Our work focuses on the ambiguity inherently present in Levin's (1993) classification. The problem is ignored by Merlo and Stevenson (2001), who focus only on monosemous verbs. Polysemous verbs are included in Schulte im Walde's (2000) experiments: The clustering approach can go so far as to identify more than one class for a given verb without, however, providing information about its dominant class.</Paragraph> <Paragraph position="12"> We recast Levin's classification in a statistical framework and show in agreement with Merlo and Stevenson and Schulte im Walde that corpus-based distributions provide important information for semantic classification, especially in the case of polysemous verbs whose meaning cannot be easily inferred from the immediate surrounding context (i.e., subcategorization). We additionally show that the derived model is useful not only for determining the most likely overall class for a given verb (i.e., across the corpus), but also for disambiguating polysemous verb tokens in context.</Paragraph> <Paragraph position="13"> Like Schulte im Walde (2000), our approach relies on subcategorization frames extracted from the BNC (although using a different methodology). We employ Levin's inventory of semantic classes, arriving at a finer-grained classification than Merlo and Stevenson (2001). In contrast to Schulte im Walde, we do not attempt to discover Levin classes from corpora; instead, we exploit Levin's classification and corpus frequencies in order to derive a distribution of verbs, classes, and their frames that is not known a priori but is approximated using simplifications. Our approach is not particularly tied to Levin's exact classification. We have presented in this article a general framework that could be extended to related classifications such as the semantic hierarchy proposed by Dang et al. (1998). In fact the latter may be more appropriate than Levin's original classification for our disambiguation experiments, as it is based on a tighter correspondence between syntactic frames and semantic components and contains links to the WordNet taxonomy.</Paragraph> <Paragraph position="14"> Prior knowledge with regard to the likelihood of polysemous verb classes is acquired automatically in an unsupervised manner by combining corpus frequencies estimated from the BNC and information inherent in Levin. The models proposed by Chao and Dyer (2000) and Ciaramita and Johnson (2000) are not directly applicable to Levin's classification, as the latter is not a hierarchy (and therefore not a DAG) and cannot be straightforwardly mapped into a Bayesian network. However, in agreement with Chao and Dyer and Ciaramita and Johnson, we show that prior knowledge about class preferences improves word sense disambiguation performance.</Paragraph> <Paragraph position="15"> Unlike Schulte im Walde (2000) and Merlo and Stevenson (2001), we ignore information about the arguments of a given verb in the form of either selectional restrictions or argument structure while building our prior models. The latter information is, however, indirectly taken into account in our disambiguation experiments: The verbs' arguments are features for our naive Bayesian classifiers. Such information can be also incorporated into the prior model in the form of conditional probabilities, where the verb is, for example, conditioned on the thematic role of its arguments if this is known (see Gildea and Jurafsky [2000] for a method that automatically labels thematic roles).</Paragraph> <Paragraph position="16"> Unlike Stevenson and Merlo, Schulte im Walde, and Dorr and Jones (1996), we provide a general probabilistic model that assigns a probability to each class of a given verb by calculating the probability of a complex expression in terms of the probability of simpler expressions that compose it. We further show that this model is useful for disambiguating polysemous verbs in context.</Paragraph> <Paragraph position="17"> Appendix: Disambiguation Results with Co-occurrences Figures 6-9 show the performances of our naive Bayesian classifier when co-occurrences are used as features. We experimented with four types of context: left context (Left), Lapata and Brew Verb Class Disambiguation Using Informative Priors right context (Right), sentential context (Sentence), and the sentence within which the ambiguous verb is found together with its immediately preceding sentence (PSentence). The context was encoded as lemmas or parts of speech.</Paragraph> <Paragraph position="18"> Lapata and Brew Verb Class Disambiguation Using Informative Priors</Paragraph> </Section> class="xml-element"></Paper>