File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/j92-1001_intro.xml

Size: 17,041 bytes

Last Modified: 2025-10-06 14:05:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="J92-1001">
  <Title>Using Multiple Knowledge Sources for Word Sense Discrimination</Title>
  <Section position="2" start_page="0" end_page="4" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Many problems in applied natural language processing -- including information retrieval, database generation from text, and machine translation -- hinge on relating words to other words that are similar in meaning. Current approaches to these applications are often word-based -- that is, they treat words in the input as strings, mapping them directly to other words. However, the fact that many words have multiple senses and different words often have similar meanings limits the accuracy of such systems. An alternative is to use a knowledge representation, or interlingua, to reflect text content, thereby separating text representation from the individual words.</Paragraph>
    <Paragraph position="1"> These approaches can, in principle, be more accurate than word-based approaches, but have not been sufficiently robust to perform any practical text processing task. Their lack of robustness is generally due to the difficulty in building knowledge bases that are sufficient for broad-scale processing.</Paragraph>
    <Paragraph position="2"> But a synthesis is possible. Applications can achieve greater accuracy by working at the level of word senses instead of word strings. That is, they would operate on text in which each word has been tagged with its sense. Robustness need not be sacrificed, however, because this tagging does not require a full-blown semantic analysis. Demonstrating this claim is one of the goals of this paper.</Paragraph>
    <Paragraph position="3"> Here is an example of the level of analysis a sense tagger would provide to an application program. Suppose that the input is (1): Correspondence should be addressed to the author at Department of Computer Science, University of Toronto, Toronto, Canada M5S lA4 or mcroy@ai.toronto.edu.</Paragraph>
    <Paragraph position="4"> (~) 1992 Association for Computational Linguistics Computational Linguistics Volume 18, Number 1 Example 1 The agreement reached by the state and the EPA provides for the safe storage of the waste.</Paragraph>
    <Paragraph position="5"> The analysis would provide an application with the following information.</Paragraph>
    <Paragraph position="6"> * agreement refers to a state resulting from concurrence, rather than an act, object, or state of being equivalent.</Paragraph>
    <Paragraph position="7"> * reach is intended to mean 'achieve,' rather than 'extend an arm.' * state refers to a government body, rather than an abstract state of existence.</Paragraph>
    <Paragraph position="8"> safe in this context is an adjective corresponding to 'secure,' rather than a noun corresponding to a container for valuables.</Paragraph>
    <Paragraph position="9"> The EPA and the state were co-agents in completing some agreement that is instrumental in supplying a secure place to keep garbage, rather than there was some equivalence that extended its arm around the state while the EPA was busy filling safes with trash.</Paragraph>
    <Paragraph position="10"> Preliminary e;cidence suggests that having access to a sense tagging of the text improves the performance of information retrieval systems (Krovetz 1989).</Paragraph>
    <Paragraph position="11"> The primary goal of this paper, then, is to describe in detail methods and knowledge that will enable a language analyzer to tag each word with its sense. To demonstrate that the approach is sufficiently robust for practical tasks, the article will also discuss the incorporation of the approach into an existing system, TRUMP (Jacobs 1986, 1987, 1989), and the application of it to unrestricted texts. The principles that make up the approach are completely general, however, and not just specific to TRUMP.</Paragraph>
    <Paragraph position="12"> An analyzer whose tasks include word-sense tagging must be able to take an input text, determine the concept that each word or phrase denotes, and identify the role relationships that link these concepts. Because determining this information accurately is knowledge-intensive, the analyzer should be as flexible as possible, requiring a minimum amount of customization for different domains. One way to gain such flexibility is give the system enough generic information about word senses and semantic relations so that it will be able to handle texts spanning more than a single domain.</Paragraph>
    <Paragraph position="13"> While having an extensive grammar and lexicon is essential for any system's domain independence, this increased flexibility also introduces degrees of ambiguity not frequently addressed by current NLP work. Typically, the system will have to choose from several senses for each word. For example, we found that TRUMP's base of nearly 10,000 root senses and 10,000 derivations provides an average of approximately four senses for each word of a sentence taken from the Wall Street Journal. The potential for combinatoric explosion resulting from such ambiguity makes it critical to resolve ambiguities quickly and reliably. It is unrealistic to assume that word sense discrimination can be left until parsing is complete, as suggested, for example, by Dahlgren, McDowell, and Stabler (1989) and Janssen (1990).</Paragraph>
    <Paragraph position="14"> No simple recipe can resolve the general problem of lexical ambiguity. Although semantic context and selectional restrictions provide good cues to disambiguation, they are neither reliable enough, nor available quickly enough, to be used alone. The approach to disambiguation that we will take below combines many different, strong Susan W. McRoy Using Multiple Knowledge Sources sources of information: syntactic tags, word frequencies, collocations, semantic context (clusters), selectional restrictions, and syntactic cues. The approach incorporates a number of innovations, including: * a hybridization of several lexicons to help control which senses are considered: a static generic lexicon a lexicon linked to collocations w a lexicon linked to concretions (i.e., specializations of abstract senses of words) m lexicons linked to specialized conceptual domains; * a separate processing phase, prior to parsing, that eliminates some ambiguities and identifies baseline semantic preferences; * a preference combination mechanism, applied during parsing and semantic interpretation, that uses dynamic measures of strength based on specificity, instead of a fixed, ordered set of rules.</Paragraph>
    <Paragraph position="15"> Although improvements to our system are ongoing, it already interprets arbitrary text and makes coarse word sense selections reasonably well. (Section 6 will give some quantitative assessments.) No other system, to our knowledge, has been as successful. We Will now review word sense discrimination and the determination of role relations. In Section 3, we discuss some sources of knowledge relevant to solving these problems, and, in Section 4, how TRUMP's semantic interpreter uses this knowledge to identify sense preferences. Section 5 describes how it combines the preference information to select senses. Afterward, we will discuss the results of our methods and the avenues for improvement that remain.</Paragraph>
    <Paragraph position="16"> 2. Cues to Word Sense Discrimination The problem of word sense discrimination is to choose, for a particular word in a particular context, which of its possible senses is the &amp;quot;correct&amp;quot; one for the context. Information about senses can come from a wide variety of sources:  * the analysis of each word into its root and affixes, that is, its morphology; * the contextually appropriate part or parts of speech of each word, that is, its syntactic tag or tags; * for each sense of the word, whether the sense is preferred or deprecated -- either in general, because of its frequency, or in the context, because it is the expected one for a domain; * whether a word is part of a common expression, or collocation, such as a nominal compound (e.g., soda cracker) or a predicative relation (e.g., take action); * whether a word sense is supported by the semantic context -- for  example, by its association with other senses in the context sharing a semantic category, a situation, or a topic; * whether the input satisfies the expectations created by syntactic cues (e.g., some senses only take arguments of a particular syntactic type);  Computational Linguistics Volume 18, Number 1 whether it satisfies role-related expectations (i.e., expectations regarding the semantic relations that link syntactically attached objects); whether the input refers to something already active in the discourse focus.</Paragraph>
    <Paragraph position="17"> Of course, not all these cues will be equally useful.</Paragraph>
    <Paragraph position="18"> We have found that, in general, the most important sources of information for word sense discrimination are syntactic tags, morphology, collocations, and word associations. Role-related expectations are also important, but to a slightly lesser degree. Syntactic tags are very important, because knowing the intended part of speech is often enough to identify the correct sense. For example, according to our lexicon, when safe is used as an adjective (as in Example 1), it always denotes the sense related to security, whereas safe used as a noun always denotes a type of container for storing valuables.</Paragraph>
    <Paragraph position="19"> Morphology is also a strong cue to discrimination because certain sense-affix combinations are preferred, deprecated, or forbidden. Consider the word agreement. The verb agree can mean either 'concur,' &amp;quot;benefit,' or 'be equivalent' and, in general, adding the affix -ment to a verb creates a noun corresponding either to an act, or to its result, its object, or its associated state. However, of the twelve possible combinations of root sense and affix sense, in practice only four occur: agreement can refer only to the act, object, or result in the case of the 'concur' sense of agree or the state in the case of the 'equivalence' sense of agree. Furthermore, the last of these combinations is deprecated. Collocations and word associations are also important sources of information because they are usually &amp;quot;dead giveaways,&amp;quot; that is, they make immediate and obvious sense selections. For example, when paired with increase, the preposition in clearly denotes a patient rather than a temporal or spatial location, or a direction. Word associations such as bank~money similarly create a bias for the related senses. Despite their apparent strength, however, the preferences created by these cues are not absolute, as other cues may defeat them. For example, although normally the collocation wait on means 'serve' (Mary waited on John), the failure of a role-related expectation, such as that the BENEFICIARY be animate, can override this preference (Mary waited on the steps). Thus, collocations and word associations are strong sources of information that an understander must weigh against other cues, and not just treat as rules for sense-filtering (as in Hirst 1987 or Dahlgren, McDowell, and Stabler 1989).</Paragraph>
    <Paragraph position="20"> The selection of a role relationship can both influence and be influenced by the selection of word senses, because preferences partially constrain the various combinations of a role, its holder, and the filler. For example, the preposition from prefers referring to the SOURCE role; transfers, such as give, prefer to have a DESTINATION role; and instances of colors, such as red, prefer to fill a COLOR role. Approaches based on the word disambiguation model tend to apply constraint satisfaction techniques to combine these role preferences (Hirst 1987). Preferences based on role-related expectations are often only a weak cue because they are primarily for verbs and not normally very restrictive.</Paragraph>
    <Paragraph position="21"> Although generally a weak cue, role-related preferences are quite valuable for the disambiguation of prepositions. In our view, prepositions should be treated essentially the same as other words in the lexicon. The meaning of a preposition either names a relation directly, as one of its core senses (Hirst \[1987\] also allows this), or indirectly, as a specialized sense triggered, for example, by a collocation or concretion. Because the meaning of a preposition actually names a relation, relation-based cues are a good source of information for disambiguating them. (References to objects in the discourse  Susan W. McRoy Using Multiple Knowledge Sources focus can also be a strong cue for disambiguating prepositions, but this cue appears fairly infrequently \[Whittemore, Ferrara, and Brunner 1990\].) The problem of determining role relationships entangles word sense discrimination with the problem of syntactic attachment. The attachment problem is a direct result of the ambiguity in determining whether a concept is related to an adjacent object, or to some enveloping structure that incorporates the adjacent object. Most proposed solutions to this problem specify a fixed set of ordered rules that a system applies until a unique, satisfactory attachment is found (Fodor and Frazier 1980; Wilks, Huang, and Fass 1985; Shieber 1983; Hirst 1987; Dahlgren, McDowell, and Stabler 1989). Such rules can be either syntactic, semantic, or pragmatic. Syntactic rules attempt to solve the attachment problem independent of the sense discrimination problem. For example, a rule for Right Association (also known as Late Closure) says to prefer attaching a new word to the lowest nonterminal node on the rightmost branch of the current structure (i.e., in the same structure as the last word processed) (Kimball 1973). Semantic rules, by contrast, intertwine the problems of discrimination and attachment; one must examine all combinations of senses and attachments to locate the semantically best one. Such rules normally also collapse the attachment problem into the conceptual role filling problem. For example, a lexical preference rule specifies that the preference for a particular attachment depends on how strongly or weakly the verb of the clause prefers its possible arguments (Fodor 1978; Ford, Bresnan, and Kaplan 1982). Pragmatic rules also intermingle sense discrimination and attachment, but consider the context of the utterance. For example, one suggested rule says to prefer to build structures describing objects just mentioned (Crain and Steedman 1985; Altmann and Steedman 1988).</Paragraph>
    <Paragraph position="22"> The accuracy of systems with fixed-order rules is limited by the fact that it is not always possible to strictly order a set of rules independent of the context. For example, Dahlgren, McDowell, and Stabler (1989) propose the rule &amp;quot;If the object of the preposition is an expression of time, then S-attach the PP&amp;quot; to explain the preference for assuming that &amp;quot;in the afternoon&amp;quot; modifies adjourn in Example 2: Example 2 The judge adjourned the hearing in the afternoon.</Paragraph>
    <Paragraph position="23"> Although they admit this rule would fail for a sentence like John described the meeting on January 20th, where the NP has a lexical preference for a time modifier, lexical preferences are not always the determining factor either. The existence of a conceptually similar object in the context (such as &amp;quot;the morning trial&amp;quot;) can also create an expectation for the grouping &amp;quot;hearing in the afternoon,&amp;quot; as in Example 3 below.</Paragraph>
    <Paragraph position="24"> Example 3 The judge had to leave town for the day. He found a replacement to take over his morning trial, but couldn't find anyone else that was available. He called the courthouse and cancelled the hearing in the afternoon.</Paragraph>
    <Paragraph position="25"> Moreover, pragmatic effects are not always the determining factor either, leading many people to judge the following sentence as silly (Hirst 1987).</Paragraph>
    <Paragraph position="26"> Example 4 The landlord painted all the walls with cracks (Rayner, Carlson, and Frazier 1983). Computational Linguistics Volume 18, Number 1 The presence of different lexical items or different objects in the discourse focus may strengthen or weaken the information provided by an individual rule. Another possibility we will discuss in Section 5 is to weigh all preference information dynamically (cf. Schubert 1986; McRoy and Hirst 1990).</Paragraph>
    <Paragraph position="27"> The system we will be describing in Section 4 will use many of the cues described above, including syntactic tags, morphology, word associations, and role-related expectations. But first, we need to discuss the sources of knowledge that enable a system to identify these cues.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML