File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/92/j92-2004_abstr.xml

Size: 14,765 bytes

Last Modified: 2025-10-06 13:47:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="J92-2004">
  <Title>Inheritance in Natural Language Processing</Title>
  <Section position="2" start_page="0" end_page="209" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Imagine that you are a linguistic innocent setting out on the job of building a computer lexicon for English. You begin by encoding everything you know about the verb love and then turn your attention to the verb hate. Although they are antonyms, the majority of properties that you have listed for love will show up again in your list for hate. Your first thought is to put this list of common properties into an editor macro to save you the laborious task of typing them all in each time that you add another verb. But it soon becomes clear to you that adopting this strategy is going to lead to a huge representation for your lexicon, and one that keeps saying the same thing again and again. Your second thought is to put the common property list in just one place and call it, say, TRANSITIVE VERB. Then you amend what you have entered for love and hate so that all the common material is replaced by a notation that indicates that each is a transitive verb. This works well and you add a couple of thousand more English verbs without difficulty. It is only when you reach elapse and expire that you find yourself landed with the tedious task of again typing full lists of properties, since these two verbs cannot be accurately represented by including a reference to the TRANSITIVE VERB property list. Looking at the entries for these two anomalous verbs induces a feeling of d6ja vu. They too have many properties in common, but just not exactly the same set of common properties as hate and love and their siblings. Following the strategy that worked well before, you gather their common properties together and give them the name INTRANSITIVE VERB, then you strip the duplicated material  Monotonic single inheritance.</Paragraph>
    <Paragraph position="1"> from the entries for elapse and expire and replace it with a notation that points to your list of intransitive verb properties. As you inspect your handiwork, you notice that the lists of properties associated with TRANSITIVE VERB and INTRANSITIVE VERB now exhibit exactly the kind of duplication that you first saw when you wrote down your entries for love and hate. Indeed, the number of their commonalities exceeds the number of their differences. Once again you decide to invoke the style of solution that you have used before: you collect the common properties together, give the collection the name VERB and then rework your formulation of TRANSITIVE VERB and INTRANSITIVE VERB so as to strip the shared material and replace it with a notation indicating that each is an instance of VERB.</Paragraph>
    <Paragraph position="2"> Although you may not realize it, what you have done is build an inheritance network to represent the information that you are including in your lexicon--see Figure 1. The root node of this network is VERB and it has two daughters, TRANSITIVE VERB and INTRANSITIVE VERB, which inherit all the properties associated with the root. Each of these two nodes has further daughters (Love, Elapse, etc.). The latter inherit all the properties of VERB together with all the properties of their immediate parent. These inherited properties are added to the properties listed as idiosyncratic to the lexical item itself (e.g., the property of being orthographically represented as /1 o v e/). This very simple lexical network has a couple of characteristics that it is worth drawing attention to. Firstly, each node has a single parent, and there is thus only one path through which properties may be inherited. A network of this kind either consists of a single tree of nodes, or of a set of (unconnected) trees of nodes, and we will call such a network a single inheritance network. 1 Secondly, in describing our example, we have been assuming that each node inherits all the properties associated 1 Two trees are unconnected if and only if they have no nodes in common. For present purposes, a set of unconnected trees can always be trivially converted into an equivalent single tree by adding a new root for all the trees, but one that has no properties associated with it.  Walter Daelemans et al. Inheritance in Natural Language Processing with its parent node which, in logician's parlance, means that property inheritance is monotonic.</Paragraph>
    <Paragraph position="3"> Neither single inheritance nor monotonicity is a necessary characteristic of inheritance networks. Suppose you try to add Beat to the network we have been describing. The obvious thing to do is to insert it as a daughter of TRANSITIVE VERB. But this is likely to entail that your network will claim that the past participle is *beated. One potential solution to this problem would be to define a node called EN TRANSITIVE VERB and attach Beat as a daughter to this. However, this strategy simply pushes the problem further up the inheritance tree: EN TRANSITIVE VERB cannot be a daughter of the TRANSITIVE VERB node since it contains a property (past participle = /e n/) that is inconsistent with a property associated with the latter (past participle = /e d/). Nor can our new node be attached as a daughter of VERB, for exactly the same reason. It seems, therefore, as if the new node may have to be defined wholly from scratch, duplicating all but one of the properties of TRANSITIVE VERB. To avoid this disagreeable conclusion, we might consider another potential solution in which we remove any reference to the past participle suffix at the level of the VERB node, and specify it instead at the level of that node's daughters. At first sight, this appears to be a most attractive option. In fact, by adopting it, we have embarked on a slippery slope that will result in our stripping VERB of almost all the properties canonically associated with verbs. For each property you might expect it to have, if there is a single verb in English that is exceptional with respect to that property, then the property cannot appear at the VERB node. In the case of morphological properties, this is likely to mean that &amp;quot;present participle =/i n g/&amp;quot; is the only property that can be associated with the VERB node. And, in the case of syntactic properties, it is likely to mean that banalities such as &amp;quot;category = verb&amp;quot; will be all we are able to list.</Paragraph>
    <Paragraph position="4"> How are we to avoid these rather dismal alternatives? There are (at least) two possibilities. One is to abandon single inheritance. Suppose we reorganize our network so that TRANSITIVE VERB and INTRANSITIVE VERB only encode syntactic properties of verbs. We then introduce two further nodes, ED VERB and EN VERB, which only encode morphological properties. Then we allow Beat to have both TRANSITIVE VERB and EN VERB as its parents. A network of this kind can no longer be represented as a tree (or set of unconnected trees) and is said to employ multiple inheritance-see Figure 2. Another possibility is to abandon monotonicity. We leave Beat where we first attached it, under TRANSITIVE VERB in our original network, and we associate the property &amp;quot;past participle =/e n/&amp;quot; with it. If inheritance continues to be construed monotonically, then the network will make contradictory claims about the past participle of Beat. But if we adopt a nonmonotonic interpretation of inheritance, in which properties that are attached to a node take precedence over those that are inherited from a parent, then no contradiction will arise. Such nonmonotonic inheritance is known as &amp;quot;default inheritance'--see Figure 3.</Paragraph>
    <Paragraph position="5"> Monotonic single inheritance networks are easy to build and easy to understand. If one designs a notation for defining them, then it is straightforward to say what the semantics of that notation is: translation into first order logic, for example, is quite trivial. Unfortunately, for the reasons hinted at in the example considered above, monotonic single inheritance networks are not really very well suited to the description of natural languages. As a result, as we shall see below, most researchers who have employed inheritance techniques in NLP have chosen to use either default inheritance or multiple inheritance or, very commonly, both. Networks that employ default and/or multiple inheritance are also quite easy to build, but they are much less easy to understand.</Paragraph>
    <Paragraph position="6"> The combination of default and multiple inheritance is especially problematic: &amp;quot;despite a decade of study, with increasingly subtle examples and counterexamples being  Walter Daelemans et al. Inheritance in Natural Language Processing considered, consensus has yet to emerge regarding the proper treatment of multiple inheritance with cancellations&amp;quot; (Selman and Levesque 1989, pp. 1140). Unsurprisingly, the problem has given rise to a large, and growing, list of publications in the knowledge representation literature (see, e.g., Horty, Thomason, and Touretzky 1990, and references therein). Almost all of this theoretical work has concerned itself with very simple networks that are only able to say whether or not a monadic property holds of a node in the network. Recently, however, Thomason and Touretzky (1991) have turned their attention to the properties of more expressive networks, potentially capable of encoding what would need to be encoded in any real NLP application. Nonmonotonic inference more generally (i.e. not just in networks) has been, arguably, the dominant theoretical concern in the AI literature of the late 1980s (as measured, for example, by the proportion of papers that have appeared on the topic in Artificial Intelligence over the period).</Paragraph>
    <Paragraph position="7"> One of the key issues in the knowledge representation literature has been how to deal with the default inheritance of mutually contradictory information from two or more parent nodes. Most NLP researchers who have embraced multiple inheritance techniques have chosen to avoid this issue by adopting one of two strategies. On one strategy, information is partitioned between parental nodes. You can, for example, inherit morphological properties from node A and syntactic properties from node B, but no single property can be inherited from more than one parent node. This is known as &amp;quot;orthogonal inheritance.&amp;quot; One way of thinking of it is in terms of a set of disjoint single inheritance networks layered on top of each other. On another strategy, a given property, or set of properties, may potentially be inherited from more than one parent node, but the parents are ordered: the first parent in the ordering that is able to supply the property wins, and contradiction is thus avoided. We will refer to this strategy as &amp;quot;prioritized inheritance.&amp;quot; The use of inheritance networks in current NLP comes from three rather separate traditions. The first is that of &amp;quot;semantic nets&amp;quot; in AI, which goes back to Quillian (1968) through Fahlman's (1979) NETL to the late 1980s monographs by Touretzky (1986) and Etherington (1988). The second is that of data abstraction in programming languages, which has led to (a) object-orientation in computer science with its notions of classes and inheritance as embodied in such languages as Smalltalk, Simula, Flavors, CLOS and C++, and (b) the use of type hierarchies, which have become widely seen in unification-oriented NLP since the appearance of Ait-Kaci (1984) and Cardelli (1984).</Paragraph>
    <Paragraph position="8"> Of necessity, the type hierarchy work in NLP has remained strictly monotonic. The third is the notion of &amp;quot;markedness&amp;quot; in linguistics, which originates in the Prague School phonology of the 1930s, reappears in the &amp;quot;generative phonology&amp;quot; of Chomsky and Halle (1968) and Hetzron's (1975) and Jackendoff's (1975) models of the lexicon, and shows up in syntax in the &amp;quot;feature specification defaults&amp;quot; of Gazdar, Klein, Pullum, and Sag (1985). 2 Unlike the other three traditions, the linguistic tradition does not embody a notion of inheritance per se. But the issue of how to decide which operations take precedence over others has been a continuing concern in the literature (see, e.g., Pullum 1979, especially Section 1.4.1, and references therein).</Paragraph>
    <Paragraph position="9"> The consensus view, though largely unspoken, among computational linguists currently working with default inheritance networks appears to be that nodes that are close (or identical) to the root(s) of the network should be used to encode that which is regular, &amp;quot;unmarked,&amp;quot; and productive, and that distance from the root(s) should  Computational Linguistics Volume 18, Number 2 very least, this is what emerges from their practice. The differences between the current strands of NLP work in this area are partly philosophical (e.g., as to whether psycholinguistic data could or should be relevant to the structure of the network), partly methodological (e.g., as to whether networks should be built in a formal language designed for the purpose or implemented in an existing computer language), partly technical (e.g., whether a negation operator is useful, or whether orthogonal networks are to be preferred to those using prioritized inheritance), and partly theoretical (e.g., the trade-off between the semantic perspicuity of monotonic networks versus the expressiveness and concision of their nonmonotonic competitors).</Paragraph>
    <Paragraph position="10"> In the subsequent sections of this paper we will survey the use computational linguists have made of inheritance networks over the last dozen years. To organize this chronologically (e.g. by date of publication) would be to impose a wholly spurious sense of historical continuity on what has, in fact, been a fairly haphazard set of parallel developments. It is tempting to try to organize the discussion that follows by reference to technical and formal parameters, but the area is just too young for that to be possible without a great deal of rather arbitrary taxonomy. So we have chosen to play safe and organize the material by reference to levels of linguistic description. This is not wholly satisfactory, since a significant number of the approaches we discuss have been applied to several different levels of description, which means that we have to refer to them in more than one section. But we hope that readers will bear with us.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML