XML Viewer - p86-1037

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/p86-1037_metho.xml
Size: 29,058 bytes
Last Modified: 2025-10-06 14:11:54
<?xml version="1.0" standalone="yes"?>
<Paper uid="P86-1037">
  <Title>Some Uses of Higher-Order Logic in Computational Linguistics</Title>
  <Section position="3" start_page="247" end_page="248" type="metho">
    <SectionTitle>
2. Higher-Order Logic
</SectionTitle>
    <Paragraph position="0"> The higher-order logic we study here, called T, can be thought of as being a subsystem of either Church's Simple Theory of Types \[5\] or of Montague's intensional logic IL \[6\]* Unlike Church's or Montague's logics, T is very weak because it assumes no axioms regarding extensionality, definite descriptions, infinity, choice, or possible worlds. T encompasses only the most primitive logical notions, and generalizes first-order logic by introducing stronger notions of variables and substitutions. Our use of T is not driven by a desire to capture of the meaning of linguistic objects, as was the hope of Montague. It is our hope that programs written in T will do that.</Paragraph>
    <Paragraph position="1"> The language of T is a typed language. The typing mechanism provides for the usual notion of sorts often used in first-order logic and also for the notion of functional types. We take as primitive types (i.e. sorts) o for booleans and i for (first-order) individuals, adding others as needed.</Paragraph>
    <Paragraph position="2"> Functional types are written as a -* fl, where o~ and fl are types. This type is intended to denote the type of functions whose domains are a and whose codomains are /3.</Paragraph>
    <Paragraph position="3"> For example, i --~ i denotes the type of functions which map individuals to individuals, and (i --* i) --* o denotes the type of functions from that domain to the booleans. In reading such expressions we use the convention that --* is right associative, i.e. we read a --* fl --~ -y as ol --~ (fl --~ -~). The terms or formulas of T are specified along with their respective types by the following simple rules: We start with denumerable sets of constants and variables at each type. A constant or variable in any of these sets is considered to be a formula of the corresponding type. Then, if A is of type a --* fl and B is of type a, the function application (AB) is a formula of type ft. Finally, if x is a variable of type a and C is a term of type fl, the function abstraction )~xC is a formula of type a -~ ft.</Paragraph>
    <Paragraph position="4"> We assume that the following symbols, called the logical constants, are included in the set of constants of the corresponding type: true of type o, ~ of type o --* o, A, V, and D each of type o --~ o --~ o and II and ~ of type (A --~ o) --~ o for each type A. All these symbols except the last two correspond to the normal propositional connectives. The symbols II and Y:, are used in conjunction with the abstraction operation to represent universal and existential quantification: Vx P is an abbreviation for H(Ax P) and 3x P is an abbreviation for G(Ax P). H and E are examples of what are often called generalized quantifiers.</Paragraph>
    <Paragraph position="5"> The type o has a special role in this language. A formula with a function type of the form tt --* ... --~ t~ --~ o is called a predicate of n arguments. The i th argument of such a predicate is of type ti. Predicates are to be thought of as representing sets and relations. Thus a predicate of type f --* o represents a set of individuals, a predicate of type (i --~ o) --~ o represents a set of sets of individuals,  and a predicate of type i --~ (i --* o) ~ o represents a binary relation between individuals and sets of individuals.</Paragraph>
    <Paragraph position="6"> Formulas of type o are called propositions. Although predicates are essentially functions, we shall generally use the term function to denote a formula that does not have the type of a predicate.</Paragraph>
    <Paragraph position="7"> Derivability in T, denoted by ~-T, is defined in the following (simplified) fashion. The axioms of T are the propositional tautologies, the formula Vx Bx D Bt, and the formula Vx (PxAQ) D Vx PxAQ. The rules of inference of the system are Modus Ponens, Universal Generalization, Substitution, and A-conversion. The rules of A-conversion that we assume here are a-conversion (change of bound variables), fl-conversion (contraction), and r/-conversion (replace A with Az(Az) and vice versa if A has type a --* fl, z has type a, and z is not free in A). A-conversion is essentially the only rule in T that is not in first-order logic, but combined with the richer syntax of formulas in T it makes more complex inferences possible.</Paragraph>
    <Paragraph position="8"> In general, we shall consider two terms to be equal if they are each convertible to the other; further distinctions can be made between formulas in this sense by omitting the rule for rl-conversion, but we feel that such distinctions are not important in our context. We say that a formula is a A-normal formula if it has the form Axi...Ax, (h tl ... tin) wheren, m&gt;0, where h is a constant or variable, (h tl ... t,,) has a primitive type, and, for 1 &lt; i &lt; m, t~ also has the same form.</Paragraph>
    <Paragraph position="9"> We call the list of variables xl,...,x,~ the binder, h the head, and the formulas tl,...,tm the arguments of such a formula. It is well known that every formula, A, can be converted to a A-normal formula that is unique up to aconversions. We call such a formula a A-normal form of A and we use Anorrn(A) to denote any of these alphabetic variants. Notice that a proposition in A-normal form must have an empty binder and contai9 either a constant or free variable as its head. A proposition in A-normal form which has a non-logical constant as its head is called atomic.</Paragraph>
    <Paragraph position="10"> Our purpose in this paper is not merely to use a logic as a representational device, but also to think of it as a device for specifying computations. It turns out that T is too complex for the latter purpose. We shall therefore restrict our attention to what may be thought of as a higher-order analogue of positive Horn clauses. We define these below.</Paragraph>
    <Paragraph position="11"> We shall henceforth assume that we have a fixed set of nonlogical constants. The positive Herbrand Universe is identified in this context to be the set of all the A-normal formulas that can be constructed via function application and abstraction using the nonlogical constants and the logical constants true, A, V and ~; the omission here is of the symbols ~, D, and II. We shall use the symbol )4+ to denote this set of terms. Propositions in this set are of special interest to us. Let G and A be propositions in ~/+ such that A is atomic. A (higher-order) definite clause then is the universal closure of a formula of the form G D A, i.e. the formula Ve (G D A) where * is an arbitrary listing of all the free variables in G and A, some of which may be function and predicate variables. These formulas are our generalization of positive Horn clauses for first-order logic. The formula on the left of the D in a higher-order definite clause may contain nested disjunctions and existential quantification.</Paragraph>
    <Paragraph position="12"> This generalization may be dispensed within the first-order case because of the existence of appropriate normal forms.</Paragraph>
    <Paragraph position="13"> For the higher-order case, it is more natural to retain the embedded disjunctions and existential quantifications since substitutions for predicate variables have the potential for re-introducing them. Illustrations of this aspect appear in Section 4.</Paragraph>
    <Paragraph position="14"> Deductions from higher-order definite clauses are very similar to deductions from positive Horn clauses in first-order logic. Substitution, unification, and backchaining can be combined to build a theorem prover in either case. However, unification in the higher-order setting is complicated by the presence of A-conversion: two terms t and 8 are unifiable if there exists some substitution ~ such that Us and ~t are equal modulo A-conversions. Since fl-conversion is a very complex process, determining this kind of equality is difficult. The unification of typed A-terms is, in general, not decidable, and when unifiers do exist, there need not exist a single most general unifier. Nevertheless, it is possible to systematically search for unifiers in this setting \[8\] and an interpreter for higher-order definite clauses can be built around this procedure. The resulting interpreter can be made to resemble Prolog except that it must account for the extra degree of nondeterminism which arises from higher-order unification. Although there are several important issues regarding the search for higher-order unifiers, we shall ignore them here since all the unification problems which arise in this paper can be solved by even a simple-minded implementation of the procedure described in \[8\].</Paragraph>
  </Section>
  <Section position="4" start_page="248" end_page="250" type="metho">
    <SectionTitle>
3. AProlog
</SectionTitle>
    <Paragraph position="0"> We have used higher-order definite clauses and a depth-first interpreter to describe a logic programming language called AProlog. We present below a brief exposition of the higher-order features of this language that we shall use in the examples in the later sections. A fuller description of the language and of the logical considerations underlying it may be found in \[9\].</Paragraph>
    <Paragraph position="1"> Programs in AProlog are essentially higher-order definite clauses. The following set of clauses that define certain standard list operations serve to illustrate some of the syntactic features of our language.</Paragraph>
    <Paragraph position="2">  append nil K K.</Paragraph>
    <Paragraph position="3"> append (cons X L) K (cons X M) :- append L K M.</Paragraph>
    <Paragraph position="4"> member X (cons X L).</Paragraph>
    <Paragraph position="5"> member X (cons Y L) :- member X L.</Paragraph>
    <Paragraph position="6">  As should be apparent from these clauses, the syntax of AProlog borrows a great deal from that of Prolog. Symbols that begin with capital letters represent variables. All other symbols represent constants. Clauses are written backwards and the symbol :- is used for C. There are, however, some differences. We have adopted a curried notation for terms, rather than the notation normally used in a first-order language. Since the language is a typed one, types must be associated with each term. This is done by  either explicitly defining the type of a constant or a variable, or by inferring such a type by a process very similar to that used in the language ML \[7\]. The type expressions that are attached to symbols may contain variables which provide a form of polymorphism. As an example cons and nil above are assumed to have the types A -&gt; (list A) -&gt; (list A) and (list A) respectively; they serve to define lists of different kinds, but each list being such that all its elements have a common type. (For the convenience of expression, we shall actually use Prolog's notation for lists in the remainder of this paper, i.e. we shall write (cons X L) as \[XIL\]). In the examples in this paper, we shall occasionally provide type associations, but in general we shall assume that the reader can infer them from context when it is important. We need to represent A-abstraction in our language, and we use the symbol \ for this purpose; i.e. AX A is written in AProlog as X \ A.</Paragraph>
    <Paragraph position="7"> The following program, which defines the operation of mapping a function over a list, illustrates a use of function variables in our language.</Paragraph>
    <Paragraph position="8"> mapfun F \[XIL\] \[(F X)IK\] :- mapfun F L K.</Paragraph>
    <Paragraph position="9"> mapfun F \[\] \[\].</Paragraph>
    <Paragraph position="10"> Given these clauses, (mapfun F L1 L2) is provable only if L2 is a list that results from applying F to each element of L1. The interpreter for AProlog would therefore evaluate the goal (mapfun (X\(g X X)) \[a. b\]) L) by returning the value \[(g a a). (g b b)\] for L.</Paragraph>
    <Paragraph position="11"> The logical considerations underlying the language permit functions to be treated as first-class, logic programming variables. In other words, the values of such variables can be computed through unification. For example, consider the query (mapfun F\[a. b\] \[(g a a), (g a b)\]).</Paragraph>
    <Paragraph position="12"> There is exactly one substitution for F, namely X\(g a X), that makes the above query provable. In searching for such higher-order substitutions, the interpreter for AProlog would need to backtrack over choices of substitutions. For example, if the interpreter attempted to prove the above goal by attempting to unify (F a) with (g a a), it would need to consider the following four possible substitutions for F: X\(g X X) Xk(g a X) X\(g X a) X\(g a a).</Paragraph>
    <Paragraph position="13"> If it chooses any of these other than the second, the interpreter would fail in unifying (F b) with (g a b), and would therefore have to backtrack over that choice.</Paragraph>
    <Paragraph position="14"> It is important to notice that the set of functions that are representable using the typed A-terms of AProlog is not the set of all computable functions. The set of functions that are so representable are in fact much weaker than those representable in, for example, a functional programming language like Lisp. Consider the goal (mapfun F \[a. b\] \[c, d\]).</Paragraph>
    <Paragraph position="15"> There is clearly a Lisp function which maps a to c and b to d, namely, (lambda (x) (if (eq x 'a) 'b (if (eq x 'c) 'd 'e))) Such a function is, however, not representable using our typed A-terms since these donot contain any constants representing conditionals {or fixed point operators needed for recursive definitions). It is actually this restriction to our term structures that makes the determination of function values through unification a reasonable computational operation. null The provision of function variables and higher-order unification has several uses, some of which we shall examine in later sections. Before doing that we consider briefly certain kinds of function terms that have a special status in the logic programming context, namely predicate terms. 4. Predicates as Values From a logical point of view, predicates are not much different from other functions; essentially they are functions that have a type of the form ai --~ ... --* ~ --~ o. In a logic programming language, however, variables of this type may play a different and more interesting role than non-predicate variables. This is because such variables may appear inside the terms of a goal as well as the head of a goal. In a sense, they can be used intensionally and extensionally (or nominally and saturated). When they appear intensionally, predicates can be determined through unification just as functions. When they appear extensionally, they are essentially &amp;quot;executed.&amp;quot; An example of these mixed uses of predicate variables is provided by the following set of clauses; the logical con- null nectives A and V are represented in AProlog by the symbols * and ;, true is represented by true and Z is represented by the symbol sigma that has the polymorphic type (A -&gt; O) -&gt; O.</Paragraph>
    <Paragraph position="16"> sublist P \[XIL\] \[XlK\] :- P X. sublist P L Z.</Paragraph>
    <Paragraph position="17"> sublist P \[XIL\] K :- sublist P L K.</Paragraph>
    <Paragraph position="18"> sublist P \[\] \[\].</Paragraph>
    <Paragraph position="19"> have_age L K :- sublist Z\(sigma Xk(ags Z X)) L K.</Paragraph>
    <Paragraph position="20"> name_age L K :- sublist Z\(age Z A) L K.</Paragraph>
    <Paragraph position="21">  age bob 9.3.</Paragraph>
    <Paragraph position="22"> age sue 24.</Paragraph>
    <Paragraph position="23"> age ned 23.</Paragraph>
    <Paragraph position="24"> The first three clauses define the predicate sublist whose first argument is a predicate and is such that (sublist P L K) is provable if K is some sublist of L and all the members in K satisfy the property expressed by the predicate P. The fourth clause uses sublist to define the predicate have_age which is such that (have_age L K) is provable if K is a sublist of the objects in L which have an age. In the definition of have_age a predicate term that contains an explicit quantifier is used to instantiate the predicate argument of sublist; the predicate (Z\ (sigma X\ (age Z X))), which may be written in logic as Az 3z age(z,z), is true of an individual if that individual has an age. This predicate term needs to be executed in the course of evaluating, for example, the query (have_age \[bob. sue ,ned\] K). The predicate name_age whose definition is obtained by dropping the quantifier from the predicate term defines a different property; (same_age L K) is true only when the objects in K have the same age.</Paragraph>
    <Paragraph position="25">  Another example is provided by the following set of clauses that define the operation of mapping a predicate over a list.</Paragraph>
    <Paragraph position="26"> mappred P \[X\[L\] \[Y\[K\] :- P X Y. mappred P L K.</Paragraph>
    <Paragraph position="27"> mappred P \[\] \[\].</Paragraph>
    <Paragraph position="28"> This set of clauses may be used, for example, to evaluate the following query: mappred (X\Y\(age Y X)) \[23.24\] L.</Paragraph>
    <Paragraph position="29"> This query essentially asks for a list of two people, the first of which is 23 years old while the second is 24 years old. Given the clauses that appear in the previous example, this query has two different answers: \[bob. sue\] and \[ned. sue\]. Clearly the mapping operation defined here is much stronger than a similar operation considered earlier, namely that of,napping a function over a list. In evaluating a query that uses this set of clauses a new goal, i.e. (P X Y), is formed whose evaluation may require arbitrary computations to be performed. As opposed to this, in the earlier case only A-reductions are performed. Thus, mappred is more like the mapping operations found in Lisp than mapfun is.</Paragraph>
    <Paragraph position="30"> In the cases considered above, predicate variables that appeared as the heads of goals were fully iustantiated before the goal was invoked. This kind of use of predicate variables is similar to the use of apply and lambda terms in Lisp: A-contraction followed by the goal invocation simulates the apply operation in the Prolog context. However, the variable head of a goal may not always be fully instantiated when the goal has to be evaluated. In such cases there is a question as to what substitutions should be attempted. Consider, for example, the query (P bob 23). One value that may be returned for P is XkY\ (age X Y), and this may seem to be the most &amp;quot;natural&amp;quot; value. There are, however, many more substitutions for P which also satisfy this goal:</Paragraph>
    <Paragraph position="32"> 24), etc. are all terms that could be picked, since if they were substituted for P in the query they would result in a provable goal. There are, clearly, too many substitutions to pick from and perhaps backtrack over. Furthermore several of these may have little to do with the original intention of the query. A better strategy may be to pick the one substitution that has the largest &amp;quot;extension&amp;quot; in such cases; in the case considered here, such a substitution for P would be the term XkY\true. It is possible to make such a choice without adding to the incompleteness of an interpreter.</Paragraph>
    <Paragraph position="33"> Picking such a substitution does not necessarily trivialize the use of predicate variables. If a predicate occurs intensionally as well as extensionally in a goal, this kind of a trivial substitution may not be possible. To illustrate this let us consider the following set of clauses: primrel father, primrel mother.</Paragraph>
    <Paragraph position="34"> primrel wife.</Paragraph>
    <Paragraph position="35"> primrel husband.</Paragraph>
    <Paragraph position="36"> tel R :- primrel R.</Paragraph>
    <Paragraph position="37"> rel XkYk(sigma Zk(R X Z, S Z Y)) :prlmrel R. prlmrel S.</Paragraph>
    <Paragraph position="38"> The first four clauses identify four primitive relations between individuals (primrel has type (i -&gt; i -&gt; o) -&gt; o). These are then used to define other relations that are a result of &amp;quot;joining&amp;quot; primitive relations. Now if (mother Jane mary) and (wife john jane) are provided as additional clauses, then the query (rel R. R john mary) would yield the substitution X\Y\(sigma Zk(wife X Z. mother Z Y)) for R. This query asks for a relation (in the sense of tel) between john and mary. The answer substitution provides the relation mother-in-law.</Paragraph>
    <Paragraph position="39"> We have been able to show (Theorem 1 \[9\]) that any proof in T of a goal formula from a set of definite clauses which uses a predicate term containing the logical connectives ~, D, or V, can be converted into another proof in which only predicate terms from ~/+ are used. Thus, it is not possible for a term such as</Paragraph>
    <Paragraph position="41"> to be specified by a AProlog program, i.e. be the unique substitution which makes some goal provable from some set of definite clauses. This is because a consequence of our theorem is that if this term is an answer substitution then there is also another A-term that does not use implications or universal quantification that can be used to satisy the given goal. If an understanding of a richer set of predicate constructions is desired, then one course is to leave definite clause logic for a stronger logic. An alternative approach, which we use in Section 6, is to represent predicates as function terms whose types do not involve o.</Paragraph>
    <Paragraph position="42"> This, of course, means that such predicate constructions could not be the head of goals. Hence, additional definite clauses would be needed to interpret the meaning of these encoded predicates.</Paragraph>
  </Section>
  <Section position="5" start_page="250" end_page="251" type="metho">
    <SectionTitle>
5. A Simple Parsing Example
</SectionTitle>
    <Paragraph position="0"> The enriched term structure of AProlog provides two facilities that are useful in certain contexts. The notion of A-abstraction allows the representation of binding a variable over a certain expression, and the notion of application together with A-contraction captures the idea of substitution. A situation where this might be useful is in representing expressions in first-order logic as terms, and in describing logical manipulations on them. Consider, for example, the task of representing the formula VxBy(P(x,y) D Q(y,x)) as a term. Fragments of this formula may be encoded into first-order terms, but there is a genuine problem with representing the quantification. We need to represent the variable being quantified as a genuine variable, since, for instance, instantiating the quantifier involves substituting for the variable. At the same time we desire to distinguish between occurences of a variable within the scope of the quantifier from occurences outside of it. The mechanism of A-abstraction provides the tool needed to make such distinctions. To illustrate this let us consider how the formula above may be encoded as a Aterm. Let the primitive type b be the type of terms that represent first-order formulas. Further let us assume we have the constants &amp; and =&gt; of type b -&gt; b -&gt; b, and all  and some of type (i -&gt; b) -&gt; b. These latter two constants have the type of generalized quantifiers and are in fact used to represent quantifiers. The A-term (all X\ (some Y\ (p X Y =&gt; q Y X) ) ) may be used to represent the above formula.</Paragraph>
    <Paragraph position="1"> The type b should be thought of as a term-level encoding of the boolean type o.</Paragraph>
    <Paragraph position="2"> A more complete illustration of the facilities alluded to above may be provided by considering the task of translating simple English sentences into logical forms. As an example, consider translating the sentence &amp;quot;Every man loves a woman&amp;quot; to the logical form</Paragraph>
    <Paragraph position="4"> which in our context will be represented by the A-term (all X\(man X =&gt; (some Y\(woman Y ~ loves X Y)))) A higher-order version of a DCG \[10\] for performing this task is provided below. This DCG draws on the spirit of Montague Grammars. (See \[11\] for a similar example.)</Paragraph>
    <Paragraph position="6"> --&gt; np P1, vp P2, \[.\].</Paragraph>
    <Paragraph position="7"> --&gt; determ Pl, hem P2.</Paragraph>
    <Paragraph position="8"> --&gt; propernoun P.</Paragraph>
    <Paragraph position="9"> --&gt; noun P.</Paragraph>
    <Paragraph position="10"> --&gt; noun Pl, relcl~2.</Paragraph>
    <Paragraph position="11"> --&gt; transverb Pl, np P2. --&gt; intransverb P.</Paragraph>
    <Paragraph position="12"> --&gt; \[that\], vp P.</Paragraph>
    <Paragraph position="13">  --&gt; \[man\].</Paragraph>
    <Paragraph position="14"> --&gt; \[woman\].</Paragraph>
    <Paragraph position="15"> --&gt; \[john\].</Paragraph>
    <Paragraph position="16"> --&gt; \[mary\].</Paragraph>
    <Paragraph position="17"> --&gt; \[loves\].</Paragraph>
    <Paragraph position="18"> --&gt; \[likes\].</Paragraph>
    <Paragraph position="19">  intransverb lives --&gt; \[lives\]. We use above the type token for English words; the DCG translates a list of such tokens to a term of some corresponding type. In the last few clauses certain constants are used in an overloaded manner. Thus the constant man corresponds to two distinct constants, one of type token and another of type i -&gt; b. We have also used the symbol iota that has type (i -&gt; b) -&gt; i. This constant plays the role of a definite description operator; it picks out an individual given a description of a set of individuals. Thus, parsing the sentence &amp;quot;The woman that loves john likes mary&amp;quot; produces the term (likes (iota Xk(woman X ~ loves X john)) mary), the intended meaning of which is the predication of the relationship of liking between an object that is picked out by the description X\(woman X &amp; loves X john)) and mary.</Paragraph>
    <Paragraph position="20"> Using this DCG to parse a sentence illustrates the role that abstraction and application play in realizing the notion of substitution. It is interesting to compare this DCG with the one in Prolog that is presented in \[10\]. The first thing to note is that the two will parse a sentence in nearly identical fashions. In the first-order version, however, there is a need to explicitly encode the process of substitution, and considerable ingenuity must be exercised in devising grammar rules that take care of this process. In contrast in ),Prolog the process of substitution and the process of parsing are handled by two distinct mechanisms, and consequently the resulting DCG is more perspicuous and so also easier to extend.</Paragraph>
    <Paragraph position="21"> The DCG presented above may also be used to solve the inverse problem, namely that of obtaining a sentence given a logical form, and this illustrates the use of higher-order unification. Consider the task of obtaining a sentence from the logical form (all X\(man X =&gt; (some Y\(woman</Paragraph>
    <Paragraph position="23"> Once this unifier is picked, the task then breaks into that of obtaining a noun phrase from Pk(all Xk(man X =&gt; P X)) and a verb phrase from X\ (some Y\ (woman Y ~ loves X Y).</Paragraph>
    <Paragraph position="24"> The use of higher-order unification thus seems to provide a top-down decomposition in the search for a solution. This view turns out to be a little simplistic however, since unification permits more structural decompositions than are warranted in this context. Thus, another unifier for the pair considered above is</Paragraph>
    <Paragraph position="26"> which does not correspond to a meaningful decomposition in the context of the rest of the rules. It is possible to prevent such decompositions by anticipating the rest of the grammar rules. Alternatively decompositions may be eschewed altogether; a logical form may be constructed bottom-up and compared with the given one. The first alternative detracts from the clarity, or the specificational nature, of the solution. The latter involves an exhaustive search over the space of all sentences. The DCG considered here, together with higher-order unification, seems to provide a balance between clarity and efficiency.</Paragraph>
    <Paragraph position="27"> The final point to be noted is that the terms that are produced at intermediate stages in the parsing process are logically meaningful terms, and computations on such terms may be encoded in other clauses in our language. In Section 7, we show how some of these terms can be directly interpreted as frame-like objects.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML