File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/a92-1001_metho.xml
Size: 23,011 bytes
Last Modified: 2025-10-06 14:12:48
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1001"> <Title>Deriving Database Queries from Logical Forms by Abductive Definition Expansion</Title> <Section position="5" start_page="2" end_page="2" type="metho"> <SectionTitle> 3 Translation Schemas </SectionTitle> <Paragraph position="0"> The ideas sketched out above can be formalised as the inference rules (2), (3) and (4):</Paragraph> <Paragraph position="2"> where 0 is a substitution that replaces each Yi with a different unique constant.</Paragraph> <Paragraph position="4"> where 0 substitutes a unique constant for x.</Paragraph> <Paragraph position="5"> In each of these, the formulas before the :=> are the premises, and the formula after the conclusion. The inference rules can be justified within the framework of the sequent calculus (Robinson 1979), though space limitations prevent us from doing so here. (2) is the base case: it gives sufficient conditions for using (1) to expand P1 (the head of the definition) to P' (its body).</Paragraph> <Paragraph position="6"> The other formulas, (3) and (4), are the main recursive cases. (3) expresses expansion of a conjunction in terms of expansion of one of its conjuncts, adding the other conjunct to the environment of assumptions as it does so; (4) expresses expansion of an existentially quantified form in terms of expansion of its body, replacing the bound variables with unique constants. We will refer to inference rules like (3) and (4) as expansion-schemas or just schemas. One or more such schema must be given for each of the logical operators of the representation language, defining the expansion of a construct built with that operator in terms of the expansion of one of its constituents. null The central use of the equivalences is thus as truthpreserving conditional rewriting rules, which licence translation of the head into the body in environments where the conditions hold. There is a second use of the equivalences as normal Horn-clauses, which as we soon shall see is also essential to the translation process. An equivalence of the form /'1 ^P2 ^... ~ 01^02 A...</Paragraph> <Paragraph position="7"> implies the validity, for any i, of all Horn-clauses either of the form</Paragraph> <Paragraph position="9"> We will refer to these, respectively, as normal and backward Horn-clause readings of the equivalence. For example, the rule and(manl(X) ,employeel(X)) <-> exists ( \[HasCar\], employee (X ,m, HasCar) ) produces two normal Horn-clause readings, manl(X) <- employee(X,m,HasCar).</Paragraph> <Paragraph position="10"> employeel(X) <- employee(X,m,HasCar).</Paragraph> <Paragraph position="11"> and one backward Horn-clause reading, employee(X,m,skl(X)) <- manl(X),employeel(X). where ski is a Skolem function. Note that in the equivalential reading, as well as in the backward one, it is essential to distinguish between existential and universal quantification of variables on the left-hand side. The equivalential reading of a rule of type p(X,Y) <-> q(Y) licences, for example, expansion of p(a,b) to q(b); the justification for this is that q(b) implies p(X,b) for any value of X. However, if the rule is changed to exisgs(\[X\],p(X,Y)) <-> q(Y) the expansion is no longer valid, since q(b) only implies that p(X,b) is valid for some value of X, and not necessarily for a. This pair of examples should clarify why the constants involved in schema (2) must be unique. We are now in a position to explain the basic expansion process; in the interests of expositional clarity, we will postpone mention of the abductive proof mechanism until section 6. Our strategy is to use (2) and the expansion-schemas as the kernel of a system that allows expansion of logical forms, using the equivalences as expandable complex definitions.</Paragraph> <Paragraph position="12"> The actual process of expansion of a complex formula F is a series of single expansion steps, each of which consists of the expansion of an atomic constituent of F. An expansion step contains the following sub-steps: Recurse: descend through F using the expansionschemas, until an atomic sub-formula A is reached. During this process, an environment E has been accumulated in which conditions will be proved, and some bound variables will have been replaced by unique constants.</Paragraph> <Paragraph position="13"> Translate: find a rule Byi.(H A C) ~ B such that (i) H (the 'head') unifies with A with m.g.u. 0, and (ii) 0 pairs the ~Yi only with unique constants in A deriving from existentially bound variables. If it is then possible to prove 0(C) in E, replace A with O(B).</Paragraph> <Paragraph position="14"> Simplify: if possible, apply simplifications to the resulting formula.</Paragraph> </Section> <Section position="6" start_page="2" end_page="3" type="metho"> <SectionTitle> 4 A Simple Example </SectionTitle> <Paragraph position="0"> We now present a simple example to illustrate how the process works.</Paragraph> <Paragraph position="1"> In CLARE, the sentence ($2) (S2) Do any women work on CLARE? receives the LF exists( \[C,E\] , and (woman I (C), work onl (E, C, clare) ) ) This has to be mapped to a query which accesses two database relations, DB_EMPLOYEE(Emp1,Sex,HasCar) and DB_PROJECT_MEMBER(Emp1,Project); the desired result is thus: exists(\[C,H\], and (DB_ EMP LOYEE ( C, w, H ), DB_PRO JECT_MEMBER (clare, C) ) ) (Sex can be w or m). The most clearly non-triviM part is justifying the conversion between the linguistic relation womanl(X) and the database relation DB_EMPLOYEE(X,w,_). Even in the limited PRM domain, it is incorrect to state that &quot;woman&quot; is equivMent to &quot;employee classed as being of female sex&quot;; there are for example large numbers of women who are listed in the DB_PAYEE relation as having been the recipients of payments. It is more correct to say that a tuple of type DB EMPLOYEE (X, w, _) is equivalent to the conjunction of two pieces of information: firstly that X is a woman, and secondly that she is an employee. This can be captured in the rule and (womanl (Person), employeel (Person)) <-> exists ( \[HasCar\] , and (DB_EMPLOYEE (Person, w, HasCar) ) ) (EQI) In the left-to-right direction, the rule can be read as &quot;womanl (X) translates to DB_EMPLOYEE(X, w,_), in contexts where it is possible to prove employeel(X).&quot; For the rule to be of use in the present example, we must therefore provide a justification for employeel (X) 's holding in the context of the query. The simplest way to ensure that this is so is to provide a Horn-clause meaning postulate, employeel (X) <-DB_PROJECT_MEMBER(Proj ect, X). (HCI) which encodes the fact that project members are employees. null Similarly, we will need an equivalence rule to convert between work_onl and DB_PROJECT_MEMBER. Here the fact we want to state is that project-members are pre- null since this will allow us to infer (by looking in the database) that the predicate project 1 holds of clare. Two expansions now produce the desired transformation; in each, the schemas (4) and (3) are used in turn to reduce to the base case of expanding an atom. Remember that schema (4) replaces variables with unique constants; when displaying the results of such a transformation, we will consistently write X* to symbolize the new constant associated with the variable X.</Paragraph> <Paragraph position="2"> The first atom to be expanded is womanl(C*), and the corresponding environment of assumptions is {work_onl(E*,C*,clare)}. womanl(C*) unifies with the head of the rule (EQ1), making its conditions employeel(C*). Using the Horn-clause meaning postulate (HCl), this can be reduced tc DB_PROJECT_MEMBER(Proj ect, C*). Note that C* in thi, formula is a constant, while Project is a variable. Thi,, new goal can now be reduced again, by applying the rul~ (EQ2) as a backwards Horn-clause, to and(work_onl (Event, C*, Project) , project I (Project ) ) ), The first conjunct can be proved from the assumptions instantiating Project to clare; the second conjunct ca* now be derived from the normal Horn-clause reading o rule (EQ3), together with the fact that clare is listed a a project in the database. This completes the reasoninl that justifies expanding womanl (C) in the context of thi query, to exists ( \[HasCar\], and(DB_EMPLOYEE ( C, w, HasCar) ) ) The second expansion is similar; the atom to be e~ panded here is work_onl(E*,C*,clare), and the en vironment of assumptions is {womanl(C*)}. Now th rule (EQ2) can be used; its conditions after unif cation with the head are projectl(clare), the w lidity of which follows from another application c (EQ3). So work onl(E,C,clare) can be expanded t DB_PROJECT_MEMBEK(clare,C), giving the desired r~ sult.</Paragraph> </Section> <Section position="7" start_page="3" end_page="3" type="metho"> <SectionTitle> 5 Existential Quantification </SectionTitle> <Paragraph position="0"> We have so far given little justification for the complic~ tions introduced by existential quantification on the left hand sides of equivalences. These become important i connection with the so-called &quot;Doctor on Board&quot; pro\[ lem (Perrault and Grosz, 1988), which in our domai can be illustrated by a query like ($3), (S3) Does Mary have a car? This receives the LF exists(\[C,E\] , and(carl (C) , havel (E ,mary, C) ) ) ) for which the intended database query will be exists ( IS\], DB_EMPLOYEE (mary, S, y) ) if Mary is listed as an employee. However, we also d, mand that a query like ($4) (S4) Which car does Mary have? should be untranslatable, since there is clearly no way extract the required information from the DB_EMPLOYE relationship.</Paragraph> <Paragraph position="1"> The key equivalence is (EQ4) which defines the linguistic predicate carl. When used in the context of ($3), (EQ4) can be applied in exactly the same way as (EQ2) and (E{~3) were in the previous example; the condition have l (E, P, C) will be proved by looking at the other conjunct, and employeel (mary) by referring to the database. The substitution used to match the carl predication from the LF with the head of (EQ4) fulfills the conditions on the translate step of the expansion procedure: the argument of carl is bound by an existential quantifier both in the LF and in (EQ4). In ($4), on the other hand, carl occurs in the LF in a context where its argument is bound by a find quantifier, which is regarded as a type of universal. The matching substitution will thus be illegal, and translation will fail as required.</Paragraph> </Section> <Section position="8" start_page="3" end_page="3" type="metho"> <SectionTitle> 6 Abductive Expansion </SectionTitle> <Paragraph position="0"> We now turn to the topic of abductive expansion. As pointed out in section 1, it is normally impossible to justify an equivalence between an LF and a database query without making use of a number of implicit assumptions, most commonly ones stemming from the hypothesis that the LF should be interpretable within the given domain.</Paragraph> <Paragraph position="1"> The approach we take here is closely related to that pioneered by Hobbs and his colleagues (Hobbs et a188). We inclu~le declarations asserting that certain goals may be assumed without proof during the process of justifying conditions; each such declaration associates an assumption cost with a goal of this kind, and proofs with low assumption cost are preferred. So for example the meaning postulate relating the linguistic predicate paymentl and the intermediate predicate transaction is and (payment I (Trans), payment from_SRI(Trans)) <-> exist s ( \[Cheque, Dat e, Payee\], transaction(Trans, Cheque ,Date, Payee) )) (EQS) &quot;transactions are payments from SRI&quot; and there is also a Horn-clause meaning postulate</Paragraph> <Paragraph position="3"> and an assumptiondeclaration as sume (payment s _ref erred_t o_are_f rom_SRI, cost (0)) The advantage of this mechanism (which may at first sight seem rather indirect) is that it makes it possible explicitly to keep track of when the assumption payments._veferred_to_are_from_SRI has been used in the course of deriving a database query from the original LF. Applied systematically, it allows a set of assumptions to be collected in the course of performing the translation; if required, CLARE can then inform the user as to their nature. In the current version of the PRM application, there are about a dozen types of assumption that can be made. Most of these are similar to the one shown above: that is to say, they are low-cost assumptions that cheques, payments, projects and so on are SRI-related.</Paragraph> <Paragraph position="4"> One type of assumption, however, is sufficiently different as to deserve explicit mention. These are related to the problem, mentioned in Section 1, of queries &quot;contingently&quot; outside the database's domain. The PRM database, for instance, is limited in time, only containing records of transactions carried out over a specified eighteen-month period. Reflecting this, meaning postulates distinguish between the two predicates transaction and DB_TRANSACTION, which respectively are intended to mean &quot;A transaction of this type took place&quot; and &quot;A transaction of this type is recorded in the database&quot;. The meaning postulate linking them is</Paragraph> <Paragraph position="6"> The interesting thing about (HC2) is that the information needed to prove the condition transaction_data_available(Date) is sometimes, though not always, present in the LF. It will be present in a query like ($I), which explicitly mentions a period; there are further axioms that allow the system to infer in these circumstances that the conditions are fulfilled. However, a query like ($5), ($5) Show the largest payment to Cow's Milk.</Paragraph> <Paragraph position="7"> contains no explicit mention of time. To deal with sentences like ($5), there is a meaning postulate</Paragraph> <Paragraph position="9"> 31/3/91).</Paragraph> <Paragraph position="10"> with an associated assumption declaration as sume ( payments_referred_to_made_between( 17/8/89, 31/3/91), cost (15)).</Paragraph> <Paragraph position="11"> The effect of charging the substantial cost of 15 units for the assumption (the maximum permitted cost for an expansion step being 20) is in practice strongly to prefer proofs where it is not used; the net result from the user's perspective is that s/he is informed of the contingent temporal limitation of the database only when it is actually relevant to answering a query. This has obvious utility in terms of increasing the interface's userfriendliness. null</Paragraph> </Section> <Section position="9" start_page="3" end_page="5" type="metho"> <SectionTitle> 7 Simplification Using Functional </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="3" end_page="5" type="sub_section"> <SectionTitle> Information </SectionTitle> <Paragraph position="0"> A problem arising from the definition-expansion process which we have so far not mentioned is that the database queries it produces tend to contain a considerable amount of redundancy. For example, we shall see below in section 9 that the database query derived from sentence (S1) originally contains three separate instances of the transaction relation, one from each of the original linguistic predicates paymentl, make2 and duringl. Roughly speaking, payraentl(Ev) expands to transaction(Ev ...... ), make2(Ev,Ag,P,To) to transaction(Ev, _, To,_) and during_Temporal (Ev, Date) to transaction(Ev ..... Date); the database query will conjoin all three of these together. It is clearly preferable, if possible, to merge them instead, yielding a composite predication transact ion (Ev,_, To,Dat e).</Paragraph> <Paragraph position="1"> Our framework allows an elegant solution to this problem if a little extra declarative information is provided, specifically information concerning functional relationships in predicates. The key fact is that transaction is a function from its first argument (the transaction identifier) to the remaining ones (the cheque number, the payee and the date). The system allows this information to be entered as a &quot;function&quot; meaning postulate in the form funct ion (transact ion ( Id, ChequeNo, Payee, Date ),</Paragraph> <Paragraph position="3"> This is treated as a concise notation for the meaning</Paragraph> <Paragraph position="5"> which is just a conditional form of the equivalential meaning postulates already described. It is thus possible to handle &quot;merging&quot; simplification of this kind, as well as definition expansion, with a uniform mechanism.</Paragraph> <Paragraph position="6"> In the current version of the system, the transformation process operates in a cycle, alternating expansions followed by simplifications using the same basic interpreter; simplification consists of functional &quot;merging&quot; followed by reduction of equalities where this is applicable.</Paragraph> <Paragraph position="7"> The simplification process is even more important when processing assertions. Consider, for example, what would happen to the pair of sentences ($6) - ($7) without simplification: (S6) Clara is an employee who has a car.</Paragraph> <Paragraph position="8"> ($7) Clara is a woman.</Paragraph> <Paragraph position="9"> ($6) translates into the database form exists(\[A,B\] , DB_EMPLOYEE ( clara, A, y) ) (The second field in DB_EMPLOYEE indicates sex, and the third whether or not the employee has a company car). This can then be put into Horn-clause form as DB_EMPLOYEE (clara, skl ,y) and asserted into the Prolog database. Since Clara is now known to be an employee, ($7) will produce the unit clause DB_EMPLOYEE ( clara, w, sk2) The two clauses produced would contain all the information entered, but they could not be entered into a relational database as they stand; a normal database has no interpretation for the Skolem constants skl and sk2. However, it is possible to use function information to merge them into a single record. The trick is to arrange things so that the system can when necessary recover the existentially quantified form from the Skolemized one; all assertions which contain Skolem constants are kept together in a &quot;local cache&quot;. Simplification of assertions then proceeds according to the following sequence of steps: 1. Retrieve all assertions from the local cache. 2. Construct a formula A, which is their logical conjunction. null 3. Let A0 be A, and let {skl...skn) be the Skolem constants in A. For i = 1 ... n, let xi be a new variable, and let Ai be the formula 3xi.Ai_l \[ski/xi\], i.e. the result of replacing ski with xi and quantifying existentially over it.</Paragraph> <Paragraph position="10"> 4. Perform normal function merging on Am, and call the result A'.</Paragraph> <Paragraph position="11"> 5. Convert A' into Horn-clause form, and replace the result in the local cache.</Paragraph> <Paragraph position="12"> In the example above, this works as follows. After ($6) and ($7) have been processed, the local cache contains the clauses DB_EMPLOYEE ( clara, sk 1, y) DB_EMPLOYEE ( clara, w, sk2) A = A0 is then the formula and (DB EMPLOYEE (clara, sk 1, y) DB_EMPLOYEE (clara, w, sk2) ) and A2 is exists(\[Xl,X2\] and (DB EMPLOYEE ( clara, X 1, y) DB_EMPLOYEE (clara, w, X2) ) Since DB_EMPLOYEE is declared functional on its first argument, the second conjunct is reduced to two equalities: giving the formula exists ( \[Xl, X2\] and (DB_EMPLOYEE ( clara, X I, y)</Paragraph> <Paragraph position="14"> which finally simplifies to A ', DB_EMPLOYEE (clara, w, y) a record without Skolem constants, which can be added to a normal relational database.</Paragraph> </Section> <Section position="2" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 8 Search Strategies for Definition Expansion </SectionTitle> <Paragraph position="0"> This section describes the problems that must be solved at the implementation level if the definition-expansion scheme is to work with acceptable efficiency. The structure of the top loop in the definition-expansion process is roughly that of a Prolog meta-interpreter, whose clauses correspond to the &quot;expansion-schemas&quot; described in section 2.</Paragraph> <Paragraph position="1"> The main predicate in the expansion interpreter contains an argument used to pass the environment of assumptions, which corresponds to the Conds in the schemas above. The interpreter successively reduces the formula to be expanded to a sub-formula, possibly adding new hypotheses to the environment of assumptions. When an atomic formula is reached, the interpreter attempts to find an equivalence with a matching head (where &quot;matching&quot; includes the restrictions on quantification described at the end of section 2), and if it does so then attempts to prove the conditions. If a proof is found, the atom is replaced by the body of the selected equivalence.</Paragraph> <Paragraph position="2"> The computationally expensive operation is that of proving the conditions; since inference uses the equivalences in both directions, it can easily become very inefficient. The development of search techniques for making this type of inference tractable required a significant effort, though their detailed description is beyond the scope of this paper. Very briefly, two main strategies are employed. Most importantly, the application of &quot;backward&quot; Horn clause readings of equivalences is restricted to cases similar to that illustrated in section 4, where there are dependencies between the expansion of two or more conjuncts. In addition to this, there are a number of heuristics for penalizing expenditure of effort on branches judged likely to lead to infinite recursion or redundant computation.</Paragraph> <Paragraph position="3"> For the project resource management domain, which currently has 165 equivalence rules, the time taken for query derivation from LF is typically between 1 and 10 seconds under Quintus Prolog on a Sun Sparcstation 2.</Paragraph> </Section> </Section> <Section position="10" start_page="5" end_page="5" type="metho"> <SectionTitle> 9 A Full Example </SectionTitle> <Paragraph position="0"> In this section, we will present a more elaborate illustration of CLARE's current capabilities in this area, showing how the process of definition expansion works for the sentence (S1). This initially receives an LF which after some simplification has the form find( \[PayEv\] , exist s ( \[Payer, MakeEv\] , and (payment I (PayEr), and (make2 (MakeEv, Payer, PayEr, bt ), duringl (PayEr, interval(date(1990, I, 1)), date(1990,12,31))))) As already indicated, the resulting database query will have as its main predicate the relation DB_TRAN-</Paragraph> </Section> class="xml-element"></Paper>