File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/c02-1128_abstr.xml

Size: 40,973 bytes

Last Modified: 2025-10-06 13:42:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1128">
  <Title>Text Authoring, Knowledge Acquisition and Description Logics</Title>
  <Section position="1" start_page="0" end_page="67" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> We present a principled approach to the problem of connecting a controlled document authoring system with a knowledge base. We start by describing closed-world authoring situations, in which the knowledge base is used for constraining the possible documents and orienting the user's selections. Then we move to open-world authoring situations in which, additionally, choices made during authoring are echoed back to the knowledge base. In this way the information implicitly encoded in a document becomes explicit in the knowledge base and can be re-exploited for simplifying the authoring of new documents. We show how a Datalog KB is sufficient for the closed-world situation, while a Description Logic KB is better-adapted to the more complex open-world situation. All along, we pay special attention to logically sound solutions and to decidability issues in the different processes.</Paragraph>
    <Paragraph position="1"> Introduction Recently there has been a surge of interest in interactive natural language generation systems (Paris et al., 1995; Power and Scott, 1998; Coch and Chevreau, 2001); such systems rely on a capability of generating a natural language text from an abstract content representation, but -- contrary to traditional NLG (Natural Language Generation) systems -- this representation is only partially available at the beginning of the text production process; it is then gradually completed by a human author, typically using content-selection menus correlated with regions of the evolving generated text..</Paragraph>
    <Paragraph position="2"> One such system, MDA (Multilingual Document Authoring) (citation omitted) is based on a formal specification -- using a variant of Definite Clause Grammars (DCGs) (Pereira and Warren, 1980) -- of what counts as a valid abstract content representation. The different derivation trees in the grammar correspond to texts with different contents, and at each step of the authoring process the user is asked to make interactive choices on how to expand the current partial derivation tree one step further. There are important analogies between this process and the process of authoring an XML document under the control of a DTD or a Schema, but DCGs are more expressive in terms of the contextual constraints that can be expressed and also are more adapted to the production of grammatical text.</Paragraph>
    <Paragraph position="3">  In published MDA work, all the knowledge about what constitutes a valid document is provided in the grammars, with no clear separation between (1) world knowledge (the fact that a certain pharmaceutical drug contains some molecule makes it dangerous for a certain patient condition) and (2) constraints about document organization (if a certain drug is dangerous for a certain condition, then a warning should be generated at a certain place in the document).</Paragraph>
    <Paragraph position="4"> A more principled and modular solution is to leave in the grammar all constraints pertaining to document/textual organization, and to use an external logical theory to express knowledge about the world described by the documents. A document will then be constrained to have a semantic interpretation that is compatible with the external theory.</Paragraph>
    <Paragraph position="5"> The aims of this paper are the following.</Paragraph>
    <Paragraph position="6"> 1. To provide a formally precise and computationally tractable model for this approach. The logical theory we will be using will take the form of a Description Logic (DL) knowledge base (Donini et al., 1996); DLs are subsets of FOPC (First-Order Predicate Calculus) which provide a trade-off between expressivity and tractability (in particular decidability) and have recently be given a lot of attention in the knowledge representation community and in activities around the Semantic Web. They are now starting to attract attention in the computational linguistics community as well (Gabsdil et al., 2001; Striegnitz, 2001); 2. To show how this model can be used not only for constraining the document during the authoring process, but also to use the document as a source of new knowledge to be added in a logically sound way to the KB (knowledge acquisition); 3. To discuss conditions under which the whole process of authoring is decidable.</Paragraph>
    <Paragraph position="7">  The grammars used in MDA are typically more &amp;quot;semantically&amp;quot; than &amp;quot;syntactically&amp;quot; oriented, and a choice between two alternatives for expanding a nonterminal in the grammar tends to correlate with a clear distinction of meaning in the final text. A given grammar covers a semantically unified class of documents (e.g. employment offers, drug package leaflets, etc.), in a way analogous to the customized XML DTDs used for technical documentation.</Paragraph>
    <Paragraph position="8"> The paper is organized as follows. We first describe a class of situations, closed-world authoring, in which the flow of information is strictly from the knowledge base to the document. The MDA approach is briefly presented, and we show how the document specification can be interfaced with an &amp;quot;informationally complete&amp;quot; KB, using a Datalog representation (Ceri et al., 1989); then we present conditions on the specification which guarantee decidability of the closed-world authoring process, that is, that guarantee that at each authoring step, the selections presented to the author are &amp;quot;real choices&amp;quot; which will not result in dead-ends at a later stage of authoring. We then move on to open-world authoring, in which the flow of information is bi-directional between the KB and the document. Now we start working with an &amp;quot;informationally incomplete&amp;quot; KB, using a Description Logic representation, which can be satisfied in several &amp;quot;possible worlds&amp;quot;; the document being authored has to be compatible with at least one of these possible worlds. We give conditions on the grammar which guarantee that, as long as the DL on which the KB is built is intrinsically decidable, then the authoring process as a whole is also decidable. We introduce a notion of light semantics, which corresponds to a restricted form of semantic interpretation for the document allowing exchange of information between the document and the knowledge base and permitting knowledge acquisition during the authoring process. In particular the knowledge gained during the authoring of a document can be re-used for simplifying the authoring of other documents.</Paragraph>
    <Paragraph position="9"> Closed-world authoring MDA. We start by introducing briefly the MDA framework through a simplified example. The focus of this paper is on the document content aspects (as represented by what we call the abstract content tree) and not on the textual realization aspects, which are handled in a simplistic way here (see (citation omitted) for details on MDA).</Paragraph>
    <Paragraph position="10"> Grammar G1: dfa1: dfa(D,F,A) ! &amp;quot;the drug&amp;quot;, drug(D), &amp;quot;has the form of a&amp;quot;, dform(D,F), &amp;quot;and is administered by&amp;quot;, dadm(D,A).</Paragraph>
    <Paragraph position="11"> dform1: dform(D,F) ! form(F), &amp; df(D,F).</Paragraph>
    <Paragraph position="12"> dadm1: dadm(D,A) ! admin(A), comments(D,A).</Paragraph>
    <Paragraph position="13"> coms1: comments(D,A) ! &amp;quot; &amp;quot;, &amp; da(D,A).</Paragraph>
    <Paragraph position="14"> coms2: comments(D,A) ! comments(D,A), &amp;quot;;&amp;quot;, comment(D,A).</Paragraph>
    <Paragraph position="15"> com1: comment(D,A) ! &amp;quot;strictly follow instructions&amp;quot;. com2: comment(diprox,A) ! &amp;quot;take a glass of water&amp;quot;. diprox: drug(diprox) ! &amp;quot;Diprox&amp;quot;.</Paragraph>
    <Paragraph position="16"> xenor: drug(xenor) ! &amp;quot;Xenor&amp;quot;.</Paragraph>
    <Paragraph position="17"> burpal: drug(burpal) ! &amp;quot;Burpal&amp;quot;.</Paragraph>
    <Paragraph position="18"> tablet: form(tablet) ! &amp;quot;tablet&amp;quot;.</Paragraph>
    <Paragraph position="19"> solution: form(solution) ! &amp;quot;solution&amp;quot;.</Paragraph>
    <Paragraph position="20"> swallow: admin(swallow) ! &amp;quot;swallowing&amp;quot;.</Paragraph>
    <Paragraph position="21"> chew: admin(chew) ! &amp;quot;chewing&amp;quot;.</Paragraph>
    <Paragraph position="22"> drink: admin(drink) ! &amp;quot;drinking&amp;quot;.</Paragraph>
    <Paragraph position="23"> Auxiliary clauses D1: df(diprox,tablet).</Paragraph>
    <Paragraph position="24"> df(xenor,tablet).</Paragraph>
    <Paragraph position="25"> df(burpal,solution).</Paragraph>
    <Paragraph position="26"> da(diprox,swallow).</Paragraph>
    <Paragraph position="27"> da(xenor,chew).</Paragraph>
    <Paragraph position="28"> da(burpal,drink).</Paragraph>
    <Paragraph position="29"> The form of grammar G1 is a variant of the DCG format (Pereira and Warren, 1980): (1) each of the grammar clauses is given a unique name (e.g. dfa1); (2) the nonterminals are notated in lowercase and are parameterized by variable or ground terms; (3) the terminals are enclosed in double quotes; (4) the auxiliary predicates (a.k.a. Prolog calls, usually enclosed in curly brackets) appear after the ampersand sign.</Paragraph>
    <Paragraph position="30"> Free generation. If we start from the initial nonterminal dfa(D,F,A) and expand it nondeterministically until we get to terminal strings (so-called free generation mode), we can obtain (among others) the texts: (T1) &amp;quot;the drug Diprox has the form of a tablet and is administered by swallowing&amp;quot;, (T2)&amp;quot;the drug Xenor has the form of a tablet and is administered by chewing; strictly follow instructions&amp;quot;, but not the text: &amp;quot;the drug Burpal has the form of a tablet and is administered by swallowing&amp;quot;.</Paragraph>
    <Paragraph position="31"> Authoring. The authoring mode is different from the free generation mode in that it gives the author the responsibility of choosing expansions for nonterminals rather than enumerating all possible expansions nondeterministically. Thus, after all the obligatory expansions from dfa(D,F,A) (expansions for which there is only one possibility in the grammar) have been done, the frontier of the derivation tree contains some terminals and the nonterminals drug(D), form(F), admin(A), comments(D,A), and has to satisfy the constraint df(D,F).At this point the user can freely choose which of these non-terminals to expand next -- say form(F). There are two possible ways to expand this nonterminal: through the clause of name tablet or through the clause with name solution, and the system displays to the user a menu listing these two choices. Assume that the author chooses tablet. The nonterminal form(F) is expanded into the terminal &amp;quot;tablet&amp;quot;, F is unified with tablet, and the process is repeated until no more nonterminal needs to be expanded. null At the end of this process, the collection of choices that the user has made can be represented as a tree labeled by names of clauses, for instance: (AT1) dfa1(diprox, dform1(tablet), dadm1(swallow,coms1)) from which a complete derivation tree can be reconstructed as well as the associated terminal string, which in this case is seen to be equal to T1.</Paragraph>
    <Paragraph position="32"> Such a tree of choices as AT1 will be called an abstract content tree, or simply an abstract tree. Different abstract trees correspond to different sets of choices of content and also to different document instances in the class of documents associated with the grammar. It is then natural to see an abstract tree as a representation of the content of a document belonging to that class.</Paragraph>
    <Paragraph position="33">  Life/death issues There is one important issue that we did not discuss in the explanation just given, namely how exactly the system determines which choices to propose to the user once he has selected a new nonterminal to be expanded. One possibility is to present him with all the possible names of clauses which are headed by the nonterminal in question (as was done for form(F)), but then it is possible that the author makes a choice that will never lead to a complete valid document.</Paragraph>
    <Paragraph position="34"> For instance, let us go back to the point just after the author has chosen tablet as the clause for expanding form(F); at this point the nonterminals on the frontier of the derivation tree are: drug(D), admin(A), comments(D,A), with the constraint df(D,tablet) in the background. Suppose the author next chooses to expand admin(A); if the system was working in a naive fashion, it would then display the choices swallow, chew, and drink. However it is easy to see that drink is in fact ruled out as a choice: any complete document would eventually have to satisfy the constraints df(D,tablet) and da(D,drink),but there is no drug in the database which is compatible with both this form and this administration. We can say that drink is a &amp;quot;dead&amp;quot; choice in this context.</Paragraph>
    <Paragraph position="35"> In order to prevent the author from entering a deadend, what is really needed is for the system to foresee such possible clashes and to present to the author only those choices which may eventually lead to a valid document; in the case at hand, it should present the &amp;quot;live&amp;quot; choices swallow and chew.</Paragraph>
    <Paragraph position="36"> Remark. When exactly one choice is possible, the system should not even present any choice to the author, but make the only possible expansion decision on its own: authoring should be done automatically at that point. In these cases the authoring mode becomes closer to the classical non-interactive NLG mode, and in the limit, when knowledge-base inferences force all authoring choices, the two modes converge.</Paragraph>
    <Paragraph position="37"> Finitely-parameterized grammars, Datalog, and decidability of life/death In the current MDA system, the method for determining whether a choice is live or dead is incomplete. This is due to the fact that the non-terminal parameters can be terms of arbitrary complexity (built from variables, constants and functional symbols) and then it is easy to simulate with a DCG an arbitrary Prolog program.</Paragraph>
    <Paragraph position="38">  Determining whether the initial  This abstract tree approach to document content stems from the work of Aarne Ranta on his &amp;quot;Grammatical Framework (GF)&amp;quot; in which he was inspired by the interactive proof editors in a higher-order typed/functional setting such as ALF and COQ in which the user attempts to build a proof of a formula through stepwise top-down refinements of a partial proof (Ranta, 1999 ). In the present paper the abstract trees can be seen as proofs of an initial goal in a logic programming setting. null  Even without the use of auxiliary predicates: a pure Prolog program is equivalent to a DCG generating empty strings. nonterminal may lead to a complete valid document is then undecidable in general. It is usually possible for the grammar writer to exercise some care in designing the grammars so that life/death problems do not hinder the authoring process in practice, but a principled solution would be preferable.</Paragraph>
    <Paragraph position="39"> In order to tackle this problem, we will be making two fundamental assumptions: (i) the nonterminal parameters in the grammar clauses -- as well as the goal arguments in the auxiliary program clauses -- are variables or constants; (ii) all variables take their value in the finite set of constants present in the grammar and auxiliary clauses .</Paragraph>
    <Paragraph position="40"> Under these assumptions, we are now dealing with a DCG with finite-domainparameters both for its grammar and for its auxiliary predicates components. The auxiliary predicate component is then formally the same as a Datalog database (Ceri et al., 1989), as in our example D1.</Paragraph>
    <Paragraph position="41">  We can then see the authoring model as consisting of two components: a finitely parameterized DCG, and a Datalog database.</Paragraph>
    <Paragraph position="42"> Now, it is striking that, when working with finite-domain DCGs, not only the auxiliary predicate component, but also the grammar component, has formal similarities to a Datalog base: in fact, if one &amp;quot;forgets&amp;quot; in the grammar G1 all the terminal strings, then one obtains a Datalog program DP1: DP1: dfa1: dfa(D,F,A) drug(D), dform(D,F), dadm(D,A).</Paragraph>
    <Paragraph position="43"> dform1: dform(D,F) form(F), &amp; df(D,F).</Paragraph>
    <Paragraph position="44"> dadm1: dadm(D,A) admin(A), comments(D,A).</Paragraph>
    <Paragraph position="45"> coms1: comments(D,A) &amp; da(D,A).</Paragraph>
    <Paragraph position="46"> coms2: comments(D,A) comments(D,A), comment(D,A). null com1: comment(D,A).</Paragraph>
    <Paragraph position="47"> com2: comment(diprox,A).</Paragraph>
    <Paragraph position="48"> diprox: drug(diprox).</Paragraph>
    <Paragraph position="49"> xenor: drug(xenor).</Paragraph>
    <Paragraph position="50"> burpal: drug(burpal).</Paragraph>
    <Paragraph position="51"> tablet: form(tablet).</Paragraph>
    <Paragraph position="52"> solution: form(solution).</Paragraph>
    <Paragraph position="53"> swallow: admin(swallow).</Paragraph>
    <Paragraph position="54"> chew: admin(chew).</Paragraph>
    <Paragraph position="55"> drink: admin(drink).</Paragraph>
    <Paragraph position="56"> Deciding the productivity of a parameterized nonterminal in the combination G1+D1 is then formally equivalent to proving it as a program goal in the combination DP1+D1 (which is itself a global Datalog program), and a derivation in G1 has a one-to-one correspondence to a proof in DP1.</Paragraph>
    <Paragraph position="57"> For instance, deciding the productivity of the nonterminal dfa(D,tablet,drink)is equivalent to proving the goal dfa(D,tablet,drink) in the Datalog program DP1+D1: be- null The database D1 only contains facts (Datalog's EDB), but it could also contain recursively defined predicates (Datalog's IDB) without impact on the discussion.</Paragraph>
    <Paragraph position="58"> cause no such proof can be found, the nonterminal is not productive.</Paragraph>
    <Paragraph position="59"> Now, the interest of this translation is that provability of a goal in a Datalog program is not only known to be decidable, but also to be amenable to efficient implementation (Abiteboul et al., 1995).</Paragraph>
    <Paragraph position="60"> Consider the situation discussed before, just after the author has chosen the form tablet, and at the point where the system needs to present him with a list of choices for admin(A). At that point, the system is confronted with the following question: what are the possible values for A such that the following goal: drug(D), admin(A), comments(D,A), df(D,tablet) is satisfiable? This question can be succinctly represented as the following conjunctive Datalog query: answer(A) drug(D), admin(A), comments(D,A), df(D,tablet) for which a number of optimization techniques exist (see (Ceri et al., 1989; Abiteboul et al., 1995)), and which returns as possible values for A the set fswallow, chewg.</Paragraph>
    <Paragraph position="61">  The advantage for authoring is clear: at each choice point, the system is capable to return a valid list of choices more efficiently than by applying more naive techniques. It is also worthy of note that some fundamental issues in authoring are so closely connected with database query optimization.</Paragraph>
    <Paragraph position="62">  Open-world authoring In an authoring context, some grammatically valid documents will never be authored because they do not correspond to any possible state of affairs. Typically the grammar specifies a much larger set of documents than  In this case, the set of possible values for the parameter A coincides with the set of possible values for the names of the expanding clauses for admin(A). In general it is not the case, but it is simple to add a parameter to each nonterminal that indexes its (finitely many) possible expanding clauses.</Paragraph>
    <Paragraph position="63">  A DCG is nothing else than a context-free grammar with parameterized nonterminals and a unification mechanism between the parameters. Because of the analogy between DTD/Schemas and CFGs, it seems likely that the same approach could be useful for extending XML-based authoring through the use of finite-domain parameters and unification.</Paragraph>
    <Paragraph position="64">  The fact that the program DP1 is equivalent to G1 as far as non-terminal productivity is concerned does not mean that the two objects are equivalent for authoring purposes. The grammar associates different texts with different derivations of the same ground nonterminal (for instance, there are an infinite number of texts produced by comments(diprox,tablet), corresponding to different combinations of coms1, coms2, com1, com2.), whereas the program is of interest to us here not in the different proofs of a given ground goal, but in the fact that this goal is provable or not. Note that the clause of name coms2 can be eliminated from the program DP1 without changing its interpretation (because in order to prove comments(D,A) it requires a proof of comments(D,A)), but making the program non-recursive and therefore simplifying the check for productivity; eliminating the same clause from G1 would however completely change the meaning of the grammar.</Paragraph>
    <Paragraph position="65"> the ones which are actually possible. If this were not the case, then an author would not have to take the trouble to direct the production process by making content choices that he alone can make. That is to say, a document which has actually been authored conveys more meaning than just stating &amp;quot;I am a valid document relative to the specification&amp;quot;. However, in a closed-world environment as we have been discussing until now, that additional meaning has no explicit counterpart in the knowledge-base; it is only represented implicitly in the abstract content tree, in a form which is not perspicuous and would be difficult to re-use for the authoring of other documents or to share with other processes.</Paragraph>
    <Paragraph position="66">  In a closed-world context, the KB constraints which are tested during the authoring process are completely passive: they are seen purely as validity checks against the knowledge base.</Paragraph>
    <Paragraph position="67"> By contrast, open-world authoring sees the KB constraints not only as checks, but also as conditions on the world being described. When authoring a document, the author is not neutrally picking out one of the documents valid relative to the KB, but asserting that the constraints do hold of the actual world.</Paragraph>
    <Paragraph position="68">  Let us illustrate this idea. We are now viewing the formal specification of valid documents as consisting, as before, of a grammar of the type previously described (we will take again the grammar G1), but instead of a Datalog database, we are now using an informationally incomplete description logic knowledge base KB1:</Paragraph>
    <Paragraph position="70"> This knowledge-base is written using a certain number of DL constructors -- existential quantification, concept  Note an analogy here with the Semantic Web perspective: tags used in XML documents may convey implicit semantic information, but in order to make this information sharable, it had better be represented explicitly in some formal knowledge representation.  In the language of pragmatics, the author is then performing a speech act by committing to the &amp;quot;truth&amp;quot; of the document.  An introduction to DLs would take us too far afield; let's just say that there is a whole family of DLs, which differ by the logical constructors they allow, and that most can be seen as decidable fragments of first-order logic. An accessible recent introduction to DLs is available at http://www.cs.man.ac.uk/ horrocks/Slides/leipzig-jun-01.pdf . enumeration, disjoint union (an abbreviation: A=B]C can be replaced by the two constraints A=BtC and BuC = ?, and B]C]D is an abbreviation for (B]C)]D)--, and we are assuming the unique name convention (all named individuals are different). The constructors which are used place the knowledge base in the class ALCO (Donini et al., 1996).</Paragraph>
    <Paragraph position="71"> The TBOX can be glossed in the following way. The TabletDrugs are those drugs D for which df(D,tablet), the SolutionDrugs those drugs for which df(D,solution), ..., the DrinkDrugs those drugs for which da(D,drink). The drugs can come in either one of the two forms: tablet and solution, and in either one of the three administrations swallow, chew and drink. Finally TabletDrugs are either swallow drugs or chew drugs, whereas SolutionDrugs are always drink drugs. The ABOX says what we already know about the form and administration of Burpal.</Paragraph>
    <Paragraph position="72"> The list of relations in D1 is compatible with KB1: indeed it is easy to see that one can obtain a model of KB1 by taking the relations of D1 along with the facts: diprox: TabletDrugs</Paragraph>
    <Paragraph position="74"> In a certain sense the TBOX of KB1 can be seen as a conceptual schema for the database D1, which states certain general relations about the forms and administrations of drugs, or about the uniqueness of form and administration for a drug, but which does not say how many drugs there are or what are the properties of these drugs.</Paragraph>
    <Paragraph position="75"> Valid abstract trees and incomplete KBs Let us return to our authoring example in this new context. We now associate grammar G1 with KB1 instead of DB1.</Paragraph>
    <Paragraph position="76"> We then make the assumption that all constant parameters appearing in the grammar (diprox, xenor, burpal, tablet, etc.) are to be considered distinct named individuals for the KB, and that the constraint relations (da, df) are all unary or binary and correspond to concepts or roles in the KB.</Paragraph>
    <Paragraph position="77"> Let's now look again at the abstract tree AT1: dfa1(diprox, dform1(tablet), dadm1(swallow,coms1)) This abstract tree is valid relative to G1 (it corresponds to a possible complete derivation) but it is not necessarily valid relative to the combination &lt;G1,KB1&gt;; this notion is defined in the following way: because the abstract tree uniquely determines the set of rules which have been used for building the derivation, it also uniquely determines a set of associated KB constraints; thus AT1 is associated with the set of constraints: fdf(diprox,tablet), da(diprox,swallow)g.</Paragraph>
    <Paragraph position="78"> Now we say that AT1 is valid relative to the combination &lt;G1,KB1&gt; if and only if it is both valid relative to G1 and if its associated set of constraints is compatible with KB1. In other words we need to show that the addition of the two constraints df(diprox,tablet), da(diprox,swallow) to the ABOX still leads to a satisfiable knowledge base. This can be shown by exhibiting a model as we did a few paragraphs ago, and therefore AT1 is a valid abstract tree relative to &lt;G1,KB1&gt;.</Paragraph>
    <Paragraph position="79"> The informal reasoning by which we just showed the satisfiability of KB1 extended with the two relations can also be established by a computational proof, due to the decidability of KB-consistency checking in ALCO (Donini et al., 1996).</Paragraph>
    <Paragraph position="80"> Open- vs. closed-world authoring, satisfiability vs.</Paragraph>
    <Paragraph position="81"> deducibility Note that validity of an abstract tree in the open-world authoring context involves the satisfiability of a conjunction of constraints relative to the knowledge base, whereas the notion of validity of an abstract tree in the closed-world authoring context involves the dual notion of deducibility of a conjunction of constraints relative to the knowledge-base (in the Datalog context, being true in the minimal Herbrand model is the same as being deducible from the Horn clauses constituting the base).</Paragraph>
    <Paragraph position="82"> Decidability of the authoring process In order to illustrate the process, let's go back to the point in the authoring after all obligatory expansions of dfa(D,F,A) have been made, where the frontier of the derivation tree is drug(D), form(F), admin(A), comments(D,A), and where the user has chosen to expand form(F). There are apparently two possible expansions: the clauses with names tablet and solution. Before presenting these choices to the user, the system must check that they are live, namely, as before, that they may lead to a complete valid document.</Paragraph>
    <Paragraph position="83"> Choosing the tablet expansion leads to the derivation frontier drug(D), admin(A), comments(D,A) with constraint df(D,tablet). In order to decide whether the frontier is live, the system needs to enumerate possible complete derivations of this frontier until it finds one that is satisfiable relative to KB1 and then return a positive answer, and if it does not find one, it should return a negative answer. In principle, the enumeration could never stop, but because of the finite parameter condition on the grammar, the system has only to enumerate a finite number of trees; this is because if a derivation tree is of the form S(... A1(... A2(...) ...) ...) where S is a ground instantiation of the initial nonterminal and A1 and A2 are the same ground instantiation of a nonterminal (&amp;quot;repetitive derivation&amp;quot;), then the satisfiability of S(... A1(... A2(...) ...) ...) relative to KB1 implies the satisfiability of S(... A2(...) ...): a model of the larger derivation tree is again a model of the smaller derivation tree. This means that when checking life/death we do not ever need to consider a repetitive derivation during the enumeration of derivations. In particular, because we are dealing with a finite parameter domain, the derivations that we need to consider have a bounded depth (otherwise we would necessarily encounter repetitive situations), and the decidability of the process follows.</Paragraph>
    <Paragraph position="84">  The same reasoning could be made for proving decidability in the In the case of choosing tablet, the abstract tree AT1 is enumerated at some point in the process, and its satisfiability relative to KB1 can be decidably checked: tablet is then shown to be a live authoring choice. The same process shows solution to be live.</Paragraph>
    <Paragraph position="85"> Now, let's go to the point where, after having chosen tablet, the author decides to select an expansion for admin(A). The derivation frontier is then drug(D), admin(A), comments(D,A), with the constraint df(D,tablet), and the apparently possible expansions are swallow, chew, and drink. Both swallow and chew can be seen to be live by a similar reasoning as before. In the case of drink, we have to check whether the sequence drug(D), comments(D,drink), with the constraint df(D,tablet) is live. Let's choose to expand comments(D,drink) first. The expansion coms2 leads to a repetitive situation (comments(D,drink) is above comments(D,drink) in the derivation path.) and is therefore discarded; the expansion coms1 leads to the frontier drug(D), with the constraints df(D,tablet) and da(D,drink). However the two constraints cannot be simultaneously satisfied in KB1; This can be shown computationally by using the satisfiability check in KB1, but also by the following informal reasoning: df(D,tablet) and da(D,drink) imply that D is both in TabletDrugs and in DrinkDrugs; by the second fact it is in SolutionDrugs,butSolutionDrugs and TabletDrugs have an empty intersection. Thus all expansions of comments lead to invalidity; hence drink is not a live choice.</Paragraph>
    <Paragraph position="86"> Open-World authoring and hybrid knowledge bases The process that we have just described for finding live selections, although decidable, is clearly not optimized. In the case of closed-world authoring that we discussed at the beginning of this paper, we said that, from the point of view of detecting life/death situations, a Datalog program such as DP1 could be used in place of the grammar G1, and that the combination of DP1 + D1 could be treated as a global Datalog database to which standard query optimization techniques could be applied.</Paragraph>
    <Paragraph position="87"> Is there some comparable possibility here? A clue comes from the area of hybrid knowledge bases in the description logic community. Some researchers have shown that by associating Description Logics with Datalog one can significantly increase the expressive power of both formalisms, which have a nice complementarity (recursive definitions can be easily expressed in Datalog, but not in DLs; partial knowledge can be easily expressed in DLs, but not in Datalog) (Levy and Rousset, 1996; Donini et al., 1998). The open-authoring approach we propose has strong connections with these hybrid knowledge-bases (citation omitted) and it seems likely that optimization techniques from that area may be transferred to our problem. null Light semantics and knowledge acquisition Let's step back and reconsider the rationale behind open-world authoring. We are considering a situation in which there is an &amp;quot;actual world&amp;quot; which is not completely known eiclosed-world case, instead of appealing there to the decidability of Datalog queries.</Paragraph>
    <Paragraph position="88"> ther to the knowledge base or to the author; however both the KB and the author are supposed to have correct partial knowledge about that world.</Paragraph>
    <Paragraph position="89"> The system presents the author with a collection of documents which, from its point of view, are compatible with what it knows about the actual world. Among these documents, the author picks (during the authoring process) one document that, from his point of view,is compatible with what he knows about the actual world.</Paragraph>
    <Paragraph position="90"> So the author is not passively exploring the space of document considered possible by the system (although that could certainly be a nonstandard mode of operation if the author takes a developer's hat and wants to see what the system believes is possible), but is actively committing to certain facts about the world.</Paragraph>
    <Paragraph position="91"> What are these facts? What the author is producing is an abstract content tree, which corresponds to a completely specific choice of expansion rules for the nonterminals of the grammar. This means that the abstract tree completely determines a set of associated ground KB relations. For instance AT1 determines the set fdf(diprox,tablet), da(diprox,swallow)g. These are the facts that the author asserts to be true in the actual world. Light semantics. Such facts are aspects of the document content that the document &amp;quot;exports&amp;quot; to the knowledge base and thereby makes formally explicit. They provide what we shall call a light semantics for the document. In terms of light semantics, if we were to build a standard logical form for the whole document, for instance for AT1, that logical form would simply be the conjunction of the associated asserted facts df(diprox,tablet) ^da(diprox,swallow). Light semantics does not attempt to model the whole semantics of the document (for instance, in our example, there is no explicit logical counterpart to the different choices for the comment nonterminal), but focuses instead on modeling those parts of the document semantics that can be tractably handled both by the knowledge representation component and by the authoring process.</Paragraph>
    <Paragraph position="92">  Knowledge acquisition. Once the author has committed to a document, he has revealed a certain number of facts that he knows about the actual world and that the  When working in a more powerful framework for logical forms, such as Montague semantics, the interpretation of a document may depend in non-monotonic ways on the interpretations of its parts, as in negated contexts: &amp;quot;it is not the case that ...&amp;quot; or in opaque contexts: &amp;quot;John believes that ...&amp;quot;. Predicting at authoring time which selections are live relative to such a knowledge representation framework, while possible in principle, seems to be a difficult research question. Another (orthogonal) argument in favor of light semantics is the fact that if we consider the communicative role of a document inside a predefined class of documents, then there is no point in formally representing those parts of a document that are not contrastive between two documents in the class; for instance, there is no need to analyze the sentence &amp;quot;Always ask your doctor's advice in case of doubt&amp;quot; in any semantic detail if it appears in all documents of the class: these semantic details are irrelevant to the informational content of the document as opposed to other documents of the class. A thorough discussion of this point, connected to considerations of information theory, would bring us outside the scope of this paper.</Paragraph>
    <Paragraph position="93"> KB possibly did not &amp;quot;know&amp;quot;. These facts (in our example: df(diprox,tablet) and da(diprox,swallow)) can then be added to the ABOX of the knowledge base, and can be used either for their own sake (knowledge acquisition) or in order to constrain the authoring of a new document.</Paragraph>
    <Paragraph position="94"> So after the authoring of AT1, the ABOX of KB1 becomes: null</Paragraph>
    <Paragraph position="96"> Suppose now the user authors a new document, first making a selection for drug(D), and choosing diprox.</Paragraph>
    <Paragraph position="97"> Then the KB &amp;quot;knows&amp;quot; that tablet is the only choice for F and swallow the only choice for A. Indeed they are possible choices (because df(diprox,tablet) and da(diprox,swallow) are in the ABOX of the KB), but are also the only choices, for diprox is now known to be in TabletDrugs and in SwallowDrugs; it can therefore not be in SolutionDrugs or in ChewDrugs or in DrinkDrugs, which means that none of the facts df(dirprox,solution), da(diprox,chew) or da(diprox,drink) may hold. After the author's choice of diprox, the derivation frontier is form(F), admin(A), comments(diprox,A) with the constraint df(diprox,F). The author then chooses to expand form(F), and the system notices that the only live choice is tablet, and performs this expansion without asking the user. The frontier is now admin(A), comments(diprox,A), with the constraint df(diprox,tablet).</Paragraph>
    <Paragraph position="98"> Now the user can choose to expand admin(A), and the only live choice is swallow. At that point the frontier is comments(diprox,swallow) with the constraint df(diprox,tablet). The author can then make choices for comments(diprox,swallow) that lead to zero or several instances of comment(diprox,swallow). At a certain point he will choose the nonrecursive expansion com1, which will lead to an empty frontier, with the constraints df(diprox,tablet) and da(diprox,swallow).</Paragraph>
    <Paragraph position="99"> We could obviously suppose here that rather than waiting for the user to point to the nonterminal he wants to expand next before finding the live choices for this nonterminal, the system could find all the live choices for all nonterminals on the frontier beforehand, and do the obligatory expansions without any input from the user, but at a slightly higher computational cost. In this way, after the initial choice of diprox as the drug, the other steps of the authoring process would be done automatically, apart from the choice of how many (and which) comments to make, which would still remain the responsibility of the author.</Paragraph>
    <Paragraph position="100"> Conclusion In the course of the paper we have defined different notions such as live-death issues in authoring processes, closed-world versus open-world authoring, and light document semantics. We have presented a formal approach to closed-world authoring that shows a correspondence between life-death problems and conjunctive Datalog queries, as well as a formal approach to open-world document authoring based on Description Logics. We have also sketched proofs of decidability for life/death issues in these different processes. Finally we have shown how an open-world authoring context can be used for supporting a novel form of knowledge acquisition.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML