File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/p82-1008_metho.xml
Size: 18,087 bytes
Last Modified: 2025-10-06 14:11:29
<?xml version="1.0" standalone="yes"?> <Paper uid="P82-1008"> <Title>Prince, E. \[1982\] &quot;The Simple Futurate: Not Simply Progrsslve Futurate Minus Progressive,&quot;</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> TRANSPORTABLE NATURAL-LANGUAGE INTERFACES: PROBLEMS AND TECHNIQUES </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="46" type="metho"> <SectionTitle> I OVERVIEW </SectionTitle> <Paragraph position="0"> I will address the questions posed to the panel from wlthln the context of a project at SRI, TEAM \[Grosz, 1982b\], that is developing techniques for transportable natural-language interfaces.</Paragraph> <Paragraph position="1"> The goal of transportability is to enable nonspeciallsts to adapt a natural-language processing system for access to an existing conventional database. TEAM is designed to interact with two different kinds of users.</Paragraph> <Paragraph position="2"> During an acquisition dlalogue, a database expert (DBE) provides TEAM with information about the files and fields in the conventlonal database for which a natural-language interface is desired.</Paragraph> <Paragraph position="3"> (Typlcally this database already exists and is populated, but TEAM also provides facillties for creating small local databases.) This dlalogue results in extension of the language-processlng and data access components that make it possible for an end user to query the new database in natural language.</Paragraph> <Paragraph position="4"> A major benefit of using natural language is that it shifts onto the system the burden of mediating between two views of the data--the way in which it is stored (the &quot;database view&quot;) and the way in which an end user thinks about it (the &quot;user's view&quot;). Basically, database access is done in terms of files, records, and fields, while natural-language expressions refer to the same information in terms of entities and relationships in the world.</Paragraph> <Paragraph position="5"> In my discussion, I will assume the use of a general grammar of English rather than a semantic grammar, and also that the interpretation of queries will pass through an intermediate stage in which a database-lndependent representation of the meaning of the query is derived before constructing the formal database query. This is because systems based on semantic grammars amalgamate i~formatlon about language, about the domain, ~ asout the database in ways that make it difficult to transfer those systems to new databases. I will use the term &quot;conceptual schema&quot; to refer to the internal representation of Pennsylvania.</Paragraph> <Paragraph position="6"> information about the entities in the domain of discourse and the relationships that can hold among them, 2 and &quot;database schema&quot; to refer to the encoding of information about the way concepts in the conceptual schema map onto the structures of the database. In addition, I will use the term &quot;logical form&quot; to refer to the representation of the literal meaning of an expression in the context of an utterance.</Paragraph> <Paragraph position="7"> The insistence on transportability (which distinguishes TEAM from previous systems such as LADDER \[Hendrlx et al., 1978\], LUNAR \[Woods, Kaplan, and Webber, 1972\], PLANES \[Waltz, 1975\], REL \[Thompson, 1975\], and CHAT \[Warren, 1981\]) entails two major consequences for the design of a natural-language interface. First, the database cannot be restructured to make the way in which it stores data more compatible with the way in which a user would pose his questions. Second, because the DBE cannot be expected to know about the internal structure of the conceptual schema and the database schema, these must be organized so that the information they encode about any particular database and its corresponding domain can be obtained systematically (and, therefore, automatically).</Paragraph> <Paragraph position="8"> These differences are crucial to any consideration of the issues before this panel.</Paragraph> <Paragraph position="9"> Although, for any partlcular database, it may be possible to handcraft solutions to each problem, such an approach is not viable for a transportable system. Handcraftlng requires expertise in computational linguistics, knowledge of the internal structures and algorithms used in an interface, and so forth--none of which the DBE can be expected to possess. In addition, interfacing to an existing conventional database introduces many problems caused by the difference between the database view and the end user's view. Many of these problems can be avoided if one is allowed to design the database as well as the natural-language system. However, given the prevalence of existing conventional databases, approaches that make this assumption are likely to have llmited applicability in the near future.</Paragraph> <Paragraph position="10"> Most of the issues the panel has been asked to address arise (or have analogues) in any 2 This schema is a restricted form of the standard AI knowledge base.</Paragraph> <Paragraph position="11"> application of natural-language processing. In the sections that follow, my objective in dlscusslng each of these issues will be to point out where I see the constraints of the database query task as simplifying the general problem and where, on the other hand, transportability (and the way in which database systems typically structure information and view the world) makes things more difficult. Inevitably, l will be raising at least as many questions as I answer.</Paragraph> </Section> <Section position="3" start_page="46" end_page="47" type="metho"> <SectionTitle> II AGGREGATES </SectionTitle> <Paragraph position="0"> It is useful to separate problems involving aggregates into two categories: (I) those t!mt involve mapping from natural-language to logical form, and (2) those that involve translating from logical form into a formal database query. The examples presented to the panel have elements of each of these.</Paragraph> <Paragraph position="1"> In addressing the question of logical form, I first want to note how similar &quot;how many&quot; and &quot;how much&quot; questions are to other degree questions (e.g., &quot;How tall is John?&quot;). Consider, for example, (I) James is old./ How old is James? (2) The department is big./ How big is the department? (3) (4) The department has many employees./ How many employees does the department have? The ship is heavy./ How heavy is the ship? (5) The ship is carrying much coal./ How much coal is the ship carrying? Hence, it seems that the logical forms for the queries ought to bear a close resemblance. In interpreting degree questions, the language-processing component of TEAM \[Gzosz et al., 1982a\], applies a hlgher-order degree operator to the predicate that underlies the adjective. For example, the logical form for &quot;How tall is John?&quot; would be (WHAT H (HEIGHT H) ((DEGREE TALL) JOHN H)) The problem in transferring this treatment to &quot;how many&quot; and &quot;how much&quot; questions is that while adjectives llke &quot;heavy&quot; are usually treated as predicates, &quot;many&quot; is usually treated as a quantifier. So, if &quot;how&quot; is treated by uniformly applying some kind of hlgher-ozdez degree operator, then that operator has to apply to both predicates and quantiflers. Another possibility would be to apply the degree operator to an entire fozmula, as in (WHAT H (HEIGHT H) ((DEGREE (TALL JOHN) H)) rather than Just to the head of the formula. Whether this can be made to work, however, depends on whether a satisfactory analysis can be provided when the formula consists of mole than Just a predicate and its arguments.</Paragraph> <Paragraph position="2"> The problem of an appropriate logical form for these questions is not affected by the need for transportability. However, transportability does make the problem of translating from logical form into a database query more difficult. Fields that store count totals, llke NUMBER-OF-EMPLOYEES, are semantically complex in much the same way as the CHILD-OF-ALUMNUS field (the predicate encoded by a count field can be defined in terms of a count operator and the domain entities that are to be counted), and they present similar problems for transportability and database access (see section 5). The question therefore (to which I do not have an answer) is whether this kind of semantically complex field is any simpler to handle than the more general case.</Paragraph> <Paragraph position="3"> In addition, some ways of storing information about aggregates in these semantically complex fields may require inferences to be drawn to answer ceztaln kinds of queries. For example, if the number of employees in a department must be calculated from the number of employees in each office of the department, answering queries about the number of employees in a department will require reasoning about the part/whole relationship between offices and departments and how the number of employees in a department depends on that relationship. A general treatment of such cases would require both the acquisition of information about the part/whole relationship Impllcltly encoded in the database, and the ability to infer that (in this case) the count for the whole is the sum of the counts for the parts. The need for drawing inferences arises with mass fields as well as with count fields. For example, consider a database of ships and their cargoes, with separate entries for the different kinds of cargo a ship is carrylng. Then an answer to &quot;How much cargo is the ship cazzylng?&quot; will require the same kind of totaling operation as does the query about the number of employees in the above example. It may be possible to handle the most straightforward cases of these phenomena by adding special purpose information (&quot;hacks&quot; to compensate for the lack of theorem-proving capabilltes) for each operator corresponding to a data access system aggregate function, specifying how it interacts with part/whole relationships (AVERAGE will work differently from TOTAL).</Paragraph> </Section> <Section position="4" start_page="47" end_page="47" type="metho"> <SectionTitle> III TIME AND TENSE </SectionTitle> <Paragraph position="0"> The context of database querying does not seem to make questions concerning time and tense any easier than they are for linguistics or philosophy in general; in fact, they are actually more difficult because of the extensional nature of the temporal information stored in a database.</Paragraph> <Paragraph position="1"> It does not appear useful, even in the database query context, to have different representations for sentences involving concepts related to points in time and those involving intervals. The same natural-language expressions about time may be used to refer to a given time as either a point or an interval. Consider, (6) How far did the Fox travel yesterday? (yesterday as an interval over which an event extends) <7) Who was the officer of the day yesterday? (yesterday as a point in a sequence of days) It is fairly easy to imagine databases against which each of these queries might be posed and, in each case, &quot;yesterday&quot; might correspond either to a single database entzy or to a set of entries spanning an interval. Furthermore, the same verb can be used to refer to activities in terms of points or intervals--e.g., (8) The ship is sailing to Naples.</Paragraph> <Paragraph position="2"> (interval) (9) The ship is sailing tomorrow. (point) --and the same event may be viewed as occurring during an interval or at some single point \[Moore, 1981\]. (See Prince \[1982\] for an interesting discussion of the differences between (9) and &quot;The ship sails tomorrow.&quot;) On the issue of interpolation, we should note that questions involving temporal attributes also involve at least one other attribute of an entity (e.g., its location). To handle adequately queries about times not explicitly represented in the database, such factors must he taken into account as the time scale over which an attribute changes (e.g., a ship's position changes more slowly than an airplane's), and whether or not the change is linear. In general, this requires mechanisms for reasoning about temporal relationships and complex events, mechanisms normally absent in database systems. Also note that, even when interpolation is possible, additional mechanisms are needed to handl- queries about times beyond the last zecord~d~ e+me. (I have been living in Philadelphia for the last four month , Out I will not be two months hence.) All this suggests that naive interpolation is likely to result in incorrect answers (entities may even have ceased to exist since the last data about them was recorded). I believe it is misleading to provide direct responses involving such interpolation, because the user has no way of knowing that the system's reasoning is only approximate, or knowing on what it has based its answer. If the natural-language interface isolates a user from the manner in which information is stored, it must compensate by furnishing sufficient information in its responses to allow the user to assess their validity. Of course, this is a more general issue than one concerning Just time, but the appeal of interpolatlon (as a simple solution) may mislead us into thinking we can provide the user with an answer that later reflection will reveal as worse than no answer at all.</Paragraph> <Paragraph position="3"> In an interface designed for a particular database, special purpose routines may be provided that take such factors as time scale into account. The problem is more difficult to deal with for a transportable natural-language interface, but two strategies appear possible. One is to provide the two values of the attribute being queried that correspond to times that bracket the time specified in an actual query. The second is to associate with each attribute-time pairing an interval oyez which the attribute value can be considered to be constant, as well as possibly a function for interpolating between values and extrapolating from them. The problem for transportability, then, is obtaining the ~equisite information from the DBE.</Paragraph> </Section> <Section position="5" start_page="47" end_page="48" type="metho"> <SectionTitle> IV QUANTIFYING INTO QUESTIONS </SectionTitle> <Paragraph position="0"> The problem of quantifying into questions may have a simpler solution in the database query environment than it does in general. Database queries usually seek an enumeration (as opposed to queries seeking a description, as in &quot;Which woman does every Englishman admire most? His mother.&quot; \[Engdahl, 1982\]). For such cases, it seems possible to analyze a question as a REQUEST to INFORM (an analysis done in \[Cohen and Perrault, 1979\] to allow planning of questions, taking into account plans and goals of both speakers and hearers), with REQUEST being the illocutionary-force operator. If this is done, a quantifier can outscope the INFORM without outscoplng the REQUEST. Thus, the logical form of &quot;Who commands each ship&quot; would be something like</Paragraph> </Section> <Section position="6" start_page="48" end_page="48" type="metho"> <SectionTitle> V SEMANTICALLY COMPLEX FIELDS </SectionTitle> <Paragraph position="0"> The predicate represented in a semantically complex field llke CHILD-OF-ALUMNUS typically has a definition in terms of simpler concepts, namely an existential quantifier and whatever entity is being quantified over (in this case ALUMNUS). In a nontranspoztable system, some of the variability of expression that these fields give rise to can be handled by enriching the conceptual schema appropriately (e.g., adding to it the class of alumnl). However, as the query &quot;Did either of John Jones's parents attend the college?&quot; illustrates, this by itself is not sufficient in general.</Paragraph> <Paragraph position="1"> In extreme cases, sophisticated deductive capabilities may be necessary to answer questions that can arise in connection with semantically complex fields. For example, the BLI~FILE database (to which LADDER provided an interface) has a field DOC that records whether or not a ship has a doctor on board. To answer a query like &quot;Is there a doctor within 200 miles of Philadelphia?&quot; requires not only repzesentlon of the connection between a positive value In the DOC field and the existence of a doctor, but also the ability to reason that, if a ship that has a doctor on board is within 200 miles of Philadelphia, then the doctor himself is within 200 miles of Philadelphia.</Paragraph> <Paragraph position="2"> An apparent precondition for the correct treatment of semantically complex fields is that the system should have a richer model of the domain than the model constituted by the database itself. Konollge \[1981\] suggests one possible approach to this in which a metatheory is employed to describe both the domain of discourse and the information the database contains. Axioms in the metalanguage are used to encode things llke the connection between the existence of an alumnus and a particular value in the CHILD-OF-ALLPMNUS field. It does not seem possible to handle a wide variety of semantically complex fields In a transportable system, unless the system is much richer than typical DB systems (in which case much more general knowledge acquisition schemes must be implemented, such as those proposed by Hendrix and Haas \[1982\], for example). ~owever, transportable systems can provide for a fairly wide range of fixed phrases corresponding to these fields \[Grosz et el, 1982b\]).</Paragraph> </Section> class="xml-element"></Paper>