File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/92/c92-4206_abstr.xml

Size: 24,363 bytes

Last Modified: 2025-10-06 13:47:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-4206">
  <Title>Multimodal Database Query</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> The paper proposes a multimodal interface for a real sales database application. We show how natural language processing may be integrated with a visual, direct manipulation method of database query, to produce a user interface which supports a flexible form of query specification, provides implicit guidance about the coverage of the linguistic component, and allows more focused discourse reference.</Paragraph>
    <Paragraph position="1"> Introduction Recently there has been a burgeoning of interest in the combination of natural language processing with visual and gestural forms of communication. The range of research includes the interpretation of combined linguistic and diagrammatic input (e.g. Klein and Pineda, 1990), the generation of multimedia explanations (e.g. Wahlster et al., 1991), the integration of NLP with hypertext (Stock, 1991), and the combination of natural language input with pointing (e.g. Kobsa et al., 1986), menus (Tennant ctal., 1983), and forms (Cohen et al., 1989). The rationale for most of this work is that since the different modes are best suited to expressing different kinds of information, the expressiveness of the communication can be increased by employing a combination of modes rather than one in isolation. At one end of the spectrum, this rationale has led to applications in which there is a clear dividing line between the function of the different modes; for example, in GRAFLOG (Klein and Pineda, 1990), the function of linguistic utterances like This line is a wall is to provide real-world interpretations for the parts of a line drawing. Our own work lies at the other end of the spectrum. We are interested in exploring the power of combining natural language and direct manipulation when the function and expressive power of the two modes are similar.</Paragraph>
    <Paragraph position="2"> The present paper describes this dual approach in the context of a specific, real database query application. We have integrated NLP with a visual, direct manipulation method of database query, in such a way that both query modes can express approximately the same range of queries. Our objective in this work is to explore the ways in wlfich the co-presence of a direct manipulation interface improves the usability and flexibility of the natural language interface. The concern of this paper is in the nature of our computational proposals, rather than empirical evidence about their utility. So far we have performed a user trial on the direct manipulation interface alone (Frohlich, 1991), and we intend to perform similar experiments with the combined interface and a stand-alone version of the NL interface.</Paragraph>
    <Paragraph position="3"> We start with an overview of our application area, and then summarise the pertinent features of our direct manipulation interface. The body of the paper illustrates how we have integrated NLP with this interface, with respect to the available vocabularly, the portrayal of user queries, and reference to past queries.</Paragraph>
    <Paragraph position="4"> Application Domain The target of our application is a relational database used by a large UK company to summarise the value of their product sales. The main users of the data arc sales professionals and their secretarial assistants. ~lb date, these users have \[lad two routes of access to thc data: they can retrieve fixed-format financial tables using a menu-based query system, or they can use a rather brittle natural language interface (supplied by a third-party vendor). The design of our system stems partly from users' experience of these existing interfaces. Our user interface is geared to supporting tbe following user requirements: it should be simple to use; it should support flexible-format views or &amp;quot;reports&amp;quot; on the data; and it should allow new queries to be composed with reference to past queries.</Paragraph>
    <Paragraph position="5"> Although our user interface is a prototype, and is not yet in use, we have done nothing to change the structure of the underlying relational database.</Paragraph>
    <Paragraph position="6"> The database summarises sales along three dimensions, or parameters: the product sold, the purchaser (in this case, retail outlets), and the time of sale. In conceptual terms, each dimension forms a hierarchy. For example, tim leaves of the purchaser or &amp;quot;customer&amp;quot; hierarchy identify trading points (numeric identifiers corresponding to distinct physical retail stores); at the next level, these are grouped into AcrEs OR COLING-92. NANTES, 23-28 ho~r 1992 1 2 7 4 PRoc. ol: COLING-92, NhgrEs. AUG. 23-28, 1992 trading concerns (corresponding to the familiar highstreet names of retail chains); and at the top level the trading concerns are grouped into corporate concerns (corresponding to the public limited company which owns the chain). Similarly, the time hierarchy represents financial time periods from leaf &amp;quot;bookweeks&amp;quot; up to the financial year, and the product hierarchy represents specific product box sizes up to a general classification of market areas.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Direct Manipulation Interface
</SectionTitle>
      <Paragraph position="0"> Tbe direct manipulation interface represents each sales dimension by a visual domain model. For instance, the temporal domain model (see Figure 1) is depicted as a scrollable timeline, demarcated into hierarchical time periods. The model presents both the concepts (e.g. &amp;quot;year&amp;quot;, &amp;quot;quarter&amp;quot;) and values (e.g. &amp;quot;1990&amp;quot;, &amp;quot;Q2-90&amp;quot;) which are distinguished in the temporal dimension. The product and customer domain models are similarly displayed as hierarelfica\[ &amp;quot;pick- null Users pose qneries by constructing the format, or appearance, of tim report they want to see, using a technique we have dubbed &amp;quot;Query by Format&amp;quot;. The present system supports three types of query in this fashion, providing: numeric summaries of sales with respect to specific parameters (e.g. the value of sales of Krunchy in June), lists of values related to the sales parameters (e.g. all trading points which bought Krunchy), and details on specific values (e.g. tire telephone number of trading point 647). To simplify the exposition, the remainder of this paper will discuss only the first kind of query, requesting summaries of sales.</Paragraph>
      <Paragraph position="1"> Figure 2 shows a sales summary table which the user has created and then evaluated as a query against the database. It represents the sales of Krunchy for Oct-90 and for Nov-90, for the stores Amlrews and Walkers. This talfie is created by first selecting &amp;quot;Create a sales tat)le&amp;quot; from a menu, which produces a skeletal table structure without row or C/olmnn headings. The nser then specifies the headings by gesturally selecting value elements (at any level) from the domain models and dropping them into appropriate positions on the table. Once the user is satistied with the format of the table, it is submitted to the system where it is interpreted as a query against the database, and the derived results filled in. The  table therefore has a standard &amp;quot;intersective&amp;quot; semantics, in which each cell represents tim summed total of sales with respect to its corresponding row aud column constraints for examt)le , the top left-hand cell in Figure 2 represents the total value of sales of Krunchy in Oct-90 to Andrews.</Paragraph>
      <Paragraph position="2"> Each such report created by the user forms a distinct window on the screen, and is therefore subject to standard window management functions such as iconisation. Importantly, the user can return to an existing report at any stage and refine it to create a new view--by adding, expanding, deleting, or replacing report headings.</Paragraph>
      <Paragraph position="3">  Natural language processing is based oil tile Core Language Engine (CLE; Alshawi et at. 1989), wlfich performs application-indcpendent processing from string segmentation and morphological analysis through to quantifier scopmg and discourse reference, and produces semantic logical forlYm as output. Tbe CLE contains &amp;quot;hooks&amp;quot; for al)plication-spceific modules, and we have used these to augment the CLE with an application-specific lexicm~, a set of rules for reference resolution in our domain, and a ,nodule for evaluating the logical forms against the relational database. The present coverage of the NL system is similar to the gestural interface, in terms of the sub-ject of the query and the conceptual parameters of the query. So we can ask questions about the value of sales with respect to any of tbe three sales parameters (e.g. Show the sales of Krunchy 250g for Q3-90) and questions about the sales parameters themselves (e.g. What Andrews trading points are there?). It goes beyond the graphics in supporting certain forms ACRES DE COLING-92, NANTES, 23-28 AOU-r 1992 I 2 7 5 PROC. OF COLING-92. NANTES, AUG. 23-28. 1992 of request which yield a simple textual response (such as yes/no questions). At present, natural language queries and direct manipulation queries have separate routes of access to the database, and we achieve the integration described below by translating between the two worlds at certain points in processing.</Paragraph>
      <Paragraph position="4"> The following sections present three ways in which we have explored the integration of natural language with the direct manipulation interface. In each case we argue that the integrated interface is superior to stand-alone, teletype-style natural language interaction. null Vocabulary A well-known problem with natural language interfaces to databases is that the user may be uncertain about the conceptual scope of the database and the supported linguistic coverage (Hendrix, 1982).</Paragraph>
      <Paragraph position="5"> The graphical environment of our NL interface offers a partial solution to this problem, since the displayed domain models remind the user of the linguistically available parameters of a sale. In particular, each domain model communicates the range of salesparameter concepts and values that can be referred to, and in doing so shows one way of expressing the related content nominals in linguistic input. For example, the temporal domain model indicates that the underlined forms in Get Krunchy sales for bweek 13-19 May-90 are available as lexical expressions. (We also allow for synonym~s in the NL, given the tendency to refer to &amp;quot;bweek', say, as &amp;quot;week&amp;quot;.) Note that the domain models are an appropriate site for lexical reference, since they abstract away from the internal structure and content of the database in order to provide a user-oriented &amp;quot;view&amp;quot; of the data. For example, the temporal model depicts information from relational tables as a single hierarchy, and combines distinct database fields to form single values within this hierarchy which are meaningful to the end-user.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Representation of Queries
</SectionTitle>
      <Paragraph position="0"> The user's natural language dialogue with the system is displayed in a separate window in teletype form.</Paragraph>
      <Paragraph position="1"> Yes/no and &amp;quot;how many&amp;quot; questions elicit a simple textual response, whereas the response to other queries is a textual pointer (e.g. See table 13) to a report displayed elsewhere on the screen. The decision on presentation style is made by a set of rules indexed on the sentence form (e.g. Wtl-question, imperative) and the requested class of data.</Paragraph>
      <Paragraph position="2"> Here we will consider those queries which request sales figures. Figure 3 shows a numeric summary table that has been produced in response to the input Show all sales in Nov-90 to Andrews trading points.</Paragraph>
      <Paragraph position="3"> This linguistically created report is an &amp;quot;live&amp;quot; graphi- null tics as if it had been constructed by direct manipulation. In fact, we can think of the table as a representation of the natural language query in the language of tabular queries. The table readily expresses the linguistic constraint in Nov-90 with the tabular label &amp;quot;Nov-90&amp;quot;. The intensional expression Andrews trading points cannot be represented directly, since our tabular restrictions must be extensional values from the domain models and not intensional concepts.</Paragraph>
      <Paragraph position="4"> However, we can represent this intensional expression indirectly in terms of its semantically equivalent extension--i.e, the set of all trading points related to Andrews.</Paragraph>
      <Paragraph position="5"> To display a natural language query and database response in this form, we must therefore not only retrieve the correct values from the database, but also generate row and/or column labels which correctly define the values in the table, x In general, this is a non-trivial task which ultimately requires sensitivity to the given/new information structure of the input; our approach uses information derived only from the logical form of the query, and corresponding values from the database. In the current version of our system, table labels are generated by the module which evaluates CLE logical forms against the database, as it searches for sales values in accordance with the constraints of the query. For example, to find all sales in Nov-90 to Andrews trading points, it searches for all sales values such that the following constraints hold: 2</Paragraph>
      <Paragraph position="7"> Here t is a free variable that will match any trading point value. Each time we find a matching sale value, we record, with this value, the corresponding values of the attributes bmonth, trading_concern, aud trading_point. This results in a set of tuples of the form IAn alternative approach is simply to generate the tabular constraints, and rely on tabular query processing to produce the a~les totals. We have not explored this approach.</Paragraph>
      <Paragraph position="8"> 2Note that the notation here has been simplified for the purposes of exposition.</Paragraph>
      <Paragraph position="9"> Acrl~s DE COLING-92, NANTES, 23-28 AOUT 1992 I 2 7 6 PROC. OF COLING-92. NANTEs. AUG. 23-28, 1992 &lt;sale_value, bmonth, trading-concern, trading.point&gt;, such as:</Paragraph>
      <Paragraph position="11"> if 181, 367, and 2717 are the only Andrews trading poiuts to have bought goods in November 1990. Such a set of tuples can then be transformed into a tree structure which removes the repetition of values apparent in the set of tuples. This tree structure corresponds to a correct labelliug of the table, where each node represents a label.</Paragraph>
      <Paragraph position="12"> This approach extends to the representation of a class of more complex expressions involving negation, coordination and quantification. For example, under the wide-scope reading of sales to all trading concerns except Andrews and Walkers, we find all trading concerns such that there is a sale with the following constraints: null</Paragraph>
      <Paragraph position="14"> Ilere wc generate a set of tuples of the form &lt;sale_value,trading.concern&gt;, where trading_concern will vary over all concerns except the excluded stores.</Paragraph>
      <Paragraph position="15"> As an example of a reading which we cannot represent in a tabular query, consider Total the sales for Jan-90 and Feb-90. ttere we can express the reading in which two totals are required, mapping to a table with one cell for each month. But we cannot express the reading where the user would like to see a single cell table, corresponding to the summed sales of January and February, because such a query cannot be specified in the tabular language. Hence, this reading is blocked, because the rules which transform the set of tup|es {&lt;sale_value,braonth&gt;} into a tree structure do not allow distinct values (i.e. jan-90 and feb-90) to be represented by a single node in the label tree.</Paragraph>
      <Paragraph position="16"> Hence it is not possible to represent all natural language queries in our simple intersective, tabular language, because the former can be much more expressive. However, our interface approach alleviates this problem to a good extent, since tabular sales summaries are just one of a variety of gestural query devices at our disposal (i.e. those mentioned earlier in the paper) to express the communicative content of a natural language query, and we can add others as the need arises. None of our gestural query methods have the expressive power of a relational query language such as, say, QBE (Zloof, 1975); rather, we have created a set of graphical access methods tailored to our target users' needs, which strike a balance between expressive power and ease of use. We decided on the range of our query devices by analysing the transcripts of the users' real sessions with the existing natural language front-end, in addition to other forms of analysis, such as interviews with target users. Given that natural language and direct manipulation both yield the same tabular output, what is the advantage of supporting two modes of query, rather than one? First, the user can build-up a table using whichever mode or combination of modes is most productive. By combining the presentation of gestural and linguistic queries, a linguistically generated report can be refined and extended by the direct manipulation operations described earlier. In the following section we will see how natural language query can extend an existing table, completing the circle of mixed-mode dialogue. Many of the distinguishing and productive features of natural language documented by Walker (1989), such as coordination, negation and quantification, can he beneficially applied to ti~e present task. So a user may start a query table with the request Find sales for the \]irst week in every month, exploiting the rich quantificational structure of English to swiftly generate a set of labels that would take a good many point-and-click actions.</Paragraph>
      <Paragraph position="17"> The user may then wish to fitrther parameterise this query with certain products which are best selected visually (perhaps because their spelling is difficult to remember).</Paragraph>
      <Paragraph position="18"> Second, our user studies in this and other domains have shown that there are differences in the preferences of individual users. Some users simply prefer the feel of, say, natural language interaction, for reasons which are difficult to explicate with thcoretical factors such as those above, llere the user may issue even gesturally simple commands like Open front the linguistic command line.</Paragraph>
      <Paragraph position="19"> Reference to Past Queries Given the exposition so far, we call construct a new table with either natural language or gestures, but can only modify an existing table using direct manipulation. To complete this picture of multimodal dialogue, wc must also allow linguistic queries to refer to past tables and from them specify modified versions.</Paragraph>
      <Paragraph position="20"> In our application domain, we expect users to have several reports open on their screen at any one time, and identifying an individual table solely in ter~rm of the content of a referring expression can be arduous, if not impossible. One option is to name the report using its unique identifier. In common with many other systems (e.g. Kobsa et al., 1986), we also allow the user to refer to aJt object by pointing to it. Our present system only allows one such deictic reference per sentence, and behaves &amp;q follows. At any point, the user can click on a button on a table to make it the &amp;quot;context&amp;quot; for the next linguistic query. If this action occurs, an anaphoric referring expression, like $hese Andrews sales in Show these Andrews sales as a bar chart, is then taken to refer directly to the contextual ACIES DE COLING-92, NAN'IT.S, 23-28 AOU-V 1992 l 2 7 7 PROC. OF COL1NG-92. NANTES, AUG. 23-28, 1992 table, assuming that the content of the expression (in this case, Andrews sales) does not contradict the content of the table. However, if the context button is pressed and then followed by the use of a definite noun phrase, as in Show the sales for Jan-gO, then the table is seen as providing the universe of reference (i.e. the set of sales specified by the table) for the sales for Jan-gO, rather than the referent itself. In this case, then, the query will yield a new table which combines the constraints of the contextual table with the Jan-90 constraint provided by the definite NP.</Paragraph>
      <Paragraph position="21"> For completeness, the primary objects arising in the discourse are tracked using the CLE's discourse model, which is based on a salience-ranking view of discourse reference. We track all tables that the user has created and evaluated, whether through direct manipulation or natural language, rather than just those arising from the linguistic dialogue. If a sales table is uniquely salient (because, say, it was the most recently created) in the discourse model, then an anaphoric expression such as these sales will be taken as referring to this table (without the need for pointing), and the query Which of these sales is for Walkers? will accordingly produce a new table extended with the label &amp;quot;Walkers&amp;quot; 3 In the future we intend to experiment with schemes where more general gestural actions affect the salience of objects in the discourse model. For example, &amp;quot;lowering&amp;quot; or iconising table windows on the screen could reduce their salience rating, whereas &amp;quot;raising&amp;quot; them would increase it. Although this is an over-simplified view of how to track the user's focus of attention, such schemes would give the user the potential for more explicit control over the working set of objects available for reference. 4 This scheme, and our implemented treatment of multimodal reference, therefore support a flexible form of discourse reference which is more unnatural to attain in teletype-style linguistic dialogue.</Paragraph>
      <Paragraph position="22"> Conclusion The previous sections have explored the consequences of a thorough integration of a gestural method of query and natural language query in the context of a specific database application. We considered the case where the coverage of the two styles is similar, in terms of the range of expressible queries, and have demonstrated that several benefits acrue from this integration. First, by translating a user's NL query into a graphical query, we support a flexible approach to the specification of the query, in which the user can SCurrently, lash'tinge can add but not alter table labels, as would be required for the elliptical reading in which Walkexa replaces an exl-tinlg store name.</Paragraph>
      <Paragraph position="23"> t Cohen et al. (1989) advocate a similar technique in which the user can direct manipulate the visually displayed tree structure of the discotm,e. However, in our proposal the discourse st~cture :- inherent in the visual layout of the screen, without the presentation o/meta-\[evel information about the dialogue. employ whichever combination of modes is best suited to the &amp;quot;micro&amp;quot; tasks involved in query specification. Second, the visual interface gives implicit guidance to the user as to the coverage of the natural language interface. Third, the use of direct manipulation to focus discourse reference offers a more flexible dialogue structure than found in a pure NL interface. In the future it would be profitable to empirically assess these present implemented proposals, and investigate further computational issues, such as the generation of linguistic descriptions of gestural actions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML