File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/a88-1001_metho.xml

Size: 21,419 bytes

Last Modified: 2025-10-06 14:12:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="A88-1001">
  <Title>The Multimedia Articulation of Answers in a Natural Language Database Query System</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 System overview
</SectionTitle>
    <Paragraph position="0"> The system described here functions as a conversational human/computer interface to database query systems. It consists of a natural language front end and a module which articulates multimedia answers.</Paragraph>
    <Paragraph position="1"> The system accepts well-formed strings as input; these sentences are interpreted by an HPSG-based parser \[PS87\] which produces a parse tree. After further processing by a semantics module, a pragmatics processor \[BFP87\] and a disambiguator, a logical formula in the language NFLT \[CP85\] is produced. This formula is transduced into a database query. Two database query formats are currently supported: a frame-based representation language, HPRL, and the standard relational database query language, SQL.</Paragraph>
    <Paragraph position="2"> Answers returned from the database are then packaged appropriately by the articulator for presentation to the user.</Paragraph>
    <Paragraph position="3"> The two database applications currently supported are a database of people and equipment (a subset of which we have proposed as a natural language evaluation test suite \[FNSW87\]), and a database of paintings by 19th century Dutch artist Vincent Van Gogh and his contemporaries. The latter database was based on the index to a commercially available videodisc \[Nim82\] and augmented from other sources.</Paragraph>
    <Paragraph position="4"> Both applications can be run on workstations configured with or without multimedia output devices.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Database answer format
</SectionTitle>
    <Paragraph position="0"> The driver of a database query application (i.e. the domain dependent part of the system) is responsible for returning answers in a list format which consists of a keyword specifying the type of the answer, followed by the answer itself. The answer types expected by the articulator are boolean, number, item, set, quantity, and table.</Paragraph>
    <Paragraph position="1"> In deciding how to package a response, the articulator uses the answer type along with additional information provided by the parser which identifies the illocutionary act of a query as imperative, declarative, yes/no question, or wh-question. An answer is presented textually as a single phrase, as a complete sentence which parallels the user's query, or as a table. In addition, depending on answer type and the system's hardware configuration, an answer may include videodisc images, text-to-speech, icons and maps. While a user can request answers in a particular medium via menus, a default strategy is in place which yields a fairly satisfying style of human/computer interaction.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="2" type="metho">
    <SectionTitle>
4 Text answers
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="2" type="sub_section">
      <SectionTitle>
4.1 Style
</SectionTitle>
      <Paragraph position="0"> Questions and answers are a common kind of adjacency pair in human language use. The preferred style of an answer is often elliptical and shows parallelism with the surface syntactic structure of the preceding question \[CC77\]. In addition, lexical choice in the answer is constrained by that in the question.</Paragraph>
      <Paragraph position="1"> An answer which is articulated using different lexical entries than its projecting question may lead the user to infer that the system is making a distinction when it is in fact only using a synonym.</Paragraph>
      <Paragraph position="2"> Although elliptical answers may be the norm in human/human conversation, the articulator described here defaults to &amp;quot;verbose mode&amp;quot;; it responds to most queries with complete sentence answers. The motivation for this approach arose when we noticed that shorter answers were unsatisfying in certain situations. When additional textual material intervenes on the user's screen after the input query is typed in and before the answer appears, and in other cases where the user is distracted or not watching the screen when the textual answer arrives, a short answer takes on something of the character of a nonsequitur. This problem manifested itself in an early version of our system that worked by having users send queries over the network via electronic mail to a single natural language server which in due time mailed its responses back to the user, and also in the current system, which returns most answers in a few seconds but can be operated in a mode which prints modular timing and status information during processing. Even more unsatisfying was the articulation of answers using text-to-speech hardware. Generated speech is often hard for users to understand \[TRC84\] and in our system, short answers delivered this way often failed even to attract a user's attention as information-bearing. To echo the query audibly seemed confusing; what was needed was the capability to frame the answer in a complete sentence based on the query. The final impetus for the verbose articulator was our desire to approximate some of the effects that real natural language generation capability might provide in a question-answering human/computer interface, before committing resources to a full-scale natural language generation effort.</Paragraph>
      <Paragraph position="3"> In verbose mode, a sententi~d answer consists simply of a string derived from the formatted database answer with constituents of the user's original query wrapped around it. Articulation achieves the dual purposes of satisfying the user's request for information while preserving a conversational style of interaction (figure 1). It is interesting to compare these answers with the kind of paraphrasing capacity that one finds in some other systems which are commercially available (figure 2).</Paragraph>
      <Paragraph position="4"> To paraphrase a user's query in a form that reflects the actual database access method (figure 1) can be extremely helpful in identifying misinterpretations of the query. However, that approach may interfere with  User: Who has a terminal? System: DAN FLICKINGER HAS A TERMINAL.</Paragraph>
      <Paragraph position="5">  User: Who has a terminal? System: Shall I do the following? Create a report showing the full name and the manager and the equipment from the forms on which the equipment  natural interaction by insisting that the user confirm his or her every conversational move. Furthermore, whether the system's interpretation of what the user meant by the query with respect to the database is a correct mapping or not, the user is forced to reformulate his or her question in a program-like or logical form. Such an interface imposes a significant cognitive load on the user. Presumably, a central motivation for providing a natural language interface to a database is to avoid forcing the user to use a foreign language. This strategy pays homage to Grice's maxim of manner, &amp;quot;avoid obscurity of expression&amp;quot;. On the other hand, the argument has been made that separate, non-equivalent representations providing different views of the world should be maintained by the system \[Spa83\]; each of these views should be available to the user at appropriate times. Thus logical paraphrases, desirable in establishing initial system credibility, should be available upon specific request by the user.</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.2 &amp;quot;Namely&amp;quot; answers
</SectionTitle>
      <Paragraph position="0"> Grice's maxim of quantity for cooperative communication is a reminder that it is frequently desirable to provide more information in an answer than was literally requested. For example, when a user asks &amp;quot;Are there any secretaries?&amp;quot; the best answer may be not &amp;quot;Yes&amp;quot;, but &amp;quot;Yes - namely, X, Y, and Z&amp;quot; (where X, Sentence: How many employees are there?  Answer list: (NUMBER 4 (NAMELY {abrams} {chiang} {devito} {browne})) Articulated: THERE ARE 4 EMPLOYEES -NAMELY, IRA ABRAMS, LYN CHIANG, KAT DEVITO, AND DEREK BROWNE.</Paragraph>
      <Paragraph position="1">  Y, and Z are the names of the secretaries). Several question-answering systems have addressed issues of this sort \[WMJB83\] \[WJMM82\]. While our system does not explicitly model the user's goals or know anything about indirect speech acts, it provides extended answers to some queries via a list containing the keyword namely, which appears as the last item in the answer list passed to the articulator (figure 3). Extended answer lists are constructed as follows. When an answer is of type number and its cardinality is below a certain threshold, or else when it is both of type boolean and affirmative, the articulator makes an additional query to the database which returns information for constructing the &amp;quot;namely&amp;quot; answer. This additional information is combined with the short answer to the user's original query, to create an extended answer. In this way we attempt to comply with Grice's maxims of manner and quantity: to &amp;quot;be brief' and to &amp;quot;make your contribution as informative as is required&amp;quot;.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.3 Verbose mode
</SectionTitle>
      <Paragraph position="0"> Verbose mode works as follows. Initially, a short answer string is created from the formatted list that the database returns. First, the type keyword is stripped off the answer list. Depending on the type, the remaining short answer list is transformed into a string which is a textual phrase consisting of one of the following: a name or names (for type set or item the database is queried and returns appropriate nouns or proper names), a string containing an integer (for type number), a string containing a number followed by units of measure (for type quantity), or the strings &amp;quot;yes&amp;quot; or &amp;quot;no&amp;quot; (for type boolean). Set answers are expanded into coordinated noun phrases with appropriate punctuation. If the type is table, a table is produced.</Paragraph>
      <Paragraph position="1"> In constructing the short answers to wh-questions, some simple additional heuristics are used. First, if the short answer string was derived from a null set or null item, the answer is converted from the empty string to an appropriate string: &amp;quot;nowhere&amp;quot; if the wh-question word is &amp;quot;where&amp;quot;, &amp;quot;never&amp;quot; for &amp;quot;when&amp;quot;, &amp;quot;nobody's&amp;quot; for &amp;quot;whose&amp;quot;, and either &amp;quot;none&amp;quot; or else &amp;quot;no&amp;quot; plus the string corresponding to the modified NP head for &amp;quot;which&amp;quot;, &amp;quot;what&amp;quot; or &amp;quot;how many&amp;quot; phrases. Otherwise, when the answer is not an empty set and the wh-question word is &amp;quot;whose&amp;quot;, &amp;quot; 's&amp;quot; is appended to the answer. When &amp;quot;whose&amp;quot; modifies the head of a noun phrase, the noun phrase is appended to the answer.</Paragraph>
      <Paragraph position="2"> Then, once the short answer has been produced, if the query is not an imperative (and the answer is not a table), the input query's parse tree representation is transformed into a template with which to frame the short answer. Four functions traverse the parse tree and return strings corresponding to constituents from the input query: these constituents are subject, auxiliary verb (if there is one), main verb phrase, and preposition (if the wh-question word is within a prepositional phrase or fills a trace in one).</Paragraph>
      <Paragraph position="3"> An end-of-sentence string is created which contains, simply, terminating punctuation, or else an expanded phrase consisting of &amp;quot;namely,&amp;quot; followed by a coordinated noun phrase with appropriate punctuation.</Paragraph>
      <Paragraph position="4"> This expanded phrase is constructed whenever a short to medium-length namely list is available at the end of an answer list, as shown in figure 3.</Paragraph>
      <Paragraph position="5"> Finally, the verbose answer string is constructed using one of two strategies: if the wh-question word is in subject position in the query, the constituents are positioned in the answer as follows, (the items in parentheses may or may not be present): answer (aux-verb) (main-verb-phrase) (preposition) end-ofsentence; if the wh-question word is in non-subject position, the positioning is: subject (aux-verb) main-verb-phrase (preposition) answer end-of-sentence.</Paragraph>
      <Paragraph position="6"> If the query is a declarative or a yes/no question, a boolean answer results. When a boolean answer is affirmative, the string &amp;quot;yes,&amp;quot; with the modified input string appended, is articulated. For negative boolean answers, if the input string contains an auxiliary verb,  the following sequence is articulated: &amp;quot;No,&amp;quot; sub-ject auz-verb &amp;quot;not&amp;quot; main-verb-phrase end-of-sentence (figure 4). If there is no auxiliary verb in the negative answer, the canned phrase &amp;quot;No, it is not true that,&amp;quot; with the original input string appended, is articulated. In addition, a some/any transformation is applied to yes/no questions. &amp;quot;Any of&amp;quot; is replaced by &amp;quot;none of&amp;quot; or &amp;quot;some of&amp;quot;, depending on whether the answer is affirmative or negative. If the input query contains an auxiliary verb and the word &amp;quot;any&amp;quot; without &amp;quot;of', &amp;quot;any&amp;quot; is replaced by &amp;quot;no&amp;quot; or &amp;quot;some&amp;quot; (figure 5). If the constructed answer template contains successive double negatives (as might result from a query containing a negation), these are removed.</Paragraph>
      <Paragraph position="7"> Finally, contrast the situation where the answer list is (BOOLEAN NIL) with the one where the answer list is simply NIL (which means the database failed to return an answer). In this case, the system answers &amp;quot;I don't know whether&amp;quot; with the modified input query appended (figure 6).</Paragraph>
      <Paragraph position="8"> The style of the articulator's verbose responses, while somewhat quaint, appears cooperative because the answer is delivered using the same lexical and syn- null tactic forms that the user chooses in the query. Of course, this technique of wrapping the query around the answer works only in very simple question-answering applications, where the system has little of its own to say. Failure in the form of ungrammatical answers to wh-questions sometimes occurs due to lack of agreement; rather than extend the verbose articulator any further, it seems a better strategy to simply detect those cases and suppress an ungrammatical verbose answer in favor of a short one. Pragmatic failures that are still syntactically well-formed may also occur, particularly in negative boolean answers and empty set answers; we have not arrived at a consistently successful strategy for detecting and treating presuppositional failures (figure 7). Our implementation Mso does not take into account syntactic constraints on given/new information in framing the answer in the query. Despite these limitations, the appeal of verbose articulation argues for integrating a real generation capability with a natural language interface to database query.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="2" end_page="2" type="metho">
    <SectionTitle>
5 Multimedia
</SectionTitle>
    <Paragraph position="0"> While the articulator always manages to produce some sort of textual answer, it is often desirable to respond with an answer in a different medium (figure 8). Visual images from a videodisc can be displayed whencver item or set answers are associated with videodisc frames in the database, in addition to whenever an imperative is used to explicitly request images.</Paragraph>
    <Paragraph position="1"> The articulator consults a module called circus which contains the drivers and methods pertaining to the videodisc player and the text-to-speech hardware.</Paragraph>
    <Paragraph position="2"> This module queries the database application to discover whether any entities in the answer list can be displayed as videodisc images. These images are represented and accessed by videodisc frame numbers which are stored in the database in SQL tables or in HPRL slots.</Paragraph>
    <Paragraph position="3"> When the system is configured with the text-to-speech generator and the items in a set answer are associated with videodisc images, the entire textual answer is displayed first. Then a synchronizing function in circus articulates the items in the set by displaying the approprate image on the video monitor and speaking the corresponding items, one at a time.</Paragraph>
    <Paragraph position="4"> Thus the user hears the name of an item spoken immediately after it comes up on the video monitor; videodisc images are displayed for a few seconds each. We have not synchronized the textual answers with the videodisc answers, since these media are displayed on two separate screens at somewhat different rates and it would be difficult for a user to attend simultaneously to both. Laser videodiscs in CAV format (constant angular velocity) advertise fast, random access to still images, yet with most videodisc players there is some time cost to searching for frames on a disc and for changing search direction. We minimize this cost by reordering the items in the set according to their videodisc frame numbers, which correspond to their ordering on the disc.</Paragraph>
    <Paragraph position="5"> It seems appropriate to mention here that videodisc imagery, like sex and violence, can be either gratuitous or meaningful. In the course of our project, we have demonstrated both. In the context of our people and equipment database, the articulator is capable of displaying a picture of a featureless cubicle or a slide show of nervously posed employees in conjunction with a textual answer. On the other hand, the database of Van Gogh paintings has proven to be a very appealing application for visual articulation.</Paragraph>
    <Paragraph position="6"> With visually articulated answers, we were provided with an opportunity to begin to experiment with deictic reference. While personal pronouns are interpreted by the pragmatics processor using a discourse model which takes a centering approach \[Gro77\] \[Sid79\] \[JW81\] \[GJW83\] \[BFP87\], demonstrative pronouns are interpreted via a rudimentary environment model that knows which painting is currently displayed on the video screen. Note that the displayed image may not be the one currently under discussion in the the discourse, but may be left over from an earlier query if no intervening queries elicited videodisc answers. Since imagery can be such a salient part of the user's environment, it is necessary to support deictic references to the current image. At present in our system, &amp;quot;this&amp;quot; and &amp;quot;that&amp;quot; have the same interpretation, but we are exploring alternatives such as interpreting &amp;quot;that&amp;quot; as referring to the previously displayed image when it appears contrastively in the same context as &amp;quot;this&amp;quot;. A more thorough treatment should of course integrate spatial, temporal and discourse perspective \[Lin79\]. We are attempting to model more of the visual environment, including graphic elements on the screen, and to integrate deictic information more fully into the discourse.</Paragraph>
    <Paragraph position="7"> By now it should be evident that one should not consider articulation of answers entirely independently from discourse. A natural language interface to a database query application can provide textual feed-back about the discourse apart from the literal answer. Our articulator makes explicit the interpretation of the user's pronominal reference by substituting the phrase it cospecifies for the pronoun in the verbose answer (figure 9). Thus the user is likely to discover any misunderstanding instantly.</Paragraph>
    <Paragraph position="8"> On the other hand, since verbose answers rely on more or less blindly-applied heuristics to wrap text around the answer, the articulator is not a full partner in the discourse and is not capable of achieving Q: What did Gauguin paint? A: GAUGUIN PAINTED VINCENT PAINTING.</Paragraph>
    <Paragraph position="9"> Q: How many pictures of Van Gogh were not painted by him? A: 8 PICTURES OF VAN GOGH WERE NOT PAINTED</Paragraph>
  </Section>
  <Section position="7" start_page="2" end_page="2" type="metho">
    <SectionTitle>
BY GAUGUIN.
</SectionTitle>
    <Paragraph position="0"> subtle but nevertheless critical discourse functions through syntactic choices. A true generation component would presumably exercise lexical and syntax choices, thus avoiding eccentric as well as ungrammatical exchanges.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML