File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/p92-1004_metho.xml

Size: 36,793 bytes

Last Modified: 2025-10-06 14:13:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="P92-1004">
  <Title>THE REPRESENTATION OF MULTIMODAL USER INTERFACE DIALOGUES USING DISCOURSE PEGS</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE REPRESENTATION OF MULTIMODAL USER INTERFACE DIALOGUES
USING DISCOURSE PEGS
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> The three-tiered discourse representation defined in (Luperfoy, 1991) is applied to multimodal human-computer interface (HCI) dialogues. In the applied system the three tiers are (1) a linguistic analysis (morphological, syntactic, sentential semantic) of input and output communicative events including keyboard-entered command language atoms, NL strings, mouse clicks, output text strings, and output graphical events; (2) a discourse model representation containing one discourse object, called a peg, for each construct (each guise of an individual) under discussion; and (3) the knowledge base (KB) representation of the computer agent's 'belief' system which is used to support its interpretation procedures.</Paragraph>
    <Paragraph position="1"> I present evidence to justify the added complexity of this three-tiered system over standard two-tiered representations, based on (A) cognitive processes that must be supported for any non-idealized dialogue environment (e.g., the agents can discuss constructs not present in their current belief systems), including information decay, and the need for a distinction between understanding a discourse and believing the information content of a discourse; (B) linguistic phenomena, in particular, context-dependent NPs, which can be partially or totally anaphoric; and (C) observed requirements of three implemented HCI dialogue systems that have employed this three-tiered discourse representation.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="22" type="metho">
    <SectionTitle>
THE THREE-TIERED FRAMEWORK
</SectionTitle>
    <Paragraph position="0"> This paper argues for a three-tiered computational model of discourse and reports on its use in knowledge based human-computer interface (HCI) dialogue. The first tier holds a linguistic analysis of surface forms. At this level there is a unique object (called a linguistic object or LO) for each linguistic referring expression or non-linguistic communicative gesture issued by either participant in the interface dialogue. The intermediate tier is the discourse model, a tier with one unique object corresponding to each concept or guise of a concept, being discussed in the dialogue. These objects are called pegs after Landman's theoretical construct (Landman, 1986a). 1 The third tier is the knowledge base (KB) that describes the belief system of one agent in the dialogue, namely, the backend system being interfaced to. Figure 1 diagrams a partitioning of the information available to a dialogue processing agent.</Paragraph>
    <Paragraph position="1"> This partitioning gives rise to the three discourse tiers proposed, and is motivated, in part, by the distinct processes that transfer information between tiers.</Paragraph>
    <Paragraph position="3"> The linguistic tier is similar to the linguistic representation of Grosz and Sidner (1985) and its LO's are like Sidner's NP bundles (Sidner, 1979), i.e., both encode the syntactic and semantic analyses of surface forms. One difference, however, is that NP bundles specify database objects directly whereas LOs are instead &amp;quot;anchored&amp;quot; to pegs in the discourse model tier and make no direct connection to entries in the static  namesake but the term provides the suitable metaphor (also suggested by Webber): an empty hook on which to hang properties of the real object. For more background on the Data Semantics framework itself see (Landman 1986b) and (Veltman, 1981).</Paragraph>
    <Paragraph position="4">  knowledge representation. LOs are also like Discourse Referents (Karttunen, 1968), Discourse Entities ((Webber, 1978), (Dahl and Ball, 1990), (Ayuso, 1989), and others), File Cards (Heim, 1982), and Discourse Markers (Kamp, 1981) in at least two ways. First, they arise from a meaning representation of the surface linguistic form based on a set of generation rules which consider language-specific features, and facts about the logical form representation: quantifier scope assignments, syntactic number and gender markings, distributive versus collective reading information, ordering of modifiers, etc. Janus (Ayuso, 1989) allows for DE's introduced into the discourse context through a non-linguistic (the haptic) channel. But in Janus, a mouse click on a screen icon is assigned honorary linguistic status via the logical form representation of a definite NP, and that introduces a new DE into the context.</Paragraph>
    <Paragraph position="5"> WML, the intensional language used, also includes time and possible world parameters to situate DE's.</Paragraph>
    <Paragraph position="6"> These innovations are all important attributes of objects at what I have called the linguistic tier.</Paragraph>
    <Paragraph position="7"> Secondly, the discourse constructs listed above all correspond either directly (Discourse Referents, File Cards, Discourse Entities of Webber) or indirectly after collapsing of referential equivalence classes (Discourse Markers, DE's of Janus) with referents or surrogates in some representation of the reference world, and it is by virtue of this mapping that they either are assigned denotations or fail to refer. While I am not concerned here with referential semantics I view this linguistic tier as standing in a similar relation to the reference world of its surface forms.</Paragraph>
    <Paragraph position="8"> The pegs discourse model represents the world as the current discourse assumes it to be only, apart from how the description was formulated, apart from the true state of the reference world, and apart from how either participant believes it to be. This statement is similar to those of both Landman and Webber. The discourse model is also the locus of the objects of discourse structuring techniques, e.g., both intentional and attentional structures of Grosz and Sidner (1985) are superimposed on the discourse model tier. A peg has links to every LO that &amp;quot;mentions&amp;quot; it, the mentioning being either verbal or non-verbal and originating with either dialogue participant.</Paragraph>
    <Paragraph position="9"> Pegs, like File Cards, are created on the fly as needed in the current discourse and amount to dynamically defined guises of individuals. These guises differ from File Cards in that they do not necessarily correspond I:1 to individuals they represent, i.e., a single individual can be treated as two pegs in the discourse model, if for example the purpose is to contrast guises such as Superman and Clark Kent, without requiring that there also be two individuals in the knowledge structure. In comparing the proposed representation to those of Heim, Webber, and others it is also helpful to note a difference in emphasis. Heim's theory of definiteness defines semantic values for NPs based on their ability to add new File Cards to the discourse state, their &amp;quot;file change potential.&amp;quot; Similarly, Webber's goal is to define the set of DE's justified by a segment of text.</Paragraph>
    <Paragraph position="10"> Examples of a wide range of anaphoric phenomena are used as evidence of which DEs had to have been generated for the antecedent utterance. Thus, the definition of Invoking Descriptions but no labels for subsequent mention of a DE or discussion of their affect on the DE.</Paragraph>
    <Paragraph position="11"> In contrast, my emphasis is in tracking these representations over the course of a long dialogue; I have nothing to contribute to the theory of how they are originally generated by the logical form representation of a sentence. I am also concerned with how the subsequent utterance is processed given a possibly flawed or incomplete representation of the prior discourse, a possibly flawed or incomplete linguistic representation of the new utterance, and/or a mismatch between KB and discourse. The purpose here is to manage communicative acts encountered in real dialogue and, in particular, HCI dialogues in which the interpreter is potentially receiving information from the other dialogue participant with the intended result of an altered belief structure. So I include no discussion of the referential value of referring expressions or discourse segments, in terms of truth conditions, possible worlds, or sets of admissible models. Neither is the aim a descriptive representation of the dialogue as a whole; rather, the purpose is to define the minimal representation of one agent's egocentric view of a dialogue needed to support appropriate behavior of that agent in real-time dialogue interaction.</Paragraph>
    <Paragraph position="12"> The remainder of this paper argues for the additional representational complexity of the separate discourse pegs tier being proposed. Evidence for this innovation is divided into three classes (A) cognitive requirements for processing dialogue, (B) linguistic phenomena involving context-dependent NPs, and (C) implementation-based arguments.</Paragraph>
  </Section>
  <Section position="4" start_page="22" end_page="25" type="metho">
    <SectionTitle>
EVIDENCE FOR THREE TIERS
A. COGNITIVE PROCESSING CONSTRAINTS
</SectionTitle>
    <Paragraph position="0"> This section discusses four requirements of discourse representation based on the cognitive limitations and pressures faced by any dialogue participant.</Paragraph>
    <Paragraph position="1"> 1.Incompleteness: The information available to a dialogue agent is always incomplete; the belief system, the linguistic interpretation, the prior discourse representation are partial and potentially flawed representations of the world, the input  utterances, and the information content of the discourse, respectively. The distinction between discourse pegs and KB objects is important because it allows for a clear separation between what occurs in the discourse, and what is encoded as beliefs in the KB. The KB is viewed as a source of information consulted by one agent during language processing, not as the locus of referents or referent surrogates. Belief system incompleteness means it is common in dialogue to discuss ideas one is unfamiliar with or does not believe to be true, and to reason based on a partial understanding of the discourse. So it often happens that a discourse peg fails to correspond to anything familiar to the interpreting agent. Therefore, no link to the KB is required or entailed by the occurrence of a peg in the discourse model.</Paragraph>
    <Paragraph position="2"> There are two occasions where the interpreter is unable to map the discourse model to the KB, The first is where the class referenced is unfamiliar to the interpreting agent, e.g., when an unknown common noun occurs and the interpreter cannot map to any class named by that common noun, e.g., &amp;quot;The picara walked in.&amp;quot; The second is where the class is understood but the particular instance being referenced cannot be identified at the time the NP occurs. I.e., the interpreter may either not know of any instances of the familiar class, Picaras, or it may not be able to determine which of those picara instances that it knows of is the single individual indicated by the current NP. The pegs model allows the interpreter to leave the representation in a partial state until further information arrives; an underspecified peg for the unknown class is created and, when possible, linked to the appropriate class. As the dialogue progresses subsequent utterances or inferences add properties to the peg and clarify the link to the KB which becomes gradually more precise. But that is a matter between the peg and the KB; the original LO is considered complete at NP processing time and cannot be revisited.</Paragraph>
    <Paragraph position="3">  2. Contradiction: Direct conflicts between what an  agent believes about the world (the KB) and what the agent understands of the current discourse (the discourse model) are also common. Examples include failed interpretation, misunderstanding, disagreement between two negotiating parties, a learning system being trained or corrected by the user, a tutorial system that has just recognized that the user is confused, errors, lies, and other hypothetical or counterfactual discourse situations. But it is often an important service of a user interface (UI) to identity just this sort of discrepancy between its own KB information and the user's expressed beliefs. How the 15I responds to recognized conflicts will depend on its assigned task; a tutoring system may leave its own beliefs unchanged and engage the user in an instructional dialogue whereas a knowledge  acquisition tool might simply correct its internal information by assimilating the user's assertion.</Paragraph>
    <Paragraph position="4"> To summarize 1 and 2, since dialogue in general involves transmission of information the interpreting agent is often unfamiliar with individuals being spoken about. In other cases, familiar individuals will receive new, unfamiliar, and/or controversial attributes over the course of the dialogue. Thirdly, on the generation side, it is clear that an agent may choose to produce NL descriptions that do not directly reflect that agent's belief system (generating simplified descriptions for a novice user, testing, game playing, etc.). In all cases, in order to distinguish what is said from what is believed, KB objects must not be created or altered as an automatic side effect of discourse processing, nor can the KB be required to be in a form that is compatible with all possible input utterances. In cases of incompleteness or contradiction the underspecified discourse peg holds a tentative set of properties that highlight salient existing properties of the KB object, and/or others that add to or override properties encoded in the KB. 3. Dynamic Guises: Landman's analysis of identity statements suggests a model (in a model-theoretic semantics) that contains pre-defined guises of individuals. In the system I propose, these guises are instead defined dynamically as needed in the discourse and updated non-monotonically. These are the pegs in the discourse model. Grosz (1977) introduced the notion of focus spaces and vistas in a semantic net representation for the similar purpose of representing the different perspectives of nodes in the semantic net that come into focus and affect the interpretation of subsequent NPs. What is in attentional focus in Grosz's system and in mine, are not individuals in the static belief system but selected views on those individuals and these are unpredictable, defined dynamically as the discourse progresses. I.e., it is impossible to know at KB creation time which guises of known individuals a speaker will present to the discourse. My system differs from the semantic net model in the separation it posits between static knowledge and discourse representation; focus spaces are, in effect, pulled out of the static memory and placed in the discourse model as a smactudng of pegs.</Paragraph>
    <Paragraph position="5"> This eliminates the need to ever undo individual effects of discourse processing on the KB; the entire discourse model can be studied and either cast away after the dialogue or incorporated into the KB by an independent operation we might call &amp;quot;belief incorporation.&amp;quot; 4. Information Decay: In addition to monotonic information growth and non-monotonic changes to the discourse model, the agent participating in a dialogue experiences information decay over the course of the conversation. But information from the linguistic, discourse, and belief system tiers decays at different rates and in response to different cognitive forces/limitations. (1) LOs become old and vanish at an approximately linear rate as a function of time counted from the point of their introduction into the discourse history, i.e., as LOs get older, they fade from the discourse and can no longer serve as linguistic sponsors 2 for anaphors; (2) discourse pegs decay as a function of attentional focus, so that as long as an individual or concept is being attended to in the dialogue, the discourse peg will remain near the top of the focus stack and available as a potential discourse sponsor for upcoming dependent referring expressions; (3) decay of static information in the KB is analogous to more general forgetting of stored beliefs/information which occurs as a result of other cognitive processes, not as an immediate side-effect of discourse processing or the simple passing of time.</Paragraph>
    <Paragraph position="6"> kinds (signalled by a bare plural NP in English) to sponsor dependent references to indefinite instances.</Paragraph>
    <Paragraph position="7"> (Substitute &amp;quot;picaras&amp;quot; for &amp;quot;racoons&amp;quot; in Carlson's example to demonstrate the independence of this phenomenon from world knowledge about the referent of the NP.) 3 This holds for mass or count nouns and applies in either direction, i.e., the peg for a specific exemplar can sponsor mention of the generic kind.</Paragraph>
    <Paragraph position="8"> Nancy ate her oatmeal this morning because she heard that il lowers cholesterol.</Paragraph>
    <Paragraph position="9"> The two parameters, partial/total dependence and linguistic/discourse sponsoring, classify all anaphoric phenomena (independently of the three-tiered framework) and yield as one result a characterization of indefinite NPs as potentially partially anaphoric in exactly the same way that definite NPs are.</Paragraph>
    <Paragraph position="10"> B. LINGUISTIC EVIDENCE This section sketches an analysis of context-dependent NPs to help argue for the separation of linguistic and discourse tiers. (Luperfoy, 1991) defines four types of context-dependent NPs and uses the pegs discourse framework to represent them: a dependent (anaphoric) LO must be linguistically sponsored by another LO in the linguistic tier or discourse sponsored by a peg in the discourse model and these two categories are subdivided into total anaphors and partial anaphors. Total anaphors are typified by coreferential, (totally dependent), definite pronouns, such as &amp;quot;himself TM and &amp;quot;he&amp;quot; below, both of which are sponsored by &amp;quot;Karl.&amp;quot; Karl saw himself in the mirror. He started to laugh.</Paragraph>
    <Paragraph position="11"> I stopped the car and when I opened the hoodI saw that a spark plug wire was missing.</Paragraph>
    <Paragraph position="12"> The distinction between discourse sponsoring and linguistic sponsoring, plus the differential information decay rates for the three tiers discussed in Section A, together predict acceptability conditions and semantic interpretation of certain context-dependent NP forms. For example, the strict locality of one-anaphoric references is predicted by two facts: (a) one-anaphors must always have a linguistic sponsor (i.e., an LO in the linguistic tier).</Paragraph>
    <Paragraph position="13"> (b) these linguistic sponsor candidates decay more rapidly than pegs in the discourse model tier.</Paragraph>
    <Paragraph position="14"> Partial anaphors depend on but do not corefer with their sponsors. Examples of partial anaphors have been discussed widely under other labels, by Karttunen, Sidner, Heim, and others, in examples like this one from (Karttunen, 1968) I stopped the car and when I opened the hoodl saw that the radiator was boiling.</Paragraph>
    <Paragraph position="15"> where knowledge about the world is required in order to make the connection between dependent and sponsor, and others like Carlson's (1977) In contrast, definite NPs can be discourse sponsored.</Paragraph>
    <Paragraph position="16"> And the sponsoring peg may have been first introduced into the discourse model by a much earlier LO mention and kept active by sustained attentional focus. Thus, discourse- versus linguistic sponsoring helps explain why definite NPs can reach back to distant segments of the discourse history while one-anaphors cannot. 4 Figure 2 illustrates the four possible discourse configurations for context-dependent NPs. The KB interface is omitted in the diagrams in order to show only the interaction between linguistic and discourse Nancy hates racoons because t.hey ate her corn last year.</Paragraph>
    <Paragraph position="17"> where associating dependent to sponsor requires no specific world knowledge, only a general discourse principle about the ability of generic references to  tiers, and dark arrows indicate the sponsorship relation. In each case, LO-1 is non-anaphoric and mentions Peg-A, its anchor in the discourse model.</Paragraph>
    <Paragraph position="18"> For the two examples in the top row LO-2 is linguistically sponsored by LO-1. Discourse sponsorship (bottom row) means that the anaphoric LO-2 depends directly on a peg in the discourse model and does not require sponsoring by a linguistic form.</Paragraph>
    <Paragraph position="19"> The left column illustrates total dependence, LO-1 and LO-2 are co-anchored to Peg-A. Whereas, in partial anaphor cases (fight column), a new peg, Peg-B, gets introduced into the discourse model by the partially anaphoric LO-2.</Paragraph>
  </Section>
  <Section position="5" start_page="25" end_page="28" type="metho">
    <SectionTitle>
TOTAL ANAPHORA PARTIAL ANAPHORA
</SectionTitle>
    <Paragraph position="0"> Search for a button. Delete it.</Paragraph>
    <Paragraph position="1"> a button, it.</Paragraph>
    <Paragraph position="2"> Search for a button.</Paragraph>
    <Paragraph position="3"> a button, the new icon Search for all buttons.</Paragraph>
    <Paragraph position="4"> Display one.</Paragraph>
    <Paragraph position="5"> all buttons, one Search for a button.</Paragraph>
    <Paragraph position="6"> Delete the label a button the label FIGURE 2. Four Possible Discourse Configurations For Anaphoric NPs The classification of context-dependence is made explicit in the three-tiered discourse representation which also distinguishes incidental coreference from true anaphoric dependence. It supports uniform analysis of context-dependent NPs as diverse as reflexive pronouns and partially anaphoric indefinite NPs. The resulting relationship encodings are important for long-term tracking of the fate of discourse pegs. In File Change Semantics this would amount to recording the relation that justifies accommodation of the new File Card as a permanent fact about the discourse.</Paragraph>
    <Paragraph position="7"> Furthermore, relationships between objects at different levels inform each other and allow application of standard constraints. The three tiers allow you to uphold linguistic constraints on coreference (e.g., syntactic number and gender agreement) at the LO level but mark them as overridden by discourse or pragmatic constraints at the discourse model level., i.e. apparent violations of constraints are explained as transfer of control to another tier where those constraints have no jurisdiction. In a two-tiered model coreferential LOs must be equated (or collapsed into one) or else they are distinct. Here, the discourse tier is not simply a richer analysis of linguistic tier information nor a conflation of equivalence classes of LOs partitioned by referential identity.</Paragraph>
    <Paragraph position="8"> C. EVIDENCE BASED ON AN IMPLEMENTED SYSTEM The discourse pegs approach has been implemented as the discourse component of the Human Interface Tool Suite (HITS) project (Hollan, et al. 1988) of the MCC Human Interface Lab and applied to three user interface (UI) designs: a knowledge editor for the Cyc KB (Guha and Lenat, 1990), an icon editor for designing display panels for photocopy machines, and an information retrieval (IR) tool for preparing multi-media presentations. All three UIs are knowledge based with Cyc as their supporting KB. An input utterance is normally a command language operator followed by its arguments. And an argument can be formulated as an NL string representation of an NP, or as a mouse click on presented screen objects that stand for desired arguments. Output utterances can be listed names of Cyc units retrieved from the knowledge base in response to a search query, self-narration statements simultaneous with changes to the screen display, and repair dialogues initiated by the NL interpretation system.</Paragraph>
    <Paragraph position="9"> Input and output communicative events of any modality are captured and represented as pegs in the discourse model and LOs in the linguistic history so that either dialogue participant can make anaphoric reference to pegs introduced by the other, while the source agent of each assertion is retained on the associated LO.</Paragraph>
    <Paragraph position="10"> The HITS UIs endeavor to use NL only when the added expressive power is called for and allow input mouse clicks and output graphic gestures for occasions when these less costly modalities are sufficient. The respective strengths of the various UI modalities are reviewed in (P. Cohen et al., 1989) which reports on a similar effort to construct UIs that make maximal benefit of NL by using it in conjunction with other modalities.</Paragraph>
    <Paragraph position="11"> Two other systems which combine NL and mouse gestures, XTRA (Wahlster, 1989) and CUBRICON (Neal, et al., 1989), differ from the current system in two ways. First, they take on the challenge of ambiguous mouse clicks, their primary goal being to use the strengths of NL (text and speech) to disambiguate these deictic references. In the HITS system described here only presented icons can be clicked on and all uninterpretable mouse input is ignored. A second, related difference is the assumption by CUBRICON and XTRA of a closed  world defined by the knowledge base representation of the current screen state. This makes it a reasonable strategy to attempt to coerce any uninterpretable mouse gesture into its nearest approximation from the finite set of target icons. In rejecting the closed world assumption I give up the constraining power it offers, in exchange for the ability to tolerate a partially specified discourse representation that is not fully aligned with the KB. In general, NL systems assume a closed world, in part because the task is often information retrieval or because in order for NL input to be of use it must resolve to one of a finite set of objects that can be acted upon. Because the HITS systems intended to generate and receive new information from the user, it is not possible to follow the approach taken in Janus for example, and resolve the NP &amp;quot;a button&amp;quot; to a sole instance of the class #%Buttons in the KB. Ayuso notes that this does not * reflect the semantics of indefinite NPs but it is a shortcut that makes sense given the UI task undertaken.</Paragraph>
    <Paragraph position="12"> In human-human dialogue many extraneous behaviors have no intended communicative value (scratching one's ear, picking up a glass, etc.). Similarly, many UI events detectable by the dialogue system are not intended by either agent as communicative and should not be included in the discourse representation, e.g., the user moving the mouse cursor across the screen, or the backend system updating a variable. In the implemented system NL and non-NL knowledge sources exchange information via the HITS blackboard (R. Cohen et al., 1991) and when a knowledge source communicates with the user a statement is put on the blackboard. Only those statements are captured from the blackboard and recorded in the dialogue. In this way, all non-communicative events are ignored by the dialogue manager.</Paragraph>
    <Paragraph position="13"> Many of the interesting properties of this system arise from the fact that it is a knowledge-based system for editing the same KB it is based on. The three-tiered representation suits the needs of such a system. The HITS knowledge editor is itself represented in the KB and the UI can make reference to itself and its components, e.g., #%Inspector3 is the KB unit for a pane in the window display and can be referred to in the UI dialogue. Secondly, ambiguous reference to a KB unit versus the object in the real world is possible. For example, the unit #%Joseph and the person Joseph are both reasonable referents of an NP: e.g., &amp;quot;When was he born?&amp;quot; requests the value in the #%birthdate slot of the KB unit #%Joseph, whereas &amp;quot;When was it created?&amp;quot; would access a bookkeeping slot in that same unit. Finally, the need to refer to units not yet created or those already deleted would occur in requests such as, &amp;quot;I didn't mean to delete them&amp;quot; which require that a peg persist in focus in the  discourse model independent of the status of the corresponding KB unit. These example queries are not part of the implementation but do exemplify reference problems that motivate use of the three-tiered discourse representation for such systems. The dialogue history is the sequences of input and output utterances in the linguistic tier and is structured according to (Clark and Shaeffer 1987) as a list of contributions each of which comprises a presentation and an acceptance. This underlying structure can be displayed to the user on demand. The following example dialogue shows a question-answer sequence in which queries are command language atom followed by NL string or mouse click.</Paragraph>
    <Paragraph position="14">  Here, output utterances are not true generated English but rather canned text string templates whose blanks are filled in with pointers to KB units. The whole output utterance gets captured from the HITS blackboard and placed in the discourse history. The objects filling template slots generate LOs and discourse pegs which are then used by discourse updating algorithms to modify the focus stack. For example, output-template: #%Holm displayed in #%Inspector3.</Paragraph>
    <Paragraph position="15"> causes the introduction of LOs and pegs for #%Holm and #%Inspector3. Those objects generated as system output can now sponsor anaphoric reference by the user.</Paragraph>
    <Paragraph position="16"> A collection of discourse knowledge sources update data structures and help interpret context dependent utterances. In this particular application of the three-tiered representation, context-dependence is exclusively a fact about the arguments to commands since command names are never context-sensitive.</Paragraph>
    <Paragraph position="17"> Input NPs are first processed by morphological, syntactic, and semantic knowledge sources, the result being a 'context-ignorant' (sentential) semantic analysis with relative scope assignments to quantifiers in NPs such as &amp;quot;Every Lisp programmer who owns a dog.&amp;quot; This analysis would in principle use the DE generation rules of Webber and Ayuso for introducing its LOs. Discourse knowledge sources use the stored discourse representation to interpret context-dependent LO's, including definite pronouns, contrastive oneanaphors, 5 reference with indexical pronouns (e.g. you, my, I, mouse-clicks on the desktop icons), and totally anaphoric definite NPs. 6 The discourse module augments the logical form output of semantic processing and passes the result to the pragmatics processor whose task is to translate the logical form interpretation into a command in the language of the backend system, in this case Cycl, the language of the Cyc knowledge base system.</Paragraph>
    <Paragraph position="18"> Productive dialogue includes subdialogues for repairs, requests for confLrrnations, and requests for clarification (Oviatt et al., 1990). The implemented multimodal discourse manager detects one form of interpretation failure, namely, when a sponsor cannot be identified for an input pronoun. The discourse system initiates its own clarification subdialogue and asks the user to select from a set of possible sponsors or to issue a new NP description as in the example user: EDIT it.</Paragraph>
    <Paragraph position="19"> system: The meaning of &amp;quot;it&amp;quot; is unclear.</Paragraph>
    <Paragraph position="20"> Do you mean one of the following? &lt;#%Ebihara&gt; &lt;#%Inspector3&gt; user: (mouse clicks on #%Inspector3) system: #%Inspector3 displayed in #%Inspector3 The user could instead type &amp;quot;yes&amp;quot; followed by a mouse click at the system's further prompting or &amp;quot;no&amp;quot; in which case the system prompts for an alternative descriptive NP which receives from-scratch NL processing. During the subdialogue, pegs for the actual LO &lt;LO-it&gt; (the topic of the subdialogue) and for the two screen icons for #%Ebihara and #%Inspector3 are in focus in the discourse model.</Paragraph>
    <Paragraph position="21"> Figure 3 illustrates the arrangement of information structures in one multimodal HCI dialogue setting. 7 In this example, the user requests creation of a new button. Peg-A represents that hypothetical object.</Paragraph>
    <Paragraph position="22"> The system responds by (1) creating Button-44, (2) displaying it on the screen, and (3) generating a self-narration statement &amp;quot;Button-44 created.&amp;quot; After the non-verbal event a followup deictic pronoun or mouse click, e.g., &amp;quot;Destroy that (button)&amp;quot; or &amp;quot;Destroy &lt;mouse-click on Button-44&gt;,&amp;quot; could access the peg directly, but a pronominal reference, e.g., &amp;quot;Destroy it&amp;quot; would require linguistic sponsoring by the LO from</Paragraph>
    <Paragraph position="24"/>
    <Section position="1" start_page="27" end_page="28" type="sub_section">
      <SectionTitle>
Design Tool
</SectionTitle>
      <Paragraph position="0"> the system's previous output statement. Because the system responded with both a graphical result and simultaneous self-narration statement in this example, either dependent reference type is possible. The knowledge based graphical knowledge source creates the KB unit #%Button44 as an instance of #%Buttons, but in this 15I the user is unaware of the underlying KB and so cannot make or see references to KB units directly.</Paragraph>
      <Paragraph position="1"> Note that Pegs A and B cannot be merged in the discourse model. The followup examples above only refer to that new Button-44 that was created.</Paragraph>
      <Paragraph position="2"> Alternatively (in some other UI) the user might have made total- and partial anaphoric re-mention of Peg-A by saying &amp;quot;Create a button. And make it a round 0ng.&amp;quot; The relationship between the two pegs is not identity. However this is not just a fact about knowledge acquisition interfaces, since the IR system might have allowed similar elaborated queries, &amp;quot;Search for a button, and make sure it'.__~s a round one. ''8 The relationship between Pegs A and B arises from their being objects in a question-response pair in the structured dialogue history.</Paragraph>
      <Paragraph position="3"> Finally, if the system is unable to map the word, say it were &amp;quot;knob,&amp;quot; to any KB class then that constitutes a missing lexical item. Peg-A still gets created but it is not hooked up to #%Buttons (yet). In response to a 'floating' peg a UI system could choose to engage the user in a lexical acquisition dialogue, leave Peg-A underspecified until later (especially appropriate for text understanding applications), or associate it with the most specific possible node 8Analogous to the issue in Karttunen's John wants to catch a fish and eat it for supper.</Paragraph>
      <Paragraph position="4">  temporarily (e.g., #%Icons or #%PhysicalObjects).</Paragraph>
      <Paragraph position="5"> The eventual response may be to acquire a new class, #%Knobs, as a subclass of icons, or acquire a new lexical mapping from &amp;quot;knob&amp;quot; to the class #%Buttons. The implemented systems which test the discourse representation were built primarily to demonstrate other things, i.e., to show the value of combining independent knowledge sources via a centralized blackboard mechanism and to explore options for combining NL with other UI modalities.</Paragraph>
      <Paragraph position="6"> Consequently, the NL systems were exercised on only a subset of their capabilities, namely, NP arguments to commands, which could be interpreted by most NLU systems. The dialogue situation itself is what argues for the separation of tiers.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML