File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/83/e83-1031_metho.xml
Size: 28,695 bytes
Last Modified: 2025-10-06 14:11:35
<?xml version="1.0" standalone="yes"?> <Paper uid="E83-1031"> <Title>Wolfgang Wahlster FBI0 - Angewandte Mathematlk</Title> <Section position="4" start_page="188" end_page="188" type="metho"> <SectionTitle> FAMILIAR WITH SCENE BUT CANNOT SEE IT PDP-IO NL DIALOG SYSTEM HAM ANS IMAGE SEQUENCE ANALYSIS SYSTEM NAOS MORIO \] IL STREET INTERSECTION </SectionTitle> <Paragraph position="0"> selection of the deep case slots for an um~marked extended response: Select the deep case slots which contain the concepts necessary for the perceptual verification of the motion descrlbed by the verb.</Paragraph> <Paragraph position="1"> In order to verify a stop-event it is necessary to determine the end point of the motion (Cf. (4a)) but not the cause (cf. (4b)). For a turn-off event a change of direction between source and goal must be established (cf. (Sa)). It is not essential to determine whether other objects make this change of direction at the same time (cf. (Sb)).</Paragraph> <Paragraph position="2"> Hence case role filling for the construction of an extended response can be regarded as a side effect of the visual, search necessary to answer the question. null This also appears plausible when seen in the light of the beliefs that the questioner imputes to the answerer. The questioner believes that the answerer will fill in the case sluts necessary for answering the question and that it is therefore unnecessary to explicitly mention these in the question. Additionally the questioner believes that the answerer believes that the questioner expects an extended reply and fur this reason did not explicitly request the additional information. A cooperative dialog system fulfills this user expectation by applying the heuristic formulated above.</Paragraph> <Paragraph position="3"> A prerequisite for the application of this heuristic is that \[he system have knowledge about which deep case slots are relevant for the verification OF a movem~mt. This prerequisite is not met by most natural language (NL) systems since they sim- ply represent events in the domain or discourse in fully instant~ated Form using case frames, e.g. as part of a semantic net or frame hierarchy. In contrast, the G,,rman language dialog system HAM-ANS (Hamburg application--oriented natural language system) \[6\], which we have developed, can apply this heuris(~c because in addition to the case frame of each verb the system includes a representation of the referential semantics of predications associated with that verb which makes it possible to ~valuate the ViSual input data for the movement in question.</Paragraph> <Paragraph position="4"> The goal of this article is to elucidate the representation constructions for case frames and referential semantics of verbs of motion used in HAM-ANS and to illustrate their use in generating unmarked extended responses.</Paragraph> </Section> <Section position="5" start_page="188" end_page="189" type="metho"> <SectionTitle> 2. A SHORT OVERVIEW OF HA~-ANS </SectionTitle> <Paragraph position="0"> HAM-ANS is a large German natural language dialog system of both considerable depth and breadth which presently provides access to three different application classes, namely an expert system (hotel reservation situation), a database system (fishery data) and a scene analysis system (traffic scene).</Paragraph> <Paragraph position="1"> The communicative situations the system handles are characterized as follows: In the hotel reservation situation the system takes the role of a hotel manager, who tries to persuade the user to book a room. The caller is assumed to have the overall goal of determining whether the room offered meets his requirements. The system must attempt to recognize the user s specific desires concerning the room as they are revealed - usually indirectly - in his utterances and to make use of the various devices available in natural language that permit the room in question to be presented in a particularly favorable light (e.g. the generation of tendentious descriptions using hedged relative adjectives).</Paragraph> <Paragraph position="2"> In the database application \[8\] HAM-ANS provides marine scientists with NL access to a fishery database, which contains data collected during international expeditions in the South Atlantic and the Indian Ocean. The complex structured data sets, which consist of cruise data concerning the vessels' voyages , krill biology data, relevant net hauls and oceanographic data, are stored in a relational database (76 relations. 11MByte) with a PASCAL/R data base management system.</Paragraph> <Paragraph position="3"> In the traffic domain we presuppose the following situational context: The system has observed a street intersection and supplies on the phone the user, who is familiar with the scene but cannot see it from his remote location, with information about the traffic at that intersection (cf. fig.</Paragraph> <Paragraph position="4"> I). A geometrical description of the observed scene Is provided by the image sequence analysis system NAOS/HORIO \[g\]. In this application our research efforts are concerned with the handling of time (tense and temporal adverbs) and the semantics of locomotion verbs (cf. fig. 2).</Paragraph> <Paragraph position="5"> ? How many trucks have you seen? TWO.</Paragraph> <Paragraph position="6"> ? Which trucks have gone by?</Paragraph> </Section> <Section position="6" start_page="189" end_page="189" type="metho"> <SectionTitle> A LIGHT-COLORED ONE ON HARTUNGSTREET. </SectionTitle> <Paragraph position="0"> ? Did a Mellow car and a green station wagon turn off first? NO. A YELLOW CAR TURNED OFF HARTUNGSTREET ONTO SCHLUETERSTBEET, BUT THE GREEN STATION DID NOT TURN OFF FIRST.</Paragraph> <Paragraph position="1"> ? Were a young man and an old woman walking down Schloeterstreet during this time? YES. AN OLD WOMAN AND A YOUNG MAN.</Paragraph> <Paragraph position="2"> ? Did the green station wagon stop on Schlueterstreet? YES.</Paragraph> <Paragraph position="3"> ? How many station wagons altogether have stopped? ONE OF THE THREE ON SCHLUETERSTREET. ? What did I just ask you? YOU ASKED: HOW MANY STATION WAGONS STOPPED. ? Did a station wagon drive off? YES, ONE ON SCHLUETEBSTREET.</Paragraph> <Paragraph position="4"> ? By the way, the light-colored truck: Did it drive down Schlueterstreet too? YES.</Paragraph> <Paragraph position="5"> How many cars are there in all? SIXTEEN.</Paragraph> <Paragraph position="6"> ? Is one of these cars blue 9 YES. TWO OF THEM ARE KIND OF BLUE. ? Did a blue car drive down both streets? YES, ONE BLUE CAB.</Paragraph> <Paragraph position="7"> So long! Fig. 2: A transcript of an actual dialog (translated from German) 3. REPRESENTATION FORMALISMS FOR THE SEMANTICS OF</Paragraph> </Section> <Section position="7" start_page="189" end_page="189" type="metho"> <SectionTitle> LOCOMOTION VERBS 3,1. THE GEOMETRICAL SCENE DESCRIPTION </SectionTitle> <Paragraph position="0"> A basic requirement for answering questions about movements that have occurred in real sequences of scenes is an adequate representation of these sequences. Not only the shape, the centers of gravity, col,,r, etc. of objects must be represented, but also the trajectories of moving ob\]ects.</Paragraph> <Paragraph position="1"> Thls geometrical scene description consists of a combination of automatically generated outputs oF the scene analysis processes (insofar as this is presently possible) and a number of manual augmentations. null The length in time of the scene under consideration is ca. 14 sec., which corresponds.to ca. 360 single TV images. From these 360 lmages 72 snapshots are coded in a relational formalism, denotlng which objects were observed, the shape of these objects, their current center of gravity and some other properties (e.g. color). The represen ration of the first snapshot contains information about all objects that are visible at that time.</Paragraph> <Paragraph position="2"> For the successive snapshots only changes with respect to the predecessors are recorded, i.e.</Paragraph> <Paragraph position="3"> objects and their descriptions are only entered if they have changed location or appeared in the scene. A trajectory of an object is determined by its different centers of gravity relative to an underlying coordinate system. In contrast to the real TV image sequence this representation is only 2 dimensional and thus provides a bird's-eye view of the scene.</Paragraph> <Paragraph position="4"> 3.2. THE REPRESENTATION LANGUAGES SURF AND DEEP The logic-oriented semantic representation languages SURF and DEEP are the central representation formalisms used in HAM-ANS. These languages are designed to be declarative and easily extendable. SURF is the target language of the analysis components and source language for the generation components and thus as close as possible to NL utterances, whereas DEEP is better suited for the evaluation of utterances on the basis of the system's domain-specific knowledge sources.</Paragraph> <Paragraph position="5"> Originally SURF and DEEP were designed to represent term and predicate structures which serve as a representation formalism for state descriptions occurring typically in the hotel reservation situation. For an adequate representation of the semantics of questions containing verbs, the definition of SURF and DEEP was augmented by meta-predicates for marking deep cases, tense and voice adapted from Fillmore's deep case theory \[3\]. Since events can be existentially quantlfied as in (6) or explicitly quantified as in (7) (6) Did \]ohn fly to Hamburg? (7) Did John fly to Hamburg three times last week? SURF and DEEP provide a means of representing quantification of events. A special quantifier E-ACT denotes an existential quantification of events. Other quantifiers like those in (7) are currently not available but can easily be included. Examples of SURF and DEEP expressions are shown in the annotated example (cf. fig. 8).</Paragraph> <Paragraph position="6"> In this paper only some of the features of SURF and DEEP are discussed, see \[6\] for a more detailed description.</Paragraph> </Section> <Section position="8" start_page="189" end_page="190" type="metho"> <SectionTitle> 3.3. THE CASE-FRAME LEXICON </SectionTitle> <Paragraph position="0"> The case frames for verbs used in the system are stored in the case-frame lexicon \[5\]. Each entry in the word lexicon for a verb contains a pointer to its applicable case frame which describes the semantics of that verb in terms of case relations.</Paragraph> <Paragraph position="1"> A case frame is represented as a combination of deep case descriptions specifying for each deep case its name, a marker, whether the deep case is obligatory (0) or optional (F), and the semantic restrictions which are required from a syntactic substructure to fill the deep case (of. fig. 3).</Paragraph> <Paragraph position="2"> This pointer technique permits the use of a specific case frame for several verbs during the analysis phase without predetermining a single process for these verbs during the evaluation of whole utterances. For verbs with different referential semantics, e.g. 'to accelerate' and 'to stop', a single case frame, namely that specitying an obligatory AGENT of type 'vehicle' and a optional LOCATIVE of type 'thoroughfare', is applied during the analysis phase.</Paragraph> <Paragraph position="3"> Case frames are formulated in SURF so that the checking of the semantic restrictions can be accomplished by the inference rules usually applied during the evaluation of a complete utter-ance; The selectional restriction that, e.g., the NP a car' describe an object of the class of vehicles, and therefore be a possible candidate to fill ~ the agent role of the verb 'to stop', can be verified because of the transitivity of the superset relation in the conceptual semantic net.</Paragraph> <Paragraph position="4"> In the case-frame lexicon the case frames are not recorded in the form shown in fig. 3. but rather are represented as constructor calls for building \[rl-s: ageL~t: \[d-l: rolommarker: 0 restrictions: (lambda: xl \[af-a: ISA xl VEHICLE\]\]\] objective: SOUrce: locative.</Paragraph> <Paragraph position="5"> (d-l: rolA--marker: F restrictions: \[lambda: xl (af-a: ISA xl THOROUGHFARE\]\]\] goal: time: path: instrumeht:\] Fig. 3: Case frames for verbs of type 'to stop a case frame according to the actual syntax definition of SURF, This guarantees that all possible modifications of SURF are immediately present in the case frames. 3.4. OB3ECT-ORZENTEB REPRESENTATION OF MOTION</Paragraph> </Section> <Section position="9" start_page="190" end_page="190" type="metho"> <SectionTitle> CONCEPTS </SectionTitle> <Paragraph position="0"> In object-oriented programming languages programming is more or less the activity of creating a world of entities called objects and of specifying a set of generic operations that can be performed on them* Objects can communicate with each other by sending and receiving messages. Essentially, running a program means that the object sends a message to ar, object (possibly to itself) which in turn sends a message etc., until the required task is fulfilled. An important benefit of the object-oriented style is that it lends itself to a particularly simple and lucid kind of modularity.</Paragraph> </Section> <Section position="10" start_page="190" end_page="190" type="metho"> <SectionTitle> 3.4.1. THE FLAVOR SYSTEM </SectionTitle> <Paragraph position="0"> The Flavor system \[2\] \[13\] is an implementation of the language features that support object-oriented programming. Two kinds of objects exist in a Flavor system, namely one called flavor and the other instance of a flavor. A flavor represents a generic object and an instance an individual realization of a ge,~eric object. It is possible to send messages to both kinds of objects. Flavors are organized in ,, directed graph called the flavor graph* There is one designated flavor, the vanilla flays, r, which corresponds to the thing frame in FRL \[I0\]. Since the heritage of information for each flavor is provided by the flavor graph, it zs necessary to specify for each newly defined flavor its location in the graph by naming its direct predecessors (its superflavors). The information contained in a flavor is a combination of all the information inherited from its superflavors and the added information given by its own definition. The added information can also override, augment or modify the inherited information.</Paragraph> <Paragraph position="1"> This is one dimension of the information contained in a flavor: owned or inherited. Another is the declarative/procedural distinction. The declarative knowle~tge of a Flavor is stored in variables of different kinds whereas procedural knowledge is encoded in so called methods* One kind of variable the instance variable - is used to give instances of the same generic object their individual information. The other kind - the class variable is owned by a flavor, can be 'bequeathed' to other flavors, and accessed by any object in the flavor system. However, a flavor is only allowed to change a value of a class variable, if it owns this variable.</Paragraph> <Paragraph position="2"> Methods are function definitions that implement the operations defined for each flavor. The combination of methods from different flavors is called mixing flavors.</Paragraph> <Paragraph position="3"> In comparison with FRL the Flavor system has mainly three distinguishing features: The 'A kind of' slot in FRL serves both for establishing an inheritance hierarchy and for connecting instances to superclasses, i.e. no clear distinction is made between generic frames and instances* On the other hand the flavor graph is built by specifying the superflavors for each flavor, instances are created by the make-instance-method.</Paragraph> <Paragraph position="4"> Because the distinction between generic frames and instances is not made in FRL there is also no distinction between instance variables and class variables* In the Flavor system the semantics of variables is more clearly defined in that instance variables can only be modified in instances and class variables can only be modified in flavors* Frames in FRL are passive data structures, whereas flavors can be (re-)activated, created and modified; they are autonomous; they are declarative and procedural at the same time and hence are entities which are better suited for as formalisms for representing common knowledge (cf. \[2\]).</Paragraph> <Paragraph position="5"> Although the flavor system is a tool for the development of large software systems and not a knowledge representation language, it includes the basic concepts for the rapid design of specific knowledge representation formalisms. In contrast to a full-fledged knowledge representation language this approach requires some additional programming in the beginning, but it avoids any permanent overhead for features which are superfluous for the task at hand*</Paragraph> </Section> <Section position="11" start_page="190" end_page="191" type="metho"> <SectionTitle> 3.4.2. THE MOTION CONCEPT HIERARCHY </SectionTitle> <Paragraph position="0"> The Flavor system is used in HAM-ANS for representing a specialization hierarchy of motion concepts (cf. fig. 4). The root flavor of this hierarchy is the motion concept HOVE. Descendants in the tree, e.g. GO_BY, TURN inherit the declarative and procedural information contained</Paragraph> <Paragraph position="2"> in their parents. Instance variables comprise information about the deep cases associated with the motion concept as well as information needed and extracted by methods. The methods are respon-sible for checking the referential semantics of the motion concepts. Instances of a flavor denote specific events in the domain of discourse that could be verified by the application of the methods.</Paragraph> <Paragraph position="3"> The methods of the additionally defined flavors TIME and SPACE are responsible for temporal and ~;patial computations. Instances of these flavors determine the temporal and spatial description of the actual scene: the length of the scene in time, the number of snapshots, the spatial extent, etc.</Paragraph> <Paragraph position="4"> The task of checking.the truth value of the proposition in ;~ user s question is accomplished through messaqe passing. These messages include: creating in' Lances of motion concepts, e.g.</Paragraph> <Paragraph position="5"> TURN120, inst.,~tiating deep case slots specified il, the question, and activating appropriate (nt! t hod S .</Paragraph> <Paragraph position="6"> Let's now con,,zder the example given in fig. 5 in more detail. '.ince only the AGENT was specified in the questioh, the selected method is ONLY AGENT Sl~'l !ILLED. After determinirlg an interval ~f c~nsideration this method calls further m~.thods, namely FIND_A_SOURCE, DIRECTION_CHAUGE and FIND_A_GOAL NEQ ~;OURCE.</Paragraph> <Paragraph position="7"> DIRECTION CIIAI;GE is a special method of the flavor TURN. Th~ first and last methods are inherited (of. fig. 5) from flavor GO_BY because they are also needed in that flavor for answering questions like: 'Has the yellow car driven from Biberstreet to Hartungstreet~'.</Paragraph> <Paragraph position="8"> FIND A SOURCE identifies the first entry of the agen~'~ trajectory in the interval of consideration and checks which of the objects of the static background these coordinates belong to. For this test only those static objects are selected that satisfy the selectional restrictions for the source slot specified in the case-frame lexicon.</Paragraph> <Paragraph position="9"> If the test succeeds for an object, the name o~ this object is stored in the source slot, DIRECTION CHANGE now follows the agent's trajectory look~ng for a significant change of direction. If this test is also positive,</Paragraph> </Section> <Section position="12" start_page="191" end_page="191" type="metho"> <SectionTitle> FIND A GOAL NEQ_SOURCE is tried. This method </SectionTitle> <Paragraph position="0"> searches fur a point on the trajectory which is not inside the ob3ect identified in the source slot. If there is such a point, the same selectional check as for the source slot is executed for the possible goal object. The successful application of these methods yields a ful\].y instantiated flavor instance, e.g. TIJRN120 (cf.</Paragraph> <Paragraph position="1"> fig. ?).</Paragraph> </Section> <Section position="13" start_page="191" end_page="194" type="metho"> <SectionTitle> 4. AN EXAMPLE OF THE PROCESSING OF AN UTTERANCE </SectionTitle> <Paragraph position="0"> The processing of a user's utterance may be illustrated by an example taken .from the dialog in fig.</Paragraph> <Paragraph position="1"> motion concept hierarchy The following discussion of some of the processing phases can hi:st be understood if continual re~erence is made to fig. B, which shows a traced version of the example.</Paragraph> <Paragraph position="2"> The processing of a user's NL input starts with a rather elaborate lexical and morphological analysis - a process which on the one hand reduces single words to their canonical forms with their morphologi<al and syntactic features (e.g. gender, person, number) and on the other hand recognizes syntagmatic groups of words and discontinuous verb constituents, transforming them according to predefined rules. The generated structure - the preterminal string (not shown in fi@. 8) - forms the input to the parser. The syntactlc analysis consists of two different strategies, both of which use the same ATN-definitions of syntactic categories, e.g. for noun phrases and prepositional phrases. One of these strategies - always applied for sentences with copula verbs - uses a surface grammar to cope with word order variations. The other is a case-driven analysis strategy which is used for sentences containing verbs with an associated case frame.</Paragraph> <Paragraph position="3"> Since in the example the verb 'to go by' has a case frame the second strategy is applied. After an access to the case-frame lexicon the case frame is constructed. This case frame is used to guide the parsing in the following manner: The al@orithm first attempts to recognize those syntactic constituents that are possible candidates for a deep case marked obligatory, and then to recognize those constituents that are possible candidates for optional deep cases. When the input is completely consumed and all obligatory deep cases are filled the process ends.</Paragraph> <Paragraph position="4"> The test for determining if a syntactic consti- tuent is a possible candidate to fill a specific deep case is divided into a syntactic and a semantic check. The syntactic check requires, e.g., that in order to fill the agent role a constituent must contain the attribute 'nominative' (sentence in active voice) and that its number must correspond to that of the verb. The semantic check requires that the noun of the constituent fulfill the semantic restrictions specified for the specific deep case. This is accomplished through the building of a SURF expression for the constituent, the transformation of this expression into a DEEP expression, and the evaluation of the DEEP expression on the basis of the conceptual net.</Paragraph> <Paragraph position="5"> In our example only the agent case is marked as obligatory and the noun phrase 'which trucks' fulfills both the syntactic and semantic requirements to fill this slot. Since no other syntactic constituents are encountered, the complete SURF representation is constructed.</Paragraph> <Paragraph position="6"> The structure is normalized into a DEEP structure.</Paragraph> <Paragraph position="7"> One of the maln tasks or this process is the determination of the scope of quantifiers. The algorithm used for this purpose is modelled after the one described by Hendrix \[4\]; it takes into account the relative strength of natural language quantifiers (e.g. 'a', 'both') and question opera-tots (e.g. 'which' 'how many ). The strength is determined by a numeric value, which in some cases is modified by the degree of generality of the noun. E.g. the existential quantifier 'a' is weaker than the more specific (luantifier 'both'.</Paragraph> <Paragraph position="8"> Since, in the example discussed, the question operator 'which' is stronger than the existential quantifier for verbs 'E-ACT', the structure is rearranged.</Paragraph> <Paragraph position="9"> The task of evaluating a OEEP formula is governed by a generate and test strategy. Generate and test procedures can De viewed as being activated by pattern-directed invocation and differ from each other in that the generate procedures assign internal object identifiers to variables in DEEP formulas, while the test procedures yield two values, the first of which is either a fully instantiated formula equivalent to the input formula or a modified formula, and the second of which indicates the truth value of the input formula in the range \[0,1\]. In the interpretation phase these two processes interact in such a way that a test attempt activates generate procedures which in turn call test procedures and so on.</Paragraph> <Paragraph position="10"> A closer look at our example shows that after the first test attempt has discovered a structure containing a variable in this case the term representing the noun phrase 'which trucks' - a package of generate procedures is activated to produce the set of object identifiers denoting the referential set of objects that are trucks - here TRUCK1 and TRUCK2. The rest of the formula is then recursively sent to a test process with the variable 'w14' replaced by elements of the reference set for trucks one after the other.</Paragraph> <Paragraph position="11"> The next formula to be tested requires the generation of a set of instances of the type GO_BY.</Paragraph> <Paragraph position="12"> Since events are not represented in fully instantiated form but rather must be extracted from the geometrical scene description, a special set of procedures - the methods specified in the verb flavor hierarchy - is activated. (See section 3.4.2 for how this process functions,) A verification of an event GO BY is possible only for TRUCK2. The additional ~nformation extracted durin 9 the process of visual search - the specific location of the event - is recorded in the locative slot.</Paragraph> <Paragraph position="13"> During the formation of the result of the evaluation, the system, guided by general heuristics, decides whether the additional detail will cause too ~reat a complexity in the answer or not \[11\].</Paragraph> <Paragraph position="14"> In this case the complexity is suitable and the location will be mentioned in the answer. The word 'which' is defined as quantifier that causes a description of a set of objects to be returned (instead of a truth value). Thus the set of reference objects for which the proposition in question could be verified, i.e. TRUCK2, is substituted for the noun phrase 'which trucks'.</Paragraph> <Paragraph position="15"> The resulting DEEP expression is transformed by the inverse normalization process into a SURF expression. In order to verbalize extended responses in a manner both informative and concise as possible, the ellipsis generation process elides those parts of the semantic representation of complete responses that are identical to the stored representation of the question \[?\].</Paragraph> <Paragraph position="16"> The verbalization component produces a string of canonical words and their grammatical features using translation rules attached to the various categories of SURF expressions, A special subcomponent provides for the generation of noun phrases as descriptions of domain individuals, in our example TRUCK2. In this case the NP-generator decides not to generate a definite description since neither the system nor the user has already referred to TRUCK2 in the previous dialog and the existence of TRUCK2 as a moving ob3ect is not implied by the existential assumptions supplied by the a priori user model (cf. \[?\]). Instead, the indefinite NP a light-colored truck' is generated, using the property 'light-colored' as an initial characterization.</Paragraph> <Paragraph position="17"> Finally the &quot;surface transformation' component \[1\] pronominalizes the noun 'truck' and yields a standard word order of the utterance and the correctly inflected forms of the canonical words.</Paragraph> </Section> class="xml-element"></Paper>