File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1414_metho.xml

Size: 40,646 bytes

Last Modified: 2025-10-06 14:14:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1414">
  <Title>report, German Research Center for Artificial Intelligence (DFKI).</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 A Multimodal Interpretation System
</SectionTitle>
    <Paragraph position="0"> In this section the definition of the syntax and semantics of the languages L and G to express the multimodal message of Figure 2 is presented. The language L is designed to produce expressions useful to refer to objects, properties and relations commonly found in discourse about maps. In particular, the natural language expressions of Figure 2 can be constructed in a compositional fashion within L.The language G, on the other hand, is expressive enough to refer to geometrical objects, properties and relations found in drawings. The definitions of L and G follow closely the general guidelines of Montague's semiotic programme. As a first step of the syntactic definition the set of categories or types is stated. For each type of a language a corresponding type in the other language is defined. Basic constants of the source language can be mapped either to basic or composite expressions of the corresponding type in the object language; in a similar fashion a composite expression of the source language can be mapped into a basic or composite expression of the object language. A number of basic constants for each of these types is defined and the combination rules for producing composite expressions are stated. Associated to each syntactic rule a translation rule mapping the expression formed by the rule to its translation to the other language is defined. In the same way that the interpretation of the natural language expressions in the PTQ system (Dowry, 1985) is given indirectly through the translation to intensional logic, which has a model-theoretic semantic interpretation, the interpretation of expressions of L is given indirectly through its translation to expressions of G, as shown in Figure 3.</Paragraph>
    <Paragraph position="1"> The interpretation of expressions of G, in turn, is explicitly given through the model Me. The interpretation function FL states the normal meaning for English words, and Fp is determined by transitivity once the translation function between G and L is defined, and no further formalization for FL and Fp is presented in this paper. Another simplifying assumption rests on the consideration that the interpretations of all expressions included in these languages depend only on the current graphical state and no intensional types are included in the definition of L and G. However, this analysis can be extended on the lines of intensional logic if to deal with a more comprehensive fragment of English is required. In Section 2.1 the definition of L is presented and in Section 2.2 the language G and its interpretation function are formally defined.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Definition of the Language L
</SectionTitle>
      <Paragraph position="0"> The language L is designed to produce expressions like Saarbriicken lies at the intersection between the border between France and Germany and a line from Paris to Frankfurt in a compositional fashion.This means that all basic constants like France and Germany, and also all subexpressions of the former sentence, like the border between France and Germany or a line from Paris to Frankfurt can also be produced.</Paragraph>
      <Paragraph position="1"> In addition, language L can produce expressions like France is a country, Frankfurt is a city of Germany or Germany is to the east of France which express common sense knowledge required in the interpretation of maps. Next, the definition of L is presented.</Paragraph>
      <Paragraph position="2"> The set of syntactic categories of L is as follows:  1. The basic syntactic categories of L are t, IV and CN, where t is the category of sentences, IV is the category of intransitive verbs and CN is the category of common nouns.</Paragraph>
      <Paragraph position="3"> 2 If A and B are syntactic categories then A/B is a  category.</Paragraph>
      <Paragraph position="4"> Traditional syntactic categories of natural language like transitive verbs (TV), terms (TL propositional phrases (PP) and determiners (T/CN) can be derived from the basic categories.</Paragraph>
      <Paragraph position="5"> For each syntactic category of L there is a corresponding type in G. The correspondence between linguistic categories and geometrical types resembles the translation from English to Intensional Logic (Dowty, 1985) and it is defined in terms of the function f as follows:  1. f(t) = t.</Paragraph>
      <Paragraph position="6"> 2. f(CN) = f(IV) = &lt;e, t&gt;.</Paragraph>
      <Paragraph position="7"> 3. For any categories A and B, f(A/B) = &lt;f(B),f(A)&gt;.  The following table illustrates the basic constants of L with their category names, category definition and the corresponding type in the graphical language:  As can be seen in Figure 5, simple terms like the names of cities and countries translate into characteristic functions of sets of individuals. This graphical type is interpreted as the set of properties (or predicates holding in the interpretation state) that an individual named by the term has. So, as a city is represented through a dot in the graphical domain, the translation of Paris, for instance, is the set of properties that the dot representing Paris has in the intepretation state. Common nouns are translated into graphical predicates: city translates into the set of dots representing cities. Transitive verbs are translated into functions taking predicates as their arguments and producing sets of individuals as their values: the verb phrase be a city, for instance, translates into a set of dots representing cities. Determiners, prepositional phrases and intransitive verbs function in a similar fashion, although there are no basic constant of the last two categories, as prepositional words are introduced syncategorematically and intransitive verb phrases are always composite expressions in this grammar. In Figure 6 the translation for all basic constants of L into G is presented. The interpretation of the expressions in the column for G are clarified below in this paper when G is formally defined.</Paragraph>
      <Paragraph position="8"> Next, the syntactic rules of L and the translation rules to G are presented. Each rule is presented in a box containing the purpose of the rule, the syntactic rule itself with examples of expressions that can be formed with the rule, and finally the translation rule of expressions formed by the rule to their corresponding expressions in G. Following Montague, syntactic rules and the syntactic operations for combining symbols (for instance, Ft) associated to each rule are separated. In the following, Pc is the set of expressions of catergory  S1L. = 5&amp;quot; If c~ ~ Pr and 5 E Pry, then F1(~,5) ~ P,, where Ft(~, 5) C/x 5&amp;quot;, y is the result of replacing the first verb in5 by its third person singular present form. Examples: -Paris is a city of France -Germany is to the east of France -Saarbrticken lies at the intersection between the border between France and Germany and a line from Paris to Frankfurt T1L. If Ot ~ Pr and 5 ~ Pry, and ct, 5 translate into a', 5', respectively, then F1(o~,5) translates into a'(5').</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
TRANSITIVE VERB PHRASES
</SectionTitle>
    <Paragraph position="0"> S2L. IfS~Prvand13~Pr, thenFz(5,13)~ Ptv, whereF2(5,~)=S\[3.</Paragraph>
    <Paragraph position="1"> Examples: -be a city -be to the east of France T2L. If5 ePrv and 13 ~Pr, and 5, 13 translate into 5', 13', respectively, then F2(5,13) translates into 5'(~').  -border between France and Germany -line from Paris to Frankfurt -intersection between the border between France and Germany and a line from Paris to Frankfurt T4L. If I~ e PeP and 8 e Pc~, and 8, \[~ translate into 8', 13', respectively, then F2(8, 13) translates into 13' (8'). 106 L.A. Pineda and G. Garza of PREPOSITIONAL PHRASES S5L. If CX ~ Pr, then F4(oO ~ Ppp, where F4(~) = of ~ Example: of France T5L. If a C/ Pr, and a translates into a', then F4(a) translates into of(o:) where of is a short-hand definition for either of the expression (1) or (2) of G: (1) Lx&lt;&lt;~, t&gt;, t&gt; kY&lt;~, t&gt; kZ~\[y(z) ^ inside(x)(z)\] (2) Lx&lt;&lt;e, t&gt;, t&gt; kY&lt;e, t&gt; ~,ze\[zone (x)(y)(z)\].</Paragraph>
    <Paragraph position="2">  For instance, of(tx) is obtained by applying of&amp;quot; to o~' as follows: 2~,x&lt;&lt;e. t&gt;, t&gt; ~,Y&lt;~. t&gt; kz~\[y(z) ^ inside(x)(z)\] (t~') which can be reduced to kY&lt;e, t&gt; ~Ze\[y(z) ^ inside(ot')(z)\].</Paragraph>
    <Paragraph position="3"> Although of has been introduced syncategorematically in L for simplicity, it could have been defined as a basic constant of some category of L and its translation to G would have been a composite expression of some graphical type.</Paragraph>
    <Paragraph position="4"> between PREPOSITIONAL PHRASES S6L. If ~, 13 ~ Pr, then Fs(ct, 13) e PeP, where Fs(~ 13) = between ex and 1~. Examples: -between France and Germany -between the border between France and Germany and a line from Paris to Frankfurt T6L. Ifcx, I~ e Pr, and ct, I~ translate into cx', 13', respectively, then Fs(cx, 13) translates into between*(a')(\[~') where between* is a short-hand definition for either of the expression (1) or (2) of G:  (1) kx&lt;&lt;~. ,&gt;. t&gt; ~Y &lt; &lt;e. t&gt;. t&gt; )~Z&lt;e, ,&gt; ~,u, \[Z(U) ^ curve_between(x)(y)(u)\] (2) ~x&lt;&lt;e, t&gt;, ,&gt; ~,Y&lt;&lt;~. t&gt;. ,&gt; kz&lt;e. t&gt; ~,ue \[z(u) ^ intersection_between(x)(y)(u)\].</Paragraph>
    <Paragraph position="6"> S7L. If if,, 13 ~ Pr, then F6(o~, 6) E PeP, where Ft(o~, 1~) =from C/t to \[~.</Paragraph>
    <Paragraph position="7"> Example: from Paris to Frankfurt T7L. If OC, 13 ~ Pr, and c~, ~ translate into a', I~', respectively, then Ft(c~, 1~) translates intofrom_to*(C/x')(~') wherefrom_to&amp;quot; is a short-hand definition for the following expression of G: 2~.,x&lt;&lt;e, t&gt;, ,&gt; ~Y&lt;&lt;e. t&gt;. ,&gt; ~,Z&lt;e. t&gt; ~Ue \[Z(U) ^ line_from_to(x)(y)(u)\] A Model for Multimodal Reference Resolution 107</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Definition of the Language G
</SectionTitle>
      <Paragraph position="0"> In this section, the syntax and semantics or the graphical language are formally defined, as well as the rules for translating graphical expressions back into natural language. The types of language G are defined  as follows: (1) e is a type (graphical objects).</Paragraph>
      <Paragraph position="1"> (2) t is a type (truth values).</Paragraph>
      <Paragraph position="2"> (3) If a and b are any types, then &lt;a, b&gt; is a type. (4) Nothing else is a type.</Paragraph>
      <Paragraph position="3">  Let V~ be the set of variables of type s, C~ the set of basic constants of type s, and Es the set of well-formed expressions of graphical type s. The basic constants are presented in Figure 7.</Paragraph>
      <Paragraph position="4"> Basic constant</Paragraph>
      <Paragraph position="6"> dot, region, curve, line, intersection, right  TV &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;e, t&gt;&gt; ! &lt;e, &lt;e, t&gt;&gt; !none i = ^, ~ &lt;t, &lt;t, t&gt;&gt; none inside &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;e, t&gt;&gt; TV curve_between &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;e, t&gt;&gt;&gt; none intersection_between &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;e, t&gt;&gt;&gt; none line_from_to &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;e, t&gt;&gt;&gt; none zone &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;&lt;e, t&gt;, e&gt;&gt; none of&amp;quot; &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;&lt;e, t&gt;, &lt;e, t&gt;&gt;&gt; none i &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;&lt;&lt;e, t&gt;, t&gt;, &lt;&lt;e, t&gt;, &lt;e, t&gt;&gt;&gt;&gt; between*, fromto*  The symbols of', between* and from_to* are abbreviations for the corresponding expressions in G as mentioned above in the definition of rules T5L, T6L and T7L.</Paragraph>
      <Paragraph position="7"> In the same way that the translation of basic constants of L into G where given with the purpose to understand the translation rules T1L tO T7L, in Figure 8 and 9 the reciprocal translations are given. Note that constants of G in Figure 8 translate into basic constants of L; however, the translation shown in Figure 9 are more complex as composite expressions of G can translate into basic or composite expressions of L. Consider that expressions 1 to 6 of G in Figure 9 represent the graphical objects dl, d2, ds, r~, r2 and c~ in Figure 4. The reason to represent these objects with a type-rised is that the language G is designed to allow quantification over the graphical  representing Paris, the corresponding expression denotes the set of geometrical properties that the dot representing Paris has. According to this, expression 1 (of G) in Figure 9 denotes the set containing all sets of dots of the drawing in which dt is included; thus, if P is the set of all dots representing cities, dl is included in P (that is to say, P is a property of dl), but if P is the  1. If a~ C,, then a~ E,.</Paragraph>
      <Paragraph position="8"> 2. If I.t~ Vs, then kt~ E~.</Paragraph>
      <Paragraph position="9"> 3. IfcmE&lt;a.b&gt; y \[3EEa, then a(~)~Eb.</Paragraph>
      <Paragraph position="10"> 4. IfaC/Ea y U~Vb, then ku\[a\] e E&lt;a.b&gt;.</Paragraph>
      <Paragraph position="11"> 5. IfLu\[ot\] ~ E~.b&gt;, and \[3~Ea, then XU\[Ct\](~)EEb. 6. If ~ ~ V~ and 13~ Et then 3~t(13) ~ Et.</Paragraph>
      <Paragraph position="12"> 7. If g C/ V, and ~C/ Et then Vp(\[~) ~ E,.</Paragraph>
      <Paragraph position="13">  Note that all expressions of L can be translated into G; however, G is a very expressive language and only a subset of well-formed expressions of G has translation into L. The definition of this last subset the format used for introducing L is also used.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SENTENCES
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> translates into ct' 5&amp;quot;, where 5&amp;quot; is the result of replacing the firt verb in 5' for its third person singular present form.</Paragraph>
    <Paragraph position="3"> Translation of the Examples:  - Paris is a city of France - Germany is to the east of France - Saarbriicken lies at the intersection between the border between France and Germany and a line from Paris to Frankfurt</Paragraph>
    <Paragraph position="5"> If 8~E&lt;&lt;e. t.&gt;. &lt;&lt;e, t&gt;. t&gt;&gt;&gt;, 4eE&lt;,. t&gt;, and 8, 4 translate into 8', 4', respectively, then 6(4 ) translates into 5&amp;quot; 4', where 8&amp;quot; is 8' except in the case where 8' is a and the first word in 4 begins with a vowel; here, 8&amp;quot; is an.</Paragraph>
    <Paragraph position="7"> T4G. If 13C/E&lt;&lt;,, t&gt;, &lt;,, t&gt;&gt;, and 8~E&lt;,, t&gt;, and 13, ~5 translate into 13', 8', respectively, then 13(5) translates into 8' 13'.</Paragraph>
    <Paragraph position="8"> Translation of the Examples: -city of France -east of France -border between France and Germany -line from Paris to Frankfurt -intersection between the border between France and Germany and a line from Paris to  If ix, I~E&lt;&lt;e, ,&gt;, t&gt; Y a, 13 translate into a', ~', respectively, then from_to*(tx)(13) translates into from o~' to 9'.</Paragraph>
    <Paragraph position="9"> Translation of the Example: from Paris to Frankfurt. T7c.</Paragraph>
    <Paragraph position="10"> The semantics for the language is given in a model-theoretic fashion as follows.</Paragraph>
    <Paragraph position="11"> Let A be the set of graphical individuals A = {dl, d2, d3, rl, r2, r3, r4, cl, c2, c3, c4, c5, c6, 11, rightside, right-rb right-r2, right-r3, right-r4}, where dl to 11 are the graphical entities shown in Figure 4, right-side is &amp;quot;the right&amp;quot; and right-re is the zone at the right side of region r~. Let Dx be the set of possible denotations for expressions of type x, such that De = A, Dt = { 1, 0}, and, for any types a and b, D&lt;~.b&gt; = DbD~ (i. e., the set of all functions from D~ to Db). Let F be an interpretation function that assigns to each constant of type a a member of Da. The interpretation (assigned by F) of the constants dot, region, curve, line, right, intersection are the sets containing the corresponding graphical objects. The interpretation of the constants lie_at, in_zone, inside, curve_between, intersection_between, line_from_to, zone (whose types are shown in Figure 6) are geometrical functions. If the arguments of these functions have an appropriate geometrical type (dot, region, curve, etc.) expressions containing these constants can be properly interpreted through a geometrical algorithm; however, if some of the arguments are not of the right kind of geometrical object, then expressions containing these constants have no denotation in G and, as a consequence, their translation into L lack a denotation too. These conditions can be computed with the help of the type-predicates for geometrical objects in G.</Paragraph>
    <Paragraph position="12"> Following Montague, the interpretation of variables is defined in terms of an assignment function g. It is also adopted the notational convention by which the semantic value or denotation of an expression tx with respect to a model M and a value assignment g is expressed as \[\[a\]\] M,~. The semantic rules for interpreting the language L are the following:  1. Ifa~ C~, then \[\[a\]\]M= ~a).</Paragraph>
    <Paragraph position="13"> 2. If I.t e V~, then \[\[~\]\]M,g = g(B).</Paragraph>
    <Paragraph position="14"> 3. If aeE&lt;a,b&gt;, and 13eE,~, then \[\[ct(\[3)\]\] M's = \[\[a\]l~'ff\[\[p\]\]) M's 4. If o~eEa and ueVb, then \[\[Xu\[o~\] \]\] M,g is that function h from Db into Da such that for all objects k in Db, h(k) is equal to \[\[et\]\] M'g. 5. IfaeEa, ueVb, and \[3~Eb, then \[\[Xu\[tx\](\[~)\]\] M'g is equal to \[\[tx(u/13)\]\] M'g, where a(ul\[3) is the result of replacing all ocurrences of u for 83 in a.</Paragraph>
    <Paragraph position="15"> 6. If B~V, y 13~E, then \[\[3B(13)\]\] M'8 =1 ifffor some value assignment g' such that g' is exactly like g except possibly for the individual assigned to I.t by g,, \[\[p\]\]M,g' =1.</Paragraph>
    <Paragraph position="16"> 7. Ifl.t~Vs y \[3~Et then \[\[VI.t(\[3)\]\] u'g =1 iff for every value assignment g' such that g' is exactly like g  except possibly for the individual assigned to ~ by g,, \[\[p\]\]M,~' =1. With this, the specification of the system of multimodal interpretation presented in Figure 3 is concluded. In this system it is possible to express natural language and graphics and translate expresssion between each other as stated in Section 1. It is also possible to interpret multimodal messages in which part of information is expressed in one modality but some information is carried out in the other modality. One advantage of the system is that a natural language question can be answered by considering the graphics; for instance, if one asks (with a suitable extension to the language) what is the distance between Paris and Frankfurt? the answer could be obtained by translating the question into the graphical domain where the distance sought could be computed in terms of the geometry (assuming that the map is drawn at a given scale) and the numerical value could be translated back into natural language. In addition, if 112 L.A. Pineda and G. Garza reasoning models acting upon representations of each of the modalities were stated, problems could be solved in the modality requiring the lower reasoning effort.</Paragraph>
    <Paragraph position="17"> Another advantage of the system is that it permits to rule out natural language expressions which are well-formed syntactically but are, nevertheless, meaningless. The expression a city of France is well-formed and has a well-defined reference; however, the expression a city of Paris, with the same syntactic form, is not well-defined semantically. This last expression can be rule out as ungrammatical as its translation into the graphical language does not have an interpretation in terms of the geometry. If a condition to the effect that a expression of L is grammatical only if its translation into G has a well-defined denotation, the graphical domain imposes a kind of selectional restriction which simplifies greatly the syntactic definition of L.</Paragraph>
    <Paragraph position="18"> At this point, one warning note is in order: the graphical language is probably too expressive and more complex than required. However, having explicit quantification in the graphical domain can pose some interesting questions. Traditionally graphics is considered appropriate for representing concrete situations and natural language is better to express abstractions. However, this is not necessarily the case: consider the example of Figure 1 in which the drawings of a man, a car and a bucket were taken to represent concrete individuals. This is so because the man and the car were taken to be the antecedents of the anaphoric pronouns he and it. In a suitable graphical language of the kind developed here a definite reference to such graphical objects can be made as follows:</Paragraph>
    <Paragraph position="20"> However, some pressupositions are involved in this interpretation choice as the graphical symbols could also be taken to be representing any individual or even the set of all individuals of the kind. In the language G all of these readings of the drawings can be expressed as shown in Figure 10.</Paragraph>
    <Paragraph position="21"> The multimodal interpretation system developed here does not solve the question of which interpretation should be preferred to, and this question can only be resolved most probably at a pragmatic level; however, the distinction can be made and it should be taken into consideration in a general theory of graphical interpretation. Consider, for instance, whether the drawing of a car on a road sign preventing cars from parking should be interpreted as a particular car, any car or all cars.</Paragraph>
    <Paragraph position="22">  The theory also suggests an intriguing path of exploration related to interactive issues. In the same way that natural language expressions can be input directly through the interface, concrete graphical symbols are normally placed on the screen by graphical input devices. The question is, however, whether it is possible and useful to input expressions of G, like the ones in Figure 10, directly. At the moment this issue is left for further research.</Paragraph>
    <Paragraph position="23"> One last consideration is that the system of multimodal interpretation presented here provides a sound representational scheme to refer in an uniform way to symbols on the screen and to the objects in the world that they represent. In particular, the ambiguity of natural language expressions making interwoven reference to objects on the screen and their interpretation can be placed in a clear representational setting.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Incremental Interpretation
</SectionTitle>
    <Paragraph position="0"> In the theory developed in Section 2 it was assumed that the translation of the basic constants of all categories from L to G and vice versa were known, and then multimodal interpretation and reasoning were possible; however, in the interpretation of multimodal messages, natural language and graphics are input from different sources, and working out the meaning of a multimodal message is by no means trivial. As was discussed in Section 1, solving the graphical anaphora, finding out the reference of deictic pronouns and inducing the translation function are related problems that need to be solved for the interpretation of multimodal messages. Consider, for instance, the situation of reading a book with words and pictures: when the associations between text and graphical symbols is realized by the reader, the message as a whole has been properly understood. However, it cannot be A Model for Multimodal Reference Resolution 113 expected that such an association can be known beforehand.</Paragraph>
    <Paragraph position="1"> Inducing this translation function is similar to the computer vision problem of interpreting drawings. A related antecedent is the work on the logic of depiction (Reiter et al., 1987) in which a logic for the interpretation of maps to be applied in computer vision and intelligent graphics is developed. It is argued that any adequate representation scheme for visual (and computer graphics) knowledge must mantain the distinction between knowledge of the image (the graphics) and knowledge of the scene (its interpretation), and about the depiction relation. In Reiter's system two sets of first order logical sentences representing the scene and the image are employed, and express, respectively, the conceptual and geometrical knowledge about hand drawn sketch maps of geographical regions. The depiction relation corresponds to the translation function between constants of L and G discussed above. An interpretation in Reiter's system is defined as a model, in the logical sense, of both sets of sentences and the depiction relation, and interpreting a drawing consists in finding out all possible models of such sets of sentences.</Paragraph>
    <Paragraph position="2"> Although computing the set of models of a set of first order sentences is computationally untractable problem, the entities constituting a drawing conform, normally, a finite set which is often small. So, the possibility of computing the set of models of a drawings is a matter for empirical research. In particular, Reiter's system employs a constraint satistaction algorithm to find out all possible interpretation of maps, and the output of his system is a set of labels for curves or chains as rivers, roads or shores, and for areas as land regions or water regions. As was mentioned above, to find the translation functions between G and I., is a similar problem with the same kind of complexity.</Paragraph>
    <Paragraph position="3"> As a side effect of working out the translation between basic graphical and linguistic constants, a method for generating natural language expressions that refer to graphical objects and configurations is at hand. Consider that a natural language description can have both simple and composite referring expression that translate into basic graphical constants, and inducing the linguistic translation of a graphical term which has not been named is the same as generating a linguistic description for such an object. Next, an algorithm for constructing the translations is illustrated.</Paragraph>
    <Paragraph position="4"> As a preliminar consideration it is important to highlight that such translations cannot be built with the overt information expressed through the multimodal message. For working out the interpretation of Figure 2, for instance, it is required, in addition to the text and graphics, knowledge about the geography of Europe and also knowledge about interpretation conventions of maps. To find out the translation such knowledge must be employed.</Paragraph>
    <Paragraph position="5"> The conventions about the interpretation of maps are expressed as a correspondence between graphical and conceptual types, for instance, that dots represent cities and regions represent countries. General knowledge about maps, either geometrical or conceptual, will constraint the possible translation.</Paragraph>
    <Paragraph position="6"> The algorithm for computing the translation function has two parts; the purpose of the first is to assign a graphical constant to all terms in the overt textual message according to the grammar of L, and the second is to assign a referring expression to the remaining graphical constants of the drawing. For the example in Figure 2, the output of the first part is shown in Figure 11.</Paragraph>
    <Paragraph position="7">  the border between France and Germany c t a line from Paris to Frankfurt 11 the intersection between the border d2 between France and Germany and a line from Paris to Frankfurt Figure 11 The second part would assign a description to the remaining graphical constants as shown in Figure 12.</Paragraph>
    <Paragraph position="8">  For the definition of the algorithm a function table for representing the set of possible functions from graphical to linguistic constants of the corresponding semantic types is defined. The interpretation conventions for maps are stated through the order pairs in the following set: 1 = {&lt;dot, city&gt;, &lt;region, country&gt;, &lt;curve, border&gt;, &lt;line, line&gt;}. The function table for the first pair in relation to Figure 2 and 4 is shown in Figure 13.  As can be seen the function table in Figure 13 relates all dots to all cities in the text. As any dot can represent any city, but different cities are represented by different dots, each dot must be associated to one city by filling the box in which the dot intersects the corresponding city in the function table. Considering that once a dot has been assigned to a city, the row corresponding to that city cannot be filled out for the other dots. According to this, if there are n cities, the first dot receiving an interpretation can be assigned in n different ways (it can represents one of the n cities), the second in n-1 different ways, etc. As a consequence, each function map represents n! possible interpretation functions (if all cities are represented).</Paragraph>
    <Paragraph position="9"> The first step in the algorithm is to identify all graphical and linguistic basic constants from the overt message and draw the function tables for the interpretation conventions set L In our example basic linguisctic constants referring to graphical objects (proper names) name cities and countries, and only a function table for region representing countries (in addition to Figure 13) is considered as shown in Figure 14.</Paragraph>
    <Paragraph position="10">  The next step is to fill out tables in Figures 13 and 14 with all possible interpretations that are consistent with the overt knowledge and also the background knowledge about the interpretation task, in this case knowledge about the geography of Europe. The general knowledge to be considered for this example is shown in Figure 15. Note that clauses 1 to 6 are general knowlege of geography, but clause 7 is introduced explicitly in the multimodal message. Note as well that there might be a considerable amount of knowledge about the geography of Europe which is not included in Figure 15. However, how knowledge is brought about to the interpretation process is beyond the scope of this paper. The only consideration is that the range values of the function tables are the main indices that somehow retrieve the information from memory.</Paragraph>
    <Paragraph position="11">  1. France is a country 2. Germany is a country 3. Paris is a city of France 4. Frankfurt is a city of Germany 5. i Saarbrticken is a city of Germany 6. i Germany is to the east of France 7. :Saarbrticken lies at the intersection  between the border between France and Germany and a line from Paris to Frankfurt Figure 15 For filling out the function tables th set of all constraints should be considered. As shown by Reiter (Reiter et al., 1987) all possible models can be found for a finite set of graphical symbols-with a constraint satisfaction algorithm. Along this line, we are exploring strategies to find out the set of models incrementally extending the function tables filling out one column of one table at a time by considering one constraint at a time. This is done in a way that the extended model satisfies all constraints considered so far. The process continues until all function tables are filled out.</Paragraph>
    <Paragraph position="12"> For the example, propositions 3 to 7 are considered to produce the function tables illustrated in Figure 16. Note, in particular, that if proposition 7 is not considered a diagonal model for the intepretation of dots would also be admissible.</Paragraph>
    <Paragraph position="13">  A Model for Multimodal Reference Resolution 115 The next step of the algorithm consists in identifying the translations of composite referring expressions to complete table in Figure 11. This can be done with the help of the grammar, the translation functions, the semantic interpretation of G and the translations already computed in Figure 16. In fact, once the translation of basic constants is known the translation of composite terms using those constants can be found compositionally in terms of the translation rules from L to G and the interpretation rules of G. Consider, additionally, that the translation for basic constants for other categories is given beforehand as shown in Figure 6. For instance, the translation of the composite expression the border between France and Germany into G is</Paragraph>
    <Paragraph position="15"> to produce the final value which is, due to the type rising of terms, the set of properties that c, has, and for simplicity we take it to be the constant c~.</Paragraph>
    <Paragraph position="16"> Next, the second part of the algorithm producing the translation of the set UNNAMED of graphical constants that have no name, as shown in Figure 12, is described. In order to carry out this process, the first step is to identify the types of all constants in UNNAMED with the help of the type predicates of G. For each constant it is required to identify all geometrical functions producing objects of the constant type. So, the same constant may be produced by a number of ways depending on the geometrical functions available for producing objects of the constant type. The next step consists in identifying all combinations of expressions that can be used as arguments of geometrical functions, forming in this way a set of expressions that can produce the constant at hand. Each of these expressions is interpreted. From this process two kinds of outputs can be expected: either the expression has a well-defined value or it does not.</Paragraph>
    <Paragraph position="17"> Expressions having a proper value must be combined with a graphical quantifier, and probably with other expressions of G (to produce the between* term from the geometrical function curve_between). The resulting term can be translated back into the natural language, producing in this way the corresponding description.</Paragraph>
    <Paragraph position="18"> Suppose it is desired to find a referring expression of the constant ct of graphical type curve. Considering all geometrical functions denoted by the basic constants of G, only curve and curve_between can produce curves. The constant curve denotes the set of curves on the drawing, so the expressions ~,P~Q3x\[P(x) ^ Q(x)\](curve) --a border-- and 2~P~Q3y\[Vx\[P(x) ~ x=y\] ^ Q(y)\](curve) --the border-- can be formed; the interpretation of the former expression results in the set of properties that one curve or another has, including the properties of curve ct; however, the intepretation of the last expression results in an empty set as there is more than one curve in the drawing. So, only the former expression generated by the constant curve is a possible description of ct. If the constant curve_between is considered, the expressions shown in Figure 16 can be obtained (the expressions denoting empty sets --as between*(kP\[P(rl)\]) kP\[P(ri)\]) curve)---- where omitted). As can be seen, only the first of these expressions refers to the curve denoted by c,.</Paragraph>
    <Paragraph position="19">  Thus, the expressions a border, a border between France and Germany and the border between France and Germany are possible descriptions for cl. The pragmatic choice of which expression is the most appropriate in an interpretation context goes beyond the scope of this paper; here we are only concerned with the computation of the set of expressions that can be correctly produced in terms of the multimodal representation.</Paragraph>
    <Paragraph position="20"> This procedure has one additional complication that has to be taken into account. Once a graphical object has been produced by the procedure mentioned above it can also extend the set of the argument combinations of the graphical functions referring to other well-formed graphical objects.</Paragraph>
    <Paragraph position="21"> Note that it is possible to produce very large expressions and even infinite ones with the unconstrained recursive application of the procedure. It could be possible to produce, for instance, the border between France and Germany 116 L.A. Pineda and G. Garza between France and Germany between France and Germany. This expression is well-formed and denotes the border between France and Germany. In order to prevent these kind of very large expressions an algorithm for generating the set of simplest but maximally expressive expressions of a graphical language has been proposed (Santana, 1995).</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Multimodal Discourse Representation
</SectionTitle>
    <Paragraph position="0"> Theory The ability to interpret individual multimodal messages is a prerrequisite for interpreting sequences of multimodal messages occuring in the normal flow of interactive conversations. In the same sense that discourse theories, like DRT, are designed to interpret sequences of sentences, it is desirable to have a theory in which sequences of multimodal messages can be understood. Such a kind of theory would have to support anaphoric and deictic resolution models in an integrated fashion, and would have to be placed in a larger pragmatic setting in which intentions and presupositions are considered, and in which mechanisms to retrieve knowledge from memory are also taken into account. To work out such a theory is quite an ambitious goal, however, in the same way that DRT focuses in internal structural processes that govern anaphoric resolution, it is plausible to consider a multimodal discourse representation theory (MDRT) to cope with referential aspects of multimodal communication. In the same way that DRT postulates discourse representation structures in which referents and conditions are introduced incrementally through the interpretation of the incoming natural language discourse through the application of construction rules, it is plausible to conceive similar multimodal discourse representation structures (MDRS) whose referents and conditions would be introduced by modality depending construction rules acting upon the expressions of the corresponding modality. The definition of such an extention of DRT is a long-term goal of this work.</Paragraph>
    <Paragraph position="1"> A consequence of the notion of modality that has been developed so far is that expressions referring to graphical objects and relations are well-defined in a suitable language, and could be included as referents and conditions in the proposed MDRS. In these structures, DRS-conditions extracted from different modalities would be kept in separate partitions, but the discourse referents would be abstract objects common to the whole MDRS. The formalization of modalities in terms of representation languages would permit to extend DRT, allowing to handle different modalities, as long as the conditions and referents were introduced by construction rules that triggered by specific syntactic configurations of the representation language of the modality in question.</Paragraph>
    <Paragraph position="2"> In summary, the definition of this kind of structures would be possible if the following three questions could be answered: how information of different modalities can be incorporated into a MDRS, how discourse referents common to expressions of different modalities can be identified, and lastly, how simplification of conditions involving different modalities can be carried out.</Paragraph>
    <Paragraph position="3"> The suggestion is that these three problems can be solved in terms of the scheme shown in Figure 3 and Section 2, and the interpretation process illustrated in Section 3. For the moment these issues are left for further work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML