File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-1414_intro.xml

Size: 7,698 bytes

Last Modified: 2025-10-06 14:06:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1414">
  <Title>report, German Research Center for Artificial Intelligence (DFKI).</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
FL
</SectionTitle>
    <Paragraph position="0"> Figure 3 of graphical symbols consituting the graphical modality proper (i.e the actual symbols on a piece of paper or on the screen). Note that two sets of expressions are considered for the graphical modality: the expressions in G belong to the formal language in which graphics are represented and reasoned about, and are thought of as the &amp;quot;form&amp;quot; of the overt graphical symbols whose &amp;quot;substance&amp;quot; (in the Saussurean sense) contained in P cannot be manipulated directly. The set W stands for the world and together with the functions Fp, FG, FL constitutes a multimodal system of interpretation. The order pair &lt;W, Ft.&gt; defines the model ML for the natural language, and the order pair &lt;W, FG o Fp&gt; defines the model MG for the graphical language.</Paragraph>
    <Paragraph position="1"> In order to illustrate how this multimodal system of interpretation works consider, for instance, that the denotations of the picture of a man and the word he in Figure 1 is the same individual in the world; in the same way, the denotations of the word Saarbriicken and the dot on the intersection between the straight line and the lower segment of curve representing the border between France and Germany in Figure 2 are also the same, which is the city of Saarbriicken itself. So, if one asks who is he? looking at Figure 1, the answer is found by computing pL(he) whose value is the picture of the man on the figure. Once this computation is performed the picture can be highlighted or signaled by other graphical means. If one points out the middle dot in Figure 2 at the time the question what is this? is asked, on the other hand, the answer is found by applying the function Pc to the pointed dot, whose value would be the word Saarbriicken.</Paragraph>
    <Paragraph position="2"> It should be clear that if all theoretical elements illustrated in Figure 3 are given, questions about 102 L.A. Pineda and G. Garza multimodal scenarios can be answered through the interpretation process; that is to say, evaluating expressions of a given modality in terms of the interpreters of the languages involved and the translation functions.</Paragraph>
    <Paragraph position="3"> However, when one is instructed to interpret a multimodal message, like Figures 1 and 2, not all information in the scheme of Figure 3 is available. In particular, the translation functions PL and PG are not known, and the crucial inference of the interpretation process has as its goal to induce these functions. Such an inference can be thought of as the same process that the one involved in solving the so-called linguistic anaphor with pictorial antecedent and the pictorial anaphor with linguistic antecedent. It would also be equivalent to finding out deictic references if &amp;quot;the visual world&amp;quot; is thought of as represented through expressions of the graphical representation modality.</Paragraph>
    <Paragraph position="4"> These are three different ways of looking at the same problem.</Paragraph>
    <Paragraph position="5"> It is important to highlight that in order to induce PL and PG the information overtly provided in the multimodal message is usually not enough. As will be discussed below in this paper, such a process will also require to consider the grammatical structrure of the languages involved, the definition of translations rules between languages, and conceptual knowledge stored in memory about the interpretation domain.</Paragraph>
    <Paragraph position="6"> Another consequence of the scheme in Figure 3 is that it provides the basis for generating referring expressions of a given modality in terms of information provided in other modalities. Consider that basic constants or composite expressions of the languages G and L can be translated to basic or composite expressions of the other language, depending on the definition of the translation function. So, if ones needs to refer linguistically to a graphical configuration, for instance, it would only be required to find an expression of G which expresses all graphical attributes of the desired object in the most simple fashion, and then translate it to its corresponding expression in L. The resulting natural language expression could be used directly or embbeded in a larger natural language expressions containing words that refer to abstract objects or properties. To illustrate this point consider the natural language text Saarbriiken lies at the intersection between the border between France and Germany and a line from Paris to Frankfurt. This sentence contains the definite description the intersection between the border between France and Germany and a line from Paris to Frankfurt, which in turns contains a number of simplier (basic and composite) referring expressions.</Paragraph>
    <Paragraph position="7"> Finding the graphical referent of these expressions requires the identification of dots, lines and curves (and parts of curves) in the map that have the same referent. However, the map in Figure 2 has graphical entities that have an interpretation but are not named in the text (consider Figure 4 in which the graphical entities of Figure 2 have been labeled). For instance, Belgium is represented by region r4, and the curve r6 represents the border between France and Belgium.</Paragraph>
    <Paragraph position="8"> Once the picture has been interpreted one would be entitle to ask not only for graphical objects that have been named, but also for any meaningful graphical object. So, if one points to the curve c6 in Figure 2, one answer provided could be The border between France and Belgium. As some graphical objects named by constants of the graphical language do not have a proper natural language name, the translation function Po must associate a basic constant of G with a composite description of L. The process of inducing such a translation function produces the corresponding referring expressions too.</Paragraph>
    <Paragraph position="10"> In the rest of this paper, some preliminar results of how this programme can be carried out are presented.</Paragraph>
    <Paragraph position="11"> In Section 2, a formalization of the languages L and G with their corresponding translation functions and semantic interpretation, along the lines of Montague's general semiotic programme, is presented. In this section the process of multimodal interpretation and reasoning is explained, and the translation of expressions of one modality in terms of the other is illustrated. However, such a process can be carried out only if the translation functions are known and, as was mentioned above, that is not normally the case. In Section 3, some initial results on how such functions can be induced in terms of the multimodal message, constraints on the interpretation conventions of the modalities, and constraints on the general knowledge about the domain, are presented. In this section the process of generating graphical and linguistic referring A Model for Multimodal Reference Resolution 103 expressions, which is associated with the induction of the translation functions, is also illustrated. In Section 4 a preliminar discussion on the feasibility of extending Kamp's DRS with multimodal structures is presented. Finally, in the conclusion a tentative reflexion on the relation of anaphora and spacial deixis on the light of such a kind of theory is advanced.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML