File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/84/p84-1106_abstr.xml
Size: 7,079 bytes
Last Modified: 2025-10-06 13:46:13
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1106"> <Title>NAtural Language driven Image Generation</Title> <Section position="1" start_page="0" end_page="495" type="abstr"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> In this paper the experience made through the development of a NAtural Language driven Image Generation is discussed. This system is able to imagine a static scene described by means of a sequence of simple phrases. In particular, a theory for equilibrium and support will be outlined together with the problem of object positioning.</Paragraph> <Paragraph position="1"> i. IntrOduction A challenging application of the AI techniques is the generation of 2D projections of 3D scenes starting from a possibly unformalized input, as a natural language description. Apart from the practically unlimited simulation capabilities that a tool of this kind could give people working in the show business, a better modeling of the involved cognitive processes is important not only from the point of view of story understanding (Wa8Oa,WaS\]a), but also for a more effective approach to a number of AI related problems, as, for instance, vision or robot planning (So76a). In this paper we discuss some of the ideas on which is based a NAtural Language driven Image Generation (NALIG from here on) which has been developed for experimental purposes at the University of Genoa. This system is currently able to reason about static scenes described by means of a set of simple phrases of the form: csubject~ ~preposition~ cobject, \[ creference~ \] (*).</Paragraph> <Paragraph position="2"> The understanding process in NALIG flows through several steps (distinguishable only from a logic point of view), which perform object instantiation, relation inheritance, translation of the surface expression into unambiguous primitives, (*) NALIG has been developed for the Italian language; the prepositions it can presently analyze are: su, sopra, sotto, a destra, a sinistra, vicino, davanti, dietro, in. A second deeply revised release is currently under design.</Paragraph> <Paragraph position="3"> This work has been supported by the Italian Department of Education under Grant M.P.I.-27430.</Paragraph> <Paragraph position="4"> consistency checking, object positioning and so on, up to the drawing of the &quot;imagined&quot; scene on a screen. A general overview of NALIG is given in the paper, which however is mainly concerned with the role of common sense physical reasoning in consistency checking and object instantiation.</Paragraph> <Paragraph position="5"> Qualitative reasoning about physical processes is a promising tool which is exciting the interest of an increasing number of A.I. researches (Fo83a,Fo83b,Fo83c) , (Ha78a,Ha79a) , (K179a,K183a). It plays a central role in the scene description understanding process for several reasons: i. naive physics, following Hayes definition (Ha78a), is an attempt to represent the common sense knowledge that people have about the physical world. Sharing this knowledge between the speaker and the listener (the A.I. system, in our case) is the only feasible way to let the second make realistic hypotheses about the assumptions underlying the speaker utterances; ii. it allows to reach conclusions about problems for which very little information is available and which consequently are hard to formalize using quantitative models; iii. qualitative reasoning can be much more effective to reach approximate conclusions which are sufficient in everyday life. It allows to build a hierarchy of models in order to use every time the minimal requested amount of information, and avoid to compute unnecessary details.</Paragraph> <Paragraph position="6"> Within the framework of naive physics, most of the current literature is devoted to dynamic processes. As far as we are concerned with the description of static scenes, other concepts are relevant as equilibrium, support, structural robustness, containment and so on. With few exceptions (Ha78a), qualitative theories to address these problems are not yet available even if some useful suggestions to approach statics can be found in (By8Oa). In this paper, a theory for equilibrium and support will be outlined. An important aspect of the scene description understanding process is that some amount of qualitative analysis can never be avoided, since a well defined position must be completed for every object in order to draw the image of the scene on a screen. This computation must not result in an overspecification that masks the degree of fuzziness which is intrinsic in object positions (Wa79s), in order to avoid to unnecessarily constrain all the following reasoning activities. The last section of the paper will be devoted to the object positioning problem.</Paragraph> <Paragraph position="7"> 2. Object taxonomy and spatial primitives Spatial prepositions in natural language are often ambiguous, and each one may convey several different meanings (Bo79a,He80a). Therefore, the first step is to disambiguate descriptions through the definition of a proper number of &quot;primitive relationships.</Paragraph> <Paragraph position="8"> The selection of the primitive relation representing the meaning of the input phrase is based mainly, but not only, on a taxonomy of the involved objects, where they are classified depending on attributes which, in turn, depend on the actual spatial preposition. An example may be given by the rules to select the relation H SUPPORT(A,B) (that is A is horizontally supported by B) from the phrase &quot;A on B&quot;.</Paragraph> <Paragraph position="9"> This meaning is chosen by default when some conditions are satisfied. First of all, A must not bel~g to that special category of objects which, when properly used, are flying, as aircrafts, unless B is an object expressly devoted to support them in some special case: so, &quot;the airplane on the runway&quot; is likely to be imagined touching the ground, while for the &quot;airplane on the desert&quot; a flying stats is probably inferred (of course, the authors cannot exclude that NALIG default reasoning is biased by their personal preferences).</Paragraph> <Paragraph position="10"> FLYING(A) and REPOSITORY(A,B) predicates are used to formalize these facts. To be able to give horizontal support, B must have a free upper surface ((FREETOP(B)), walls or ceilings or closed doors in an indoor view do not belong to this category. Geographic objects (GEO(X)) impose a special care: &quot;the mountains on the lake&quot; cannot be interpreted as the lake supporting the mountains and even if only B is a geographic object, but A can fly, physical contact seems not to be the most common inference (&quot;the birds on the garden&quot;). Hence, a first tentative rule is the following (the actual rule is much more complex): not GEO(A) and not(FLYING(A) and not REPOSITORY(A,B)) and ((FREETOP(B) and not GEO(B)) or (GEO(B) and not CANFLY(A)))</Paragraph> </Section> class="xml-element"></Paper>