File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-4181_metho.xml
Size: 15,271 bytes
Last Modified: 2025-10-06 14:13:01
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-4181"> <Title>ON THE INTERPRETATION OF NATURAL LANGUAGE INSTRUCTIONS</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Problems and Proposed Solutions </SectionTitle> <Paragraph position="0"> The following conclusions can be drawn from the observations in the previous section: 1. NL action descriptions are fairly complex, including modifiers of many different types--see also \[WD90\].</Paragraph> <Paragraph position="1"> An action representation formalism must be able to deal with complex descriptions, such as carry it carefully with both hands; with descriptions at different levels of abstraction, such as go and walk to, or such as cut the square in half and cut the square in half along the diagonal in (2).</Paragraph> <Paragraph position="2"> 2. NL instructions include a wide variety of construetions, such as purpose clauses and temporal clauses. Instruction interpretation systems must be able to deal with complex imperatives and with the relations between actions that they express.</Paragraph> <Paragraph position="3"> 3. An instruction interpretation system cannot assume that the descriptions of the actions to be performed are equivalent to the logical forms computed by the parser: such logical forms have to be constrained in various ways, e.g. by computing assumptions, as in (la), or more specific action descriptions, as in (2) 2. Notice that these coustralnts derive from the interaction between the actions to be executed and the goals 21n thi~ paper we will ~ly discuss the former type of co~st.raint competition; the latter ii diJoassed in \[Di 92b\].</Paragraph> <Paragraph position="4"> the agent adopts. It is essential that this interaction is taken into account by such systems.</Paragraph> <Paragraph position="5"> Work done in the past on understanding instructions has generally concentrated on simple positive commands, and has failed to address some of the desiderata listed above: \[VB90\] limits the interaction between new and preexisting goals to inserting the new goals in the list of goals if their execution does not violate preexisting constraints, otherwise they am rejected. \[Cha91\] proposes a model of instruction interpretation which seems useful at the level of the basic skills an agent is endowed with, but in winch there is no internal structure to actions, and no distinction between the agent's actions and goals. \[AZC91\] instead does assume a rich relation between instructions and pre-existing goal(s). However, instructions are not continually integrated into the plan the agent is developing; instead they are used as a resource when the stored knowledge about plans cannot be adapted to the situation at hand.</Paragraph> <Paragraph position="6"> Turning now to our proposal, our approach to these problems includes 1. An action representation formalism based on Jackendofrs Conceptual Structures \[Jac90\].</Paragraph> <Paragraph position="7"> 2. An action KB that contains simple plans that represent common sense knowledge about actions.</Paragraph> <Paragraph position="8"> 3. A plan graph that represents the structure of the agent's intentions.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Action representation </SectionTitle> <Paragraph position="0"> We have chosen to use Jackendoff's Conceptual Structures \[Jac90\] for two reasons. First, as our point of departure is NL, there are the obvious benefits of using a linguistically motivated representational theory, e.g. easing the burden upon the parser to produce such representations \[Whi92\].</Paragraph> <Paragraph position="1"> Second, there is significant mileage to be gained from using a decompositional theory of meaning, insofar as the primitives effectively capture important generalizations. In this section we introduce the notation and some minor modifications to the theory as presented in \[Jae90\]. We use Go into the other room as a representative example.</Paragraph> <Paragraph position="2"> In Jackendoff's theory, an entity may be of ontological type Thing, Place, Path, Event, State, Manner or Property.</Paragraph> <Paragraph position="3"> The conceptual structure for a room is shown in (3a) below: (3a) \[Thins ROOM\] (3b) \[Whlns l<rrCHEs\] Square brackets indicate an entity of type Thing meeting the enclosed featural description. Small caps indicate atoms in conceptual structure, which serve as links to other systems of representation; for example, the conceptual structure for a kitchen (3b) differs from that of a Acrl~ Dr: COLING-92, NANTES, 23-28 AOt~T 1992 l 1 4 8 Paoc:. OF COL1NG-92, NANq'ES, AUG. 23-28, 1992 tO a body that generates a header. The annotations on the body specify the relations between the subactions; such relatious include partial temporal ordering, enablement, and possibly othel's.</Paragraph> <Paragraph position="4"> From the planning tradition, we retain the notions of qualifiers and effects. Qualifiers are conditions tfiat make an action relevant: for example, unplug x is relevant only if x is plugged.</Paragraph> <Paragraph position="5"> Notice the importance of using a representation such as Jackendoff's: it helps us capture the comnlou characteristics of different actions, e.g. get and carry. Tfie semantic representation for carry would also match the generic move-action template, and would add to it a qualification suclt as (10) \[M~,ne~ WITH(\[Thmg HANDS\])\] Having such a representation is also useful for computing qualifiers and effects in a systematic way: they can be precompiled from tile representation itself. For example, for every action including a component ?J such as we know tlmt after 6, j must be at 1, theretore we can include this in the effects of the action. Given the filrther restriction that j cannot be in two places at once, we may infer that j cannot be at l now, and thus precompnte the qualifier s .</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 The plan graph </SectionTitle> <Paragraph position="0"> The plan graph represents tile structure of the intentions that the agent adopts as a response to the instructions. It keeps hack of the goals the agent is pursuing, of the hierarchical relations between the goals and the actions whose execution achieves such goals, and of various relations between the actions. It also helps interpret tile instructions that follow. In (t), establishing the initial goal get the urn of coffee provides the context in which the two following instructions have to be interpreted--a similar strategy is adopted for example by \[Kau90\]. In Fig. 2, we show the complete structure built after interpreting (1).</Paragraph> <Paragraph position="1"> A node in a plan graph contains the Conceptual Structure representation of an action, augmented with the consequent state achieved alter the execution of that action 9. The arcs represent relations between actions; among them, those relevant to our example are: temporal, such as precedes in Fig. 2; enablement; generation, and its generalization substep, used when ~ belongs to a sequence of more than one action that generates 3-BJickendoff suggests something antlogous with his inference rules, which have yet to be form~lizea. degIn Fig. 2 the libels on the nodes tire only mnemonics, tad do not represent their ~eal contents.</Paragraph> <Paragraph position="2"> AI: album. IN(Iotlv~r-,x~oml))</Paragraph> <Paragraph position="4"> scribing that action--A2 in Fig. 2; if it is derived while inferring a rehdion between two actions, it is associated with the corresponding arenA1.</Paragraph> <Paragraph position="5"> The plan graph is built by an interpretation algorithm that takes as its input the logical form constructed by the p,'wser. The algorithm works by keeping track of the active nodes, which include the goal currently in focus, and the nodes just added to the tree. The topmost level of the algorithm invokes different procedure.s, according to the particular syntactic construction at hand - e.g. the construction Do c~ to do/3 will trigger the hypothesis that either generates or enables fl \[Di 92b\]. These procedures retrieve the plan(s) associated with the goal currently in focus, and then expand such plans in a hierarchical fashiou. null These procedures embody various inference processes, that can be characterized either as planning--e.g, plan expausion, subgoaling-- or as plan inference---e.g, inferring assumptions, inferring the more abstract goal some actions are supposed to achieve. Space doesn't allow as to go into further details about the algorithm or the inference proeesses; rather, in the next section we will give an example of how assumptions are computed.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Making an Assumption </SectionTitle> <Paragraph position="0"> We will now show how the assumption that the urn is to be found in the other room is made while processing (la), Go into the other room to get the urn of coffee.</Paragraph> <Paragraph position="1"> The process begins with the following representation constructed by the parser, where the FOR-function (derived from the to-phrase) encodes the contributes relation holding between the go-action ~, and the get-action B: ACRES DECOL1NG-92, NANTES, 23-28 AOt/r 1992 1 1 4 9 PROC. OF COLING-92, NANTES, AUG. 23-28. 1992 room only in its choice of constant, leaving the determination of their similarities and differences to a system of representation better suited to the task 3.</Paragraph> <Paragraph position="2"> To distinguish instances of a type, we follow \[ZV91\] in requiring every conceptual stnlcture to have an index:</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> (4) \[Thing ROOM\]I </SectionTitle> <Paragraph position="0"> Conceptual structures may also contain complex features generated by conceptual functions over other conceptual structures. For example, the conceptual function IN: Thing Place may be used to represent the location in the room as shown in (5a) below. Likewise, the traction TO: Place Path describes a path that ends in the specified place, as shown in (5b) -- (5c) is an equivalent representation of (5b), where the index 1 stands for the entire constituent4: AS there is no subject in our clause, the constituent i (pragmatically, the AGENT) in (6) is left unspecified.</Paragraph> <Paragraph position="1"> To distinguish Walk into the other room from (6), we include an indication of manncr~: \[ GO(i. m) \[Mmmer WALKING\] \] (7) L Finally, semantic fields, such as Spatial and Possessional, are intended to capture the similarities between sentencas like Jack went into the other room and 7&quot;he gift went to Bill, as shown in (8) below: (Sa) \[GOso(\[JACK\], \[TO(\[IN(\[OTIIER=ROOM\])\])\])\] (8b) \[GOpo~s(\[GWT\]. \[TO(fAT(\[BILL\])\])\])\] The idea is that verbs like go leave the semantic field underspecified, whereas verbs like donate specify a particular field. In addition to these semantic fields, we propose to add a new one called Control. It is intended to represent the functional notion of having control over some object. For example, in sports, the meanings of having the ball. keeping the ball. and getting the ball embody this notion. and are clearly quite distinct from their Spatial and Possessional counterparts; (9) represents Jack got the ball: and ot~tological types, in order to hi~en the typographical berden of rep~seafing large c.~ccpmal stmcturct Slgnodng, of course, the meaning of other for now.</Paragraph> <Paragraph position="2"> e Though tiff s is clearly intended. Jackendoff never explicitly ~presc~nt~ such a distincfiota.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 The action KB </SectionTitle> <Paragraph position="0"> The action KB contains simple plans that represent common sense knowledge about actions, and whose components are expressed in terms of Jacketldofffs semantic primitives. To discuss the characteristics of these plans, we will refer to the move-action KB entry shown in Fig. 1, which might be described as follows: go to wherej is, get control over it, then take it to 17.</Paragraph> <Paragraph position="1"> Actions have a header and a body. This terminology is reminiscent of planning operators; however we express the relations between these components in terms of enablement and generation---e.g, the body generates its header.</Paragraph> <Paragraph position="2"> The representation does not employ preconditions, because it is very difficult to draw the line between what is a precondition and what is part of the body of an action.</Paragraph> <Paragraph position="3"> One could say that having control over the object to be moved is a precondition for a move-action. However, if the object is heavy, the agent will start exerting lorce to lift it, and then carry it to the other location. It is not obvious whether the lifting action is still part of achieving the precondition, or already part of the body. Therefore, we don't have preconditions, but only actions which are sub-steps in executing another action, that is, they may belong ZThis do-it-younelf method is bet one way to move something front where it is to somewhere else. Other methods would be listed separately in the aclion KB.</Paragraph> <Paragraph position="4"> Given the presence of the to phrase, we know that a may be part of a sequence of actions that generate ft. To pursue this hypothesis, we begin by looking up fl in tile action KB. /3 matches the general move-action shown in Fig. 1 if the object to be moved j is bound to the urn of coffee: j \[URN-OF-COI~'FEE\] Next we try to match r, with some stthaction 7 of ft. a matches the iirst action 71 in /3 if we take tAT(j)\] and \[\[N(\[OTIIER-ROOM\])\] to be tile same place. This is tantamount to making the following assmnption: (11) \[BEsp(J, \[IN(\[oTItER-ROOM\])\])\] Once the instruction is understood itt this way, the two actions may be incorporated into the plan graph ,'ts shown in Fig. 2.</Paragraph> <Paragraph position="5"> One should mention that assumption (11) could of course be wrong, say if there were a note in the next room saying ha ha, it's not really in this room but the next.</Paragraph> <Paragraph position="6"> Notice that even if there is already an urn of coffee in the current room, the instraction Go into lhe other roortl lo get the urn of coffee is still understood to refer to an um in the other rcmm. This contrasts sharply with Go into the other room to wash out the urn of coffee, where the most likely urn is the currently visible one. In the current framework, this difference would be captured in the following way.</Paragraph> <Paragraph position="7"> Unlike itt the case of the get-action, the go-action matches tile following subaction of wash-out: \[GOsp(\[i, \[TO(\[AT(wA sit IN C~ -MATEttI A 1,S \]1111~. TIterelore, assumption (11) will not be derived, permitting the possibility of the urn being in the current room.</Paragraph> </Section> </Section> class="xml-element"></Paper>