File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1113_metho.xml
Size: 20,227 bytes
Last Modified: 2025-10-06 14:07:50
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1113"> <Title>Natural Language and Inference in a Computer Game</Title> <Section position="4" start_page="0" end_page="1" type="metho"> <SectionTitle> 3 The World Model </SectionTitle> <Paragraph position="0"> Now we will look at the way that the state of the world is represented in the game, which will be important in the language processing modules described in Sections 4 and 5. We will first give a short overview of description logic (DL) and the theorem prover we use and then discuss some aspects of the world model in more detail.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Description Logic </SectionTitle> <Paragraph position="0"> Description logic (DL) is a family of logics in the tradition of knowledge representation formalisms such as KL-ONE (Woods and Schmolze, 1992). DL is a fragment of first-order logic which only allows unary and binary predicates (concepts and roles) and only very restricted quantification. A knowledge base consists of a T-Box, which contains axioms relating the concepts and roles, and one or more A-Boxes, which state that individuals belong to certain concepts, or are related by certain roles.</Paragraph> <Paragraph position="1"> Theorem provers for description logics support a range of different reasoning tasks. Among the most common are consistency checking, subsumption checking, and instance and relation checking. Consistency checks decide whether a combination of T-Box and A-Box can be satisfied by some model, subsumption is to decide of two concepts whether all individuals that belong to one concept must necessarily belong to another, and instance and relation checking test whether an individual belongs to a certain concept and whether a certain relation holds between a pair of individuals, respectively. In addition to these basic reasoning tasks, description logic systems usually also provide some retrieval functionality which e.g. allows to compute all concepts that a given individual belongs to or all individuals that belong to a given concept.</Paragraph> <Paragraph position="2"> There is a wide range of different description logics today which add different extensions to a common core. Of course, the more expressive these extensions become, the more complex the reasoning problems are. &quot;Traditional&quot; DL systems have concentrated on very weak logics with simple reasoning tasks. In the last few years, however, new systems such as FaCT (Horrocks et al., 1999) and RACER (Haarslev and M&quot;oller, 2001) have shown that it is possible to achieve surprisingly good average-case performance for very expressive (but still decidable) logics. In this paper, we employ the RACER system, mainly because it allows for A-Box inferences.</Paragraph> </Section> <Section position="2" start_page="0" end_page="1" type="sub_section"> <SectionTitle> 3.2 The World Model </SectionTitle> <Paragraph position="0"> The T-Box we use in the game specifies the concepts and roles in the world and defines some useful complex concepts, e.g. the concept of all objects the player can see. This T-Box is shared by two different A-Boxes representing the state of the world and what the player knows about it respectively.</Paragraph> <Paragraph position="1"> The player A-Box will typically be a sub-part of the game A-Box because the player will not have explored the world completely and will therefore not have encountered all individuals or know about all of their properties. Sometimes, however, it may also be useful to deliberately hide effects of an action from the user, e.g. if pushing a button has an effect in a room that the player cannot see. In this case, the player A-Box can contain information that is inconsistent with the world A-Box.</Paragraph> <Paragraph position="2"> A fragment of the A-Box describing the state of the world is shown in Fig. 3; Fig. 4 gives a graphical representation. The T-Box specifies that the world is partitioned into three parts: rooms, objects, and players. The individual 'myself' is the only instance that we ever define of the concept 'player'. Individuals are connected to their locations (i.e. rooms, container objects, or players) via the 'has-location' role; the A-Box also specifies what kind of object an individual is (e.g. 'apple') and what properties it has ('red'). The T-Box then contains axioms such as 'apple subsetsqequal object', 'red subsetsqequal colour', etc., which establish a taxonomy among concepts.</Paragraph> <Paragraph position="3"> These definitions allow us to add axioms to the T-Box which define more complex concepts. One is the concept 'here', which contains the room in which the player currently is - that is, every individual which can be reached over a 'has-location' role from a player object.</Paragraph> <Paragraph position="4"> complex concept from a concept and a role: [?]R.C is the concept containing all individuals which are linked via an R role to some individual in C. In the example in Fig. 3, 'here' denotes the singleton set {kitchen}.</Paragraph> <Paragraph position="5"> Another useful concept is 'accessible', which contains all individuals which the player can ma-</Paragraph> <Paragraph position="7"> All objects in the same room as the player are accessible; if such an object is an open container, its contents are also accessible. The T-Box contains axioms that express that some concepts (e.g.</Paragraph> <Paragraph position="8"> 'table', 'bowl', and 'player') contain only 'open'</Paragraph> <Paragraph position="10"> objects. This permits access to the player's inventory. In the simple scenario above, 'accessible' denotes the set {myself, t1, a1, a2, b1, b2}. Finally, we can define the concept 'visible' in a similar way as 'accessible'. The definition is a bit more complex, including more individuals, and is intended to denote all individuals that the player can &quot;see&quot; from his position in the game world.</Paragraph> </Section> </Section> <Section position="5" start_page="1" end_page="1" type="metho"> <SectionTitle> 4 Referring Expressions </SectionTitle> <Paragraph position="0"> The interaction between the game and the player revolves around performing actions on objects in the game world and the effects that these actions have on the objects. This means that the resolution and generation of referring expressions, which identify those objects to the user, are central tasks in our application. null Our implementation illustrates how useful the availability of an inference system as provided by RACER to access the world model is, once such an infrastructure is available. The inference engine is complemented by a simple discourse model, which keeps track of available referents.</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 4.1 The Discourse Model </SectionTitle> <Paragraph position="0"> Our discourse model (DM) is based on Strube's (1998) salience list approach, due to its simplicity. The DM is a data structure that stores an ordered list of the most salient discourse entities according to their &quot;information status&quot; and text position and provides methods for retrieving and inserting elements. Following Strube, hearer-old discourse entities (which include definites) are ranked Remember that &quot;seeing&quot; in our application does not involve any graphical representations. The player acquires knowledges about the world only through the textual output generated by the game engine. This allows us to simplify the DL modeling of the world because we don't have to specify all (e.g. spatial) relations that would implicitly be present in a picture.</Paragraph> <Paragraph position="1"> higher in the DM (i.e. are more available for reference) than hearer-new discourse entities (including indefinites). Within these categories, elements are sorted according to their position in the currently processed sentence. For example, the ranking of discourse entities for the sentence take a banana, the red apple, and the green apple would look as follows: [red apple [?] green apple] old [?] [banana] new The DM is built incrementally and updated after each input sentence. Updating removes all discourse entities from the DM which are not realized in the current utterance. That is, there is an assumption that referents mentioned in the previous utterance are much more salient than older ones.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 4.2 Resolving Referring Expressions </SectionTitle> <Paragraph position="0"> The task of the resolution module is to map definite and indefinite noun phrases and pronouns to individuals in the world. This task is simplified in the adventure setting by the fact that the communication is situated in a sense: Players will typically only refer to objects which they can &quot;see&quot; in the virtual environment, as modeled by the concept 'visible' above. Furthermore, they should not refer to objects they haven't seen yet. Hence, we perform all RACER queries in this section on the player knowledge A-Box, avoiding unintended ambiguities when the player's expression would e.g.</Paragraph> <Paragraph position="1"> not refer uniquely with respect to the true state of the world.</Paragraph> <Paragraph position="2"> The resolution of a definite description means to find a unique entity which, according to the player's knowledge, is visible and matches the description.</Paragraph> <Paragraph position="3"> To compute such an entity, we construct a DL concept expression corresponding to the description and then send a query to RACER asking for all instances of this concept. In the case of the apple, for instance, we would retrieve all instances of the concept apple intersectionsq visible from the player A-Box. The query concept for the apple with the worm would be apple intersectionsq ([?]has-detail.worm) intersectionsq visible. If this yields only one entity ({a2} for the apple with the worm for the A-Box in Fig. 3), the reference has been unambiguous and we are done. It may, however, also be the case that more than one entity is returned; e.g. the query for the apple would return the set {a1,a2}. We will show in the next section how we deal with this kind of ambiguity. We reject input sentences with an error message indicating a failed reference if we cannot resolve an expression at all, i.e. when no object in the player knowledge matches the description.</Paragraph> <Paragraph position="4"> We resolve indefinite NPs, such as an apple,by querying the player knowledge in the same way as described above for definites. Unlike in the definite case, however, we do not require unique reference.</Paragraph> <Paragraph position="5"> Instead, we assume that the player did not have a particular object in mind and arbitrarily choose one of the possible referents. The reply of the game will automatically inform the player which one was chosen, as a unique definite reference will be generated (see below).</Paragraph> <Paragraph position="6"> Pronouns are simply resolved to the most salient entity in the DM that matches their agreement constraints. The restrictions our grammar imposes on the player input (no embeddings, no reflexive pronouns) allow us to analyze sentences including intra-sentential anaphora like take the apple and eat it. The incremental construction of the DM ensures that by the time we encounter the pronoun it, the apple has already been processed and can serve as a possible antecedent.</Paragraph> </Section> <Section position="3" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 4.3 Generating Referring Expressions </SectionTitle> <Paragraph position="0"> The converse task occurs when we generate the feedback to show to the player: It is necessary to construct descriptions of individuals in the game world that enable the player to identify these.</Paragraph> <Paragraph position="1"> This task is quite simple for objects which are new to the player. In this case, we generate an indefinite NP containing the type and (if it has one) color of the object, as in the bowl contains a red apple.</Paragraph> <Paragraph position="2"> We use RACER's retrieval functionality to extract this information from the knowledge base.</Paragraph> <Paragraph position="3"> To refer to an object that the player already has encountered, we try to construct a definite description that, given the player knowledge, uniquely identifies this object. For this purpose we use a variant of Dale and Reiter's (1995) incremental algorithm, extended to deal with relations between objects (Dale and Haddock, 1991). The properties of the target referent are looked at in some predefined order (e.g. first its type, then its color, its location, parts it may have, ...). A property is added to the description if at least one other object (a distractor) is excluded from it because it doesn't share this property. This is done until the description uniquely identifies the target referent.</Paragraph> <Paragraph position="4"> The algorithm uses RACER's reasoning and retrieval functionality to access the relevant information about the context, which included e.g. computing the properties of the target referent and finding the distracting instances. Assuming we want to refer to entity a1 in the A-Box in Fig. 3 e.g., we first have to retrieve all concepts and roles of a1 from the player A-Box. This gives us {apple(a1), red(a1), has-location(a1,b1)}.Aswehavetohaveat least one property specifying the type of a1, we use RACER's subsumption checks to extract all those properties that match this requirement; in this case, 'apple'. Then we retrieve all instances of the concept 'apple' to determine the set of distractors which is {a1, a2}. Hence, 'apple' alone is not enough to uniquely identify a1. So, we consider the apple's color. Again using subsumption checks, we filter the colors from the properties of a1 (i.e. 'red') and then retrieve all instances belonging to the concept appleintersectionsqred to check whether and how the set of distractors gets reduced by adding this property. This concept has only one member in the example, so we generate the expression the red apple.</Paragraph> </Section> </Section> <Section position="6" start_page="1" end_page="1" type="metho"> <SectionTitle> 5 Ambiguity Resolution </SectionTitle> <Paragraph position="0"> The other aspect of the game engine which we want to highlight here is how we deal with referential and syntactic ambiguity. We handle the former by a combination of inference and discourse information, and the latter by taking psycholinguistically motivated preferences into account.</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.1 Resolving Referential Ambiguities </SectionTitle> <Paragraph position="0"> When the techniques for reference resolution described in the previous section are not able to map a definite description to a single entity in the player knowledge, the resolution module returns a set of possible referents. We then try to narrow this set down in two steps.</Paragraph> <Paragraph position="1"> First, we filter out individuals which are completely unsalient according to the discourse model. In our (simplified) model, these are all individuals that haven't been mentioned in the previous sentence. This heuristic permits the game to deal with the following dialogue, as the red but not the green apple is still accessible in the final turn, and is therefore chosen as the patient of the 'eat' action.</Paragraph> <Paragraph position="2"> Game: . . . red apple . . . green apple.</Paragraph> <Paragraph position="3"> Player: Take the red apple.</Paragraph> <Paragraph position="4"> Game: You have the red apple.</Paragraph> <Paragraph position="5"> Player: Eat the apple.</Paragraph> <Paragraph position="6"> Game: You eat the red apple.</Paragraph> <Paragraph position="7"> If this narrows down the possible referents to just one, we are done. Otherwise - i.e. if several or none of the referents were mentioned in the previous sentence -, we check whether the player's knowledge rules out some of them. The rationale is that an intelligent player would not try to perform an action on an object on which she knows it cannot be performed. null Assume, by way of example, that the player knows about the worm in the green apple. This violates a precondition of the 'eat' action for apples. Thus if both apples were equally salient, we would read eat the apple as eat the red apple.We can test if a combination of referents for the various referring expressions of a sentence violates preconditions by first instantiating the appropriate action with these referents. Then we independently add each instantiated precondition to fresh copies of the player knowledge A-Box and test them for consistency. If one of the A-Boxes becomes inconsistent, we conclude that the player knows this precondition would fail, and conclude that this is not the intended combination of referents.</Paragraph> <Paragraph position="8"> If neither of these heuristics manages to pick out a unique entity, we consider the definite description to be truly ambiguous and return an error message to the user, indicating the ambiguity.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.2 Resolving Syntactic Ambiguities </SectionTitle> <Paragraph position="0"> Another class of ambiguities which we consider are syntactic ambiguities, especially of PP attachment.</Paragraph> <Paragraph position="1"> We try to resolve them, too, by taking referential information into account.</Paragraph> <Paragraph position="2"> In the simplest case, the referring expressions in some of the syntactic readings have no possible referent in the player A-Box at all. If this happens, we filter these readings out and only continue with the others (Schuler, 2001). For example, the sentence unlock the toolbox with the key is ambiguous. In a scenario where there is a toolbox and a key, but the key is not attached to the toolbox, resolution fails for one of the analyses and thereby resolves the syntactic ambiguity.</Paragraph> <Paragraph position="3"> If more than one syntactic reading survives this first test, we perform the same computations as above to filter out possible referents which are either unsalient or violate the player's knowledge. Sometimes, only one syntactic reading will have a referent in this narrower sense; in this case, we are done. Otherwise, i.e. if more than one syntactic reading has referents, we remove those readings which are referentially ambiguous. Consider once more the example scenario depicted in Fig. 4. The sentence put the apple in the bowl on the table has two different syntactic analyses: In the first, the bowl on the table is the target of the put action whereas in the second, in the bowl modifies the apple. Now, note that in the first reading, we will get two possible referents for the apple, whereas in the second reading the apple in the bowl is unique. In cases like this we pick out the reading which only includes unique references (reading 2 in the present example). This approach assumes that the players are cooperative and try to refer unambiguously. It is furthermore similar to what people seem to do. Psycholinguistic eye-tracking studies (Chambers et al., 2000) indicate that people prefer interpretations with unambiguous references: subjects who are faced with scenarios similar to Fig. 4 and hear the sentence put the apple in the bowl on the table do not look at the bowl on the table at all but only at the apple in the bowl (which is unique) and the table.</Paragraph> <Paragraph position="4"> At this point, there can still be more than one syntactic reading left; if so, all of these will have unambiguous, unique referents. In such a case we cannot decide which syntactic reading the player meant, and ask the player to give the game a less ambiguous command.</Paragraph> </Section> </Section> class="xml-element"></Paper>