File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1627_metho.xml
Size: 10,775 bytes
Last Modified: 2025-10-06 14:10:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1627"> <Title>Spatial descriptions as referring expressions in the MapTask domain</Title> <Section position="3" start_page="0" end_page="1" type="metho"> <SectionTitle> 2 Spatial descriptions as referring </SectionTitle> <Paragraph position="0"> expressions In the Map Task dialogues, a subject gives route directions to another subject, involving the production of descriptions such as 'at the left-hand side of the banana tree' and 'about three quarters up on the page, to the extreme left'. 16 Maps and 32 subjects (8 groups of 4 speakers) were used to produce 128 dialogues. The subjects were not able to see each other's maps and thus had to resort to verbal descriptions of the map contents. There are some (intentional) mismatches the subject's maps such as additional landmarks, changed names etc.</Paragraph> <Paragraph position="1"> This is a good source of spatial descriptions, for example: 'have you got gorillas? ... well, they're in the bottom left-hand corner.' Figure 1 shows one of the 'giver' maps, which, in contrast to the corresponding 'follower' map, shows the route the follower is supposed to take. A main characteristic of both giver and follower maps is the display of named landmarks. These names typically refer to the type of the landmark. With a few exceptions, for example the 'great viewpoint' in figure 1, most of the landmark names only occur once. This seems to make it difficult to use the MapTask dialogues from the perspective of GRE: the names/types rule out most distractors and there is not much left to do for a GRE algorithm. However, as can be seen in figure 1, the routes do not lead through the landmarks but rather around them along feature-less points. The subjects of the MapTask experiments therefore often refer to points on the route, for example to those where the next turn had to be taken.</Paragraph> <Paragraph position="2"> These observations can be used to frame the generation of spatial descriptions as a GRE task in which target points on the map are distinguished from all other points (the distractors). Since most points are feature-less, two of the properties commonly used in GRE, types and attributes, cannot be used in many cases. This leaves the third property, relations, which can be used by a GRE algorithm to relate the target position to surrounding landmarks on the maps. This is also what the subjects in the MapTask experiments are doing.</Paragraph> <Paragraph position="3"> Our current, on-going work addresses the generation of descriptions referring to individual points on the maps.</Paragraph> <Paragraph position="4"> Ultimately, we hope to move on to the generation of descriptions of (straight) paths encompassing start and end points, and a description of how to travel between these.</Paragraph> <Paragraph position="5"> Looking at the MapTask corpus from the perspective of GRE, we make the following observations: * The MapTask corpus consists of transcriptions of spoken language and contains many disfluencies. A spatial description can even span more than one turn, for example: 'okay ... fine move ... upwards ... an- ... so that you're to the left of the broken gate .' TURN 'and just level to the gatepost .' We expect more polished, written language as output of our generator.</Paragraph> <Paragraph position="6"> * GRE only deals with a subset of the corpus. We need to find ways of making use of the appropriate parts of the data while ignoring the other ones.</Paragraph> <Paragraph position="7"> * Spatial descriptions containing some form of vagueness seem to be frequent: 'quite close to the mountain on its right-hand side', 'just before the middle of the page', 'a bit up from the fallen pillars on the left', 'about two inches southwest'. There even seem to be rare cases of vague types: 'a sort of curve'.</Paragraph> <Paragraph position="8"> * Discourse context is of crucial importance, i.e. many spatial descriptions do not mention a particular point for the first time. Furthermore, already established information is not always given explicitly, for example 'two inches to the left [from where I am from an inside-the-map perspective ('beneath the tree') to using the physical maps as a perspective ('a couple of centimeters ... from the bottom of the page'). The latter may be caused by the lack of a grid indicating the distances in miles or kilometers.</Paragraph> <Paragraph position="9"> In sum, the MapTask corpus contains a wealth of data combined with a domain model (the maps). The challenge is to make best use of these resources. In the following sections, we report on work-in-progress on hybrid rule-based/instance-based generation of spatial referring expressions.</Paragraph> </Section> <Section position="4" start_page="1" end_page="1" type="metho"> <SectionTitle> 3 Overgenerating spatial descriptions </SectionTitle> <Paragraph position="0"> Following the inferential approach to GRE [Varges, 2004; Varges and van Deemter, 2005], we are implementing a system that finds all combinations of properties that are true of a given target referent and distinguish it from its distractors. The approach pairs the logical forms derived from a domain representation with the corresponding 'extensions', the set of objects that the logical form denotes. We represent spatial domains as grids and domain objects (landmarks) as sets of coordinates of those grids. For example, the telephone kiosk in figure 1 is represented as the set {(2, 18), (2, 19)}. The grid resolution has implications for the definition of targets, which are, in fact, target areas, i.e. they often consist of more than one coordinate.</Paragraph> <Paragraph position="1"> We implemented a number of content determination rules that recursively build up descriptions, starting from (NPs realizing) the landmarks of the domain model: 1. spatial functions: Prep NP - PP: 'above the west lake'. The spatial functions used so far are: 'above', 'below', 'to the left of', 'to the right of'.</Paragraph> <Paragraph position="2"> 2. intersection: PP and PP - PP: 'above the west lake and to the left of the great viewpoint'.</Paragraph> <Paragraph position="3"> 3. union: NP or NP - NP: 'the stile or the ruined monastery'.</Paragraph> <Paragraph position="4"> The following examples do not always refer to the map shown in figure 1.</Paragraph> <Paragraph position="6"> The descriptions generated by these rules are all associated with an extension. For example 'above X' denotes all all the points above X (this may be changed to those points that are also 'close' to X). The grid in figure 2 is a graphical depiction of an extension associated with the disjoined NPs shown on the right. All generated descriptions can be visualized in this way.</Paragraph> <Paragraph position="7"> The three content determination rules listed above are not sufficient for singling out all areas of the maps. For example, the corners of the maps are typically not 'reachable'. Therefore, we define a further rule: 4. recursive spatial functions: Prep PP - PP: 'above the points to the left of the farmer's gate'.</Paragraph> <Paragraph position="8"> We intend to generate a wide variety of realization candidates. For example, rule 4 could also produce the more fluent (and realistic) 'above the farmer's gate slightly to the left', which will also require us to make vagueness more precise. An alternative is to use non-recursive PPs like 'southeast of X'.</Paragraph> </Section> <Section position="5" start_page="1" end_page="1" type="metho"> <SectionTitle> 4 Ranking spatial descriptions </SectionTitle> <Paragraph position="0"> As candidates for ranking we use those NPs and PPs that contain at least one target coordinate, a loose definition that increases the number of candidates. We always generate NP 'the points' as a candidate which refers to all coordinates. However, it should be ranked low because it does not rule out any distractors. We currently use the ratio of extension size to described target coordinates as our first (non-empirical) ranking criterion ('e/t' below). The second (non-empirical) criterion is brevity, measured by the number of characters ('chars'). Here are some ranked output candidates for the target area labeled '#' in figure 2 (which contains two points): e/t |chars |realization |extension size -------------------------------------------------2 34 to the left of the telephone kiosk 4 2 71 to the left of the farmer's gate and to the left of the telephone kiosk 2 2 88 above the points to the left of the farmer's gate and to the left of the telephone kiosk 2 10 32 to the left of the farmer's gate 10 225 10 the points 450 The first candidate is preferred because it has the same e/t ratio as some of its competitors but in addition is also shorter than these. In fact, this is how one of the subjects refers to the starting point in one of the dialogues. The third candidate requires the use of appropriate bracketing to yield the desired reading. For example, the generator could introduce a comma after 'gate'.</Paragraph> </Section> <Section position="6" start_page="1" end_page="1" type="metho"> <SectionTitle> 5 Toward using empirical data for </SectionTitle> <Paragraph position="0"> ranking The generation rules sketched above produce non-redundant spatial descriptions, i.e. the generator is 'economical' [Stone, 2003] and follows the 'local brevity' interpretation of the Gricean Maxims [Dale and Reiter, 1995]. The candidate of least 'complexity' is the 'full brevity' solution. A word similarity-based ranker could align the generation output (i.e. the highest-ranked candidate) with previous utterances in the discourse context. To increase choice, we intend to also generate additional candidates that include a limited amount of redundant information. One could furthermore generate candidates that, by themselves, do not rule out all distractors. In contrast to the inclusion of redundant information, these candidates would only be safe to use in combination with, for example, a reliable model of discourse salience that reduces the set of possible distractors. null It is possible (but not without difficulty) to annotate parts of the corpus with map coordinates. For example, we can annotate the turn 'on the right side of the tree' with coordinates (15,9), (15,10) in figure 2. Further markup could be applied to 'redundant' information (in the GRE sense) or highlight available discourse context.</Paragraph> <Paragraph position="1"> However, for obvious reasons it is preferable to use corpus data without any additional annotation for ranking. The maps enable us to determine how much we gain from the availability of a domain model.</Paragraph> </Section> class="xml-element"></Paper>