File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/j03-1003_metho.xml

Size: 51,696 bytes

Last Modified: 2025-10-06 14:08:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="J03-1003">
  <Title>c(c) 2003 Association for Computational Linguistics Graph-Based Generation of Referring Expressions</Title>
  <Section position="3" start_page="54" end_page="61" type="metho">
    <SectionTitle>
2. Generating Referring Expressions
</SectionTitle>
    <Paragraph position="0"> There are many different algorithms for the generation of referring expressions, each with its own objectives: Some aim at producing the shortest possible description (e.g., Dale's [1992] full brevity algorithm), others focus on psychological realism (e.g., Dale and Reiter's [1995] incremental algorithm) or realistic output (e.g., Horacek 1997). The degree of detail in which the various algorithms are described differs considerably.</Paragraph>
    <Paragraph position="1"> Some algorithms are fully formalized and come with explicit characterizations of their complexity (e.g., Dale and Reiter 1995; van Deemter 2000); others are more conceptual and concentrate on exploring new directions (e.g., Stone and Webber 1998). Despite such differences, most algorithms deal with the same problem definition. They take as input a single object v (the target object) for which a referring expression is to be generated and a set of objects (the distractors) from which the target object needs to be distinguished (we use the terminology from Dale and Reiter [1995]). The task of the algorithm is to determine which set of properties is needed to single out the target object v from the distractors. This is known as the content determination problem for referring expressions. On the basis of this set of properties a distinguishing description for v can be generated. Most algorithms do not address the surface realization problem (how the selected properties should be realized in natural language) in much detail; it is usually assumed that once the content for a referring expression has been determined, a standard realizer such as KPML (Bateman 1997) or SURGE (Elhaded and Robin 1997) can convert the meaning representation to natural language.</Paragraph>
    <Paragraph position="2"> Consider the example scene in Figure 1. In this scene, as in any other scene, we see a finite domain of entities D with properties P. In this particular scene, D =  } is the set of entities and P = { dog, cat, brown, black+white, large, small } is the set of properties. A scene is usually represented as a database (or knowledge base) listing the properties of each element in D. Thus:  Figure 1 A simple example scene consisting of some domestic animals.</Paragraph>
    <Paragraph position="3"> In what is probably the key reference on the topic, Dale and Reiter (1995) describe and discuss a number of algorithms for the generation of referring expressions. One of these is the full brevity algorithm (originally due to Dale 1992). This algorithm first tries to generate a distinguishing description for the target object v using one single property. If this fails, it considers all possible combinations of two properties to see if any of these suffices for the generation of a distinguishing description, and so on. It is readily seen that this algorithm will output the shortest possible description, if one exists. Suppose the full brevity algorithm is used to generate a description for d  }. But when considering all pairs of properties the algorithm will find that one such pair rules out all distractors, namely, small and dog; &amp;quot;the small dog&amp;quot; is a successful and minimal distinguishing description for d  .</Paragraph>
    <Paragraph position="4"> Dale and Reiter point out that the full brevity algorithm is both computationally infeasible (NP hard) and psychologically unrealistic. They offer the incremental algorithm as an alternative. The incremental algorithm considers properties for selection in a predetermined order, based on the idea that human speakers and listeners prefer certain kinds of properties (or attributes) when describing objects from a given domain. For instance, when discussing domestic animals, it seems likely that a human speaker would first describe an animal by its type (is it a dog? is it a cat?). If that does not suffice, first absolute attributes like color are tried, followed by relative ones such as size. In sum: The list of preferred attributes for our example domain would be &lt; type, color, size &gt; . Essentially, the incremental algorithm iterates through this list, and for each property it encounters, it determines whether adding this property to the properties selected so far would rule out any of the remaining distractors. If so, it is included in the list of selected properties. There is one exception to this general strategy: Type information is always included, even if it rules out no distractors. The algorithm stops when all distractors are ruled out (success) or when the end of the list of preferred attributes is reached (failure).</Paragraph>
    <Paragraph position="5"> Suppose we apply the incremental algorithm to d  from Figure 1 with &lt; type, color, size &gt; as preferred attributes. The type of d  listed in the database is dog. This property is selected (since type information is always selected). It rules out d  (which is a cat).</Paragraph>
    <Paragraph position="6"> Next we consider the color of d  ; the animal is brown. This property rules out d  (which is a black and white dog) and is selected. Finally, we consider the size of our target object, which is small. This properly rules out the remaining distractor d  (which is a large brown dog) and hence is included as well. At this point, all distractors are ruled out (success!), and the set of selected properties is {dog, brown, small}, which a linguistic realizer might express as &amp;quot;the small brown dog.&amp;quot; This is a successful distinguishing  Another scene: Two dogs and two doghouses (from Krahmer and Theune [2002]). description but not a minimal one: The property brown is, strictly speaking, made redundant by the later inclusion of the property small. Since there is no backtracking in the incremental algorithm however, every selected property is realized (hence &amp;quot;incremental&amp;quot;). This aspect is largely responsible for the computational efficiency of the algorithm (it has a polynomial complexity), but Dale and Reiter (1995, page 248) also claim that it is &amp;quot;psychologically realistic.&amp;quot; They point out that sometimes people may describe an object as &amp;quot;the white bird&amp;quot; even though the simpler &amp;quot;the bird&amp;quot; would have been sufficient (cf. Pechmann [1989]; see, however, Krahmer and Theune [2002] for discussion).</Paragraph>
    <Paragraph position="7"> Even though there are various useful and interesting algorithms for the generation of referring expressions, a number of open questions remain. Recently there has been an increased interest in statistical approaches to natural language generation. For example, Malouf (2000) has shown that large corpora can be used to determine the order of realization of sequences of prenominal adjectives. It is unclear how such statistical work on generation can be combined with older, rule-based work such as the algorithms just discussed. In addition, many algorithms still have difficulties with the generation of relational descriptions (descriptions that include references to other objects to single out the target object from its distractors). To illustrate the problem, consider the scene depicted in Figure 2. In this scene we again see a finite domain of entities D with certain properties P. Here, D = {d  } is the set of entities, and P = { dog, doghouse, small, large, brown, white } is the set of properties. Clearly no algorithm can generate a distinguishing description referring to d  on this basis.</Paragraph>
    <Paragraph position="8"> Intuitively, d  can be distinguished from d  only using its relation to the doghouse d  .</Paragraph>
    <Paragraph position="9"> To facilitate this we extend the scene description with a set of relations R = { left of, right of, contain, in }.</Paragraph>
    <Paragraph position="10"> A few algorithms have been developed that address the issue of relational descriptions. The earliest is from Dale and Haddock (1992), who offer an extension of the full brevity algorithm. The Dale and Haddock algorithm has a problem with infinite recursions; it may produce descriptions like &amp;quot;the dog in the doghouse that contains a dog that is inside a doghouse....&amp;quot; Dale and Haddock, somewhat ad hoc, solve this problem by stipulating that a property or relation may be used only once. Krahmer and Theune (2002) (see also Theune [2000]) describe an extension of the incremental algorithm that allows for relational descriptions. Their extension suffers from what may be called the problem of forced incrementality: When a first relation fails to rule out all remaining distractors, additional relations will be tried incrementally. Although it could be argued that incremental selection of properties is psychologically plausible, it seems less plausible for relations. It is unlikely that someone would describe an  object as &amp;quot;the dog next to the tree in front of the garage&amp;quot; in a situation in which &amp;quot;the dog in front of the garage&amp;quot; would suffice. As we shall argue, the graph perspective provides a clean solution for these problems.</Paragraph>
    <Paragraph position="11"> 3. Generating Referring Expressions Using Graphs  In the previous section we saw that a scene can be described in terms of a domain of entities D with properties P and relations R. Such a scene can be represented as a labeled directed graph (see, e.g., Wilson [1996] for a gentle introduction or Berge [1985] for a more specialized one). Let L = P [?] R be the set of labels with P and R disjoint (i.e., P [?] R = [?]). Then G = &lt;V</Paragraph>
    <Paragraph position="13"> is the set of labeled directed edges (or arcs). Where this can be done without creating confusion, the graph subscript is omitted. Throughout this article we use the following notations. If G = &lt;V, E&gt; is a graph and e = &lt;v, l, w&gt; an edge (with l [?] L), then the extension of G with e, denoted as G + e, is the graph &lt;V [?]{v, w}, E [?]{e}&gt; . Moreover, with E G (v, w) we refer to the set of edges in E  The scene given in Figure 2, for example, can now be represented by the graph in Figure 3. This graph models the respective spatial relations between the two chihuahuas, between the two doghouses, and between each dog and the nearest doghouse. For the sake of transparency we have not modeled the relations between the dogs and the distant doghouses (i.e., between d  and d  and between d  and d  ). (It is worth stressing that adding these edges would not result in different outcomes in the discussion below). Note that properties (such as dog) are always modeled as loops,  Some graphs for referring expressions, with circles around the intended referent. that is, as edges that start and end in the same vertex. Relations may have different start and end vertices, but they do not have to (consider potentially reflexive relations such as shave). Finally, note that the graph sometimes contains properties of various levels of specificity (e.g., chihuahua and dog). This aspect of scene graphs will be further discussed in Section 5.</Paragraph>
    <Paragraph position="14"> Now the content determination problem for referring expressions can be formulated as a graph construction problem. To decide which information to include in a referring expression for an object v [?] V, we construct a connected directed labeled graph over the set of labels L and an arbitrary set of vertices, but including v. A graph is connected iff there is a path (a list of vertices in which each vertex has an edge from itself to the next vertex) between each pair of vertices. Informally, we say that a vertex (&amp;quot;the intended referent&amp;quot;) from a graph H refers to a given entity in the scene graph G iff the graph H can be &amp;quot;placed over&amp;quot; the scene graph G in such a way that the vertex being referred to is &amp;quot;placed over&amp;quot; the vertex of the given entity in G and each edge from H with label l can be &amp;quot;placed over&amp;quot; an edge from G with the same label. Furthermore, a vertex-graph pair is distinguishing iff it refers to exactly one vertex in the scene graph.</Paragraph>
    <Paragraph position="15"> Consider the three vertex-graph pairs in Figure 4, in which circled vertices stand for the intended referent. Graph (i) refers to all vertices of the graph in Figure 3 (every object in the scene is next to some other object), graph (ii) can refer to both d  . Note that the graphs might be realized as something next to something else, a chihuahua, and the dog in the doghouse, respectively. Here we concentrate on the generation of distinguishing vertex-graph pairs.</Paragraph>
    <Paragraph position="16"> Formally, the notion that a graph H = &lt;V</Paragraph>
    <Paragraph position="18"> In words: The bijective function p maps all the vertices in H to corresponding vertices in G prime in such a way that any edge with label l between vertices v and w in H is matched with an edge with the same label between the G prime counterparts of v and w (i.e., p.v and p.w, respectively). When H is isomorphic to some subgraph of G by an isomorphism p, we write H subsetsqequal p G.</Paragraph>
    <Paragraph position="19"> Given a graph H and a vertex v in H, and a graph G and a vertex w in G,we define that the pair (v, H) refers to the pair (w, G) iff H is connected and H subsetsqequal p G and p.v = w. Furthermore, (v, H) uniquely refers to (w, G) (i.e., (v, H) is distinguishing) iff (v, H) refers to (w, G) and there is no vertex w prime in G different from w such that (v, H) refers to (w prime , G). The problem considered in this article can now be formalized as follows: Given a graph G and a vertex w in G, find a pair (v, H) such that (v, H) uniquely refers to (w, G).</Paragraph>
    <Paragraph position="20"> Consider, for instance, the task of finding a pair (v, H) that uniquely refers to the vertex labeled d  in Figure 3. It is easily seen that there are a number of such pairs, three of which are depicted in Figure 5. We would like to have a mechanism that allows us to give certain solutions to this kind of task preference over other solutions. For this purpose we shall use cost functions. In general, a cost function is a function that assigns to each subgraph of a scene graph a non-negative number. As we shall see, by defining cost functions in different ways, we can mimic various algorithms for the generation of referring expressions known from the literature.</Paragraph>
    <Section position="1" start_page="60" end_page="61" type="sub_section">
      <SectionTitle>
3.1 A Note on the Problem Complexity
</SectionTitle>
      <Paragraph position="0"> The basic decision problem for subgraph isomorphism (i.e., testing whether a graph H is isomorphic to a subgraph of G) is known to be NP-complete (see, e.g., Garey and Johnson [1979]). Here we are interested in connected H, but unfortunately that restriction does not reduce the theoretical complexity. Note that this characterization of the worst-case complexity holds for graphs in which all edges have the same label; in that case each edge from H can potentially be matched to any edge from G. The best-case complexity is given when each edge is uniquely labeled. In practice, the situation will most often be somewhere between these extremes. In general, we can say that the more diverse the labeling of edges in the graph of a particular scene is, the sooner a distinguishing vertex-graph pair will be found.</Paragraph>
      <Paragraph position="1"> It is worth pointing out that there are various alternatives to full subgraph isomorphism that have a lower complexity. For instance, as soon as an upper bound K is defined on the number of edges in a distinguishing graph, the problem loses its intractability (for relatively small K) and becomes solvable, in the worst case, in</Paragraph>
      <Paragraph position="3"> ) time, where n is number of edges in the graph G. Restricting the problem in such a way is rather harmless for our current purposes, as it prohibits the generation only of distinguishing descriptions with more than K properties, and for all practical purposes K can be small (referring expressions usually express a limited number of properties).</Paragraph>
      <Paragraph position="4"> Defining an upper bound K, however, does have a disadvantage: We lose completeness (see van Deemter [2002]). In particular, the algorithm will fail for objects that can be uniquely described only with K + 1 (or more) edges. Of course, one could argue that in such cases objects should be distinguished using other means (e.g., by pointing). Nevertheless, it is worthwhile to look for classes of graphs for which the subgraph isomorphism problem can be solved more efficiently, without postulating upper bounds. For instance, if G and H are planar (simple) graphs the problem can be solved in time linear in the number of vertices of G (Eppstein 1999). Basically, a planar graph is one that can be drawn on a plane in such a way that there are no crossing edges (thus, for instance, the graph in Figure 3 is planar, as is any graph with only four vertices). In general, there is no a priori reason to assume that our scene representations will be planar. Yet every nonplanar graph can be modified into a closely related planar one. We briefly address planarization of scene graphs in the Appendix.</Paragraph>
      <Paragraph position="5"> A final alternative is worth mentioning. The general approach to the problem of subgraph isomorphism detection assumes that both graphs are given on-line. For our current purposes, however, it may happen that the scene graph is fixed and known beforehand, and only the referring graph is unknown and given on-line. Messmer and Bunke (1995, 1998) describe a method that converts the known graph (or model graph, as they call it) into a decision tree. At run time, the input graph is classified by the decision tree, which detects subgraph isomorphisms. The disadvantage of this approach is that the decision tree may contain, in the worst case, an exponential number of nodes. But the main advantage is that the complexity of the new subgraph isomorphism algorithm is only quadratic in the number of vertices of the input referring graph. Note that with this approach we do not lose information from the scene graph, nor do we lose completeness.</Paragraph>
      <Paragraph position="6"> In sum, the basic approach to subgraph isomorphisms is NP-complete, but there exist various reformulations of the problem that can be solved more efficiently. Deciding which (combination) of these is the most suitable in practice, however, is beyond the scope of this article. Finally, it is worth stressing that the NP-completeness is due to the presence of edges representing relations between different vertices. If we re- null Krahmer, van Erk, and Verleg Graph-Based Generation strict the approach to properties (looping edges), testing for subgraph isomorphisms becomes trivial.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="61" end_page="68" type="metho">
    <SectionTitle>
4. A Sketch of a Branch and Bound Generation Algorithm
</SectionTitle>
    <Paragraph position="0"> In this section we give a high-level sketch of the graph-based generation algorithm.</Paragraph>
    <Paragraph position="1"> The algorithm (called makeReferringExpression) consists of two main components, a subgraph construction algorithm (called findGraph) and a subgraph isomorphism testing algorithm (called matchGraphs). For expository reasons we do not address optimization strategies (but see Section 6).</Paragraph>
    <Section position="1" start_page="61" end_page="61" type="sub_section">
      <SectionTitle>
4.1 The Basic Idea
</SectionTitle>
      <Paragraph position="0"> We assume that a scene graph G = &lt;V</Paragraph>
      <Paragraph position="2"> &gt; is given. The algorithm systematically tries all relevant subgraphs H of the scene graph G by starting with the subgraph containing only the vertex v (the target object) and expanding it recursively by trying to add edges from G that are adjacent to the subgraph H constructed up to that point. In this way we know that the results will be a connected subgraph. We refer to this set of adjacent edges as the H neighbors in G (denoted as G.neighbors(H)). Formally:</Paragraph>
      <Paragraph position="4"> The algorithm returns the cheapest (least expensive) distinguishing subgraph H that refers to v, if such a distinguishing graph exists; otherwise it returns the undefined null graph [?].</Paragraph>
    </Section>
    <Section position="2" start_page="61" end_page="61" type="sub_section">
      <SectionTitle>
4.2 Cost Functions
</SectionTitle>
      <Paragraph position="0"> We use cost functions to guide the search process and to give preference to some solutions over others. If H = &lt;V</Paragraph>
      <Paragraph position="2"> &gt; is a subgraph of G, then the costs of H, denoted as cost(H), can be given by summing over the costs associated with the vertices and edges of H. Formally:</Paragraph>
      <Paragraph position="4"> In fact, this is only one possible way to define a cost function. The only hard requirement cost functions have to fulfill is monotonicity. That is, adding an edge e to a graph G should never result in a graph cheaper than G. Formally:</Paragraph>
      <Paragraph position="6"> The monotonicity assumption helps reduce the search space, since extensions of sub-graphs with a cost greater than the best subgraph found up to that point can safely be ignored. Naturally, the costs of the undefined graph (cost([?])) are not defined. It is worth stressing that the cost function is global: It determines the costs of entire graphs.</Paragraph>
      <Paragraph position="7"> This implies that the cheapest distinguishing graph is not necessarily the smallest distinguishing graph; a graph consisting of two or more edges may be cheaper than a graph containing one expensive edge.</Paragraph>
    </Section>
    <Section position="3" start_page="61" end_page="63" type="sub_section">
      <SectionTitle>
4.3 Worked Example
</SectionTitle>
      <Paragraph position="0"> We now illustrate the algorithm with an example. Suppose the scene graph G is as given in Figure 3 and that we want to generate a referring expression for object d</Paragraph>
      <Paragraph position="2"> Sketch of the main function (makeReferringExpression) and the subgraph construction function (findGraph).</Paragraph>
      <Paragraph position="3"> in this graph. Let us assume for the sake of illustration that the cost function is defined in such a way that adding a vertex or an edge always costs one point. Thus,</Paragraph>
      <Paragraph position="5"> and for each e [?] E H : cost(v)=cost(e)=1. (In the next section we describe a number of more interesting cost functions and discuss the impact these have on the output of the algorithm.) We call the function makeReferringExpression (given in Figure 6) with d  as parameter. In this function the variable bestGraph (for the best solution found up to that point) is initialized as the null graph (there is no best solution yet), and the variable H (for the distinguishing subgraph under construction) is initialized as the graph containing only vertex d</Paragraph>
      <Paragraph position="7"> the function findGraph (see also Figure 6) is called, with parameters d</Paragraph>
      <Paragraph position="9"> and H.</Paragraph>
      <Paragraph position="10"> To begin with, whether a first non-null bestGraph has been found is checked and, if one has, whether the costs of H (the graph under construction) are higher than the costs of the bestGraph found up to that point. If the costs of H are higher, it is not worth extending H since, due to the monotonicity constraint, it will never end up being cheaper than the current bestGraph. During the first iteration we have no non-null bestGraph, so we continue. Next the set of distractors is calculated. In terms of the graph perspective, this is the set of vertices in the scene graph G (other than the target vertex v) to which the graph H refers. It is easily seen that the initial value of H refers to every vertex in G. Hence, as one would expect, the initial set of distrac- null }. Then the current set of distractors is checked to determine whether it is empty. If it is, we have managed to find a distinguishing graph, which is subsequently stored in the variable bestGraph. In the first iteration, this is obviously not the case, and we continue, recursively trying to extend H by adding  Three values for H in the generation process for d</Paragraph>
      <Paragraph position="12"> adjacent (neighboring) edges until either a distinguishing graph has been constructed (all distractors are ruled out) or the costs of H exceed the costs of the bestGraph found so far.</Paragraph>
      <Paragraph position="13"> While bestGraph is still the null graph, the algorithm continues until H is a distinguishing graph. Which is the first distinguishing graph to be found (if more than one exists) depends on the order in which the adjacent edges are tried (see also Section 5.1). Suppose for the sake of argument that the first distinguishing graph to be found is (ii) in Figure 7. This graph is returned and stored in bestGraph. The costs associated with this graph are five points (two vertices and three edges). At this stage in the generation process only graphs with costs lower than five points are worth investigating. In fact, there are only a few distinguishing graphs that cost less than this. After a number of iterations the algorithm will find the cheapest solution (given this particular, simple definition of the cost function), which is (iii) in Figure 7. That this distinguishing graph does not include type information does not necessarily mean that such information should not be realized. It means only that type information is, strictly speaking, not necessary to distinguish the intended referent from the distractors. We return to this issue in Section 5.2.</Paragraph>
    </Section>
    <Section position="4" start_page="63" end_page="64" type="sub_section">
      <SectionTitle>
4.4 Subgraph Isomorphism Testing
</SectionTitle>
      <Paragraph position="0"> Figure 8 sketches the part of the algorithm that tests for subgraph isomorphism, match-Graphs. This function is called each time the distractor set is calculated. It tests whether the pair (v, H) can refer to (w, G), or put differently, it checks whether there exists an isomorphism p such that H subsetsqequal p G with p.v = w. The function matchGraphs first determines whether the looping edges starting from vertex v match those of w.IfE</Paragraph>
      <Paragraph position="2"> is not a subset of E G (w, w) (e.g., v is a dog and w is a doghouse), we can immediately discard the matching. Otherwise we start with the matching p.v = w and try to expand it recursively. At each recursion step a fresh and as yet unmatched vertex y from V H is selected that is adjacent to one of the vertices in the current domain of p (notated dom(p)). For each y we calculate the set Z of possible vertices in G to which y can be  Computational Linguistics Volume 29, Number 1</Paragraph>
      <Paragraph position="4"> if z is a valid extension of the mapping then if matchHelper(p [?]{y mapsto- z}, Y, G, H) then return true</Paragraph>
      <Paragraph position="6"> Sketch of the function testing for subgraph isomorphism (matchGraphs).</Paragraph>
      <Paragraph position="7"> matched. This set consists of all the vertices in G that have the same looping edges as y and the same edges to and from other vertices in the domain of the current matching function p. Formally:</Paragraph>
      <Paragraph position="9"> (The H.neighbors(y) are the vertices in H that are adjacent to y, that is, those vertices that are connected to y by an edge.) The matching can now possibly be extended with p.y = z, for each z [?] Z. The algorithm then branches over all these possibilities. Once a mapping p has been found that has exactly as many elements as H has vertices, we have found a subgraph isomorphism. If there are still unmatched vertices in H, or if all possible extensions with vertex y have been checked and no matching can be found, the test for subgraph isomorphism has failed.</Paragraph>
    </Section>
    <Section position="5" start_page="64" end_page="65" type="sub_section">
      <SectionTitle>
4.5 A Note on the Implementation
</SectionTitle>
      <Paragraph position="0"> The algorithm outlined in Figures 6 and 8 has been implemented in Java 2 (J2SE, version 1.4). The implemented version of the algorithm is actually more efficient than the sketch suggests, because various calculations need not be repeated in each iteration (the set of distractors and the set G.neighbors(H), for example). In addition, the user has the possibility of specifying the cost function in whatever way he or she sees fit.</Paragraph>
      <Paragraph position="1"> A full-fledged performance analysis of the current implementation is beyond the scope of this article. Such an analysis would be complicated by the fact that there  Krahmer, van Erk, and Verleg Graph-Based Generation Table 1 Average times needed to find the cheapest distinguishing referring graphs for objects in seven test scene graphs of increasing complexity.</Paragraph>
      <Paragraph position="2">  are many kinds of graphs (dense, sparse, (un)connected, etc.), and the performance results will vary with the properties of the scene graph. If the scene graph is fully connected, finding distinguishing graphs is much harder then when it is fully unconnected. Nevertheless, to give the reader some insight into the performance of the implementation, we applied it to seven test scene graphs of a specific form. The first is our running example: the graph in Figure 3. The other six are obtained by scaling up this graph and permutating (plus, if necessary, adding) properties to make sure that each object can be uniquely described. This implies that only the first test graph is fully connected. The other graphs consist of n  connected subgraphs, where n is the number of vertices in the graph. Thus, the bigger graphs are relatively less complex than the smaller ones. Each graph consists of 4n looping edges (representing properties) and 3n nonlooping edges (representing spatial relations), again with n the number of vertices.</Paragraph>
      <Paragraph position="3"> The test was performed on a Windows 2000 PC with a 900 mHz AMD Athlon Processor and 128 Mb RAM. The results are given in Table 1. We measured the system time right before and right after the call of the main function makeReferringExpression. We computed the differences between these two times for a number of target objects from the scene graphs (4 and 8 objects for the first two graphs, 16 for the remaining ones). Note that this measurement does not include the time Java requires for initialization or background activities such as garbage collection. Table 1 shows that even for the larger graphs, the program is able to find minimal distinguishing graphs relatively quickly. The current implementation is a straightforward one (see also Section 6), so optimization, possibly in combination with heuristics, is likely to show further improvements in performance.</Paragraph>
      <Paragraph position="4">  5. Cost Functions and Search Strategies</Paragraph>
    </Section>
    <Section position="6" start_page="65" end_page="66" type="sub_section">
      <SectionTitle>
5.1 Full (Relational) Brevity Algorithm and Greedy Heuristics
</SectionTitle>
      <Paragraph position="0"> The algorithm described in the previous section (with the uniform cost function assigning one point to each edge and vertex) can be seen as a generalization of Dale's (1992) full brevity algorithm, in the sense that there is a guarantee that the algorithm will output the shortest possible description, if one exists. It is also an extension of the full brevity algorithm, since it allows for relational descriptions. In this respect it is comparable to the Dale and Haddock (1991) algorithm, granted that here the problems with infinite recursions do not arise, since a particular edge is either present in a graph or not. Moreover, the approach is fully general and applies to n-ary relations (and relation/property combinations) as well.</Paragraph>
      <Paragraph position="1">  Computational Linguistics Volume 29, Number 1 It is worth noting that Dale's (1992) greedy heuristic algorithm (also discussed in Dale and Reiter [1995]) can be cast in the graph framework as well. In fact, this would give us a handle on the order in which different adjacent edges could be tried. The edges associated with the intended referent should be sorted on their descriptive power, which is inversely proportional to the number of occurrences of that particular edge in the scene graph. The algorithm then adds the most discriminating edge (i.e., the one removing most distractors) first. If there are various equally distinguishing edges, the cheapest one is added. This process is then repeated until a distinguishing graph is found. In fact, such a greedy strategy could be used to produce a first nonminimal distinguishing graph. Subsequently we could call findGraph with this graph as initial value of bestGraph, instead of the null graph [?]. In this way, we would be able to find minimal graphs more efficiently.</Paragraph>
    </Section>
    <Section position="7" start_page="66" end_page="67" type="sub_section">
      <SectionTitle>
5.2 Incremental Algorithm
</SectionTitle>
      <Paragraph position="0"> The characteristic properties of Dale and Reiter's (1995) incremental algorithm can be incorporated into the graph framework as follows. First, the list of preferred attributes can be modeled in terms of the cost function: All type edges should be cheaper than all other edges. In fact, (basic level, see below) type edges could be free. Moreover, the edges corresponding to absolute properties (color) should cost less than those corresponding to relative ones (size). This gives us exactly the effect of having a list of preferred attributes &lt; type, color, size &gt; . It also implies that the type of an object is always included if it is in any way distinguishing. That by itself does not guarantee that type is always included. The incremental nature of the incremental algorithm can be obtained by ordering edges with respect to their costs. Now the cheapest edges (i.e, those expressing type information) should be tried first, and more expensive edges should be tried later. In addition, the algorithm should terminate as soon as it has found a distinguishing graph. This would guarantee that bargain type loops are always included, and the algorithm would output (iii) from Figure 4 instead of (iii) from  from Figure 3.</Paragraph>
      <Paragraph position="1"> Another characteristic property of the original incremental algorithm (not discussed in section 2) is the use of a subsumption hierarchy. A chihuahua, for instance, can be referred to as either a chihuahua or a dog. The latter has a special status and is called the basic level value (see, e.g., Rosch [1978]). According to Dale and Reiter (1995) and Reiter (1990), human speakers have a general preference for basic level values and move to more specific (subsumed) values only if these are more informative.</Paragraph>
      <Paragraph position="2"> This notion of a subsumption hierarchy can be modeled using the cost function. For a given attribute, the basic level edges should be assigned the lowest costs, and those farthest away from the basic level edge should have the highest costs. This implies that adding an edge labeled dog is cheaper than adding an edge labeled chihuahua.</Paragraph>
      <Paragraph position="3"> Hence a chihuahua edge will be selected only when there are fewer (or less expensive) additional edges required to construct a distinguishing graph than would be the case for a graph including a dog edge. Note that (assuming that the scene representation is well defined) a distinguishing graph can never contain both a dog and a chihuahua edge, since there will always be a cheaper distinguishing graph omitting one of the two edges.</Paragraph>
      <Paragraph position="4"> In sum, we can recast the incremental algorithm quite easily in terms of graphs.</Paragraph>
      <Paragraph position="5"> The original incremental algorithm operates only on properties (looped edges in graph terminology). Recall that when all edges in a scene graph are of the looping variety, testing for subgraph isomorphism becomes trivial. The graph-theoretical reformulation of the incremental algorithm does not fully exploit the possibilities offered by the graph framework and the use of cost functions. First, from the graph-theoretical perspective,  Krahmer, van Erk, and Verleg Graph-Based Generation the generation of relational descriptions poses no problems. Note that the use of a cost function to simulate subsumption hierarchies for properties carries over directly to relations; for instance, the costs of adding an edge labeled next to should be less than those of adding one labeled left of or right of. Hence, next to will be preferred, unless using left of or right of requires fewer (or less expensive) additional edges for the construction of a distinguishing graph. Another advantage of the way the graph-based algorithm models the list of preferred attributes is that finer-grained distinctions can be made than can with the incremental algorithm. In particular, we are not forced to say that values of the attribute type are always cheaper than values of the attribute color. Instead of assigning costs to attributes, we can assign costs to values of attributes. This gives us the freedom to assign edges labeled with a common color value (e.g., brown) a lower cost than edges labeled with obscure type values, such as Polish owczarek nizinny sheepdog. This implies that it will be cheaper to construct a distinguishing graph referring to an object using two cheap edges (the brown dog) than with one particularly expensive edge (the Polish owczarek nizinny sheepdog).</Paragraph>
    </Section>
    <Section position="8" start_page="67" end_page="68" type="sub_section">
      <SectionTitle>
5.3 Aspects of Other Algorithms
</SectionTitle>
      <Paragraph position="0"> Various aspects of other algorithms can be captured in the graph-based algorithm as well. To further illustrate the flexibility of the graph perspective, we briefly discuss two such aspects.</Paragraph>
      <Paragraph position="1">  scriptions (such as the dogs) can be modeled using graphs in the following way. Van Deemter's algorithm takes as input a set of target objects, which, in our case, translates into a set of vertices W from the scene graph (W [?] V G ). Now the algorithm tries to generate a vertex-graph pair (v, H) that uniquely refers to (W, G). The definition of &amp;quot;uniquely referring graphs&amp;quot; has to be generalized slightly to accommodate plurals. The constructed subgraph should refer to each of the vertices in the set W, but not to any of the vertices in the scene graph outside this set. Formally, (v, H) uniquely refers to (W, G) iff H is connected, and for each w [?] W there is a bijection p such that H subsetsqequal p G, with p.v = w and there is no w prime [?] G\W such that (v, H) refers to (w prime , G). Observe that the singular case defined in Section 4 is obtained by restricting W to singleton sets. In this way, the basic algorithm can generate both singular and plural distinguishing descriptions.</Paragraph>
      <Paragraph position="2">  linguistically salient and hence can often be referred to using fewer properties; an animal that is first described as &amp;quot;the large black dog with the hanging ears&amp;quot; may subsequently be referred to using an anaphoric description such as &amp;quot;the dog.&amp;quot; Krahmer and Theune (2002) model this phenomenon by assigning salience weights (sws) to objects. For this purpose they use a version of centering theory (Grosz, Joshi, and Weinstein 1995) augmented with a recency effect essentially due to HajiVcov'a (1993). Krahmer and Theune then define the set of distractors as the set of objects with a salience weight higher than or equal to that of the target object. In terms of the graph-theoretical framework, this would go as follows. First, we assign salience weights to the vertices in the scene graph using the salience weights definition proposed by Krahmer and Theune. Subsequently, in the sketch of the basic algorithm (Figure 6), the set of distractors should be redefined as follows:</Paragraph>
      <Paragraph position="4"> [?] matchGraphs(v, H, n, G)[?] n negationslash= v [?]sw(n) [?] sw(v)}  Computational Linguistics Volume 29, Number 1 That is, the distractor set is restricted to those vertices n in the scene graph G that currently are at least as salient as the target object v. For target objects that are linguistically salient, this will typically lead to a reduction of the distractor set. Consequently, distinguishing graphs for these target objects will generally be smaller than those for nonsalient objects. Moreover, we will be able to find distinguishing graphs for a salient object v relatively fast, since we already have a distinguishing graph (constructed for the first definite reference to v) and we can use this graph as our initial value of bestGraph.</Paragraph>
    </Section>
    <Section position="9" start_page="68" end_page="68" type="sub_section">
      <SectionTitle>
5.4 Stochastic Cost Functions
</SectionTitle>
      <Paragraph position="0"> One of the important open questions in natural language generation is how the common rule-based approaches to generation can be combined with recent insights from statistical natural language processing (see, e.g., Langkilde and Knight [1998] and Malouf [2000] for partial answers). The approach proposed in this article makes it possible to combine graph reformulations of well-known rule-based generation algorithms with stochastic cost functions (the result resembles a Markov model). Such a cost function could be derived from a sufficiently large corpus. For instance, as a first approximation we could define the costs of adding an edge e in terms of the probability P(e) that e occurs in a distinguishing description (estimated by counting occurrences):  Thus, properties that occur frequently are cheap; properties that are relatively rare are expensive. In this way, we would probably derive that polish owczarek nizinny sheepdog indeed costs more (and is thus less likely to be selected) than brown.</Paragraph>
      <Paragraph position="1"> Even though this first approximation already has some interesting consequences, it is probably not enough to obtain a plausible and useful cost function. For instance, it is unlikely that the co-occurrence of edges is fully independent; a husky is likely to be white, and a chihuahua is not. Such dependencies are not modeled by the definition given above. In addition, properties referring to size such as small and large probably occur more often in a corpus than properties referring to colors such as brown or yellow, which at first sight appears to run counter to the earlier observation that speakers generally prefer absolute properties over relative ones. The reason for this, however, is probably that there are simply fewer ways to describe the size than there are to describe the color of objects. Searching for a more sophisticated method of defining stochastic cost functions is therefore an interesting line of future research.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="68" end_page="70" type="metho">
    <SectionTitle>
6. Concluding Remarks and Future Research
</SectionTitle>
    <Paragraph position="0"> In this article, we have presented a new approach to the content determination problem for referring expressions. We proposed to model scenes as labeled directed graphs, in which objects are represented as vertices and the properties and relations of these objects are represented as edges. The problem of finding a referring expression for an object is treated as finding a subgraph of the scene graph that is isomorphic to the intended referent but not to any other object. The theoretical complexity of this reformulation of the content determination problem is NP-complete, but there exist various restrictions (planar graphs, decision trees for fixed scene graphs, upper bound to the number of edges in a distinguishing graph) that have a polynomial complexity.</Paragraph>
    <Paragraph position="1"> We have described a general and fully implemented algorithm, based on the sub-graph isomorphism idea, consisting of two main functions: one that constructs referring graphs and one that tests for subgraph isomorphisms. Cost functions are used to  Krahmer, van Erk, and Verleg Graph-Based Generation guide the search process and to give preference to some solutions over others. Optimization has not been the focus of this article, but we came across various heuristic strategies that would speed up the algorithm. For instance, we can try edges in the order determined by the cost function (from cheap to more expensive), and we can use a greedy algorithm to find a first distinguishing graph quickly. In general, one of the advantages of the graph perspective is that many efficient algorithms for dealing with graph structures are known. We can use those algorithms to formulate more efficient versions of the subgraph construction component (perhaps using the method of Tarjan [1972]; see also Sedgewick [1988]) and of the subgraph isomorphism testing component (e.g., using the aforementioned approach of Messmer and Bunke [1995, 1998]).</Paragraph>
    <Paragraph position="2"> The graph perspective has a number of attractive properties. (1) By reformulating the content determination problem as a graph construction problem, we can directly apply the many techniques and algorithms for dealing with graph structures. (2) The use of cost functions allows us to model different search methods, each restricting the search space in its own way. By defining cost functions in different ways, we can mimic and extend various well-known algorithms from the literature (see also Krahmer, van Erk, and Verleg [2001]). (3) The generation of relational descriptions is straightforward; the problems that plague some other algorithms for the generation of relational descriptions do not arise. Moreover, the approach to relations proposed here is fully general: It applies to all n-ary relations, not just binary ones. (4) The use of cost functions paves the way for integrating statistical information directly into the generation process. In fact, performing experiments with various ways to estimate stochastic cost functions from corpora is one path for future research that we have identified.</Paragraph>
    <Paragraph position="3"> Besides looking for graph-based optimizations and performing experiments with stochastic cost functions, there are three other lines for future research we would like to mention. The first concerns the construction of scene graphs. How should the decision be made as to which aspects of a scene to represent in the graph? Naturally, the algorithm can only refer to entities that are modeled in the scene graph, but representing every possible object in a single graph will lead to an explosion of edges and vertices. Perhaps some notion of focus of attention can be used to restrict the scene graph. It would also be interesting to look for automatic methods for the construction of scene graphs. We might use computer vision algorithms (see, e.g., Faugeras [1993]), which are often graph-based themselves, for this purpose. For example, Bauckhage et al. (1999) describe an assembly system in which computer vision is used to convert a workspace with various building blocks into a labeled directed scene graph. Note that this approach is also able to deal with dynamic scenes; it can track changes in the workspace (which is required for handling the assembly process).</Paragraph>
    <Paragraph position="4"> Another issue that we have not discussed in much detail is linguistic realization.</Paragraph>
    <Paragraph position="5"> How should the information contained in a referring graph be expressed in natural language? So far, we have assumed that a distinguishing graph can simply be constructed first and subsequently fed into a realization engine. There may, however, be certain dependencies between content selection and realization (see, e.g., Horacek [1997] and Krahmer and Theune [2002]). One way to take these dependencies into account would be to reformulate the cost function in such a way that it promotes graphs that can easily be realized and punishes graphs that are more difficult to realize.</Paragraph>
    <Paragraph position="6"> A final aspect of the graph model that deserves further investigation is based on the fact that we can look at a graph such as that in Figure 3 as a Kripke model.</Paragraph>
    <Paragraph position="7"> Kripke models are used in model-theoretic semantics for modal logics. The advantage of looking at graphs such as that in Figure 3 as Kripke models is that we can use tools from modal logic to reason about these structures. For example, we can reformulate  Computational Linguistics Volume 29, Number 1 the problem of determining the content of a distinguishing description in terms of hybrid logic (see, e.g., Blackburn [2000]) as follows:</Paragraph>
    <Paragraph position="9"> In words: When we want to refer to vertex i, we are looking for that distinguishing formula ph that is true of (&amp;quot;at&amp;quot;) i but not of any j different from i. One advantage of this logical perspective is that logical properties that are not covered by most generation algorithms (such as not having a certain property; see van Deemter [2002]) fit in very well with this perspective.</Paragraph>
    <Paragraph position="10"> Appendix: Planarizing Scene Graphs Planar graphs may be relevant for our current purposes, since subgraph isomorphism can be tested more efficiently on planar graphs than on arbitrary graphs. There are two ways in which a nonplanar graph G can be turned into a planar one G prime (see Liebers [2001] for a recent overview of planarization algorithms): Either the graph G can be pruned (using vertex or edge deletion) or it can be extended (for instance, using vertex splitting or by inserting vertices at crossings). A disadvantage of the extension approach is that we lose the intuitive one-to-one correspondence between potential target objects and vertices in the scene graph, since the additional vertices only serve the purpose of planarizing the graph and do not represent objects in a scene. A disadvantage of the pruning approach is that we lose information. The presence of a cost function, however, is potentially very useful, since it allows us to avoid eliminating comparatively cheap (and thus more frequently selected) edges.</Paragraph>
    <Paragraph position="11"> Here, for the sake of illustration, we briefly describe a weighted greedy pruning algorithm that turns an arbitrary scene graph G = &lt;V  +e is planar (e.g., using the algorithm from Hopcroft and Tarjan [1974]). If it is, e is added to E G prime. The algorithm terminates when R G = [?]. The result is a maximal planar subgraph G prime of the scene graph G that differs from G only possibly in the deletion of certain relatively expensive nonlooping (relational) edges.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML