XML Viewer - w06-1411

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1411_metho.xml
Size: 18,150 bytes
Last Modified: 2025-10-06 14:10:41
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1411">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Group-based Generation of Referring Expressions</Title>
  <Section position="4" start_page="73" end_page="74" type="metho">
    <SectionTitle>
2 SOG representation
</SectionTitle>
    <Paragraph position="0"> Funakoshi et al. (2004) proposed an intermediate representation between a referring expression and the situation that is referred to by the expression. The intermediate representation represents a course of narrowing down to the target as a sequence of groups from the group of all objects to the singleton group of the target object. Thus it is called SOG (Sequence Of Groups).</Paragraph>
    <Paragraph position="1"> The following example shows an expression describing the target x in Figure 2 with the corresponding SOG representation below it. Since Japanese is a head-final language, the order of groups in the SOG representation can be retained in the linguistic expression.</Paragraph>
    <Paragraph position="2"> hidari oku ni aru  where{a,b,c,d,e,f,x}denotes all objects in the situation,{a,b,x}denotes the three objects at the back left, and {x}denotes the target.</Paragraph>
    <Section position="1" start_page="73" end_page="74" type="sub_section">
      <SectionTitle>
2.1 Extended SOG
</SectionTitle>
      <Paragraph position="0"> As mentioned above, (Funakoshi et al., 2004) supposed the limited situations where only homogeneous objects are randomly arranged, and considered only spatial subsumption relations between consecutive groups. Therefore, relations between  2004) groups are not explicitly denoted in the original SOGs as shown below.</Paragraph>
      <Paragraph position="1">  In this paper, however, other types of relations between groups are also considered. We propose an extended SOG representation where types of relations are explicitly denoted as shown below. In the rest of this paper, we will refer to this extended SOG representation by simply saying &amp;quot;SOG&amp;quot;.</Paragraph>
      <Paragraph position="3"> a certain focused feature. The feature can be an attribute of objects or a relation between objects.</Paragraph>
      <Paragraph position="4"> There are two types of relations between groups: intra-group relation and inter-group relation.</Paragraph>
      <Paragraph position="6"> Intra-group relations are further classified into the following subcategories according to the feature used to narrow down G</Paragraph>
      <Paragraph position="8"> . We denote these subcategories with the following symbols. space [?]- : spatial subsumption type [?]- : the object type shape [?]- : the shape of objects color [?]- : the color of objects size [?]- : the size of objects With respect to this classification, (Funakoshi et al., 2004) dealt with only the space [?]- relation.</Paragraph>
      <Paragraph position="10"> = ph. An inter-group relation is a spatial relation and denoted by symbol =.</Paragraph>
      <Paragraph position="11">  [?]- and =. We show a referring expression indicating object b1 and the corresponding SOG in the situation of Figure 3. In the SOG, {all} denotes the total set of objects in the situation. The indexed underlines denote correspondence between SOG and linguistic expressions. As shown in the figure, we allow objects being on the other objects.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="74" end_page="78" type="metho">
    <SectionTitle>
3 Generation
</SectionTitle>
    <Paragraph position="0"> Our generation algorithm proposed in this section consists of four steps: perceptual grouping, SOG generation, surface realization and scoring. In the rest of this section, we describe these four steps by using Figure 3 as an example.</Paragraph>
    <Section position="1" start_page="74" end_page="74" type="sub_section">
      <SectionTitle>
3.1 Step 1: Perceptual grouping
</SectionTitle>
      <Paragraph position="0"> Our algorithm starts with identifying groups of objects that are naturally recognized by humans.</Paragraph>
      <Paragraph position="1"> We adopt Th'orisson's perceptual grouping algorithm (Th'orisson, 1994) for this purpose. Perceptual grouping is performed with objects in the situation with respect to each of the following features: type, shape, color, size,andproximity. Three special features, total, singleton,and closure are respectively used to recognize the total set of objects, groups containing each single object, and objects bounded in perceptually significant regions (table tops in the domain of this paper). These three features are handled not by Th`orisson's algorithm but by individual procedures. null Type is the most dominant feature because humans rarely recognize objects of different types as a group. Thus, first we group objects with respect to types, and then group objects of the same type with respect to other features (except for total).</Paragraph>
      <Paragraph position="2"> Although we adopt Th'orisson's grouping algorithm, we use different grouping strategies from the original. Th'orisson (1994) lists the following three combinations of features as possible strategies of perceptual grouping.</Paragraph>
      <Paragraph position="3">  * shape and proximity * color and proximity * size and proximity  However, these strategies are inappropriate to generate referring expressions. For example, because two blue balls b1 and b2 in Figure 3 are too much distant from each other, Th'orisson's algorithm cannot recognize the group consisting of b1 and b2 with the original strategies. However, the expression like &amp;quot;the left blue ball&amp;quot; can naturally refer to b1. When using such an expression, we assume an implicit group consisting of b1 and b2. Hence, we do not combine features but use them separately.</Paragraph>
      <Paragraph position="4"> The results of perceptual grouping of the situation in Figure 3 are shown below. Relation labels are assigned to recognized groups with respect to features used in perceptual grouping. We define six labels: all, type, shape, color, size,and space. Features singleton, proximity and closure share the same label space. A group may have several labels.</Paragraph>
      <Paragraph position="5"> feature label recognized groups  02: SOG = []; # list of groups and symbols 03: All = getAll(); # total set 04: add(All, SOG); # add All to SOG 05: TypeList = getAllTypes(All); # list of all object types 06: TargetType = getType(Target); # type of the target 07: TargetSailency = saliency(TargetType); # saliency of the target type 08: for each Type in TypeList do # {Table, Plant, Ball} 09: if saliency(Type) [?] TargetSaliency then # saliency: Table &gt; Plant &gt; Ball 10: Group = getTypeGroup(Type); # get the type group of Type 11: extend(SOG, Group); 12: end if 13: end for</Paragraph>
    </Section>
    <Section position="2" start_page="74" end_page="74" type="sub_section">
      <SectionTitle>
3.2 Step 2: SOG generation
</SectionTitle>
      <Paragraph position="0"> The next step is generating SOGs. This is so-called content planning in natural language generation. Figure 4, Figure 5 and Figure 6 show the algorithm of making SOGs.</Paragraph>
      <Paragraph position="1"> Three variables Target, AllGroups,and SOGListdefined in Figure 4 are global variables. Target holds the target object which the referring expression refers to. AllGroups holds the set of all groups recognized in Step 1. Given Target and AllGroups, function makeSOG enumerates possible SOGs in the depth-first manner, and stores them in SOGList.</Paragraph>
      <Paragraph position="2"> makeSOG(Figure 4) makeSOGstarts with a list (SOG) that contains the total set of objects in the domain. It chooses groups of objects that are more salient than or equal to the target object and calls function extendfor each of the groups.</Paragraph>
      <Paragraph position="3"> extend (Figure 5) Given an SOG and a group to be added to the SOG,function extendextends the SOG with the group for each label attached to the group. This extension is done by creating a copy of the given SOG and adding to its end an intra-group relation symbol definedinSection2.2 corresponding to the given label and group. Finally it calls searchwith the copy.</Paragraph>
      <Paragraph position="4"> search(Figure 6) This function takes an SOG as its argument. According to the last group in  the SOG (LastGroup), it extends the SOG as described below.</Paragraph>
      <Paragraph position="5"> 1. If LastGroup is a singleton of the target object, append SOG to SOGListand return.</Paragraph>
      <Paragraph position="6"> 2. If LastGroup is a singleton of a non-target object, find the groups that contain the target object and satisfy the following three conditions: (a), (b) and (c).</Paragraph>
      <Paragraph position="7"> (a) All objects in the group locate in  the same direction from the object of LastGroup (the reference). Possible directions are one of &amp;quot;back&amp;quot;, &amp;quot;back right&amp;quot;, &amp;quot;right&amp;quot;, &amp;quot;front right&amp;quot;, &amp;quot;front&amp;quot;, &amp;quot;front left&amp;quot;, &amp;quot;left&amp;quot;, &amp;quot;left back&amp;quot; and &amp;quot;on&amp;quot;. The direction is determined on the basis of coordinate values of the objects, and is assigned to the group for the use of  surface realization.</Paragraph>
      <Paragraph position="8"> (b) There is no same type object located between the group and the reference.</Paragraph>
      <Paragraph position="9"> (c) The group is not a total set of a certain type of object.</Paragraph>
      <Paragraph position="10">  Then, for each of the groups, make a copy of the SOG, and concatenate &amp;quot;=&amp;quot;andthe group to the copy, and call search recursively with the new SOG.</Paragraph>
      <Paragraph position="11"> 3. If LastGroup contains the target object together with other objects, let the intersection of LastGroup and each group in AllGroups be NewG, and copy the label from each group to NewG.IfNewG contains the target object, call function extend un- null less Checkedcontains NewG.</Paragraph>
      <Paragraph position="12"> 4. If LastGroup contains only non-target ob null 01:search(SOG) 02: LastGroup = getLastElement(SOG); # get the rightmost group in SOG 03: Card = getCardinality(LastGroup); 04: if Card == 1 then 05: if containsTarget(LastGroup) then # check if LastGroup contains # the target 06: add(SOG, SOGList); 07: else 08: GroupList = searchTargetGroups(LastGroup); # find groups containing the target 09: for each Group in GroupList do 10: SOGcopy = copy(SOG); 11: add(=, SOGcopy); 12: add(Group, SOGcopy); 13: search(SOGcopy); 14: end for 15: end if 16: elsif containsTarget(LastGroup) then 17: Checked = [ ]; 18: for each Group in AllGroups do 19: NewG = Intersect(Group, LastGroup); # make intersection 20: Labels = getLabels(Group); 21: setLabels(Labels, NewG); # copy labels from Group to NewG 22: if containsTarget(NewG) &amp; !contains(Checked, NewG) then 23: add(NewG, Checked); 24: extend(SOG, Group); 25: end if 26: end for 27: else 28: for each Group of AllGroups do 29: if contains(LastGroup, Group) then 30: extend(SOG, Group); 31: end if 32: end for 33: end if</Paragraph>
    </Section>
    <Section position="3" start_page="74" end_page="77" type="sub_section">
      <SectionTitle>
3.3 Step 3: Surface realization
</SectionTitle>
      <Paragraph position="0"> A referring expression is generated by deterministically assigning a linguistic expression to each element in an SOG according to Rule 1 and 2.</Paragraph>
      <Paragraph position="1"> As Japanese is a head-final language, simple concatenation of element expressions makes a well-formed noun phrase  . Rule 1 generates expressions for groups and Rule 2 does for relations. Each rule consists of several subrules which are applied in this order.</Paragraph>
      <Paragraph position="2"> [Rule 1]: Realization of groups Rule 1.1 The total set ({all}) is not realized. (Funakoshi et al., 2004) collected referring expressions from human subjects through experiments and found that humans rarely mentioned the total set. According to their observation, we do not realize the total set. Rule 1.2 Realize the type name for a singleton. Type is realized as a noun and only for a singleton because the type feature is used first to narrow down the group, and the succeeding groups consist of the same type objects until reaching the singleton. When the singleton is not the last element of SOG, particle &amp;quot;no&amp;quot;is added.</Paragraph>
      <Paragraph position="3"> Rule 1.3 The total set of the same type objects is not realized.</Paragraph>
      <Paragraph position="4"> This is because the same reason as Rule 1.1.</Paragraph>
      <Paragraph position="5"> Rule 1.4 The group followed by the relation space  [?]is realized as &amp;quot;[cardinality] [type] no-uti (among)&amp;quot;, e.g., &amp;quot;futatu-no (two) tukue (desk) no-uti (among)&amp;quot;. The group followed by  Although different languages require different surface realization rules, we presume perceptual grouping and SOG generation (Step 1 and 2) are applicable to other languages as well.</Paragraph>
      <Paragraph position="6">  the relation = is realized as &amp;quot;[cardinality] [type] no&amp;quot;.</Paragraph>
      <Paragraph position="7"> When consecutive groups are connected by other than spatial relations ( space [?]- and =), they can be realized as asequence of relations ahead of the noun (type name). For example, expression &amp;quot;the red ball among big balls&amp;quot; can be simplified to &amp;quot;the big red ball&amp;quot;.</Paragraph>
      <Paragraph position="9"> geometric relations among objects, generate one of four directional expressions &amp;quot;{migi, hidari, temae, oku} no ({right, left, front, back})&amp;quot;.</Paragraph>
      <Paragraph position="11"> geometric relations among objects, generate one of eight directional expressions &amp;quot;itiban {migi, hidari, temae, oku, migi temae, hidari temae, migi oku, hidari oku} no ({right, left, front, back, front right, front left, back right, back left}-most)&amp;quot; if applicable. If none of these expressions is applicable, generate expression &amp;quot;mannaka no (middle)&amp;quot; if applicable. Otherwise, generate one of four expressions &amp;quot;{hidari, migi, temae, oku} kara j-banme no (j-th from {left, right, front, back})&amp;quot;.</Paragraph>
      <Paragraph position="12"> If |G i+1 |[?]2, based on the geometric relations among objects, generate one of eight directional expressions &amp;quot;{migi, hidari, temae, oku, migi temae, hidari temae, migi oku, hidari oku} no ({right, left, front, back, front right, front left, back right, back left})&amp;quot;.</Paragraph>
      <Paragraph position="14"> |=1should hold because of search in Step 2. According to the direction assigned by search, generate one of nine expressions : &amp;quot;{migi, hidari, temae, oku, migi temae, hidari temaen, migi oku, hidari oku, ue} no ({right, left, front, back, front right, front left, back right, back left, on})&amp;quot;.</Paragraph>
      <Paragraph position="15"> Figure 8 shows the expressions generated from the first three SOGs shown in Figure 7. The numbers in the parentheses denote coindexes of fragments between the SOGs and the realized expressions. null</Paragraph>
    </Section>
    <Section position="4" start_page="77" end_page="77" type="sub_section">
      <SectionTitle>
3.4 Step 4: Scoring
</SectionTitle>
      <Paragraph position="0"> Weassign ascore toeach expression bytaking into account the relations used in the expression, and the length of the expression.</Paragraph>
      <Paragraph position="1"> First we assign a cost ranging over [0,1] to each relation in the given SOG. Costs of relations are decided as below. These costs conform to the priorities of features described in (Dale and Reiter, 1995).</Paragraph>
      <Paragraph position="2"> type [?]- : No cost (to be neglected) shape [?]- :0.2 color [?]- :0.4 size [?]- : big(est): 0.6, small(est): 0.8, middle: 1.0 space [?]-,=: Cost functions are defined according to the potential functions proposed in (Tokunaga et al., 2005). The cost for relation &amp;quot;on&amp;quot; is fixed to 0.</Paragraph>
      <Paragraph position="3"> Then, the average cost of the relations is calculated to obtain the relation cost, C rel . The cost of surface length (C len ) is calculated by</Paragraph>
      <Paragraph position="5"> where the length of an expression is measured by the number of characters.</Paragraph>
      <Paragraph position="6"> Using these costs, the score of an expression is</Paragraph>
    </Section>
    <Section position="5" start_page="77" end_page="78" type="sub_section">
      <SectionTitle>
4.1 Experiments
</SectionTitle>
      <Paragraph position="0"> Weconducted two experiments to evaluate expressions generated by the proposed method.</Paragraph>
      <Paragraph position="1"> Both experiments used the same 18 subjects and the same 20 object arrangements which were generated automatically. For each arrangement, all factors (number of objects, positions ofobjects, attributes of objects, and the target object) were randomly decided in advance to conform to the following conditions: (1) the proposed method can generate more than five expressions for the given target and (2) more than two other objects exist which are the same type as the target.</Paragraph>
      <Paragraph position="2">  evaluate the ability of expressions to identify the targets. The subjects were presented an arrangement with a generated referring expression which gained the highest score at a time, and were instructed to choose the object referred to by the expression. Figure 9 is an example of visual stimuli used in Experiment 1. Each subject responded to all 20 arrangements.</Paragraph>
      <Paragraph position="3">  evaluate validity of the scoring function described in Section 3.4. The subjects were presented an arrangement with a marked target together with the best five generated expressions referring to the target at a time. Then the subjects were asked to choose the best one from the five expressions. Figure 10 is an example of visual stimuli used in Experiment 2. Each subject responded to the all 20 arrangements. The expressions used in Experiment 2 include those used in Experiment 1.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML