XML Viewer - p06-2033

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2033_metho.xml
Size: 22,524 bytes
Last Modified: 2025-10-06 14:10:22
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2033">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Conceptual Coherence in the Generation of Referring Expressions</Title>
  <Section position="4" start_page="255" end_page="259" type="metho">
    <SectionTitle>
2 Empirical evidence
</SectionTitle>
    <Paragraph position="0"> We take as paradigmatic the case where a plural reference involves disjunction/union, that is, has the logical form lx(p(x)[?]q(x)), realised as a description of the form the N1 and the N2. By hypothesis, the case where all referents can be described using identical properties (logically, a conjunction), is a limiting case of CC.</Paragraph>
    <Paragraph position="1"> Previous work on plural anaphor processing has shown that pronoun resolution is easier when antecedents are ontologically similar (e.g. all humans) (Kaup et al., 2002; Koh and Clifton, 2002).</Paragraph>
    <Paragraph position="2"> Reference to a heterogeneous set increases processing difficulty.</Paragraph>
    <Paragraph position="3"> Our experiments extended these findings to full definite NP reference. Throughout, we used a distributional definition of similarity, as defined by Lin (1998), which was found to be highly correlated to people's preferences for disjunctive descriptions (Gatt and van Deemter, 2005). The similarity of two arbitrary objects a and b is a function of the information gained by giving a joint description of a and b in terms of what they have in common, compared to describing a and b separately.</Paragraph>
    <Paragraph position="4"> The relevant data in the lexical domain is the grammatical environment in which words occur.</Paragraph>
    <Paragraph position="5"> This information is represented as a set of triples &lt;rel,w,w'&gt; , where rel is a grammatical relation, w the word of interest and w' its co-argument in rel (e.g. &lt; premodifies, dog, domestic &gt; ). Let F(w) be a list of such triples. The information content of this set is defined as mutual information I(F(w)) (Church and Hanks, 1990). The similarity of two words w1 and w2, of the same grammatical category, is:</Paragraph>
    <Paragraph position="7"> For example, if premodifies is one of the relevant grammatical relations, then dog and cat might occur several times in a corpus with the same pre-modifiers (tame, domestic, etc). Thus, s(dog,cat) is large because in a corpus, they often occur in the same contexts and there is considerable information gain in a description of their common data.</Paragraph>
    <Paragraph position="8"> Rather than using a hand-crafted ontology to infer similarity, this definition looks at real language Condition a b c distractor HDS spanner chisel plug thimble LDS toothbrush knife ashtray clock  use. It covers ontological similarity to the extent that ontologically similar objects are talked about in the same contexts, but also cuts across ontological distinctions (for example newspaper and journalist might turn out to be very similar).</Paragraph>
    <Paragraph position="9"> We use the information contained in the SketchEngine database1 (Kilgarriff, 2003), a largescale implementation of Lin's theory based on the BNC, which contains grammatical triples in the form of Word Sketches for each word, with each triple accompanied by a salience value indicating the likelihood of occurrence of the word with its argument in a grammatical relation. Each word also has a thesaurus entry, containing a ranked list of words of the same category, ordered by their similarity to the head word.</Paragraph>
    <Section position="1" start_page="255" end_page="256" type="sub_section">
      <SectionTitle>
2.1 Experiment 1
</SectionTitle>
      <Paragraph position="0"> In Experiment 1, participants were placed in a situation where they were buying objects from an on-line store. They saw scenarios containing four pictures of objects, three of which (the targets) were identically priced. Participants referred to them by completing a 2-sentence discourse: S1 The object1 and the object 2 cost amount.</Paragraph>
      <Paragraph position="1"> S2 The object3 also costs amount.</Paragraph>
      <Paragraph position="2"> If similarity is a constraint on referential coherence in plural references, then if two targets are similar (and dissimilar to the third), a plural reference to them in S1 should be more likely, with the third entity referred to in S2.</Paragraph>
      <Paragraph position="3"> Materials, design and procedure All the pictures were artefacts selected from a set of drawings normed in a picture-naming task with British English speakers (Barry et al., 1997).</Paragraph>
      <Paragraph position="4"> Each trial consisted of the four pictures arranged in an array on a screen. Of the three targets (a, b, c), c was always an object whose name in the norms was dissimilar to that of a and b. The semantic similarity of (nouns denoting) a and b was manipulated as a factor with two levels: High Distributional Similarity (HDS) meant that b occurred among the top 50 most similar items to a in  meant that b did not occur in the top 500 entries for a. Examples are shown in Figure 2.1.</Paragraph>
      <Paragraph position="5"> Visual Similarity (VS) of a and b was also controlled. Pairs of pictures were first normed with a group who rated them on a 10-point scale based on their visual properties. High-VS (HVS) pairs had a mean rating [?] 6; Low-VS LVS) pairs had mean ratings [?] 2. Two sets of materials were constructed, for a total of 2 (DS) x 2 (V S) x 2 = 8 trials.</Paragraph>
      <Paragraph position="6"> 29 self-reported native or fluent speakers of English completed the experiment over the web. To complete the sentences, participants clicked on the objects in the order they wished to refer to them. Nouns appeared in the next available space2.</Paragraph>
      <Paragraph position="7"> Results and discussion Responses were coded according to whether objects a and b were referred to in the plural subject of S1 (a + b responses) or not (a[?]b responses). If our hypothesis is correct, there should be a higher proportion of a + b responses in the HDS condition. We did not expect an effect of VS. In what follows, we report by-subjects Friedman analyses (kh21); by-items analyses (kh22); and by-subjects sign tests (Z) on proportions of responses for pairwise comparisons.</Paragraph>
      <Paragraph position="8"> Response frequencies across conditions differed reliably by subjects (kh21 = 46.124,p &lt; .001).</Paragraph>
      <Paragraph position="9"> The frequency of a + b responses in S1 was reliably higher than that of a[?]b in the HDS condition (kh22 = 41.371,p &lt; .001), but not the HVS condition (kh22 = 1.755,ns). Pairwise comparisons between HDS and LDS showed a significantly higher proportion of a + b responses in the former (Z = 4.48,p &lt; .001); the difference was barely significant across VS conditions</Paragraph>
      <Paragraph position="11"> The results show that, given a clear choice of entities to refer to in a plurality, people are more likely to describe similar entities in a plural description. However, these results raise two further questions. First, given a choice of distinguishing properties for individuals making up a target set, will participants follow the predictions of the CC? (In other words, is distributional similarity relevant for content determination?) Second, does the similarity effect carry over to modifiers, such as adjectives, or is the CC exclusively a constraint on types? 2Earler replications involving typing yielded parallel results and high conformity between the words used and those predicted by the picture norms.</Paragraph>
      <Paragraph position="12"> Three millionaires with a passion for antiques were spotted dining at a London restaurant.</Paragraph>
    </Section>
    <Section position="2" start_page="256" end_page="258" type="sub_section">
      <SectionTitle>
2.2 Experiment 2
</SectionTitle>
      <Paragraph position="0"> Experiment 2 was a sentence continuation task, designed to closely approximate content determination in GRE. Participants saw a series of discourses, in which three entities (e1, e2, e3) were introduced, each with two distinguishing properties. The final sentence in each discourse had a missing plural subject NP referring to two of these.</Paragraph>
      <Paragraph position="1"> The context made it clear which of the three entities had to be referred to. Our hypothesis was that participants would prefer to use semantically similar properties for the plural reference, even if dissimilar properties were also available.</Paragraph>
      <Paragraph position="2"> Materials, design and procedure Materials consisted of 24 discourses, such as those in Figure 2.2. After an initial introductory sentence, the 3 entities were introduced in separate sentences.</Paragraph>
      <Paragraph position="3"> In all discourses, the pairs {e1,e2} and {e2,e3} could be described using either pairwise similar or dissimilar properties (similar pairs are coindexed in the figure). In half the discourses, the distinguishing properties of each entity were nouns; thus, although all three entities belonged to the same ontological category (e.g. all human), they had distinct types (e.g. duke, prince, bachelor). In the other half, entities were of the same type, that is the NPs introducing them had the same nominal head, but had distinguishing adjectival modifiers.</Paragraph>
      <Paragraph position="4"> For counterbalancing, two versions of each discourse were constructed, such that, if {e1,e2} was the target set in Version 1, then {e2,e3} was the target in Version 2. Twelve filler items requiring singular reference in the continuation were also included. The order in which the entities were introduced was randomised across participants, as was the order of trials. The experiment was completed by 18 native speakers of English, selected from the Aberdeen NLG Group database. They were randomly assigned to either Version 1 or 2.</Paragraph>
      <Paragraph position="5"> Results and discussion Responses were coded 1 if the semantically similar properties were used (e.g. the prince and the duke in Fig. 2.2); 2 if the  similar properties were used together with other properties (e.g. the prince and the bachelor duke); 3 if a superordinate term was used to replace the similar properties (e.g. the noblemen); 4 otherwise (e.g. The duke and the collector).</Paragraph>
      <Paragraph position="6"> Response types differed significantly in the nominal condition both by subjects (kh21 = 45.89,p &lt; .001) and by items (kh22 = 287.9,p &lt; .001). Differences were also reliable in the modifier condition (kh21 = 36.3,p &lt; .001, kh22 = 199.2,p &lt; .001). However, the trends across conditions were opposed, with more items in the 1 response category in the nominal condition (53.7%) and more in the 4 category in the modifier condition (47.2%). Recoding responses as binary ('similar' = 1,2,3; 'dissimilar' = 4) showed a significant difference in proportions for the nominal category</Paragraph>
      <Paragraph position="8"> gory. Pairwise comparisons showed a significantly larger proportion of 1 (Z = 2.7,p = .007) and 2 responses (Z = 2.54,p = .01) in the nominal compared to the modifier condition.</Paragraph>
      <Paragraph position="9"> The results suggest that in a referential task, participants are likely to conform to the CC, but that the CC operates mainly on nouns, and less so on (adjectival) modifiers. Nouns (or types, as we shall sometimes call them) have the function of categorising objects; thus similar types facilitate the mental representation of a plurality in a conceptually coherent way. According to the definition in (1), this is because similarity of two types implies a greater likelihood of their being used in the same predicate-argument structures. As a result, it is easier to map the elements of a plurality to a common role in a sentence. A related proposal has been made by Moxey and Sanford (1995), whose Scenario Mapping Principle holds that a plural reference is licensed to the extent that the elements of the plurality can be mapped to a common role in the discourse. This is influenced by how easy it is to conceive of such a role for the referents. Our results can be viewed as providing a handle on the notion of 'ease of conception of a common role'; in particular we propose that likelihood of occurrence in the same linguistic contexts directly reflects the extent to which two types can be mapped to a single plural role.</Paragraph>
      <Paragraph position="10"> As regards modifiers, while it is probably premature to suggest that CC plays no role in modifier selection, it is likely that modifiers play a different role from nouns. Previous work has shown that id base type occupation specialisation girth  restrictions on the plausibility of adjective-noun combinations exist (Lapata et al., 1999), and that using unlikely combinations (e.g. the immaculate kitchen rather than the spotless kitchen) impacts processing in online tasks (Murphy, 1984). Unlike types, which have a categorisation function, modifiers have the role of adding information about an element of a category. This would partially explain the experimental results: When elements of a plurality have identical types (as in the modifier version of our experiment), the CC is already satisfied, and selection of modifiers would presumably depend on respecting adjective-noun combination restrictions. Further research is required to verify this, although the algorithm presented below makes use of the Sketch Engine database to take modifier-noun combinations into account.</Paragraph>
      <Paragraph position="11"> 3 An algorithm for referring to sets Our next task is to port the results to GRE. The main ingredient to achieve conceptual coherence will be the definition of semantic similarity. In what follows, all examples will be drawn from the domain in Table 3.</Paragraph>
      <Paragraph position="12"> We make the following assumptions. There is a set U of domain entities, properties of which are specified in a KB as attribute-value pairs. We assume a distinction between types, that is, any property that can be realised as a noun; and modifiers, or non-types. Given a set of target referents R [?] U, the algorithm described below generates a description D in Disjunctive Normal Form (DNF), having the following properties:  1. Any disjunct in D contains a 'type' property, i.e. a property realisable as a head noun.</Paragraph>
      <Paragraph position="13"> 2. If D has two or more disjuncts, each a con- null junction containing at least one type, then the disjoined types should be as similar as possible, given the information in the KB and the completeness requirement: that the algorithm find a distinguishing description whenever one exists.</Paragraph>
      <Paragraph position="14">  We first make our interpretation of the CC more precise. Let T be the set of types in the KB, and let s(t,t') be the (symmetrical) similarity between any two types t and t'. These determine a semantic space S = &lt;T,s&gt; . We define the notion of a perspective as follows.</Paragraph>
      <Paragraph position="15"> Definition 1. Perspective A perspective P is a convex subset of S, i.e.:</Paragraph>
      <Paragraph position="17"> The aims of the algorithm are to describe elements of R using types from the same perspective, failing which, it attempts to minimise the distance between the perspectives from which types are selected in the disjunctions of D. Distance between perspectives is defined below.</Paragraph>
    </Section>
    <Section position="3" start_page="258" end_page="258" type="sub_section">
      <SectionTitle>
3.1 Finding perspectives
</SectionTitle>
      <Paragraph position="0"> The system makes use of the SketchEngine database as its primary knowledge source. Since the definition of similarity applies to words, rather than properties, the first step is to generate all possible lexicalisations of the available attribute-value pairs in the domain. In this paper, we simplify by assuming a one-to-one mapping between properties and words.</Paragraph>
      <Paragraph position="1"> Another requirement is to distinguish between type properties (the set T), and non-types (M)3.</Paragraph>
      <Paragraph position="2"> The Thesaurus is used to find pairwise similarity of types in order to group them into related clusters. Word Sketches are used to find, for each type, the modifiers in the KB that are appropriate to the type, on the basis of the associated salience values.</Paragraph>
      <Paragraph position="3"> For example, in Table 3, e3 has plump as the value for girth, which combines more felicitously with man, than with biologist.</Paragraph>
      <Paragraph position="4"> Types are clustered using the algorithm described in Gatt (2006). For each type t, the algorithm finds its nearest neighbour nt in semantic space. Clusters are then found by recursively grouping elements with their nearest neighbours.</Paragraph>
      <Paragraph position="5"> If t, t' have a common nearest neighbour n, then {t,t',n} is a cluster. Clearly, the resulting sets are convex in the sense of Definition 1. Each modifier is assigned to a cluster by finding in its Word Sketch the type with which it co-occurs with the greatest salience value. Thus, a cluster is a pair  Note that T and M need not be disjoint, and entities can have more than one type property T: {lecturer, professor} T: {woman, man} M: {plump, thin} T: {geologist, physicist, biologist, chemist} 3 2  &lt;P,M'&gt; where P is a perspective, and M' [?] M. The distance d(A,B) between two clusters A and B is defined straightforwardly in terms of the distance between their perspectives PA and PB:</Paragraph>
      <Paragraph position="7"> (2) Finally, a weighted, connected graph G = &lt;V,E,d&gt; is created, where V is the set of clusters, and E is the set of edges with edge weights defined as the semantic distance between perspectives. Figure 3.1 shows the graph constructed for the domain in Table 3.</Paragraph>
      <Paragraph position="8"> We now define the coherence of a description more precisely. Given a DNF description D, we shall say that a perspective P is realised in D if there is at least one type t [?] P which is in D. Let PD be the set of perspectives realised in D. Since G is connected, PD determines a connected subgraph of G. The total weight of D, w(D) is the sum of weights of the edges in PD.</Paragraph>
      <Paragraph position="9"> Definition 2. Maximal coherence A description D is maximally coherent iff there is no description D' coextensive with D such that w(D) &gt; w(D').</Paragraph>
      <Paragraph position="10"> (Note that several descriptions of the same referent may all be maximally coherent.)</Paragraph>
    </Section>
    <Section position="4" start_page="258" end_page="259" type="sub_section">
      <SectionTitle>
3.2 Content determination
</SectionTitle>
      <Paragraph position="0"> The core of the content determination procedure maintains the DNF description D as an associative array, such that for any r [?] R, D[r] is a conjunction of properties true of r. Given a cluster &lt;P,M&gt; , the procedure searches incrementally first through P, and then M, selecting properties that are true of at least one referent and exclude some distractors, as in the IA (Dale and Reiter, 1995).</Paragraph>
      <Paragraph position="1"> By Definition 2, the task of the algorithm is to minimise the total weight w(D). If PD is the  set of perspectives represented in D on termination, then maximal coherence would require PD to be the subgraph of G with the lowest total cost from which a distinguishing description could be constructed. Under this interpretation, PD corresponds to a Shortest Connection, or Steiner, Network. Finding such networks is known to be NP-Hard. Therefore, we adopt a weaker (greedy) interpretation. Under the new definition, if D is the only description for R, then it trivially satisfies maximal coherence. Otherwise, the algorithm aims to maximise local coherence.</Paragraph>
      <Paragraph position="2"> Definition 3. Local coherence A description D is locally coherent iff: a. either D is maximally coherent or b. there is no D' coextensive with D, obtained by replacing types from some perspective in PD with types from another perspective such that w(D) &gt; w(D').</Paragraph>
      <Paragraph position="3"> Our implementation of this idea begins the search for distinguishing properties by identifying the vertex of G which contains the greatest number of referents in its extension. This constitutes the root node of the search path. For each node of the graph it visits, the algorithm searches for properties that are true of some subset of R, and removes some distractors, maintaining a set N of the perspectives which are represented in D up to the current point. The crucial choice points arise when a new node (perspective) needs to be visited in the graph. At each such point, the next node n to be visited is the one which minimises the total weight of N, that is:</Paragraph>
      <Paragraph position="5"> The results of this procedure closely approximate maximal coherence, because the algorithm starts with the vertex most likely to distinguish the referents, and then greedily proceeds to those nodes which minimise w(D) given the current state, that is, taking all previously used nodes into account.</Paragraph>
      <Paragraph position="6"> As an example of the output, we will take R = {e1,e3,e4} as the intended referents in Table 3. First, the algorithm determines the cluster with the greatest number of referents in its extension. In this case, there is a tie between clusters 2 and 3 in Figure 3.1, since all three entities have type properties in these clusters. In either case, the entities are distinguishable from a single cluster.</Paragraph>
      <Paragraph position="7"> If cluster 3 is selected as the root, the output is lx[physicist(x)[?]biologist(x)[?]chemist(x)].</Paragraph>
      <Paragraph position="8"> In case the algorithm selects cluster 2 as the root node the final output is the logical form lx[man(x)[?](woman(x)[?]plump(x))].</Paragraph>
      <Paragraph position="9"> There is an alternative description that the algorithm does not consider. An algorithm that aimed for conciseness would generate lx[professor(x)[?]man(x)] (the professor and the men), which does not satisfy local coherence.</Paragraph>
      <Paragraph position="10"> These examples therefore highlight the possible tension between the avoidance of redundancy and achieving coherence. It is to an investigation of this tension that we now turn.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML