File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1409_metho.xml
Size: 21,283 bytes
Last Modified: 2025-10-06 14:10:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1409"> <Title>measuring the benefits for readers</Title> <Section position="5" start_page="55" end_page="56" type="metho"> <SectionTitle> 3 Obstacles for resolution </SectionTitle> <Paragraph position="0"> Generating a uniquely referring expression is not always enough, because such an expression can leave the hearer with an unnecessarily large search space. But the issue is an even starker one, especially when the locations of speaker and hearer are taken into account. (For simplicity, we assume that the locations coincide.) Suppose a hierarchically-ordered domain D contains only one entity whose TYPE is LIBRARY.</Paragraph> <Paragraph position="1"> Consider the following noun phrases, uttered in the position marked by d in Figure 1. (The first three have the same intended referent.) (2a) the library, in room 120 in the Cockcroft bld. (2b) the library, in room 120 (2c) the library (2d) room 120 Utterances like (2a) and (2b) make use of the hierarchical structure of the domain. Their content can be modelled as a list L = <(x1,P1),(x2,P2)...(xn,Pn)> , where x1 = r is the referent of the referring expression and, for every j > 1, xj is an ancestor (not necessarily the parent) of xj[?]1 in D. For every j, Pj is a set of properties that jointly identify xj within xj+1 or, if j = n, within the whole domain. For example, (2a) is modelled as</Paragraph> <Paragraph position="3"> We focus on the search for xn because, under the assumptions that were just made this is the only place where problems can occur (since no parent node is available).</Paragraph> <Paragraph position="4"> Even though each of (2a)-(2d) succeeds in characterising their intended referent uniquely, some of these descriptions can be problematic for the hearer. One such problem occurs in (2d). The expression is logically sufficient. But, intuitively speaking, the expression creates an expectation that the referent may be found nearby, within the Watts building whereas, in fact, a match can only be found in another building. In this case we will speak of Lack of Orientation (LO).</Paragraph> <Paragraph position="5"> Even more confusion might occur if another library was added to our example, e.g., in Watts 110, while the intended referent was kept constant. In this case, (2c) would fail to identify the referent, of course. The expression (2b), however, would succeed, by mutually using two parts of the description ('the library' and 'room 120') to identify another: there are two libraries, and two rooms numbered 120, but there is only one pair (a,b) such that a is a library and b is a room numbered 120, while a is located in b. Such cases of mutual identification are unproblematic in small, transparent, domains where search is not an issue, but in large hierarchical domains, they are not. For, like (2d), (2b) would force a reader to search through an unnecessarily large part of the domain; worse even, the search 'path' that the reader is likely to follow leads via an obstacle, namely room 120 Watts, that matches a part of the description, while not being the intended referent of the relevant part of the description (i.e., room 120 Cockcroft). Confusion could easily result. In cases like this, we speak of a Dead End (DE).</Paragraph> <Paragraph position="6"> In section 5 we will present evidence suggesting that instances of Dead End and Lack of Orientation may disrupt search in a sufficiently large or complex domain. For a theoretical discussion we refer to Paraboni and van Deemter (2002).</Paragraph> </Section> <Section position="6" start_page="56" end_page="57" type="metho"> <SectionTitle> 4 Generation algorithms </SectionTitle> <Paragraph position="0"> What kinds of expression would existing GRE algorithms produce in the situations of interest? Since hierarchies involve relations, the first algorithm that comes to mind is the one proposed by Dale and Haddock (1991). Essentially, this algorithm combines one- and two-place predicates, until a combination is found that pins down the target referent. A standard example involves a domain containing two tables and two bowls, while only one of the two tables has a bowl on it. In this situation, the combination {bowl(x),on(x,y),table(y)} identifies x (and, incidentally, also y) uniquely, since only one value of x can be used to verify the three predicates; this justifies the description 'the bowl on the table'. This situation can be 'translated' directly into our university domain. Consider Figure 2, with one additional library in room 110 of the Watts building. In this situation, the com-</Paragraph> <Paragraph position="2"> cated in a room with number 120 (and no other room numbered 120 contains a library). Thus, the standard approach to relational descriptions allows precisely the kinds of situation that we have described as DE. Henceforth, we shall describe this as the Minimal Description (MD) approach to reference because, in the situations of interest, it uses the minimum number of properties by which the referent can be distinguished.</Paragraph> <Paragraph position="3"> Paraboni and van Deemter (2002) have sketched two GRE algorithms, both of which are guaranteed to prevent DE and LO by including logically redundant information into the generated descriptions so as to reduce the reader's search space. These algorithms, called Full Inclusion (FI) and Scope-Limited (SL), are not the only ways in which resolution may be aided, but we will see that they represent two natural options. Both take as input a hierarchical domain D, a location d where the referring expression will materialise, and an intended referent r.</Paragraph> <Paragraph position="4"> Briefly, the FI algorithm represents a straightforward way of reducing the length of search paths, without particular attention to DE or LO. It lines up properties that identify the referent uniquely within its parent node, then moves up to identify this parent node within its parent node, and so on until reaching a subtree that includes the starting point d 2. Applied to our earlier example of a reference to room 120, FI first builds up the list</Paragraph> <Paragraph position="6"> then expands it to</Paragraph> <Paragraph position="8"> Now that Parent(X) includes d , r has been identified uniquely within D and we reach STOP. L might be realised as e.g., 'room 120 in Cockcroft'.</Paragraph> <Paragraph position="9"> FI gives maximal weight to ease of resolution.</Paragraph> <Paragraph position="10"> But something has to give, and that is brevity: By conveying logical redundancy, descriptions are lengthened, and this can have drawbacks. The second algorithm in Paraboni and van Deemter (2002), called SCOPE-LIMITED (SL), constitutes a compromise between brevity and ease of resolution. SL prevents DE and LO but opts for brevity when DE and LO do not occur. This is done by making use of the notion of SCOPE, hence the name of the algorithm.</Paragraph> <Paragraph position="11"> a natural extension of Krahmer and Theune's treatment of salience in GRE: see Paraboni and van Deemter (2002). The difference between FI and SL becomes evident when we consider a case in which the minimally distinguishing description does not lead to DE or LO. For example, a reference to r = library would be realised by FI as 'the library in room 120 in Cockcroft'. By using SL, however, the same description would be realised by the SL algorithm simply as 'the library', since there is no risk of DE or LO. With the addition of a second library in the Watts building, the behaviour of the SL algorithm would change accordingly, producing 'the library in Cockcroft'. Similarly, had we instead included the second library under another room of Cockcroft, SL would describe r as 'the library in room 120 of Cockcroft', just like FI. For details of both algorithms we refer to Paraboni and van Deemter (2002).</Paragraph> </Section> <Section position="7" start_page="57" end_page="59" type="metho"> <SectionTitle> 5 The new experiment </SectionTitle> <Paragraph position="0"> In Paraboni and van Deemter (2002) an experiment was described to find out what types of references are favoured by human judges when their opinion about these references is asked. As an example of a hierarchically ordered domain, the experiment made use of a document structured in sections and subsections. This allowed Paraboni and van Deemter (2002) to show their subjects the domain itself, rather than, for example, a pictorial representation (as it would be necessary in most other cases such as that of a University campus, which motivated many of our examples so far).</Paragraph> <Paragraph position="1"> The experiment investigated the choice of so-called document-deictic references, such as 'the picture in part x of section y' made by authors of documents to check whether they choose to avoid potential DE and LO situations by adding redundant properties (favouring ease of resolution) and, conversely, whether they choose shorter descriptions when there is no such risk (favouring ease of interpretation). The results suggested that human authors often prefer logically redundant references, particularly when DE and LO can arise.</Paragraph> <Paragraph position="2"> While this approach had the advantage that subjects could compare different expressions (perhaps balancing ease of interpretation with ease of resolution), the method is limited in other respects. For example, meta-linguistic judgements are sometimes thought to be an unreliable predictor of people's linguistic behaviour (e.g., van Deemter 2004). Perhaps more seriously, the ex- null periment fails to tell us how difficult a given type of reference (for example, one of the DE type) would actually be for a reader. Therefore, in this paper we report on a second experiment investigating the effect of the presence or absence of logical redundancy on the performance of readers. We are primarily interested in understanding the search process, so resolution rather than interpretation.</Paragraph> <Section position="1" start_page="58" end_page="59" type="sub_section"> <SectionTitle> 5.1 Experiment design </SectionTitle> <Paragraph position="0"> Subjects: Forty-two computing science students participated in the experiment, as part of a scheduled practical.</Paragraph> <Paragraph position="1"> Procedure: A within-subjects design was used. Each subject was shown twenty on-line documents, in a random order. The entire document structure was always visible, and so was the content of the current document part. A screenshot of an example document providing this level of information is shown in Figure 3. Each document was initially opened in Part B of either Section 2 or 3, where a task was given of the form &quot;Let's talk about [topic]. Please click on [referring expression]&quot; . For instance &quot;Let's talk about elephants. Please click on picture 5 in part A&quot;. Subjects could navigate through the document by clicking on the names of the parts (e.g. Part A as visible under Section 3). As soon as the subject had correctly clicked on the picture indicated, the next document was presented. Subjects were reminded throughout the document about the task to be accomplished, and the location at which the task was given. All navigation actions were recorded.</Paragraph> <Paragraph position="2"> At the start of the experiment, subjects were instructed to try to accomplish the task with a minimal number of navigation actions.</Paragraph> <Paragraph position="3"> We assume that readers do not have complete knowledge of the domain. So, they do not know which pictures are present in each part of each section. If readers had complete knowledge, then a minimal description would suffice. We do, however, not assume readers to be completely ignorant either3: we allowed them to see the current document part (where the question is asked) and its content, as well as the hierarchical structure (sections and parts) of the remainder of the document as in Figure 3 above.</Paragraph> <Paragraph position="4"> Research Questions: We want to test whether longer descriptions indeed help resolution, particularly in so-called problematic situations. Table 1 shows the types of situation (potential DE, LO, and non-problematic)4, reader and referent location, and descriptions used.</Paragraph> <Paragraph position="5"> Hypothesis 1: In a problematic (DE/LO) situation, the number of navigation actions required for a long (FI/SL) description is smaller than that required for a short (MD) description.</Paragraph> <Paragraph position="6"> We will use the DE and LO situations in Table 1 to test this hypothesis, comparing for each situation the number of navigation actions of the short, that is, minimally distinguishing (MD) and long (FI/SL) expressions. In Paraboni and van Deemter (2002) there was an additional hypothesis about non-problematic situations, stating that MD descriptions would be preferred to long descriptions in non-problematic situations. We cannot use this hypothesis in this experiment, as it is highly unlikely that a shorter description will lead to fewer navigation actions. (Note that the experiment in Paraboni and van Deemter (2002) looked at the combination of interpretation and resolution, while we are now focussing on resolution only).</Paragraph> <Paragraph position="7"> Instead, we will look at gain: the number of navigation actions required for a short description minus the number required for a long description.</Paragraph> <Paragraph position="8"> number as the referent, but not in a part with the same name as the part in which the referent is. In LO situations, there is no other picture with the same number as the referent, and the reader location contains pictures. In non-problematic situations, there is another picture with the same number as the referent, but not in a part with the same name as the part in which the referent is.</Paragraph> <Paragraph position="9"> Sit. Type Reader Loc. Referent Loc. Short (MD) Long (FI/SL) Long (other)</Paragraph> </Section> </Section> <Section position="8" start_page="59" end_page="60" type="metho"> <SectionTitle> 1 DE Part B Sec 3 Part A Sec 2 Pic 3 in Part A Pic 3 in Part A Sec 2 2 DE Part B Sec 2 Part C Sec 3 Pic 4 in Part C Pic 4 in Part C Sec 3 3 LO Part B Sec 3 Part A Sec 3 Pic 5 Pic 5 in Part A Pic 5 in Part A Sec 3 4 LO Part B Sec 2 Part C Sec 2 Pic 4 Pic 4 in Part C Pic 4 in Part C Sec 2 5 LO Part B Sec 3 Part A Sec 4 Pic 5 Pic 5 in Part A Sec 4 Pic 5 in Part A 6 LO Part B Sec 2 Part C Sec 1 Pic 4 Pic 4 in Part C Sec 1 Pic 4 in Part C 7 NONE Part B Sec 2 Part A Sec 2 Pic 3 in Part A Pic 3 in Part A Sec 2 8 NONE Part B Sec 3 Part C Sec 3 Pic 4 in Part C Pic 4 in Part C Sec 3 </SectionTitle> <Paragraph position="0"> description over an MD description will be larger in a problematic situation than in a non-problematic situation.</Paragraph> <Paragraph position="1"> We will use the DE and non-problematic situations in Table 1 to test this hypothesis, comparing the gain of situation 1 with that of situation 7, and the gain of situation 2 with that of situation 8.</Paragraph> <Paragraph position="2"> Longer descriptions may always lead to fewer navigation actions, and it can be expected that complete descriptions of the form picture x in Part y of Section z will outperform shorter descriptions in any situation. So, from a resolution point of view, an algorithm that would always give a complete description may produce better results than the algorithms we proposed, which do not always give complete descriptions (e.g. situation 3 in Table 1).</Paragraph> <Paragraph position="3"> The aim of our algorithms is to make the descriptions complete enough to prevent DE and LO in resolution, but not overly redundant as this may affect interpretation. We would like to show that the decisions taken by FI and SL are sensible, i.e.</Paragraph> <Paragraph position="4"> that they produce descriptions that are neither too short nor too long. Therefore: S1: We want to consider situations in which FI and SL have produced an incomplete description, and investigate how much gain could have been made by using a complete description in those cases. We would like this gain to be negligible. We will use situations 3 and 4 for this, calculating the gain of the long, complete descriptions (namely, long (other) in Table 1) over the short, incomplete descriptions generated by our algorithms (long (FI/SL) in Table 1).</Paragraph> <Paragraph position="5"> S2: We want to consider situations in which FI and SL have produced a complete description, and investigate how much gain has been made by using this compared to a less complete description that is still more complete than MD. We would like this gain to be large. We will use situations 5 and 6 for this, calculating the gain of the long complete descriptions generated by our algorithms (long (FI/SL) in Table 1) over the less complete descriptions (long (other) in Table 1).</Paragraph> <Paragraph position="6"> Introducing separate hypotheses for cases S1 and S2 poses the problem of defining when a gain is 'negligible' and when a gain is 'large'. Instead, we will compare the gain achieved in S1 with the gain achieved in S2, expecting that the gain in S2 (which we believe to be large) will be larger than the gain in S1 (which we believe to be negligible).</Paragraph> <Paragraph position="7"> Hypothesis 3: The gain of a complete description over a less complete one will be larger for situations in which FI and SL generated the complete one, than for situations in which they generated the less complete one.</Paragraph> <Paragraph position="8"> Materials: Twenty on-line documents were produced, with the same document structure (sections 1 to 5 with parts A to C) and containing 10 pictures. Documents had a unique background colour, title and pictures appropriate for the title. The number of pictures in a section or part varied per document. All of this was done to prevent subjects relying on memory.</Paragraph> <Paragraph position="9"> Documents were constructed specifically for the experiment. Using real-world documents might have made the tasks more realistic, but would have posed a number of problems. Firstly, documents needed to be similar enough in structure to allow a fair comparison between longer and shorter descriptions. However, the structure should not allow subjects to learn where pictures are likely to be (for instance, in patient information leaflets most pictures tend to be at the beginning). Secondly, the content of documents should not help subjects find a picture: e.g., if we were using a real document on animals, subjects might expect a picture of a tiger to be near to a picture of a lion. So, we do not want subjects to use semantic information or their background knowledge of the domain. Thirdly, real documents might not have the right descriptions in them, so we would need to change their sentences by hand.</Paragraph> <Section position="1" start_page="60" end_page="60" type="sub_section"> <SectionTitle> 5.2 Results and discussion </SectionTitle> <Paragraph position="0"> Forty subjects completed the experiment. Table 2 shows descriptive statistics for the number of clicks subjects made to complete each task. To analyse the results with respect to Hypothesis 1, we used a General Linear Model (GLM) with repeated measures. We used two repeated factors: Situation (sit. 1 to 6) and Description Length (short and long(FI/SL) ). We found a highly significant effect of Description Length on the number of clicks used to complete the task (p<.001).</Paragraph> <Paragraph position="1"> In all potential problematic situations the number of clicks is smaller for the long than for the short description. This confirms Hypothesis 1.</Paragraph> <Paragraph position="2"> Table 3 shows descriptive statistics for the gain as used for Hypothesis 2. We again used a GLM with repeated measures, using two repeated factors: Descriptions Content (that of situations 1 and 7, and that of situations 2 and 8) and Situation Type (potential DE and non-problematic). We found a highly significant effect of Situation Type on the gain (p<.001). In the non-problematic situations the gain is smaller than in the potential DE situations. This confirms Hypothesis 2.</Paragraph> <Paragraph position="3"> Table 4 shows descriptive statistics for the gain as used for Hypothesis 3. We again used a GLM Sit. FI Decision Mean STDEV</Paragraph> </Section> </Section> <Section position="9" start_page="60" end_page="60" type="metho"> <SectionTitle> 3 NOT COMPLETE 0.70 1.40 5 COMPLETE 4.50 6.67 4 NOT COMPLETE 0.23 2.51 6 COMPLETE 2.83 2.16 </SectionTitle> <Paragraph position="0"> with repeated measures, using two repeated factors: Descriptions Content (that of situations 3 and 5, and that of 4 and 6) and FI Decision (with 2 levels: complete and not complete). We found a highly significant effect of FI Decision on the gain (p<.001). The gain is smaller for situations were our algorithm decided to use an incomplete description than in situations were it chose a complete description. This confirms Hypothesis 3.</Paragraph> </Section> class="xml-element"></Paper>