File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0307_metho.xml

Size: 28,079 bytes

Last Modified: 2025-10-06 14:09:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0307">
  <Title>A Framework for Annotating Information Structure in Discourse</Title>
  <Section position="5" start_page="45" end_page="47" type="metho">
    <SectionTitle>
3 Information Status
</SectionTitle>
    <Paragraph position="0"> Information Status describes how available an entity is in the discourse. We de ne this in terms of the speaker's assumptions about the hearer's knowledge/beliefs, and we express it by the well-known old/new distinction.2</Paragraph>
    <Section position="1" start_page="45" end_page="46" type="sub_section">
      <SectionTitle>
3.1 Annotation Scheme
</SectionTitle>
      <Paragraph position="0"> Our annotation scheme for the discourse layer mainly builds on (Prince, 1992) and (Eckert and Strube, 2001), as well as on related work on annotation of anaphoric links (Passonneau, 1996; Hirschman and Chinchor, 1997; Davies et al., 1998; Poesio, 2000). Prince de nes old and new with respect to the discourse model as well as the hearer's point of view. Considering the interaction of both these aspects, we de ne as new an entity which has not been previously referred to and is yet unknown to the hearer, and as mediated an entity that is newly mentioned in the dialogue but that the hearer can infer from the prior context.3 This is mainly the case of generally known entities (such as the sun , or the Pope (Lcurrency1obner, 1985)), and bridging (Clark, 1975), where an entity is related to a previously introduced one. Whenever an entity is not new nor mediated is considered as old.</Paragraph>
      <Paragraph position="1"> Because ner-grained distinctions (e.g. (Prince, 1981; Lambrecht, 1994)) have proved hard to distinguish reliably in practice, we organise our scheme hierarchically: we use the three main classes described above as top level categories for which more speci c subtypes can assigned. This approach preserves a high-level, more reliable distinction while allowing a ner-grained classi cation that can be exploited for speci c tasks.</Paragraph>
      <Paragraph position="2"> Besides the main categories, we introduce two more classes. A category non-applicable is used for  wrongly extracted markables (such as course in of course ), for idiomatic occurrences, and expletive uses of it . Traces are automatically extracted as markables, but are left unannotated. In the rare event the annotators nd some fragments too dif cult to understand, a category not-understood can be assigned. Entities marked as non-applicable or not-understood are excluded from any further annotation. For all other markables, the annotators must choose between old, mediated, and new. For the rst two, subtypes can also be speci ed: subtype assignment is encouraged but not compulsory.</Paragraph>
      <Paragraph position="3"> New The category new is assigned to entities that have not yet been introduced in the dialogue and that the hearer cannot infer from previously mentioned entities. No subtypes are speci ed for this category.</Paragraph>
      <Paragraph position="4"> Mediated Mediated entities are inferrable from previously mentioned ones, or generally known to the hearer. We specify nine subtypes: general, bound, part, situation, event, set, poss, func value, aggregation.4 Generally known entities such as the moon or Italy are assigned a subtype general. Most proper nouns fall into this subclass, but the annotator could opt for a different tag, depending on the context. Also mediated are bound pronouns, such as them in (1), which are assigned a subtype bound.5 (1) [. . . ] it's hard to raise one child without them thinking they're the pivot point of the universe.</Paragraph>
      <Paragraph position="5"> A subtype poss is used to mark all kinds of intraphrasal possessive relations (pre- and postnominal). Four subtypes (part, situation, event, and set) are used to mark instances of bridging. The subtype part is used to mark part-whole relations for physical objects, both as intra- and inter-phrasal relations. (This category is to be preferred to poss whenever applicable.) The occurrence of the door in (2), for instance, is annotated as mediated/part.</Paragraph>
      <Paragraph position="6"> (2) When I come home in the evenings my dog greets me at the door.</Paragraph>
      <Paragraph position="7"> For similar relations that do not involve physical objects, i.e. if an entity is part of a situation set up by  pus. The markable in question is typed in boldface; antecedents or trigger entities, where present, are in italics. For the sake of space we do not provide examples for each category (see (Nissim, 2003)).</Paragraph>
      <Paragraph position="8">  a previously introduced entity, we use the subtype situation.6,as for the NP the speci cations in (3).</Paragraph>
      <Paragraph position="9"> (3) I guess I don't really have a problem with capital punishment. I'm not really sure what the exact speci cations are for Texas.</Paragraph>
      <Paragraph position="10"> The subtype event is applied whenever an entity is related to a previously mentioned verb phrase (VP).</Paragraph>
      <Paragraph position="11"> In (4), e.g., the bus is triggered by travelling around Yucatan.</Paragraph>
      <Paragraph position="12"> (4) We were travelling around Yucatan, and the bus was really full.</Paragraph>
      <Paragraph position="13"> Whenever an entity referred to is a subset of, a superset of, or a member of the same set as a previously mentioned entity, the subtype set is applied.</Paragraph>
      <Paragraph position="14"> Rarely, an entity refers to a value of a previously mentioned function, as zero and ten in (5). In such cases a subtype func-value is assigned.</Paragraph>
      <Paragraph position="15"> (5) I had kind of gotten used to centigrade temperature [. . . ] if it's between zero and ten it's cold. Lastly, a subtype aggregation is used to classify co-ordinated NPs. Two old or med entities, for instance do not give rise to an old coordinated NP, unless it has been previously introduced as such. A mediated/aggregation tag is assigned instead.</Paragraph>
      <Paragraph position="16"> Old An entity is old when it is not new nor mediated. This is usually the case if an entity is coreferential with an already introduced entity, if it is a generic pronoun, or if it is a personal pronoun referring to the dialogue participants. Six different subtypes are available for old entities: identity, event, general, generic, ident generic, relative. In (6), for instance, us would be marked as old because it corefers with we , and a subtype identity would also be assigned.</Paragraph>
      <Paragraph position="17">  (6) [. . . ] we camped in a tent, and uh there were two other couples with us.</Paragraph>
      <Paragraph position="18">  In addition, a coreference link is marked up between anaphor and antecedent, thus creating anaphoric chains (see also (Carletta et al., 2004)). The subtype event applies whenever the antecedent is a VP. In (7), it is old/event, as its antecedent is the VP educate three . As we do not extract VPs as markables, no link can be marked up.</Paragraph>
      <Paragraph position="19"> (7) I most certainly couldn't educate three. I don't know how my parents did it.</Paragraph>
      <Paragraph position="20"> 6This includes elements of the thematic grid of an already introduced entity. It subsumes Passonneau's (1996) class arg . Also classi ed as old are personal pronouns referring to the dialogue participants as well as generic pronouns. In the rst case, a subtype general is speci ed, whereas the subtype for the second is generic. An instance of old/generic is you in (8).</Paragraph>
      <Paragraph position="21"> (8) up here you got to wait until Aug- August until the water warms up.</Paragraph>
      <Paragraph position="22"> In a chain of generic references, the subtype ident generic is assigned, and a coreference link is marked up. Coreference is also marked up for relative pronouns: they receive a subtype relative and are linked back to their head.</Paragraph>
      <Paragraph position="23"> The guidelines contain a decision tree the annotators use to establish priority in case more than one class is appropriate for a given entity. For example, if a mediated/general entity is also old/identity the latter is to be preferred to the former. Similar precedence relations hold among subtypes.</Paragraph>
      <Paragraph position="24"> To provide more robust and reliable clues in annotating bridging types (e.g. for distinguishing between poss and part), we provided replacement tests and referred to relations encoded in knowledge bases such as WordNet (Fellbaum, 1998) (for part) and FrameNet (Baker et al., 1998) (for situation).</Paragraph>
    </Section>
    <Section position="2" start_page="46" end_page="47" type="sub_section">
      <SectionTitle>
3.2 Validation of the Scheme
</SectionTitle>
      <Paragraph position="0"> Three Switchboard dialogues (for a total of 1738 markables) were marked up by two different annotators for assessing the validity of the scheme. We evaluated annotation reliability by using the Kappa statistic (Carletta, 1996). Good quality annotation of discourse phenomena normally yields a kappa (a4 ) of about .80. We assessed the validity of the scheme on the four-way classi cation into the three main categories (old, mediated and new) and the non-applicable category. We also evaluated the annotation including the subtypes. All cases where at least one annotator assigned a not-understood tag were excluded from the agreement evaluation (14 markables). Also excluded were all traces (222 markables), which the annotators left unmarked. The total markables considered for evaluation over the three dialogues was therefore 1502.</Paragraph>
      <Paragraph position="1"> The annotation of the three dialogues yielded a4 a5a7a6a9a8a11a10a13a12 for the high-level categories, and a4 a5 a6a15a14a16a8a17a8 when including subtypes (a18a19a5a21a20a22a12a16a23a25a24 ;a26a27a5a28a24 ).  These results show that overall the annotation is reliable and that therefore the scheme has good reproducibility. When including subtypes agreement decreases, but backing-off to the high-level categories is always possible, thus showing the virtues of a hierarchically organised scheme. Reliability tests for single categories showed that mediated and new are more dif cult to apply than old, for which agreement was measured ata4a31a5a32a6a9a33a16a23a25a24 , although still quite reliable (a4 a5a34a6a9a8a16a23a17a23 and a4 a5a34a6a15a14a16a33a11a10 , respectively). Agreement for non-applicable wasa4a31a5a32a6a9a8a11a10a13a35 . The annotators found the decision tree very useful when having to choose between more than one applicable subtype, and we believe it has a signi cant impact on the reliability of the scheme.</Paragraph>
      <Paragraph position="2"> The scheme was then applied for the annotation of a total of 147 Switchboard dialogues. This amounts to 43358 sentences with 69004 annotated markables, 35299 of which are old, 23816 mediated and 9889 new (8127 were excluded as non-applicable, and 160 were not understood), and 16324 coreference links.</Paragraph>
      <Paragraph position="3"> In Section 6 we use this scheme to annotate the Pie-in-the-Sky text.</Paragraph>
    </Section>
    <Section position="3" start_page="47" end_page="47" type="sub_section">
      <SectionTitle>
3.3 Related Work
</SectionTitle>
      <Paragraph position="0"> To our knowledge, (Eckert and Strube, 2001) is the only other work that explicitly refers to IS annotation. They also use a Prince's (1992)-based old/med/new distinction for annotating Switchboard dialogues. However, their IS annotation is specifically designed for salience ranking of candidate antecedents for anaphora resolution, and not described in detail. They do not report gures on inter-annotator agreement so that a proper comparison with our experiment is not feasible. Among the schemes that deal with annotation of anaphoric NPs, our scheme is especially comparable with DRAMA (Passonneau, 1996) and MATE (Davies et al., 1998). Both schemes have a hierarchical structure. In DRAMA, types of inferrables can be speci ed, within a division into conceptual (pragmatically determined) vs. linguistic (based on argument structure) inference. No annotation experiment with inter-annotator agreement gures is however reported. MATE provides subtypes for bridging relations, but they were not applied in any annothe number of annotators. Unless otherwise speci ed, a29a31a36</Paragraph>
      <Paragraph position="2"> a44 scores reported in Section 3.</Paragraph>
      <Paragraph position="3"> tation excercise, so that reliability and distribution of categories are only based on the core scheme (true coreference). For a detailed comparison of our approach with related efforts on the annotation of anaphoric relations, see (Nissim et al., 2004).</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="47" end_page="49" type="metho">
    <SectionTitle>
4 Information Structure
</SectionTitle>
    <Paragraph position="0"> We have seen that information status describes how available an entity is in a discourse. Generally old entities are available, and new entities are not. In prosody we nd that newness is highly correlated with pitch accenting, and oldness with deaccenting (Cutler et al., 1997). However, this is only one aspect of information structure. We also need to describe how speakers signal the organisation and salience of elements in discourse. Building on the work of (Vallduv* &amp; Vilkuna, 1998), as developed by (Steedman, 2000), we de ne two notions, theme/rheme structure and background/kontrast.</Paragraph>
    <Paragraph position="1"> Theme/rheme structure guides how an element ts into the discourse model: if it relates back it is thematic; if it advances the discourse it is rhematic.</Paragraph>
    <Paragraph position="2"> Steedman claims that intonational phrases can mark information units (theme and rheme - though not all boundaries are realised and a unit may contain more than one phrase). The pitch contour associated with nuclear accents in themes is distinct from that in rhemes (which he identi es as L+H*LH% and H*LH% re ToBI (Beckman and Elam, 1997)), so that, where present, such boundaries disambiguate information structure. (See (9)).8  The second dimension, kontrast, relates to salience.9 We expect new entities to be salient and old entities not. Therefore, if an old element is salient, or a new one especially salient, an extra meaning is implied.</Paragraph>
    <Paragraph position="3"> 8Annotation is as in Section 3. Words in SMALL CAPS are accented, parentheses indicate intonation phrases, including boundary tones if present. See website to hear some examples from this section.</Paragraph>
    <Paragraph position="4"> 9We use kontrast to distinguish it from the everyday use of contrast and the sometimes con icting uses of contrast in the literature. Annotators, however, will not be given this term.  These are largely subsumed by kontrast, i.e. distinguishing an element from alternatives made available by the context (See (9)).</Paragraph>
    <Section position="1" start_page="48" end_page="49" type="sub_section">
      <SectionTitle>
4.1 Annotation Scheme
</SectionTitle>
      <Paragraph position="0"> As we have seen, in English, information structure is primarily conveyed by intonation. We therefore think it is vital for annotators to listen to the speech while annotating this structure.</Paragraph>
      <Paragraph position="1">  We have claimed that prosodic phrasing can divide utterances into information units. However, often theme material is entirely background, i.e., mutually known and without contrasting alternatives. Therefore, for both model theoretic and practical purposes, it is the same as background of the rheme. Accordingly, we work with a test for themehood, de ning the rheme as any prosodic phrase that is not identi able as a theme.</Paragraph>
      <Paragraph position="2"> Annotators will mark each prosodic phrase as a theme if it only contains information which links the utterance to the preceding context, i.e. setting up what they're saying in relation to what's been said before. In their opinion, even if this is not the tune the speaker used, it must sound appropriate if they say it with a highly marked tune, such as L+H* LH%. For example, in (10), the phrase where I lived links was a town called Newmarket to the statement the speaker lived in England (accenting not shown). It would be appropriate to utter it with an L+H* accent on Where and/or lived, , and a nal LH%. So it is a theme. The same accent on town and/or Newmarket sounds inappropriate, and it advances the discussion, so it is a rheme.</Paragraph>
      <Paragraph position="3"> (10) I lived over in England for four years (Where I lived) (Theme) (was a town called Newmarket) (Rheme)  Although there is a clear link between prosodic prominence and kontrast, there are a number of disagreements about how this works which this annotation effort seeks to resolve. Some, including (Steedman, 2000), have claimed that kontrast within theme and kontrast within rheme are marked by categorically distinct pitch accents. Another view is that kontrast, also called contrastive focus or topic, only applies to themes that are contrastive; the head of a rheme phrase always attracts a pitch accent, it is therefore redundant to call one part kontrastive. Further, some consider kontrast within a rheme phrase only occurs when there is a clear alternative set, i.e. the distinction between broad and narrow focus, as in (9) where daffodil contrasts with other bulbs the speaker might grow. Again, there is controversy on whether there is an intonational difference between broad and narrow focus (Calhoun, 2004a). If these distinctions are marked prosodically, it is disputed whether this is with different pitch accents (Steedman), or by the relative height of different accents in a phrase (Rump and Collier, 1996; Calhoun, 2004b).</Paragraph>
      <Paragraph position="4"> Rather than using the abstract notion of kontrast directly, annotators will identify discourse scenarios which commonly invoke kontrast (drawing on functions of emphatic accents from (Brenier et al., 2005)).10 This addresses the disagreements above, while making our annotation more constrained and robust. In each case, using the full discourse context including the speech, annotators mark each content word (noun, verb, adjective, adverb and demonstrative pronoun) for the rst category that applies. If none apply, they mark it as background.</Paragraph>
      <Paragraph position="5"> correction The speaker's intent is to correct or clarify another just used by them or the other speaker. In (11), e.g., the speaker wishes to clarify whether her interlocutor really meant hyacinths .</Paragraph>
      <Paragraph position="6"> (11) (now are you sure they're HYACINTHS) (because that is a BULB) contrastive The speaker intends to contrast the word with a previous one which was (a) a current topic; (b) semantically related to the contrastive word, such that they belong to a natural set. In (12), B contrasts recycling in her town San Antonio , with A's town Garland , from the set places where the speakers live.</Paragraph>
      <Paragraph position="7">  (12) (A) I live in Garland, and we're just beginning to build a real big recycling center...</Paragraph>
      <Paragraph position="8"> (B) (YEAH there's been) (NO emphasis on recycling at ALL) (in San ANTONIO) 10Emphasis can occur for two major reasons, both identi ed by Brenier: emphasis of a particular word or phrase, i.e. kon null trast, or emphasis over a larger span of speech, conveying affective connotations such as excitement, which is not included here. (Ladd, 1996).</Paragraph>
      <Paragraph position="9">  subset The speaker highlights one member of a more general set that has been mentioned and is a  current topic. In (13), the speaker introduces three day cares , and then gives a fact about each.</Paragraph>
      <Paragraph position="10"> (13) (THIS woman owns THREE day cares) (TWO in Lewisville) (and ONE in Irving) (and she had to open the SECOND one up) (because her WAIT-ING list was) (a YEAR long)  adverbial The speaker uses a focus-sensitive adverb, i.e. only, even, always or especially to highlight that word, and not another in the natural set. The adverb and/or the word can be marked. In (14), B didn't even like the previews of 'The Hard Way', let alone the movie.</Paragraph>
      <Paragraph position="11">  (14) (A) I like Michael J Fox, though I thought he was crummy in 'The Hard Way'.</Paragraph>
      <Paragraph position="12"> (B) (I didn't even like) (the PREVIEWS ) answer The word (or its syntactic phrase, e.g. an NP) and no other, lls to an open proposition set up in the context. It must make sense if they had only said that word or phrase. In (15), A sets up the blooms she can't identify, and B answers lily .</Paragraph>
      <Paragraph position="13"> (15) (A) We have these blooms, I'm not sure what they are but they come in all different colours yellow, purple, white...</Paragraph>
      <Paragraph position="14"> (B) (I BET you) (that that's a LILY)  Again, in Section 6 we apply the scheme to the Pie-in-the-Sky text.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="49" end_page="49" type="metho">
    <SectionTitle>
4.2 Related Work
</SectionTitle>
    <Paragraph position="0"> Annotator agreement for pitch accents and prosodic boundaries, re ToBI, is about 80% and 90% respectively (Pitrelli et al., 1994). Automatic performance, using acoustic and textual features, is now above 85% accuracy (Shriberg et al., 2000). However, this does not distinguish prosodic events which occur for structural or rhythmical reasons from those which mark information structure (Ladd, 1996). (Heldner et al., 1999) try to predict focal accents. They dene this minimally as the most prominent in a three-word phrase. (Hirschberg, 1993) got 80-98% accuracy using only text-based features. However, her de nition of contrast was not as thorough as ours.</Paragraph>
    <Paragraph position="1"> (Hedberg and Sosa, 2001) looked at marking of rati ed, unrati ed (old and new) and contrastive topics and foci (theme and rheme) with ToBI pitch accents.</Paragraph>
    <Paragraph position="2"> (Baumann et al., 2004) annotated a simpler information structure and prosodic events in a small German corpus.</Paragraph>
  </Section>
  <Section position="8" start_page="49" end_page="50" type="metho">
    <SectionTitle>
5 Information Structure and Prosodic
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="49" end_page="50" type="sub_section">
      <SectionTitle>
Structure
</SectionTitle>
      <Paragraph position="0"> Much previous work, not corpus-based, draws a direct correspondence between information structure, prosodic phrasing and pitch accent type. However in real speech there are many non-semantic in uences on prosody, including phrase length, speaking rate and rhythm. Information structure is rather a strong constraint on the realisation of prosodic structure (Calhoun, 2004a). Contrary to the assumption of ToBI, this structure is metrical, highly structured and linguistically relevant both within and across prosodic phrases (Ladd, 1996; Truckenbrodt, 2002).</Paragraph>
      <Paragraph position="1"> One of our main aims is to test how such evidence can be reconciled with theories presented earlier about the relationship between information structure and prosody. Local prominence levels have been shown to aid in the disambiguation of focal adverbs, anaphoric links, and global discourse structures marked as elaboration, continuation, and contrast (Dogil et al., 1997). Global measures of prominence level have been linked to topic structure, corrections, and turn-taking cues (Ayers, 1994). (Brenier et al., 2005) found that emphatic accents realised special discourse functions such as assessment, clari cation, contrast, negation and protest in child-directed speech. Most of these functions can be seen as conversational implicatures of kontrast, i.e. if an element is unexpectedly highlighted, this implies an added meaning. Brenier found that while pitch accents can be detected using both acoustic and textual cues; textual features are not useful in detecting emphatic pitch accents, showing there is added meaning not available from the text.</Paragraph>
      <Paragraph position="2"> As noted in Section (4.2), inter-annotator agreement for the identi cation of prosodic phrase boundaries with ToBI is reasonably good. We will therefore label ToBI break indices 3 and 4 (con ated) (Beckman and Elam, 1997). Annotators will also mark the perceived level of prosodic prominence on each word using a de ned scale. We are currently running a pilot experiment to identify a reasonable number of gradations of prosodic prominence, from completely unstressed and/or reduced to highly emphatic, to use for the nal annotation.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="50" end_page="50" type="metho">
    <SectionTitle>
6 Pie-in-the-Sky annotation
</SectionTitle>
    <Paragraph position="0"> Pie in the Sky is a joint effort to annotate two sentences with as much semantic/pragmatic information as possible (see http://nlp.cs.nyu.</Paragraph>
    <Paragraph position="1"> edu/meyers/pie-in-the-sky.html). Information structure is one of the desired annotation layers.</Paragraph>
    <Paragraph position="2"> And, as standards are not yet established, our proposal contributes to de ning annotation guidelines for this structure. Figure 1 report the Pie-in-the-sky sentences enriched with our annotation. The context prior to these sentences is as follows: a 12-year-old boy reports seeing a man launch a rubber boat from a car parked at the harbor. fbi of cials nd what they believe may be explosives in the car. yemeni police trace the car to a nearby house. the fbi nds traces of explosives on clothes found neighbors say they saw two men who they describe as arab-looking living there for several weeks. police also nd a second house where authorities believe two others may have assembled the bomb, possibly doing some welding. passports found in one of the houses identify the men as from a privilege convenience province noted for lawless tribes. but the documents turn out to be fakes. meantime, analysts at the fbi crime lab try to discover what the bomb was made from. no conclusions yet, u.s. of cials say. but a working theory, plastic explosive.</Paragraph>
    <Paragraph position="3"> We identi ed 14 NPs markable for information status (see Figure 1).11 Most annotations were straightforward. Some comments though: Yemen is annotated as med/general, although it could also be med/sit as Yemeni was previously mentioned. Our decision tree was used for such cases. The explosive material is med/set not old/identity since it refers to the kind of explosive used rather than to a speci c entity previously mentioned.</Paragraph>
    <Paragraph position="4"> In the absence of any prosodic annotation in the transcript, these sentences are slightly ambiguous as to information structure. The most likely interpretation is given in Figure 1.12 For example, Yemen's President contrasts with US of cials , 11Square brackets are used to mark annotation boundaries.</Paragraph>
    <Paragraph position="5"> 12Kontrast is marked with the relevant category, unmarked words are background.</Paragraph>
    <Paragraph position="6"> in the set of people talking about what the bomb is made of. Since both words are contrastive, either or both could have L+H* accents, whereas say could not. The inclusion of the latter in the theme is consistent with the possibility of a rising boundary LH% after it. The FBI has told him is thematic because it links Yemen's president 's opinion to the previous discourse. It also would sound appropriate with an L+H*LH% tune. As can be seen, although theme/rheme and prosodic phrase boundaries align, in both cases the VP is split between information/intonation phrases. The independence of information structure and intonation structure from traditional surface structure is a major reason behind our use of 'stand-off' markup.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML