File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1034_metho.xml
Size: 14,692 bytes
Last Modified: 2025-10-06 14:08:42
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1034"> <Title>Resolving Individual and Abstract Anaphora in Texts and Dialogues</Title> <Section position="3" start_page="1" end_page="6" type="metho"> <SectionTitle> 2 Background for DAR </SectionTitle> <Paragraph position="0"> In most applied approaches resolving pronominal anaphors mainly consists in the following steps: 1: determining the anaphoric antecedent domain; 2: choosing the most prominent or salient antecedent among possible candidates.</Paragraph> <Paragraph position="1"> Thus determining the degree of salience of discourse elements, henceforth DEs, is essential to anaphor resolution. Although there is not always an identity relation between linguistic antecedents and referents, we also follow this strategy, well aware that it is particularly problematic for resolving APAs (se especially (Webber, 1991)). Nearly all salience-based algorithms identify high degree of salience with high degree of givenness of DEs. In fact, although different criteria are used for ranking DEs, such as linear order, hierarchy of grammatical roles, information structure, Prince-s Familiarity Scale (Prince, 1981), all algorithms assign the highest prominence to the DEs which are most topical, known, bound, familiar and thus given,e.g.</Paragraph> <Paragraph position="2"> (Grosz et al., 1995; Brennan et al., 1987; Strube, 1998). Analysing the Danish data we found a restricted number of cases where high degree of salience did not correspond to high degree of topical ity, as it is the case in example (1). (1) A: hvem...hvem arbejdede [din mor]</Paragraph> <Paragraph position="4"> (with whom... whom did [your mother]</Paragraph> <Paragraph position="6"> var enke ... havde tre sonner [bysoc]</Paragraph> <Paragraph position="8"> was a widow... had three sons) In (1) the antecedent of the second occurrence of the pronoun hun (she) is the object vores nabo (our neighbour) which provides the information requested in the preceding question. This nominal is assigned lower prominence than the subject pronoun hun (she)inmostsaliencemodels.</Paragraph> <Paragraph position="9"> The only salience model which departs from the givenness assumption has been proposed by HajiVcov'a et al. (1990). HajiVcov'a et al., in fact, assign the highest degree of salience to DEs in the focal part of an utterance in information structure terms (Sgall et al., 1986). These entities often represent new information. HajiVcov'a et al.-s approach is original and can account for the data in (1), but it is problematic from an applied point of view. In the first place it is difficult to determine the information structure of all utterances. Secondly, focal candidate antecedents are ranked highest in HajiVcov'a et al.-s model, but they still compete with given candidate antecedents in their system. Finally the data does not confirm that all entities in the focal part of an utterance have the highest degree of accessibility.</Paragraph> <Paragraph position="10"> We agree with HajiVcov'a-s insight, but in order to operationalise the influence of focality in a reliable way propose the following. Accessibility by default is connected with givenness as assumed in most algorithms. However, when speakers explicitly change the degree of accessibility of entities in discourse by marking them as salient with information structure related devices, these entities get the highest degree of salience and are proposed as the preferred antecedents of anaphors. In cases of explicit focus marking the shift of focus of attention is as coherent as continuing speaking about the same entities, because it is preannounced to the addressee. On the basis of the data we propose a list of identifiable constructions in which explicit focus marking occurs.</Paragraph> <Paragraph position="11"> Examples from the list are the following: a: Entities referred to by NPs which in Danish are focally marked structurally (clefts, existential and topicalised constructions).</Paragraph> <Paragraph position="12"> b: Entities referred to by NPs that follow focusing adverbs.</Paragraph> <Paragraph position="13"> c: Entities focally marked by the prosody (if this information is available) and/or entities providing information requested in questions, as in (1).</Paragraph> <Paragraph position="14"> Giveness subsumes here concepts such as topicality and familiarity.</Paragraph> <Paragraph position="15"> Many of these constructions are also studied in the Information Structure and in some anaphora resolution literature, e.g. (Sidner, 1983).</Paragraph> <Paragraph position="16"> Givenness preference in Danish can be modelled by the hierarchy of verbal complements. In addition to salience preferences we found that parallelism can account for numerous uses of Danish anaphors.</Paragraph> <Paragraph position="17"> Inspired by the work of (Kameyama, 1996) we have defined a preference interaction model to be used in resolution. Our model is given in figure 1.</Paragraph> <Paragraph position="18"> tion model states that givenness preferences are overridden by focality preference, when in conflict, and that they all are overridden by parallelism. null dar also accounts for reference differences between Danish demonstrative and personal pronouns. Weak (cliticised and unstressed) pronouns usually refer to the most salient entity in the utterance. Strong (stressed and demonstrative) pronouns emphasise or put in contrast the entities they refer to and/or indicate that their antecedents are not the most expected ones.</Paragraph> <Paragraph position="19"> Demonstratives preferentially refer to abstract entities, while personal pronouns preferentially refer to individual entities in ambiguous contexts. All these differences are also accounted for in the literature on anaphora. However we also found more language-specific peculiarities in our data. Two examples of these pecularities are the following. The Danish demonstratives denne/dette/disse (this common gender/this neuter gender/these) never corefer with a subject antecedent intrasententially. In the few cases where they have a subject antecedent in a preceding clause, there are no other antecedent competitors. The abstract anaphor dette, furthermore, is often used to refer to the last mentioned situation in the previous sentence, often expressed in a subordinated clause, and not to the whole sentence or to an abstract anaphor in the preceding sentence. The partic- null According to parallelism in adjacent utterances with parallel grammatical complements, the preferred antecedent of an anaphor in the second utterance is the linguistic expression in the first utterance with the same grammatical function.</Paragraph> <Paragraph position="20"> Commonsense preferences which override all the other preferences are not implemented.</Paragraph> <Paragraph position="21"> The most frequent Danish third person singular gender pronoun det can both be a personal pronoun (corresponding to it) and a demonstrative pronoun (corresponding to this/that). In the latter case it is always stressed.</Paragraph> <Paragraph position="22"> ular phenomena are also accounted for in dar. Approx. half of the APA occurrences in our dialogues refer to entities evoked by larger discourse segments (more turn takings). Thus we follow Eckert and Strube-s approach of marking the structure of dialogues and searching for APA antecedents in the right frontier of the discourse tree (Webber, 1991). dar presupposes different discourse structures for texts and dialogues. dar follows the es00 and phora strategy of discriminating between IPAsandAPAs by rules looking at the semantic constraints on the predication contexts in which the anaphors occur. dar relies on more discriminating rules than es00, which were defined on the basis of large amounts of data and of the encodings of a large computational lexicon.</Paragraph> <Paragraph position="23"> dar uses language-specific rules to account for Danish APAs. These occur in much more contexts than in English where elliptical constructions or other anaphors such as too and so are used. Examples of Danish-specific uses of abstract anaphors are given in (2)-(3).</Paragraph> <Paragraph position="24"> (2) Han var sulten. Det var jeg ikke. [pid] (lit. He was hungry. That was I not) (My friends were hungry. I wasn-t.) (3) Han kunne svomme, men det kunne hun ikke (lit. He could swim, but it could she not) (He could swim, but she couldn-t) A language-specific rule recognising APAsis the following: constructions with modal verbs andanobject,suchasx skal man (lit. x shall one) (one shall), x vil man (lit. x will one) (one will).</Paragraph> </Section> <Section position="4" start_page="6" end_page="7" type="metho"> <SectionTitle> 3 The DAR Algorithm </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="6" end_page="7" type="sub_section"> <SectionTitle> 3.1 Search Space and DE lists </SectionTitle> <Paragraph position="0"> dar presupposes the discourse structure described by Grosz and Sidner (1986). The minimal discourse unit is the utterance U. Paragraphs correspond to discourse segments in texts. In dialogues discourse segments were manually marked (se section 4). The dialogues were structured with Synchronising Units (SU) according to the definitions in ES00.</Paragraph> <Paragraph position="1"> The immediate antecedent search space of a pronoun x in utterance U n is the previous utterance, U</Paragraph> <Paragraph position="3"> in dialogues the immediate search space for x is</Paragraph> <Paragraph position="5"> . dar assumes two antecedent domains depending on whether the pronoun has or has not been recognised as an IPA. The antecedent domain for IPAsisfirstU n[?]1 and then the preceding utterances in the right frontier of the discourse tree searched for in recency order. The antecedent domain for APAs or anaphors which</Paragraph> <Paragraph position="7"> dar operates on two lists of DEs, the Ilist and the Alist.TheIlist contains the NPs referred to in U n[?]1 ranked according to their degree of salience and enriched with information on gender, number, animacy and other simple semantic types necessary to implement selectional restrictions. In the Ilist information about the grammatical role of nominals is provided and strongly focally marked elements are indicated. The leftmost element in the Ilist is the most salient one. Givenness and focality preferences are accounted for in the Ilist, as illustrated in figure 2. Focally marked entities are put in front of the list while the remaining DEs are ordered according to verbal complement order. Inside verbal complements nominals are ordered according to their occurrence order as illustrated in the second row of the figure. The abstract entities which are referred to by an APA in U</Paragraph> <Paragraph position="9"> are encoded in the Alist. They are removed from the list after a new utterance (SU in dialogues) has been processed if they have not been mentioned in it. The context ranking for abstract entities is that proposed by Eckert and Strube (2000).</Paragraph> </Section> <Section position="2" start_page="7" end_page="7" type="sub_section"> <SectionTitle> 3.2 The Algorithm and Its Functions </SectionTitle> <Paragraph position="0"> dar consists of two different functions ResolveDet and ResolveIpa.Theformerisapplied if the actual pronoun x is third person singular neuter, while the latter is applied in all the remaining cases: if x is singular & neuter cating possible reference ambiguities resembles that proposed by Kameyama (1996). The main structure of the function ResolveDet is inspired by es00. ResolveDet tests the pronoun x using the IPA and APA discriminating rules discussed in section 2. ResolveDet is simplified in figure 4. ResolveIpa-neu is like ResolveIpa except that it returns if no NP antecedents are found in U</Paragraph> <Paragraph position="2"> The search space in es00 is the preceding utterance for all pronouns.</Paragraph> <Paragraph position="3"> distinguishes between types of pronoun. If x is weak, the preferred antecedent is searched for among the elements indicated in the context ranking, unless it is the object of the verb gore (do), modals, have (have) or the abstract subject in copula constructions. In these cases the pronoun is resolved to the VP of the element in the A-list or in the context ranking. If x is strong ResolveApa attempts to resolve or classify it as vague depending on the type of pronoun. This part of the algorithm is specific to Danish and accounts for the fact that different strong pronouns preferentially refer to different abstract entities in the data. Resolved APAs are inserted into the Alist. In case of failure ResolveApa returns so that ResolveIpa-neu can be applied. If both functions fail, the pronoun is classified as vague.</Paragraph> </Section> <Section position="3" start_page="7" end_page="7" type="sub_section"> <SectionTitle> 3.3 Some Examples </SectionTitle> <Paragraph position="0"> In the following we look at the resolution of example (1) from section 2. The simplified Ilists and Alists after each utterance has been processed are given in figure 5. (1) contains three SUs. U has been processed, contains one element, din mor (your mother). In U the personal pronoun hun (she) occurs, thus ResolveIpa is applied. It resolves hun to the compatible NP in the Ilist, din mor.AfterU has been processed the Ilist contains two elements in this order: the focal marked entity vores nabo (our neighbour) and the pronoun hun (= din mor). ResolveIpa resolves the occurrence of the pronoun hun (she) in U to the most salient candidate NP in the Ilist, vores nabo.Herefocal preference overrides pronominal chain preference. Example (4) contains the APA det. How do you manage it/this (neuter gender))? The simplified IlistsandAlistsafterthetwo utterances in (4) have been processed are presented in figure 6. After U cessed there are two common gender singular NPs in the Ilist, musemarkoren (the mouse cursor) and skaermen (the screen). In U the singular neuter gender pronoun det (it) occurs, thus ResolveDet is applied. The pronoun is neither IPA nor APA according to the discriminating rules. ResolveDet attempts to find an individual antecedent of the weak pronoun, applying the function ResolveIpa-neu. ResolveIpa-neu fails because the two DEsin the Ilist do not agree with the pronoun. Then the function ResolveApa resolves x looking at the context ranking. Being the Alist empty, U , is proposed as antecedent. The resolved APA is added to the Alist.</Paragraph> </Section> </Section> class="xml-element"></Paper>