File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-1312_intro.xml
Size: 4,469 bytes
Last Modified: 2025-10-06 14:06:27
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1312"> <Title>United Kingdom</Title> <Section position="3" start_page="0" end_page="82" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Annotated anaphoric links in language corpora play an important role in teaching and research.</Paragraph> <Paragraph position="1"> Research roles may include investigation into the distribution of the different types of anaphors, or into the location or distance of the antecedent, and also development of rules or heuristics for anaphora resolution and the testing of anaphora-related hypotheses/theories on the basis of numerous real-life examples.</Paragraph> <Paragraph position="2"> Annotation of referential links has not yet been able to benefit from the level of automation enjoyed by its lexical, syntactical and semantic &quot;counterparts&quot;. Part-of-speech tagging has shown remarkable accuracy (99.2% see \[Voutilaen 95\]), robust parsing in corpora has delivered very good results and even word sense tagging has reported a considerable improvement. However, &quot;referential tagging&quot; has not been fully explored (and developed) and probably this is due, no doubt, to the complexity of automatic anaphora resolution.</Paragraph> <Paragraph position="3"> One of the best known tools for anaphoric annotation is XANADU - an X-windows interactive editor written by Roger Garside, which offers the user an easy-to-navigate environment for manually marking pairs of anaphorsantecedents (\[Fligelstone 92\]). Manual annotation, however, imposes a considerable demand on human time and labour.</Paragraph> <Paragraph position="4"> In this paper we put forward the idea of incorporating a practical, knowledge-poor approach to anaphora resolution (\[Mitkov 97\]) within a larger architecture for rough automatic referential annotation of corpora. At this stage our proposal deals with pronominal anaphora only and &quot;rough annotation&quot; implies that a follow-up manual correction would be necessary. Nevertheless, we believe that this partial solution brings us somewhat closer to the automatic annotation of all types of anaphoric links.</Paragraph> <Paragraph position="5"> 2. Outline of our practical pronoun resolution approach With a view to avoiding complex syntactic, semantic and discourse analysis (vital for real-world applications), we have developed a practical approach to pronoun resolution (\[Mitkov 97\]) which does not parse and analyse the input in order to identify antecedents of anaphors. It makes use of only a part-of-speech tagger, plus simple noun phrase rules (sentence constituents are identified at the level of noun phrase at most) and operates on the basis of antecedent-tracking preferences (referred to hereafter as &quot;antecedent indicators&quot;).</Paragraph> <Section position="1" start_page="82" end_page="82" type="sub_section"> <SectionTitle> 2.1 Antecedent indicators </SectionTitle> <Paragraph position="0"> Our empirical study (restricted to computer and hi-fi technical manuals) enabled us to develop efficient preferences for antecedent tracking in this sublanguage/genre. (We studied more than 400 different documents which had been handannotated; referential links were marked by human experts). These antecedent indicators are described in detail in \[Mitkov 97\]; we shall outline here those which are most frequently used as a supplement to gender and number agree-</Paragraph> <Paragraph position="2"> NPs representing terms in the field are more likely to be the antecedent than NPs which are not terms (scores 1 if the NP is term and</Paragraph> <Paragraph position="4"> If the verb is a member of the Verb_set = {discuss, present, illustrate, summarise, examine, describe, define, show, check, develop, review, report, outline, consider, investigate, explore, assess, analyse, synthesise, study, survey, deal, cover}, then consider the first NP following it as the preferred antecedent (scores 1 and 0).</Paragraph> <Paragraph position="5"> (Empirical evidence suggests that because of their salience, the verbs listed above are particularly likely candidates) These two preferences can be illustrated by the example: This table shows a minimal configurationi; it i does not leave much room for additional applications or other software for which you may require additional swap space.</Paragraph> </Section> </Section> class="xml-element"></Paper>