XML Viewer - w97-1309

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/w97-1309_abstr.xml
Size: 17,720 bytes
Last Modified: 2025-10-06 13:49:09
<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1309">
  <Title>Towards Reliable Partial Anaphora Resolution</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper assumes that currently, anaphora resolution at a desired level of reliability has to remain partial. It presents the thesis that multiple small (&amp;quot;expert&amp;quot;) procedures of known reliability that are conceived for partial analysis have to be developed and combined in order to increase coverage. These resolution experts will be specific to style, domain, task, etc. The paper describes corpus analysis that suggests such experts and their potential ordering. A quick and partial implementation of the ideas is evaluated on Wall Street Journal articles.</Paragraph>
    <Paragraph position="1"> Introduction Totally correct anaphora resolution requires full natural language understanding, since anaphoric relations could be hidden in the context. At present, only partial natural language understanding is possible. This paper claims that one way to increase the reliability (or at least in assessing the reliability) of anaphora resolution lies in acknowledging and making use of this limitation.</Paragraph>
    <Paragraph position="2"> Strategies of anaphora resolution depend on the genre and style of text under consideration, as the different style manuals for major newspapers show. Since many practical applications are limited to a certain genre, it is legitimate to optimize results by studying peculiarities of the genre.</Paragraph>
    <Paragraph position="3"> We focus our attention on the Wall Street Journal corpus available on CD-ROM from the Association for Computational Linguistics. Our main interest at the outset was in assessing the lexical complexity of NP coreference to guide us in our development of a lexicon. We reported initial corpus analysis results that show the relative frequency of semantic relations that hold between elements in eoreference chains \[Bergler and Knoll, 1996\]. Analyzing the reference chains 1 of *This work is funded in part by the Natural Sciences and Engineering Research Council of Canada and Fonds pour la formation de chercheurs et l'aide k la recherche.</Paragraph>
    <Paragraph position="4"> 1 Reference chains in this study contain all (partial) noun phrases that corefer in a text. They are thus different from \[Morris and Hirst, 1991\], who do not limit their reference chains to NPs. 62 79 articles (28,798 words) from the Wall Street Journal we found that 35% of the subsequent references are actually equal to the first reference of that entity, 23% are close variations of the first reference (i.e. retain at least the same headword). Pronouns and appositions account for 22% and systematic lexical relation (synonymy and hyponymy) for 7%. We consider the remaining 13% to be tough cases that might require full syntactic, lexical, and semantic processing. The other 87% we expect can be addressed with a subset of these tools.</Paragraph>
    <Paragraph position="5"> This paper presents further results of this corpus analysis which lead to some resolution strategies and presents an experiment in implementing some of these aspects in a knowledge poor system.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Corpus Analysis Results
</SectionTitle>
      <Paragraph position="0"> The corpus study of 79 articles from the Wall Street Journal was performed manually by a single analyst, thus the results are as consistent as possible for a manual analysis. The analyst separated all NPs from the text, when appropriate separating referring sub-NPs from larger NPs, and for each NP placed it in a chain that contained a coreferent when possible or started a new chain. Chains were additionally annotated for a few features of interest, such as length of the article (short or long), the textual designator (see below), and whether the chain was part of the topic of the article (see next Section.) These annotations were ultimately left to the judgement of the analyst within a strict set of rules.</Paragraph>
      <Paragraph position="1"> Particularly encouraging are the first two lines in Figure 1. It turns out that over a third of the coreferring NPs are identical and can therefore be recovered reliably and correctly without linguistic tools. Almost a quarter of the coreferring NPs are very close to the first NP in the reference chain, that is they share at least the headword, if not a larger substring with the first reference. Thus in theory almost 60% of coreferring NPs should be identifiable with very simple techniques once the NPs have been identified.</Paragraph>
      <Paragraph position="2"> To put things into perspective, let us reconsider the numbers in Figure 1 with some additional information.</Paragraph>
      <Paragraph position="3"> Semantic relation Total Equal to first reference 1424 Close to first reference 955  The 4076 NPs analyzed there constitute roughly half of the NPs counted in total in that collection of texts, namely 8,027. The 3,951 NPs that are not analyzed in Figure 1 are NPs that do not corefer with any other NP in the text. These singular occurrences account for 49% of all NPs. One obvious question is: can singular occurrences of entities be singled out? This is an open question. We address the easier problem of: how can we determine NPs which are likely to corefer and which are most important to the text overall? Topicality The same study showed that NP chains considered to be in the topic of an article usually require anaphora resolution and are lexically more complex than non-topical reference chains. For this study we defined a topic to be one of the NPs that occur in the headline or the first sentence 2 (see \[Lundquist, 1989\] for a motivation of this heuristic.) A text can have no more than 4 topics (this number was chosen intuitively.) The analyst decided how many topics there were in each article according to her understanding of it. The text in Figure 3 was assigned a single topical chain containing NPs 1, 2.1, 10, and 11. There are 185 topical chains in 79 articles, averaging 2.4 topics per article. 17% of the topical reference chains are singular occurrences, i.e. NPs that do not corefer. 3 That establishes that the topic of an article is usually referred to more than once. Intuitively we assume that the topic of an artice is more important to resolve than non-topical NPs. One partial strategy, consequently, is to establish potentially topical NPs in the first n sentences of a newspaper article and to resolve coreference only to these NPs. This strategy has the advantage of reducing the search space considerably and of focusing on important (topical) chains.</Paragraph>
      <Paragraph position="4"> 2The heuristic to find potential topics of an article is easy to implement: consider all NPs in the headline and the first sentence.</Paragraph>
      <Paragraph position="5"> 3There are two possible explanations why this number is relatively high: the Wall Street Journal contains several very short segments of very few sentences, thus a topic that is mentioned only once is a possibility. Also, headlines often use a summarizing term that never corefers with another NP, but rather corefers with the article as a whole. 63</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Textual Designators
</SectionTitle>
      <Paragraph position="0"> We feel that NP resolution in newspaper articles is straightforward (by design) for human readers because of recurrent terms that designate an entity. To assess this intuition quantitatively, \[Bergler and Knoll, 1996\] report the sum of distinct words within a chain over all non-singular chains. This count eliminates the recurrent words, whether empty (the) or descriptive (company). 4 The results show the significance of this phenomenon which we call textual designator. A textual designator in this study is the first non-pronominal reference to an entity. Consider the text in Figure 3.</Paragraph>
      <Paragraph position="1"> One chain consists of two identical NPs, its Houston work force (NPs 3 and 15.) Other textual designators of that text are NP1, NP6, and NP16. Counting the number of different words in each chain allows us to assess the lexical diversity within a chain and the contribution of the textual designator to that diversity. The results are summarized in Figure 2.</Paragraph>
      <Paragraph position="2">  We find 1,632 different words in topical chains and 19,449 different words in non-topical chains. When we consider only the chains that actually involve coreference, the sums reduce to 1,544 different words for topical chains and 5,672 different words for non-topical chains. Removing the words that are part of the textual designator, we observe a drastic reduction to 981 different words for topical chains and 2,808 different words for non-topical chains.</Paragraph>
      <Paragraph position="3"> The 185 topical chains average about 9 different words per chain while non-topical chains average 4.</Paragraph>
      <Paragraph position="4"> Removing the words of the textual designator reduces the different word count by 40% for topical and 86% for non-topical chains when counting all chains. If we count only chains that involve coreference the reductions are 36% for topical chains and 50% for non-topical chains. The number of different words excluding the words of the textual designator on average are 5 for topical chains and .6 for non-topical chains. 5 These numbers suggest that the textual designator defined as the first reference to an entity leads to a strong resolution heuristic, one that is in fact stronger for non-topical chains. The numbers also show how surprisingly small the lexical diversity of words outside  a textual designator are. The sum for both types of chains is 3,789. The textual designators are made up of 563 different words for topical non-singular chains and 2864 different words for non-topical, non-singular chains. The total number of different words for singular chains is 13,865. Thus the sum total of different words for first references is 17,292, a surprisingly high number considering that the overall corpus has only 28,798 words, which includes all duplicates.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Resolution Strategies
</SectionTitle>
      <Paragraph position="0"> While we are still mining the results of our study for more data, some strategies for resolution in general are already emerging.</Paragraph>
      <Paragraph position="1"> We believe that there is strong evidence for an approach to anaphora resolution in multiple passes, where earlier passes implement more reliable 6, less knowledgeintensive, and computationally less complex strategies with faster tools than later ones. For each pass, an expected reliability should ideally be known.</Paragraph>
      <Paragraph position="2"> The expectation is that early, knowledge-poor, and highly reliable passes can be used for almost any task.</Paragraph>
      <Paragraph position="3"> Matching equal NPs, for instance, can be done with basic, fast tools independently of further linguistic processing. Anaphora resolution could then be tailored to particular needs by determining which levels of reliability are acceptable to the task and using the passes 7 up to that threshold. For the remaining resolution task, a domain- and genre-specific set of procedures has to be developed.</Paragraph>
      <Paragraph position="4"> We argue that such a multi-pass approach has advantages to a monolithic approach, be it statistical or symbolic. While most symbolic anaphora resolution systems probably correctly identify identical NPs as coreletting, making this a first step and using very fast, low level tools can pre-process a text faster. The modular approach allows for use of the tools of choice at each level. Moreover, a text can be left partly resolved, potentially allowing anaphora resolution to be interleaved with other linguistic processing as required.</Paragraph>
      <Paragraph position="5"> Another interesting result of our study is the fact that many NPs correctly resolve to more than one coreferring NP, and often resolve to the first reference.</Paragraph>
      <Paragraph position="6"> This provides support for the viability of partial parsing methods, because a missing link does not mean the rest of the chain is unresolvable. As mentioned above, this also allows for a focused partial resolution strategy that attempts to resolve subsequent NPs only to a set of NPs determined at the outset (e.g., topical NPs, pre-determined subjects or persons, ... ). This provides the basis for a series of principled heuristics. These partial resolution strategies are of great importance where the amount of text to be processed is large but the depth  course be combined into a single pass, which is the case in our experimental system presented below. 64 of processing is shallow, as for certain text annotation tasks.</Paragraph>
      <Paragraph position="7"> The study presented above suggests the following strategies:  1. Identical NPs Prerequisites: NP boundaries Procedure: string matching 2. Focused partial resolution Prerequisites: NP boundaries, identification procedures of the NP chains of interest Procedure: according to desired resolution strategies 3. Common head  Procedure: as described in the literature These resolution strategies are not exhaustive, nor can an optimal ordering be assigned in general; the desired level of reliability, the genre and style features, and any possible additional linguistic processing will determine different combinations, extensions, and orderings.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Knowledge Poor Resolution
</SectionTitle>
      <Paragraph position="0"> We implemented an experimental resolution system (ERS) based on these ideas. The system uses as input the parse trees provided by the Penn Treebank on the ACL CD-ROM. These parse trees have been corrected manually for inconsistencies.</Paragraph>
      <Paragraph position="1"> The system includes a (partial) implementation of all of the six strategies except for Focused Partial Resolution. The most carefully worked out strategy is pronoun resolution. Pronoun resolution makes use of the parse trees and follows the ideas in \[Lappin and Leass, 1994, Hobbs, 1978\]. The other strategies have only been partially or crudely implemented. The Common Head strategy, for instance, uses a crude heuristic to determine the head of a complex noun phrase that fails in certain cases. Extended Head Matching is limited to very few lexical items such as company, which receive special treatment. No lexicon is used, the required lexical knowledge has been provided in a list of gendered items. An additional limitation is the fact that the system considers coreference only within a sentence and between adjacent sentences.</Paragraph>
      <Paragraph position="2"> Algorithm For every NP in the text: (1 Telxon Corp. a) said (2 (2.1&lt;&lt;ref=l its 2.a) vice president for manufacturing 2) resigned and (3 (3.,&lt;re/=1 its 3.1) Houston work force 3) has been trimmed by (4 40 people 4), or about</Paragraph>
      <Paragraph position="4"> (6&lt;rey=a The maker of hand-held computers and computer systems 6) said (7 the personnel changes r) were needed to improve (s the efficiency s) of (9 (9.a&lt;rey=s its 9.1) manufacturing operation 9).</Paragraph>
      <Paragraph position="5"> (lO&lt;&lt;ref=l The company 10) said (ll&lt;&lt;re/=lo it aa) hasn't named (12 a successor 12) to (a3&gt;re$=a4 Ronald Burton 13),  text from the Wall Street Journal 1. Determine candidate referents within the sentence. If none are found (i.e. lack of agreement), determine candidate referents in previous sentence.</Paragraph>
      <Paragraph position="6"> 2. Test each candidate referent for actual coreference using: (a) Common Head (with slight modifications) (b) Extended Head Matching (limited to few cases) (c) Appositions (d) Copula 3. If there is more than one possible coreference, select best.</Paragraph>
      <Paragraph position="7"> 4. Merge the new coreference pair with existing reference chains or start a new chain.</Paragraph>
      <Paragraph position="8"> Sample Output  This algorithms is clearly too constrained to ever achieve full resolution, but except for the pronoun resolution, it was quickly implemented and performs surprisingly well.</Paragraph>
      <Paragraph position="9"> Both strengths and limitations are best illustrated on a short example. Consider the text in Figure 3, which has been annotated with manually determined coreference links following the Lancaster notation \[Fligelstone, 1992\] with slight modifications The annotation (2.1&lt;re/=1 means that NP2.1 (a sub-NP of NP2) starts at this point and that it refers backwards to NP1. The &lt;&lt; sign indicates that this reference has also been detected by ERS.</Paragraph>
      <Paragraph position="10"> ERS determined 4 reference chains in this article. The first chain consists of NPs 1, 2.1, 10, and 11. The second chain contains NPs 6 and 9.1, the third contains NPs 3 and 15, and chain number four contains NPs 16 and 15.1. The coreference link stipulated for chain four is wrong (an artifact of a strong bias towards intrasentential resolution.) All other stipulated coreference links are correct. There are six coreferring NPs whose coreference link has not been identified. Two of the stipulated chains could be merged (chains one and two.) 65</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML