File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/a00-1020_metho.xml
Size: 24,548 bytes
Last Modified: 2025-10-06 14:07:01
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-1020"> <Title>Multilingual Coreference Resolution</Title> <Section position="4" start_page="142" end_page="143" type="metho"> <SectionTitle> 2 COCKTAIL </SectionTitle> <Paragraph position="0"> Currently, some of the best-performing and most robust coreference resolution systems employ knowledge-based techniques. Traditionally, these techniques have combined extensive syntactic, semantic, and discourse knowledge. The acquisition of such knowledge is time-consuming, difficult, and error-prone. Nevertheless, recent results show that knowledge-poor methods perform with amazing accuracy (cf. (Mitkov, 1998), (Kennedy and Boguraev, 1996) (Kameyama, 1997)). For example, CogNIAC (Baldwin, 1997), a system based on seven ordered heuristics, generates high-precision resolution (over 90%) for some cases of pronominal reference. For this research, we used a coreference resolution system ((Harabagiu and Malorano, 1999)) that implements different sets of heuristics corresponding to various forms of coreference. This system, called COCKTAIL, resolves coreference by exploiting several textual cohesion constraints (e.g. term repetition) combined with lexical and textual coherence cues (e.g. subjects of communication verbs are more likely to refer to the last person mentioned in the text). These constraints are implemented as a set of heuristics ordered by their priority. Moreover, the COCKTAIL framework uniformly addresses the problem of interaction between different forms of coreference, thus making the extension to multilingual coreference very natural.</Paragraph> <Section position="1" start_page="142" end_page="143" type="sub_section"> <SectionTitle> 2.1 Data-Driven Coreference Resolution </SectionTitle> <Paragraph position="0"> In general, we define a data-driven methodology as a sequence of actions that captures the data patterns capable of resolving a problem with both a high degree of precision and recall. Our data-driven methodology reported here generated sets of heuristics for the coreference resolution problem. Precision is the number of correct references out of the total number of coreferences resolved, whereas the recall measures the number of resolved references out of the total number of keys, i.e., the annotated coreference data.</Paragraph> <Paragraph position="1"> The data-driven methodology used in COCKTAIL is centered around the notion of a coreference chain.</Paragraph> <Paragraph position="2"> Due to the transitivity of coreference relations, k coreference relations having at least one common argument generate k + 1 core/erring expressions. The text position induces an order among coreferring expressions. A coreference structure is created when a set of coreferring expressions are connected in an oriented graph such that each node is related only to one of its preceding nodes. In turn, a coreference chain is the coreference structure in which every node is connected to its immediately preceding node. Clearly, multiple coreference structures for the same set of coreferring expressions can be mapped to a single coreference chain. As an example, both coreference structures illustrated in Figure l(a) and (c) are cast into the coreference chain illustrated in Given a corpus annotated with coreference data, the data-driven methodology first generates all coreference chains in the data set and then considers all possible combinations of coreference relations that would generate the same coreference chains. For a coreference chain of length l with nodes nl, n2, ... nt+l, each node nk (l<k~/) can be connected to any of the l - k nodes preceding it. From this observation, we find that a number of 1 x 2 x ... x (l - k)... x I = l! coreference structures can generate the same coreference chain. This result is very important, since it allows for the automatic generation of coreference data. For each coreference relation T~ from an annotated corpus we created a median of (l - 1)! new coreference relations, where l is the length of the coreference chain containing relation 7~. This observation gave us the possibility of expanding the test data provided by the coreference keys available in the MUC-6 and MUC-7 competitions (MUC-6 1996), (MUC-7 1998). The MUC-6 coreference annotated corpus contains 1626 coreference relations, while the MUC-7 corpus has 2245 relations. The average length of a coreference chain is 7.21 for the MUC-6 data, and 8.57 for the MUC-7 data. We were able to expand the number of annotated coreference relations to 6,095,142 for the MUC-6 corpus and to 8,269,403 relations for the MUC-7 corpus; this represents an expansion factor of 3,710. We are not aware of any other automated way of creating coreference annotated data, and we believe that much of the COCKTAIL's impressive performance is due to the plethora of data provided by this method.</Paragraph> <Paragraph position="3"> Heuristics for 3rd person pronouns oHeuristie 1-Pronoun(H1Pron) Search in the same sentence for the same 3rd person pronoun Pros' if (Pron' belongs to coreference chain CC) and there is an element from CC which is closest to Pron in Text, Pick that element.</Paragraph> <Paragraph position="4"> else Pick Pron'.</Paragraph> <Paragraph position="5"> oHeuristic 2-Pronoun(H2Pron) Search for PN, the closest proper name from Pron if (PN agrees in number and gender with Pros) if (PN belongs to coreference chain CC) then Pick the element from CC which is closest to Pros in Text.</Paragraph> <Paragraph position="6"> else Pick PN.</Paragraph> <Paragraph position="7"> o Heuristic 3- Pronoun( H3Pron ) Search for Noun, the closest noun from Pros if (Noun agrees in number and gender with Pros) if (Noun belongs to coreference chain CC) and there is an element from CC which is closest to Pros in Text, Pick that element.</Paragraph> <Paragraph position="8"> else Pick Noun Heuristics for nominal reference o Heuristic 1-Nominal(HINom ) if (Noun is the head of an appositive) then Pick the preceding NP.</Paragraph> <Paragraph position="9"> oHeuristic 2-Nominal(H2Nom) if (Noun belongs to an NP, Search for NP' such that Noun'=same_name(head(NP),head(NP')) or Noun'--same_name(adjunct(NP), adjunct(NP'))) then if (Noun' belongs to coreference chain CC) then Pick the element from CC which is closest to Noun in Text.</Paragraph> <Paragraph position="10"> else Pick Noun'.</Paragraph> <Paragraph position="11"> oHeuristic 3-Nominal(H3Nom) if Noun is the head of an NP then Search for proper name PN such that head(PN)=Noun if (PN belongs to coreference chain CC) and there is an element from CC which is closest to Noun in Text, Pick that element.</Paragraph> </Section> <Section position="2" start_page="143" end_page="143" type="sub_section"> <SectionTitle> 2.2 Knowledge-Poor Coreference Resolution </SectionTitle> <Paragraph position="0"> The result of our data-driven methodology is the set of heuristics implemented in COCKTAIL which cover both nominal and pronoun coreference. Each heuristic represents a pattern of coreference that was mined from the large set of coreference data.</Paragraph> <Paragraph position="1"> COCKTAIL uses knowledge-poor methods because (a) it is based only on a limited number of heuristics and (b) text processing is limited to part-of-speech tagging, named-entity recognition, and approximate phrasal parsing. The heuristics from COCKTAIL can be classified along two directions. First of all, they can be grouped according to the type of coreference they resolve, e.g., heuristics that resolve the anaphors of reflexive pronouns operate differently than those resolving bare nominals. Currently, in COCKTAIL there are heuristics that resolve five types of pronouns (personal, possessive, reflexive, demonstrative and relative) and three forms of nominals (definite, bare and indefinite).</Paragraph> <Paragraph position="2"> Secondly, for each type of coreference, there are three classes of heuristics categorized according to their suitability to resolve coreference. The first class is comprised of strong indicators of coreference.</Paragraph> <Paragraph position="3"> This class resulted from the analysis of the distribution of the antecedents in the MUC annotated data. For example, repetitions of named entities and appositives account for the majority of the nominal coreferences, and, therefore, represent anchors for the first class of heuristics.</Paragraph> <Paragraph position="4"> The second class of coreference covers cases in which the arguments are recognized to be semantically consistent. COCKTAIL's test of semantic consistency blends together information available from WordNet and statistics gathered from Treebank.</Paragraph> <Paragraph position="5"> Different consistency checks are modeled for each of the heuristics.</Paragraph> <Paragraph position="6"> Example of the application of heuristic H2Pron Mr. Adams1, 69 years old, is the retired chairman of Canadian-based Emco Ltd., a maker of plumbing and petroleum equipment; he1 has served on the Woolworth board since 1981.</Paragraph> <Paragraph position="7"> Example of the application of heuristic H3Pron &quot;We have got to stop pointing our fingers at these kids2 who have no future,&quot; he said, &quot;and reach our hands out to them2.</Paragraph> <Paragraph position="8"> Example of the application of heuristic H2Nom The chairman and the chief executive officer3 of Woolworth Corp. have temporarily relinquished their posts while the retailer conducts its investigation into alleged accounting irregularities4. Woolworth's board named John W. Adams, an outsider, to serve as interim chairman and executive officer3, while a special committee, appointed by the board last week and led by Mr. Adams, investigates the alleged irregularities4.</Paragraph> <Paragraph position="9"> same annotated index indicates coreference.</Paragraph> <Paragraph position="10"> The third class of heuristics resolves coreference by coercing nominals. Sometimes coercions involve only derivational morphology - linking verbs with their nominalizations. On other occasions, coercions are obtained as paths of meronyms (e.g. is-part relations) and hypernyms (e.g. is-a relations). Con144. null sistency checks implemented for this class of coreference are conservative: either the adjuncts must be identical or the adjunct of the referent must be less specific than the antecedent. Table 1 lists the top performing heuristics of COCKTAIL for pronominal and nominal coreference. Examples of the heuristics operation on the MUC data are presented presented in Table 2. Details of the top performing heuristics of COCKTAIL were reported in (Harabagiu and Maiorano, 1999).</Paragraph> </Section> <Section position="3" start_page="143" end_page="143" type="sub_section"> <SectionTitle> 2.3 Bootstrapping for Coreferenee Resolution </SectionTitle> <Paragraph position="0"> One of the major drawbacks of existing coreference resolution systems is their inability to recognize many forms of coreference displayed by many real-world texts. Recall measures of current systems range between 36% and 59% for both knowledge-based and statistical techniques. Knowledge basedsystems would perform better if more coreference constraints were available whereas statistical methods would be improved if more annotated data were available. Since knowledge-based techniques out-perform inductive methods, we used high-precision coreference heuristics as knowledge seeds for machine learning techniques that operate on large amounts of unlabeled data. One such technique is bootstrapping, which was recently presented in (Riloff and Jones 1999), (Jones et a1.1999) as an ideal framework for text learning tasks that have knowledge seeds. The method does not require large training sets. We extended COCKTAIL by using meta-bootstrapping of both new heuristics and clusters of nouns that display semantic consistency for coreference. null The coreference heuristics are the seeds of our bootstrapping framework for coreference resolution.</Paragraph> <Paragraph position="1"> When applied to large collections of texts, the heuristics determine classes of coreferring expressions. By generating coreference chains out of all these coreferring expressions, often new heuristics are uncovered. For example, Figure 2 illustrates the application of three heuristics and the generation of data for a new heuristic rule. In COCKTAIL, after a heuristic is applied, a new coreference chain is calculated. For the example illustrated in Figure 2, if the reference of expression A is sought, heuristic H1 indicates expression B to be the antecedent. When the coreference chain is built, expression A is directly linked to expression D, thus uncovering a new heuristic H0.</Paragraph> <Paragraph position="2"> As a rule of thumb, we do not consider a new heuristic unless there is massive evidence of its coverage in the data. To measure the coverage we use the FOIL_Gain measure, as introduced by the FOIL inductive algorithm (Cameron-Jones and Quinlan 1993). Let Ho be the new heuristic and/-/1 a heuristic that is already in the seed set. Let P0 be the number of positive coreference examples of Hn~w (i.e.</Paragraph> <Paragraph position="3"> the number of coreference relations produced by the heuristic that can be found in the test data) and no the number of negative examples of/-/new (i.e. the number of relations generated by the heuristic which cannot be found in the test data). Similarly, Pl and nl are the positive and negative examples of Ha.</Paragraph> <Paragraph position="4"> The new heuristics are scored by their FOIL_Gain distance to the existing set of heuristics, and the best scoring one is added to the COCKTAIL system. The FOIL_Gain formula is: log2---~ ) FOIL_Gain(H1, Ho) = k(log2 Pl nl Po -k no where k is the number of positive examples covered by both//1 and Ho. Heuristic Ho is added to the seed set if there is no other heuristic providing larger FOIL_Gain to any of the seed heuristics.</Paragraph> <Paragraph position="5"> Since in COCKTAIL, semantic consistency of coreferring expressions is checked by comparing the similarity of noun classes, each new heuristic determines the adjustment of the similarity threshold of all known coreferring noun classes. The steps of the bootstrapping algorithm that learns both new heuristics and adjusts the similarity threshold of coreferential expressions is:</Paragraph> </Section> </Section> <Section position="5" start_page="143" end_page="145" type="metho"> <SectionTitle> MUTUAL BOOTSTRAPPING LOOP </SectionTitle> <Paragraph position="0"> 1. Score all candidate heuristics with FOIL_Gain 2. Best_h--closest candidate to heuristics(COCKTAIL) 3. Add Best_h to heuristics(COCKTAIL) ,f. Adjust semantic similarity threshold for semantic consistency o\[ coreferring nouns 5. Goto step 1 if the precision and recall did not degrade under minimal performance.</Paragraph> <Paragraph position="1"> (Riloff and Jones 1999) note that the bootstrapping algorithm works well but its performance can deteriorate rapidly when non-coreferring data enter as candidate heuristics. To make the algorithm more robust, a second level of bootstrapping can be introduced. The outer bootstrapping mechanism, called recta-bootstrapping compiles the results of the inner (mutual) bootstrapping process and identifies the k most reliable heuristics, where k is a number determined experimentally. These k heuristics are retained and the rest of them are discarded.</Paragraph> </Section> <Section position="6" start_page="145" end_page="147" type="metho"> <SectionTitle> 3 SWIZZLE </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="145" end_page="145" type="sub_section"> <SectionTitle> 3.1 MultiHngual Coreference Data </SectionTitle> <Paragraph position="0"> To study the performance of a data-driven multi-lingual coreference resolution system, we prepared a corpus of Romanian texts by translating the MUC-6 and MUC-7 coreference training texts. The translations were performed by a group of four Romanian native speakers, and were checked for style by a certified translator from Romania. In addition, the Romanian texts were annotated with coreference keys.</Paragraph> <Paragraph position="1"> Two rules were followed when the annotations were done: o 1: Whenever an expression ER represents a translation of an expression EE from the corresponding English text, if Es is tagged as a coreference key with identification number ID, then the Romanian expression ER is also tagged with the same ID number. This rule allows for translations in which the textual position of the referent and the antecedent have been swapped.</Paragraph> <Paragraph position="2"> o2: Since the translations often introduce new coreferring expressions in the same chain, the new expressions are given new, unused ID numbers.</Paragraph> <Paragraph position="3"> For example, Table 3 lists corresponding English and Romanian fragments of coreference chains from the original MUC-6 Wall Street Journal document DOCNO: 930729-0143.</Paragraph> <Paragraph position="4"> Table 3 also shows the original MUC coreference SGML annotations. Whenever present, the REF tag indicates the ID of the antecedent, whereas the MIN tag indicates the minimal reference expression.</Paragraph> </Section> <Section position="2" start_page="145" end_page="146" type="sub_section"> <SectionTitle> 3.2 Lexical Resources </SectionTitle> <Paragraph position="0"> The multilingual coreference resolution method implemented in SWIZZLE incorporates the heuristics derived from COKCTAIL's monolingual coreference resolution processing in both languages. To this end, COCKTAIL required both sets of texts to be tagged for part-of-speech and to recognize the noun phrases.</Paragraph> <Paragraph position="1"> The English texts were parsed with Brill's part-of-speech tagger (Brill 1992) and the noun phrases were identified by the grammar rules implemented in the phrasal parser of FASTUS (Appelt et al., 1993). Corresponding resources are not available in Romanian.</Paragraph> <Paragraph position="2"> To minimize COCKTAIL's configuration for processing Romanian texts, we implemented a Romanian part-of-speech rule-based tagger that used the same text annotated for coreference. The elements from a coreference chain in the respective texts are underlined. The English text has only two elements in the coreference chain, whereas the Romanian text contains four different elements. The two additional elements of the Romanian coreference chain are derived due to (1) the need to translate the relative clause from the English fragment into a separate sentence in Romanian; and (2) the reordering of words in the second sentence.</Paragraph> <Paragraph position="3"> tags as generated by the Brill tagger. In addition, we implemented rules that identify noun phrases in Romanian.</Paragraph> <Paragraph position="4"> To take advantage of the aligned corpus, SWIZZLE also relied on bilingual lexical resources that help translate the referential expressions. For this purpose, we used a core Romanian WordNet (Harabagiu, 1999) which encoded, wherever possible, links between the English synsets and their Romanian counterparts. This resource also incorporated knowledge derived from several bilingual dictionaries (e.g. (Bantam, 1969)).</Paragraph> <Paragraph position="5"> Having the parallel coreference annotations, we can easily identify their translations because they have the same identification coreference key. Looking at the example given in Table 3, the expression &quot;legii', with ID=500 is the translation of the expression &quot;package&quot;, having the same ID in the English text. However, in the test set, the REF fields are intentionally voided, entrusting COCKTAIL to identify the antecedents. The bilingual coreference resolution performed in SWIZZLE, however, requires the translations of the English and Romanian antecedents. The principles guiding the translations of the English and Romanian antecedents (A E-R and A R-E, respectively) are: * Circularity: Given an English antecedent, due to semantic ambiguity, it can belong to several English WordNet sysnsets. For each such sysnset S/~ we consider the Romanian corresponding sysnet(s) Sff. We filter out all Sff that do not contain A E-R. If only one Romanian sysnset is left, then we identified a translation. Otherwise, we start from the Romanian antecedent, find all synsets S R to which it belongs, and obtain the corresponding English sysnets S F. Similarly, all English synsets not containing the English antecedent are filtered out. If only one synset remains, we have again identified a translation. Finally, in the last case, the intersection of the multiple synsets in either language generates a legal translation. For example, the English synset S E ={bill, measure} translates into the Romanian synset S R ={lege}. First, none of the dictionary translations of bill into Romanian (e.g. politE, bacnotE, afi~) translate back into any of the elements of S E. However the translation of measure into the Romanian lege translates back into bill, its synonym.</Paragraph> <Paragraph position="6"> * Semantic density: Given an English and a Romanian antecedent, to establish whether they are translations of one another, we disambiguate them by first collapsing all sysnsets that have common elements.</Paragraph> <Paragraph position="7"> Then we apply the circularity principle, relying on the semantic alignment encoded in the Romanian WordNet. When this core lexical database was first implemented, several other principles were applied.</Paragraph> <Paragraph position="8"> In our experiment, we were satisfied with the quality of the translations recognized by following only these two principles.</Paragraph> </Section> <Section position="3" start_page="146" end_page="147" type="sub_section"> <SectionTitle> 3.3 Multilingual Coreference Resolution </SectionTitle> <Paragraph position="0"> The SWIZZLE system was run on a corpus of 2335 referential expressions in English (927 from MUC-6 and 1408 from MUC-7) and 2851 Romanian expressions (1219 from MUC-6 and 1632 from MUC7). Initially, the heuristics implemented in COCKTAIL were applied separately to the two textual collections. Several special cases arose.</Paragraph> <Paragraph position="1"> Case 1, which is the ideal case, is shown in Figure 3. It occurs when two referential expressions have antecedents that are translations of one another. This situation occurred in 63.3% of the referential expressions from MUC-6 and in 58.7% of the MUC-7 references. Over 50% of these are pronouns or named entities. However, all the non-ideal cases are more interesting for SWIZZLE, since they port knowledge that enhances system performance.</Paragraph> <Paragraph position="2"> Case 2 occurs when the antecedents are not translations, but belong to or corefer with elements of some coreference chains that were already established. Moreover, one of the antecedents is textually closer to its referent. Figure 4 illustrates the case when the English antecedent is closer to the referent than the Romanian one.</Paragraph> <Paragraph position="3"> SWIZZLE Solutions: (1) If the heuristic H(E) used to resolve the reference in the English text has higher priority than H(R), which was used to resolve the reference from the Romanian text, then we first search for RT, the Romanian translation of EA, the English antecedent. In the next step, we add heuristic H1 that resolves RR into RT, and give it a higher priority than H(R). Finally, we also add heuristic H2 that links RTto RA when there is at least one translation between the elements of the coreference chains containing EA and ET respectively.</Paragraph> <Paragraph position="4"> (2) If H(R) has higher priority than H(E), heuristic H3 is added while H(E) is removed. We also add //4 that relates ER to ET, the English translation of RA.</Paragraph> <Paragraph position="5"> Case 3 occurs when at least one of the antecedents starts a new coreference chain (i.e., no coreferring antecedent can be found in the current chains).</Paragraph> <Paragraph position="6"> SWIZZLE Solution: If one of the antecedents corefers with an element from a coreference chain, then the antecedent in the opposite language is its translation. Otherwise, SNIZZLE chooses the antecedent returned by the heuristic with highest priority. null</Paragraph> </Section> </Section> class="xml-element"></Paper>