File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-1023_metho.xml
Size: 13,275 bytes
Last Modified: 2025-10-06 14:08:15
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1023"> <Title>Coreference Resolution Using Competition Learning Approach</Title> <Section position="4" start_page="3" end_page="3" type="metho"> <SectionTitle> < CF </SectionTitle> <Paragraph position="0"> , if the anaphors whose candidates all belong to CS take the majority in the training data set. In this case, a candidate in</Paragraph> </Section> <Section position="5" start_page="3" end_page="3" type="metho"> <SectionTitle> CS </SectionTitle> <Paragraph position="0"> would be assigned a larger confidence value than a candidate in CS . This nevertheless contradicts the ranking rules. If during resolution, the candidates of an anaphor all come from CS</Paragraph> <Paragraph position="2"/> </Section> <Section position="6" start_page="3" end_page="7" type="metho"> <SectionTitle> 3 The Twin-Candidate Model </SectionTitle> <Paragraph position="0"> Different from the single-candidate model, the twin-candidate model aims to learn the competition criterion for candidates. In this section, we will introduce the structure of the model in details.</Paragraph> <Section position="1" start_page="3" end_page="7" type="sub_section"> <SectionTitle> 3.1 Training Instances Creation </SectionTitle> <Paragraph position="0"> Consider an anaphor ana and its candidate set can-</Paragraph> <Paragraph position="2"> if j > i. Suppose positive_set is the set of candidates that occur in the coreferential chain of ana, and negative_set is the set of candidates not in the chain, that is, negative_set = candidate_set - positive_set. The set of training instances based on ana, inst_set, is defined as follows: Suppose we use C4.5 algorithm and the class value takes the smoothed ration,</Paragraph> <Paragraph position="4"> , where p is the number of positive instances and t is the total number of instances contained in the corresponding leaf node.</Paragraph> <Paragraph position="6"> From the above definition, an instance is formed by an anaphor, one positive candidate and one negative candidate. For each instance, )ana,cj,ci(inst , the candidate at the first position, C</Paragraph> <Paragraph position="8"> set.</Paragraph> <Paragraph position="9"> See the following example: try to block China TO accession, that will not be popular and will fail to win the support of other countries &quot;provocative and reckless&quot; and other countries said they could threaten Asian stability. In the above text segment, the antecedent candidate set of the pronoun &quot;them &quot; consists of six candidates highlighted in Italics. Among the candidates, Candidate 1 and 6 are in the coreferential chain of &quot;them &quot;, while Candidate 2, 3, 4, 5 are not. Thus, eight instances are formed for &quot;them</Paragraph> <Paragraph position="11"> Here the instances in the first line are negative, while those in the second line are all positive.</Paragraph> </Section> <Section position="2" start_page="7" end_page="7" type="sub_section"> <SectionTitle> 3.2 Features Definition </SectionTitle> <Paragraph position="0"> A feature vector is specified for each training or testing instance. Similar to those in the single-candidate model, the features may describe the lexical, syntactic, semantic and positional relationships of an anaphor and any one of its candidates. Besides, the feature set may also contain intercandidate features characterizing the relationships between the pair of candidates, e.g. the distance between the candidates in the number distances or paragraphs.</Paragraph> </Section> <Section position="3" start_page="7" end_page="7" type="sub_section"> <SectionTitle> 3.3 Classifier Generation </SectionTitle> <Paragraph position="0"> Based on the feature vectors generated for each anaphor encountered in the training data set, a classifier can be trained using a certain machine learning algorithm, such as C4.5, RIPPER, etc.</Paragraph> <Paragraph position="1"> Given the feature vector of a test instance )ana,cj,ci(inst (i > j), the classifier returns the positive class indicating that C</Paragraph> <Paragraph position="3"> as the antecedent of ana; or negative indicating that C j is preferred.</Paragraph> </Section> <Section position="4" start_page="7" end_page="7" type="sub_section"> <SectionTitle> 3.4 Antecedent Identification </SectionTitle> <Paragraph position="0"> Let CR( )ana,cj,ci(inst ) denote the classification result for an instance )ana,cj,ci(inst . The antecedent of an anaphor is identified using the algorithm shown in Figure 1.</Paragraph> </Section> </Section> <Section position="7" start_page="7" end_page="7" type="metho"> <SectionTitle> Algorithm ANTE-SEL </SectionTitle> <Paragraph position="0"> Input: ana: the anaphor under consideration candidate_set: the set of antecedent candidates of ana, {C Algorithm ANTE-SEL takes as input an anaphor and its candidate set candidate_set, and returns one candidate as its antecedent. In the algorithm, each candidate is compared against any other candidate. The classifier acts as a judge during each comparison. The score of each candidate increases by one every time when it wins. In this way, the final score of a candidate records the total times it wins. The candidate with the maximal score is singled out as the antecedent.</Paragraph> <Paragraph position="1"> If two or more candidates have the same maximal score, the one closest to the anaphor would be selected.</Paragraph> </Section> <Section position="8" start_page="7" end_page="7" type="metho"> <SectionTitle> 3.5 Single-Candidate Model: A Special Case </SectionTitle> <Paragraph position="0"> of Twin-Candidate Model? While the realization and the structure of the twin-candidate model are significantly different from the single-candidate model, the single-candidate model in fact can be regarded as a special case of the twin-candidate model.</Paragraph> <Paragraph position="1"> To illustrate this, just consider a virtual &quot;blank&quot;</Paragraph> <Paragraph position="3"> such that we could convert an instance )ana,ci(inst in the single-candidate model to an instance )ana,c,ci( 0inst in the twin-candidate model. Let )ana,c,ci( 0inst have the same class label as )ana,ci(inst , that is, )ana,c,ci( 0inst is positive if C</Paragraph> <Paragraph position="5"> the antecedent of ana; or negative if not.</Paragraph> <Paragraph position="6"> Apparently, the classifier trained on the instance set { )ana,ci(inst }, T1, is equivalent to that trained on { )ana,c,ci( 0inst }, T2. T1 and T2 would assign the same class label for the test instances )ana,ci(inst and )ana,c,ci( 0inst , respectively. That is to say, determining whether C</Paragraph> <Paragraph position="8"> as a &quot;standard candidate&quot;.</Paragraph> <Paragraph position="9"> While the classification in the single-candidate model can find its interpretation in the twin-candidate model, it is not true vice versa. Consequently, we can safely draw the conclusion that the twin-candidate model is more powerful than the single-candidate model in characterizing the relationships among an anaphor and its candidates.</Paragraph> </Section> <Section position="9" start_page="7" end_page="10" type="metho"> <SectionTitle> 4 The Competition Learning Approach </SectionTitle> <Paragraph position="0"> Our competition learning approach adopts the twin-candidate model introduced in the Section 3.</Paragraph> <Paragraph position="1"> The main process of the approach is as follows: 1. The raw input documents are preprocessed to obtain most, if not all, of the possible NPs. 2. During training, for each anaphoric NP, we create a set of candidates, and then generate the training instances as described in Section 3. 3. Based on the training instances, we make use of the C5.0 learning algorithm (Quinlan, 1993) to train a classifier.</Paragraph> <Paragraph position="2"> 4. During resolution, for each NP encountered, we also construct a candidate set. If the set is empty, we left this NP unresolved; otherwise we apply the antecedent identification algorithm to choose the antecedent and then link the NP to it.</Paragraph> <Section position="1" start_page="7" end_page="7" type="sub_section"> <SectionTitle> 4.1 Preprocessing </SectionTitle> <Paragraph position="0"> To determine the boundary of the noun phrases, a pipeline of Nature Language Processing components are applied to an input raw text: null Tokenization and sentence segmentation null Named entity recognition null Part-of-speech tagging null Noun phrase chunking Among them, named entity recognition, part-of-speech tagging and text chunking apply the same Hidden Markov Model (HMM) based engine with error-driven learning capability (Zhou and Su, 2000 & 2002). The named entity recognition component recognizes various types of MUC-style named entities, i.e., organization, location, person, date, time, money and percentage.</Paragraph> </Section> <Section position="2" start_page="7" end_page="7" type="sub_section"> <SectionTitle> 4.2 Features Selection </SectionTitle> <Paragraph position="0"> For our study, in this paper we only select those features that can be obtained with low annotation cost and high reliability. All features are listed in Table 1 together with their respective possible values. null</Paragraph> </Section> <Section position="3" start_page="7" end_page="10" type="sub_section"> <SectionTitle> 4.3 Candidates Filtering </SectionTitle> <Paragraph position="0"> For a NP under consideration, all of its preceding NPs could be the antecedent candidates. Nevertheless, since in the twin-candidate model the number of instances for a given anaphor is about the square of the number of its antecedent candidates, the computational cost would be prohibitively large if we include all the NPs in the candidate set. Moreover, many of the preceding NPs are irrelevant or even invalid with regard to the anaphor. These data noises may hamper the training of a goodperformanced classifier, and also damage the accuracy of the antecedent selection: too many comparisons are made between incorrect candidates.</Paragraph> <Paragraph position="1"> Therefore, in order to reduce the computational cost and data noises, an effective candidate filtering strategy must be applied in our approach.</Paragraph> <Paragraph position="2"> During training, we create the candidate set for each anaphor with the following filtering algorithm: 1. If the anaphor is a pronoun, (a) Add to the initial candidate set all the preceding NPs in the current and the previous two sentences.</Paragraph> <Paragraph position="3"> (b) Remove from the candidate set those that disagree in number, gender, and person.</Paragraph> <Paragraph position="4"> (c) If the candidate set is empty, add the NPs in an earlier sentence and go to 1(b).</Paragraph> <Paragraph position="5"> 2. If the anaphor is a non-pronoun, (a) Add all the non-pronominal antecedents to the initial candidate set.</Paragraph> <Paragraph position="6"> (b) For each candidate added in 2(a), add the non-pronouns in the current, the previous and the next sentences into the candidate set.</Paragraph> <Paragraph position="7"> During resolution, we filter the candidates for each encountered pronoun in the same way as during training. That is, we only consider the NPs in the current and the preceding 2 sentences. Such a context window is reasonable as the distance between a pronominal anaphor and its antecedent is generally short. In the MUC-6 data set, for example, the immediate antecedents of 95% pronominal anaphors can be found within the above distance.</Paragraph> <Paragraph position="8"> Comparatively, candidate filtering for non-pronouns during resolution is complicated. A potential problem is that for each non-pronoun under consideration, the twin-candidate model always chooses a candidate as the antecedent, even though all of the candidates are &quot;low-qualified&quot;, that is, unlikely to be coreferential to the non-pronoun under consideration.</Paragraph> <Paragraph position="9"> In fact, the twin-candidate model in itself can identify the qualification of a candidate. We can compare every candidate with a virtual &quot;standard deemed qualified and allowed to enter the &quot;round robin&quot;, whereas the losers are eliminated. As we have discussed in Section 3.5, the classifier on the pairs of a candidate and C is just a single-candidate classifier. Thus, we can safely adopt the single-candidate classifier as our candidate filter. The candidate filtering algorithm during resolution is as follows: Features describing the candidate:</Paragraph> <Paragraph position="11"> ) is in a title; else 0 Features describing the anaphor: j are not used in the single-candidate model) 1. If the current NP is a pronoun, construct the candidate set in the same way as during training. 2. If the current NP is a non-pronoun, (a) Add all the preceding non-pronouns to the initial candidate set.</Paragraph> <Paragraph position="12"> (b) Calculate the confidence value for each candidate using the single-candidate classifier. (c) Remove the candidates with confidence value less than 0.5.</Paragraph> </Section> </Section> class="xml-element"></Paper>