File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/n04-4009_metho.xml
Size: 9,869 bytes
Last Modified: 2025-10-06 14:08:54
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4009"> <Title>Competitive Self-Trained Pronoun Interpretation</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The Supervised Algorithm </SectionTitle> <Paragraph position="0"> The supervised model was trained using the improved iterative scaling algorithm for Maximum Entropy (MaxEnt) models described by Berger et al. (1996) with binary-valued features. As is standard, the model was trained as a binary coreference classifier: for each possible antecedent of each pronoun, a training instance was created that consisted of the pronoun, the possible antecedent phrase, and a binary coreference outcome. (Such a model can be seen as providing a probabilistic measure of antecedent salience.) Because we are ultimately interested in identifying the correct antecedent among a set of possible ones, during testing the antecedent assigned the highest probability is chosen.</Paragraph> <Paragraph position="1"> The algorithm receives as input the results of SRI's Textpro system, a shallow parser that recognizes low-level constituents (noun groups, verb groups, etc.). No difficult syntactic attachments are attempted, and the results are errorful. There was no human-annotated linguistic information in the input.</Paragraph> <Paragraph position="2"> The training corpus consists of 2773 annotated third-person pronouns from the newspaper and newswire segments of the Automatic Content Extraction (ACE) program training corpus. The annotated blind corpus used for evaluation consists of 762 annotated third-person pronouns from the ACE February 2002 evaluation set. The annotated pronouns in both sets include only those that are ACE &quot;markables&quot;, i.e., ones that refer to entities of the following types: Persons, Organizations, GeoPoliticalEntities (politically defined geographical regions, their governments, or their people), Locations, and Facilities.</Paragraph> <Paragraph position="3"> The system employs a set of hard constraints and soft features. The hard constraints filter out those noun groups that fail conservative number and gender agreement checks before training, whereas the soft features are used by the MaxEnt algorithm. A set of forty soft features were developed and optimized manually; they fall into five categories that have become fairly standard in the literature: Gender Agreement: Includes features to test a strict match of gender (e.g., a masculine pronoun and a masculine antecedent), as well as mere compatibility (e.g., a masculine pronoun with an antecedent of unknown gender). These features are more liberal than the gender-based hard constraint mentioned above.</Paragraph> <Paragraph position="4"> Number Agreement: Includes features to test a strict match of number (e.g., a singular pronoun and a singular antecedent), as well as mere compatibility (e.g., a singular pronoun with an antecedent of unknown number). These features are likewise more liberal than the number-based hard constraint mentioned above.</Paragraph> <Paragraph position="5"> Distance: Includes features pertaining to the distance between the pronoun and the potential antecedent. Examples include the number of sentences between them and the &quot;Hobbs distance&quot;, that is, the number of noun groups that have to be skipped before the potential antecedent is found per the search order used by the Hobbs algorithm (Hobbs, 1978; Ge et al., 1998).</Paragraph> <Paragraph position="6"> Grammatical Role: Includes features pertaining to the syntactic position of the potential antecedent. Examples include whether the potential antecedent appears to be the subject or object of a verb, and whether the potential antecedent is embedded in a prepositional phrase.</Paragraph> <Paragraph position="7"> Linguistic Form: Includes features pertaining to the referential form of the potential antecedent, e.g., whether it is a proper name, definite description, indefinite NP, or a pronoun.</Paragraph> <Paragraph position="8"> The values of these features - computed from TextPro's errorful shallow constituent parses comprised the input to the learning algorithm, along with the outcome as indicated by the annotated key.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 The Self-Trained Algorithm </SectionTitle> <Paragraph position="0"> The self-trained algorithm likewise uses MaxEnt, with the same feature set and shallow parser. The two systems differ in the training data utilized.</Paragraph> <Paragraph position="1"> Instead of the training corpus of 2773 annotated pronouns used in the supervised experiments, the self-trained algorithm creates training data from pronouns found in a raw corpus, particularly the newswire segment of the Topic Detection and Tracking (TDT-2) corpus. The system was evaluated on the same annotated set of 762 pronouns as the supervised system; the performance statistics reported herein are from the only time an evaluation with this data was carried out.</Paragraph> <Paragraph position="2"> The self-trained system embeds the MaxEnt algorithm in an iterative loop during which the training examples are acquired. The first phase of the algorithm builds an initial model as follows: 1. For each third-person pronoun: (a) Collect possible antecedents, that is, all of the noun groups found in the previous two sentences and to the left of the pronoun in the current sentence.</Paragraph> <Paragraph position="3"> (b) Filter them by applying the hard constraints. null (c) If only one possible antecedent remains, create a pronoun-antecedent pair and label the coreference outcome as True.</Paragraph> <Paragraph position="4"> (d) Otherwise, with some probability (0.2 in our experiments1), create a pronoun-antecedent pair for each possible antecedent and label the coreference outcome as False.</Paragraph> <Paragraph position="5"> 2. Train a MaxEnt classifier on this training data. The simplification assumed above - that coreference holds for all and only those pronouns for which TextPro and hard constraints find a single possible antecedent - is obviously false, but it nonetheless yields a model to seed the iterative part of the algorithm, which goes as follows: 3. For each pronoun in the training data acquired in step 1: (a) Apply the current MaxEnt model to each pronoun-antecedent pair.</Paragraph> <Paragraph position="6"> (b) Label the pair to which the model assigns the highest probability the coreference outcome of True. Label all other pairs (if any) for that pronoun the outcome of False.</Paragraph> <Paragraph position="7"> 4. Retrain the MaxEnt model with this new training data.</Paragraph> <Paragraph position="8"> 5. Repeat steps 3 and 4 until the training data reaches a steady state, that is, there are no pronouns for which the current model changes its preference to a different potential antecedent than it favored during the previous iteration.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Thehopeisthatimprovedpredictionsaboutwhich </SectionTitle> <Paragraph position="0"> potentialantecedentsofambiguouspronounsarecorrect will yield iteratively better models (note that the &quot;unambiguous&quot; pronoun-antecedent pairs collected in step 1c will be considered to be correct throughout). This hope is notwithstanding the fact that the algorithmisbasedonasimplifyingassumption-that each pronoun is associated with exactly one correct antecedent - that is clearly false for a variety of reasons: (i) there will be cases in which there is more than one coreferential antecedent in the search window, all but one of which will get labeled as not coreferential during any given iteration, (ii) there will be cases in which the (perhaps only) correct antecedent wasmisparsedorincorrectly weededoutbyhardconstraints, and thus not seen by the learning algorithm (presumably some of the &quot;unambiguous&quot; cases identified in step 1c will be incorrect because of this), and (iii) some of the pronouns found will not even be referential, e.g. pleonastic pronouns. The empirical question remains, however, of how good of a system can be trained under such an assumption. After all, the model probabilities need not necessarily be accurate in an absolute sense, but only in a relative one: that is, good enough so that the antecedent assigned the highest probability tends to be correct.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Hobbs Baseline </SectionTitle> <Paragraph position="0"> For comparison purposes, we also implemented a version of Hobbs's (1978) well-known pronoun interpretation algorithm, in which no machine learning is involved. This algorithm takes the syntactic representations of the sentences up to and including the current sentence as input, and performs a search for an antecedent noun phrase on these trees. Since TextPro does not build full syntactic trees for the input, we developed a version that does a simple search through the list of noun groups recognized.</Paragraph> <Paragraph position="1"> In accordance with Hobbs's search procedure, noun groups are searched in the following order: (i) in the current sentence from right-to-left, starting with the first noun group to the left of the pronoun, (ii) in the previous sentence from left-to-right, (iii) in two sentences prior from left-to-right, (iv) in the current sentence from left-to-right, starting with the first noun group to the right of the pronoun (for cataphora).</Paragraph> <Paragraph position="2"> The first noun group encountered that agrees with the pronoun with respect to number, gender, and person is chosen as the antecedent.</Paragraph> </Section> class="xml-element"></Paper>