XML Viewer - p06-1005

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1005_metho.xml
Size: 21,077 bytes
Last Modified: 2025-10-06 14:10:17
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1005">
  <Title>Bootstrapping Path-Based Pronoun Resolution</Title>
  <Section position="5" start_page="33" end_page="35" type="metho">
    <SectionTitle>
3 Path Coreference
</SectionTitle>
    <Paragraph position="0"> We de ne a dependency path as the sequence of nodes and dependency labels between two potentially coreferent entities in a dependency parse tree. We use the structure induced by the minimalist parser Minipar (Lin, 1998) on sentences from the news corpus described in Section 4. Figure 1 gives the parse tree of (2). As a short-form, we  write the dependency path in this case as Noun needs pronoun's support. The path itself does not include the terminal nouns John and his.</Paragraph>
    <Paragraph position="1"> Our algorithm nds the likelihood of coreference along dependency paths by counting the number of times they occur with terminals that are either likely coreferent or non-coreferent. In the simplest version, we count paths with terminals that are both pronouns. We partition pronouns into seven groups of matching gender, number, and person; for example, the rst person singular group contains I, me, my, mine, and myself. If the two terminal pronouns are from the same group, coreference along the path is likely. If they are from different groups, like I and his, then they are non-coreferent. Let NS(p) be the number of times the two terminal pronouns of a path, p, are from the same pronoun group, and let ND(p) be the number of times they are from different groups.</Paragraph>
    <Paragraph position="2"> We de ne the coreference of p as:</Paragraph>
    <Paragraph position="4"> Our statistics indicate the example path, Noun needs pronoun's support, has a low C(p) value.</Paragraph>
    <Paragraph position="5"> We could use this fact to prevent us from resolving his to John when John needs his support is presented to a pronoun resolution system.</Paragraph>
    <Paragraph position="6"> To mitigate data sparsity, we represent the path with the root form of the verbs and nouns. Also, we use Minipar's named-entity recognition to replace named-entity nouns by the semantic category of their named-entity, when available. All modi ers not on the direct path, such as adjectives, determiners and adverbs, are not considered. We limit the maximum path length to eight nodes.</Paragraph>
    <Paragraph position="7"> Tables 1 and 2 give examples of coreferent and non-coreferent paths learned by our algorithm and identi ed in our test sets. Coreferent paths are de ned as paths with a C(p) value (and overall number of occurrences) above a certain threshold, indicating the terminal entities are highly likely  1. Noun left ... to pronoun's wife Buffett will leave the stock to his wife. 2. Noun says pronoun intends... The newspaper says it intends to le a lawsuit. 3. Noun was punished for pronoun's crime. The criminal was punished for his crime. 4. ... left Noun to fend for pronoun-self They left Jane to fend for herself. 5. Noun lost pronoun's job. Dick lost his job.</Paragraph>
    <Paragraph position="8"> 6. ... created Noun and populated pronoun. Nzame created the earth and populated it 7. Noun consolidated pronoun's power. The revolutionaries consolidated their power. 8. Noun suffered ... in pronoun's knee ligament. The leopard suffered pain in its knee ligament.  to corefer. Non-coreferent paths have a C(p) below a certain cutoff; the terminals are highly unlikely to corefer. Especially note the challenge of resolving most of the examples in Table 2 without path coreference information. Although these paths encompass some cases previously covered by Binding Theory (e.g. Mary suspended her, her cannot refer to Mary by Principle B (Haegeman, 1994)), most have no syntactic justi cation for non-coreference per se. Likewise, although Binding Theory (Principle A) could identify the re exive pronominal relationship of Example 4 in Table 1, most cases cannot be resolved through syntax alone. Our analysis shows that successfully handling cases that may have been handled with Binding Theory constitutes only a small portion of the total performance gain using path coreference.</Paragraph>
    <Paragraph position="9"> In any case, Binding Theory remains a challenge with a noisy parser. Consider: Alex gave her money. Minipar parses her as a possessive, when it is more likely an object, Alex gave money to her. Without a correct parse, we cannot rule out the link between her and Alex through Binding Theory. Our algorithm, however, learns that the path Noun gave pronoun's money, is noncoreferent. In a sense, it corrects for parser errors by learning when coreference should be blocked, given any consistent parse of the sentence.</Paragraph>
    <Paragraph position="10"> We obtain path coreference for millions of paths from our parsed news corpus (Section 4). While Tables 1 and 2 give test set examples, many other interesting paths are obtained. We learn coreference is unlikely between the nouns in Bob married his mother, or Sue wrote her obituary. The fact you don't marry your own mother or write your own obituary is perhaps obvious, but this is the rst time this kind of knowledge has been made available computationally. Naturally, exceptions to the coreference or non-coreference of some of these paths can be found; our patterns represent general trends only. And, as mentioned above, reliable path coreference is somewhat dependent on consistent parsing.</Paragraph>
    <Paragraph position="11"> Paths connecting pronouns to pronouns are different than paths connecting both nouns and pronouns to pronouns the case we are ultimately interested in resolving. Consider Company A gave its data on its website. The pronoun-pronoun path coreference algorithm described above would learn the terminals in Noun's data on pronoun's website are often coreferent. But if we see the phrase Company A gave Company B's data on its website, then its is not likely to refer to Company B, even though we identi ed this as a coreferent path! We address this problem with a two-stage extraction procedure. We rst bootstrap gender/number information using the pronoun-pronoun paths as described in Section 4.1. We then use this gender/number information to count paths where an initial noun (with probabilisticallyassigned gender/number) and following pronoun are connected by the dependency path, recording the agreement or disagreement of their gender/number category.1 These superior paths are then used to re-bootstrap our nal gender/number information used in the evaluation (Section 6).</Paragraph>
    <Paragraph position="12"> We also bootstrap paths where the nodes in the path are replaced by their grammatical category. This allows us to learn general syntactic constraints not dependent on the surface forms of the words (including, but not limited to, the Binding Theory principles). A separate set of these non-coreferent paths is also used as a feature in our sys1As desired, this modi cation allows the rst example to provide two instances of noun-pronoun paths with terminals from the same gender/number group, linking each its to the subject noun Company A , rather than to each other.</Paragraph>
    <Paragraph position="13">  Pattern Example 1. Noun thanked ... for pronoun's assistance John thanked him for his assistance. 2. Noun wanted pronoun to lie. The president wanted her to lie. 3. ... Noun into pronoun's pool Max put the floaties into their pool. 4. ... use Noun to pronoun's advantage The company used the delay to its advantage. 5. Noun suspended pronoun Mary suspended her.</Paragraph>
    <Paragraph position="14"> 6. Noun was pronoun's relative. The Smiths were their relatives. 7. Noun met pronoun's demands The players' association met its demands. 8. ... put Noun at the top of pronoun's list. The government put safety at the top of its list.  tem. We also tried expanding our coverage by using paths similar to paths with known path coreference (based on distributionally similar words), but this did not generally increase performance.</Paragraph>
  </Section>
  <Section position="6" start_page="35" end_page="37" type="metho">
    <SectionTitle>
4 Bootstrapping in Pronoun Resolution
</SectionTitle>
    <Paragraph position="0"> Our determination of path coreference can be considered a bootstrapping procedure. Furthermore, the coreferent paths themselves can serve as the seed for bootstrapping additional coreference information. In this section, we sketch previous approaches to bootstrapping in coreference resolution and explain our new ideas.</Paragraph>
    <Paragraph position="1"> Coreference bootstrapping works by assuming resolutions in unlabelled text, acquiring information from the putative resolutions, and then making inferences from the aggregate statistical data. For example, we assumed two pronouns from the same pronoun group were coreferent, and deduced path coreference from the accumulated counts.</Paragraph>
    <Paragraph position="2"> The potential of the bootstrapping approach can best be appreciated by imagining millions of documents with coreference annotations. With such a set, we could extract ne-grained features, perhaps tied to individual words or paths. For example, we could estimate the likelihood each noun belongs to a particular gender/number class by the proportion of times this noun was labelled as the antecedent for a pronoun of this particular gender/number.</Paragraph>
    <Paragraph position="3"> Since no such corpus exists, researchers have used coarser features learned from smaller sets through supervised learning (Soon et al., 2001; Ng and Cardie, 2002), manually-de ned coreference patterns to mine speci c kinds of data (Bean and Riloff, 2004; Bergsma, 2005), or accepted the noise inherent in unsupervised schemes (Ge et al., 1998; Cherry and Bergsma, 2005).</Paragraph>
    <Paragraph position="4"> We address the drawbacks of these approaches  tions in the bootstrapping. Because we can vary the threshold for de ning a coreferent path, we can trade-off coverage for precision. We now outline two potential uses of bootstrapping with coreferent paths: learning gender/number information (Section 4.1) and augmenting a semantic compatibility model (Section 4.2). We bootstrap this data on our automatically-parsed news corpus. The corpus comprises 85 GB of news articles taken from the world wide web over a 1-year period.</Paragraph>
    <Section position="1" start_page="35" end_page="36" type="sub_section">
      <SectionTitle>
4.1 Probabilistic Gender/Number
</SectionTitle>
      <Paragraph position="0"> Bergsma (2005) learns noun gender (and number) from two principal sources: 1) mining it from manually-de ned lexico-syntactic patterns in parsed corpora, and 2) acquiring it on the y by counting the number of pages returned for various gender-indicating patterns by the Google search engine. The web-based approach outperformed the corpus-based approach, while a system that combined the two sets of information resulted in the highest performance (Table 3). The combined gender-classifying system is a machine-learned classi er with 20 features.</Paragraph>
      <Paragraph position="1"> The time delay of using an Internet search engine within a large-scale anaphora resolution effort is currently impractical. Thus we attempted  gender and number, where the information can be stored in advance in a table, but using a much larger data set. Bergsma ran his extraction on roughly 6 GB of text; we used roughly 85 GB.</Paragraph>
      <Paragraph position="2"> Using the test set from Bergsma (2005), we were only able to boost performance from an F-Score of 85.4% to one of 88.0% (Table 3). This result led us to re-examine the high performance of Bergsma's web-based approach. We realized that the corpus-based and web-based approaches are not exactly symmetric. The corpus-based approaches, for example, would not pick out gender from a pattern such as John and his friends... because Noun and pronoun's NP is not one of the manually-de ned gender extraction patterns. The web-based approach, however, would catch this instance with the John * his/her/its/their template, where * is the Google wild-card operator. Clearly, there are patterns useful for capturing gender and number information beyond the pre-de ned set used in the corpus-based extraction. We thus decided to capture gender/number information from coreferent paths. If a noun is connected to a pronoun of a particular gender along a coreferent path, we count this as an instance of that noun being that gender. In the end, the probability that the noun is a particular gender is the proportion of times it was connected to a pronoun of that gender along a coreferent path. Gender information becomes a single intuitive, accessible feature (i.e. the probability of the noun being that gender) rather than Bergsma's 20-dimensional feature vector requiring search-engine queries to instantiate. We acquire gender and number data for over 3 million nouns. We use add-one smoothing for data sparsity. Some example gender/number probabilities are given in Table 4 (cf. (Ge et al., 1998; Cherry and Bergsma, 2005)). We get a performance of 90.3% (Table 3), again meeting our requirements of high performance and allowing for a fast, practical implementation. This is lower than Bergsma's top score of 92.2% (Figure 3), but again, Bergsma's top system relies on Google search queries for each new word, while ours are all pre-stored in a table for fast access.</Paragraph>
      <Paragraph position="3"> We are pleased to be able to share our gender and number data with the NLP community.2 In Section 6, we show the bene t of this data as a probabilistic feature in our pronoun resolution system. Probabilistic data is useful because it allows us to rapidly prototype resolution systems without incurring the overhead of large-scale lexical databases such as WordNet (Miller et al., 1990).</Paragraph>
    </Section>
    <Section position="2" start_page="36" end_page="37" type="sub_section">
      <SectionTitle>
4.2 Semantic Compatibility
</SectionTitle>
      <Paragraph position="0"> Researchers since Dagan and Itai (1990) have variously argued for and against the utility of collocation statistics between nouns and parents for improving the performance of pronoun resolution.</Paragraph>
      <Paragraph position="1"> For example, can the verb parent of a pronoun be used to select antecedents that satisfy the verb's selectional restrictions? If the verb phrase was shatter it, we would expect it to refer to some kind of brittle entity. Like path coreference, semantic compatibility can be considered a form of world knowledge needed for more challenging pronoun resolution instances.</Paragraph>
      <Paragraph position="2"> We encode the semantic compatibility between a noun and its parse tree parent (and grammatical relationship with the parent) using mutual information (MI) (Church and Hanks, 1989). Suppose we are determining whether ham is a suitable antecedent for the pronoun it in eat it. We calculate the MI as: MI(eat:obj, ham) = log Pr(eat:obj:ham)Pr(eat:obj)Pr(ham) Although semantic compatibility is usually only computed for possessive-noun, subject-verb, and verb-object relationships, we include 121 different kinds of syntactic relationships as parsed in our news corpus.3 We collected 4.88 billion parent:rel:node triples, including over 327 million possessive-noun values, 1.29 billion subject-verb and 877 million verb-direct object. We use small probability values for unseen Pr(parent:rel:node), Pr(parent:rel), and Pr(node) cases, as well as a default MI when no relationship is parsed, roughly optimized for performance on the training set. We  include both the MI between the noun and the pronoun's parent as well as the MI between the pronoun and the noun's parent as features in our pronoun resolution classi er.</Paragraph>
      <Paragraph position="3"> Kehler et al. (2004) saw no apparent gain from using semantic compatibility information, while Yang et al. (2005) saw about a 3% improvement with compatibility data acquired by searching on the world wide web. Section 6 analyzes the contribution of MI to our system.</Paragraph>
      <Paragraph position="4"> Bean and Riloff (2004) used bootstrapping to extend their semantic compatibility model, which they called contextual-role knowledge, by identifying certain cases of easily-resolved anaphors and antecedents. They give the example Mr. Bush disclosed the policy by reading it. Once we identify that it and policy are coreferent, we include read:obj:policy as part of the compatibility model.</Paragraph>
      <Paragraph position="5"> Rather than using manually-de ned heuristics to bootstrap additional semantic compatibility information, we wanted to enhance our MI statistics automatically with coreferent paths. Consider the phrase, Saddam's wife got a Jordanian lawyer for her husband. It is unlikely we would see wife's husband in text; in other words, we would not know that husband:gen:wife is, in fact, semantically compatible and thereby we would discourage selection of wife as the antecedent at resolution time. However, because Noun gets ...</Paragraph>
      <Paragraph position="6"> for pronoun's husband is a coreferent path, we could capture the above relationship by adding a parent:rel:node for every pronoun connected to a noun phrase along a coreferent path in text.</Paragraph>
      <Paragraph position="7"> We developed context models with and without these path enhancements, but ultimately we could nd no subset of coreferent paths that improve the semantic compatibility's contribution to training set accuracy. A mutual information model trained on 85 GB of text is fairly robust on its own, and any kind of bootstrapped extension seems to cause more damage by increased noise than can be compensated by increased coverage. Although we like knowing audiences have noses, e.g. the audience turned up its nose at the performance, such phrases are apparently quite rare in actual test sets.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="37" end_page="38" type="metho">
    <SectionTitle>
5 Experimental Design
</SectionTitle>
    <Paragraph position="0"> The noun-pronoun path coreference can be used directly as a feature in a pronoun resolution system. However, path coreference is unde ned for cases where there is no path between the pronoun and the candidate noun for example, when the candidate is in the previous sentence. Therefore, rather than using path coreference directly, we have features that are true if C(p) is above or below certain thresholds. The features are thus set when coreference between the pronoun and candidate noun is likely (a coreferent path) or unlikely (a non-coreferent path).</Paragraph>
    <Paragraph position="1"> We now evaluate the utility of path coreference within a state-of-the-art machine-learned resolution system for third-person pronouns with nominal antecedents. A standard set of features is used along with the bootstrapped gender/number, semantic compatibility, and path coreference information. We refer to these features as our probabilistic features (Prob. Features) and run experiments using the full system trained and tested with each absent, in turn (Table 5). We have 29 features in total, including measures of candidate distance, frequency, grammatical role, and different kinds of parallelism between the pronoun and the candidate noun. Several reliable features are used as hard constraints, removing candidates before consideration by the scoring algorithm.</Paragraph>
    <Paragraph position="2"> All of the parsing, noun-phrase identi cation, and named-entity recognition are done automatically with Minipar. Candidate antecedents are considered in the current and previous sentence only. We use SVMlight (Joachims, 1999) to learn a linear-kernel classi er on pairwise examples in the training set. When resolving pronouns, we select the candidate with the farthest positive distance from the SVM classi cation hyperplane.</Paragraph>
    <Paragraph position="3"> Our training set is the anaphora-annotated portion of the American National Corpus (ANC) used in Bergsma (2005), containing 1270 anaphoric pronouns4. We test on the ANC Test set (1291 instances) also used in Bergsma (2005) (highest resolution accuracy reported: 73.3%), the anaphoralabelled portion of AQUAINT used in Cherry and Bergsma (2005) (1078 instances, highest accuracy: 71.4%), and the anaphoric pronoun subset of the MUC7 (1997) coreference evaluation formal test set (169 instances, highest precision of 62.1 reported on all pronouns in (Ng and Cardie, 2002)). These particular corpora were chosen so we could test our approach using the same data as comparable machine-learned systems exploiting probabilistic information sources. Parameters  were set using cross-validation on the training set; test sets were used only once to obtain the nal performance values.</Paragraph>
    <Paragraph position="4"> Evaluation Metric: We report results in terms of accuracy: Of all the anaphoric pronouns in the test set, the proportion we resolve correctly.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML