File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1023_metho.xml

Size: 21,199 bytes

Last Modified: 2025-10-06 14:08:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1023">
  <Title>Using the Web in Machine Learning for Other-Anaphora Resolution</Title>
  <Section position="4" start_page="3" end_page="4" type="metho">
    <SectionTitle>
3 The Algorithm
</SectionTitle>
    <Paragraph position="0"> We use a Naive Bayes classifier, specifically the implementation in the Weka ML library.</Paragraph>
    <Paragraph position="1">  The training data was generated following the procedure employed by Soon et al. (2001) for coreference resolution. Every pair of an anaphor and its closest preceding antecedent created a positive training instance. To generate negative training instances, we paired anaphors with each of the NPs that intervene between the anaphor and its antecedent. This procedure produced a set of 3,084 antecedent-anaphor pairs, of which 500 (16%) were positive training instances.</Paragraph>
    <Paragraph position="2"> The classifier was trained and tested using 10-fold cross-validation. We follow the general practice of ML algorithms for coreference resolution and compute precision (P), recall (R),andF-measure (BY)on all possible anaphor-antecedent pairs.</Paragraph>
    <Paragraph position="3"> As a first approximation of the difficulty of our task, we developed a simple rule-based baseline algorithm which takes into account the fact that the lemmatised head of an other-anaphor is sometimes the same as that of its antecedent, as in (5).</Paragraph>
    <Paragraph position="4">  http://www.cs.waikato.ac.nz/AOml/weka/.</Paragraph>
    <Paragraph position="5"> We also experimented with a decision tree classifier, with Neural Networks and Support Vector Machines with Sequential Minimal Optimization (SMO), all available from Weka. These classifiers achieved worse results than NB on our data set.  same, compatible, incompatible, unknown Semantic RELATION Type of relation between anaphor and antecedent same-predicate, hypernymy, meronymy, compatible, incompatible, unknown (5) These three countries aren't completely off the hook, though. They will remain on a lowerpriority list that includes other countries [...] For each anaphor, the baseline string-compares its last (lemmatised) word with the last (lemmatised) word of each of its possible antecedents. If the words match, the corresponding antecedent is chosen as the correct one. If several antecedents produce a match, the baseline chooses the most recent one among them. If string-comparison returns no antecedent, the baseline chooses the antecedent closest to the anaphor among all antecedents. The baseline assigns &amp;quot;yes&amp;quot; to exactly one antecedent per anaphor. Its P, R and BY-measure are 27.8%.</Paragraph>
  </Section>
  <Section position="5" start_page="4" end_page="6" type="metho">
    <SectionTitle>
4 Naive Bayes without the Web
</SectionTitle>
    <Paragraph position="0"> First, we trained and tested the NB classifier with a set of 9 features motivated by our own work on other-anaphora (Modjeska, 2002) and previous ML research on coreference resolution (Aone and Bennett, 1995; McCarthy and Lehnert, 1995; Soon et al., 2001; Ng and Cardie, 2002; Strube et al., 2002).</Paragraph>
    <Section position="1" start_page="4" end_page="6" type="sub_section">
      <SectionTitle>
4.1 Features
</SectionTitle>
      <Paragraph position="0"> A set of 9 features, F1, was automatically acquired from the corpus and from additional external resources (see summary in Table 1).</Paragraph>
      <Paragraph position="1"> Non-semantic features. NP FORM is based on the POS tags in the Wall Street Journal corpus and heuristics. RESTR SUBSTR matches lemmatised strings and checks whether the antecedent string contains the anaphor string. This allows to resolve examples such as &amp;quot;one woman ringer . . . another woman&amp;quot;. The values for GRAM FUNC were approximated from the parse trees and Penn Treebank annotation. The feature SYN PAR captures syntactic parallelism between anaphor and antecedent. The feature SDIST measures the distance between anaphor and antecedent in terms of sentences.</Paragraph>
      <Paragraph position="2">  Semantic features. GENDER AGR captures agreement in gender between anaphor and antecedent, gender having been determined using gazetteers, kinship and occupational terms, titles, and Word-Net. Four values are possible: &amp;quot;same&amp;quot;, if both NPs have same gender; &amp;quot;compatible&amp;quot;, if antecedent and anaphor have compatible gender, e.g., &amp;quot;lawyer . . . other women&amp;quot;; &amp;quot;incompatible&amp;quot;, e.g., &amp;quot;Mr. Johnson . . . other women&amp;quot;; and &amp;quot;unknown&amp;quot;, if one of the NPs is undifferentiated, i.e., the gender value is &amp;quot;unknown&amp;quot;. SEMCLASS: Proper names were classified using ANNIE, part of the GATE2 software package (http://gate.ac.uk). Common nouns were looked up in WordNet, considering only the most frequent sense of each noun (the first sense in Word-Net). In each case, the output was mapped onto one of the values in Table 1. The SEMCLASS AGR fea- null We also experimented with a feature MDIST that measures intervening NP units. This feature worsened the overall performance of the classifier.</Paragraph>
      <Paragraph position="3"> ture compares the semantic class of the antecedent with that of the anaphor NP and returns &amp;quot;yes&amp;quot; if they belong to the same class; &amp;quot;no&amp;quot;, if they belong to different classes; and &amp;quot;unknown&amp;quot; if the semantic class of either the anaphor or antecedent has not been determined. The RELATION between other-anaphors and their antecedents can partially be determined by string comparison (&amp;quot;same-predicate&amp;quot;)  or WordNet (&amp;quot;hypernymy&amp;quot; and &amp;quot;meronymy&amp;quot;). As other relations, e.g. &amp;quot;redescription&amp;quot; (Ex. (3), cannot be readily determined on the basis of the information in WordNet, the following values were used: &amp;quot;compatible&amp;quot;, for NPs with compatible semantic classes, e.g., &amp;quot;woman . . . other leaders&amp;quot;; and &amp;quot;incompatible&amp;quot;, e.g., &amp;quot;woman . . . other economic indicators&amp;quot;. Compatibility can be defined along a variety of parameters. The notion we used roughly corresponds to the root level of the WordNet hierarchy. Two nouns are compatible if they have the same SEMCLASS value, e.g., &amp;quot;person&amp;quot;. &amp;quot;Unknown&amp;quot; was used if the type of relation could not be determined.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="6" end_page="7" type="metho">
    <SectionTitle>
4.2 Results
</SectionTitle>
    <Paragraph position="0"> Table 2 shows the results for the Naive Bayes classifier using F1 in comparison to the baseline.</Paragraph>
    <Paragraph position="1">  Our algorithm performs significantly better than the baseline.</Paragraph>
    <Paragraph position="2">  While these results are encouraging, there were several classification errors. Word sense ambiguity is one of the reasons for misclassifications. Antecedents were looked up in WordNet for their most frequent sense for a context-independent assignment of the values of semantic class and relations. However, in many cases either the anaphor or antecedent or both are used in a sense that is ranked as less frequent in Wordnet. This might even be a quite frequent sense for a specific corpus, e.g., the word &amp;quot;issue&amp;quot; in the sense of &amp;quot;shares, stocks&amp;quot; in the WSJ. Therefore there is a strong inter- null Same-predicate is not really a relation. We use it when the head noun of the anaphor and antecedent are the same.  We used a t-test with confidence level 0.05 for all significance tests.</Paragraph>
    <Paragraph position="3"> action between word sense disambiguation and reference resolution (see also (Preiss, 2002)). Named Entity resolution is another weak link.</Paragraph>
    <Paragraph position="4"> Several correct NE antecedents were classified as &amp;quot;antecedentBPno&amp;quot; (false negatives) because the NER module assigned the wrong class to them.</Paragraph>
    <Paragraph position="5"> The largest class of errors is however due to insufficient semantic knowledge. Problem examples can roughly be classified into five partially overlapping groups: (a) examples that suffer from gaps in Word-Net, e.g., (2); (b) examples that require domain-, situation-specific, or general world knowledge, e.g., (3); (c) examples involving bridging phenomena (sometimes triggered by a metonymic or metaphoric antecedent or anaphor), e.g., (6); (d) redescriptions and paraphrases, often involving semantically vague anaphors and/or antecedents, e.g., (7) and (3); and  (e) examples with ellipsis, e.g., (8).</Paragraph>
    <Paragraph position="6"> (6) The Justice Department's view is shared by other lawyers [...] (7) While Mr. Dallara and Japanese officials say the question of investors' access to the U.S.</Paragraph>
    <Paragraph position="7"> and Japanese markets may get a disproportionate share of the public's attention, a number of other important economic issues will be on the table at next week's talks.</Paragraph>
    <Paragraph position="8"> (8) He sees flashy sports as the only way the last null place network can cut through the clutter of cable and VCRs, grab millions of new viewers and tell them about other shows premiering a few weeks later.</Paragraph>
    <Paragraph position="9"> In (6), the antecedent is an organization-for-people metonymy. In (7), the question of investors' access to the U.S. and Japanese markets is characterized as an important economic issue. Also, the head &amp;quot;issues&amp;quot; is lexically uninformative to sufficiently constrain the search space for the antecedent. In (8), the antecedent is not the flashy sports, but rather flashy sport shows, and thus an important piece of information is omitted. Alternatively, the antecedent is a content-for-container metonymy.</Paragraph>
    <Paragraph position="10"> Overall, our approach misclassifies antecedents whose relation to the other-anaphor is based on similarity, property-sharing, causality, or is constrained to a specific domain. These relation types are not -and perhaps should not be -- encoded in WordNet.</Paragraph>
  </Section>
  <Section position="7" start_page="7" end_page="14" type="metho">
    <SectionTitle>
5 Naive Bayes with the Web
</SectionTitle>
    <Paragraph position="0"> With its approximately 3033M pages  the Web is the largest corpus available to the NLP community. Building on our approach in (Markert et al., 2003), we suggest using the Web as a knowledge source for anaphora resolution. In this paper, we show how to integrate Web counts for lexico-syntactic patterns specific to other-anaphora into our ML approach.</Paragraph>
    <Section position="1" start_page="8" end_page="11" type="sub_section">
      <SectionTitle>
5.1 Basic Idea
</SectionTitle>
      <Paragraph position="0"> In the examples we consider, the relation between anaphor and antecedent is implicitly expressed, i.e., anaphor and antecedent do not stand in a structural relationship. However, they are linked by a strong semantic relation that is likely to be structurally ex- null plicitly expressed in other texts. We exploit this insight by adopting the following procedure: 1. In other-anaphora, a hyponymy/similarity relation between the lexical heads of anaphor and antecedent is exploited or stipulated by the context, null  e.g. that &amp;quot;schools&amp;quot; is an alternative term for universities in Ex. (2) or that age is viewed as a risk factor in Ex. (3).</Paragraph>
      <Paragraph position="1">  2. We select patterns that structurally explicitly express the same lexical relations. E.g., the list-context NP BD and other NP BE (as Ex. (4)) usually expresses hyponymy/similarity relations between the hyponym NP BD and its hypernym NP BE (Hearst, 1992).</Paragraph>
      <Paragraph position="2"> 3. If the implicit lexical relationship between anaphor and antecedent is strong, it is likely that anaphor and antecedent also frequently cooccur in the selected explicit patterns. We instantiate the explicit pattern for all anaphor-antecedent pairs. In (2) the pattern NP BD and other NP BE is instantiated with e.g., counterparts and other schools, sports and other schools and universities and other schools.</Paragraph>
      <Paragraph position="3">  In the Web feature context, we will often use &amp;quot;anaphor/antecedent&amp;quot; instead of the more cumbersome &amp;quot;lexical heads of the anaphor/antecedent&amp;quot;.  These simplified instantiations serve as an example and are neither exhaustive nor the final instantiations we use; see Section 5.3.</Paragraph>
      <Paragraph position="4"> searched in any corpus to determine their frequencies. The rationale is that the most frequent of these instantiated patterns is a good clue for the correct antecedent.</Paragraph>
      <Paragraph position="5"> 4. As the patterns can be quite elaborate, most corpora will be too small to determine the corresponding frequencies reliably. The instantiation universities and other schools, e.g., does not occur at all in the British National Corpus (BNC), a 100M words corpus of British English.</Paragraph>
      <Paragraph position="6">  Therefore we use the largest corpus available, the Web. We submit all instantiated patterns as queries making use of the Google API technology. Here, universities and other schools yields over 700 hits, whereas the other two instantiations yield under 10 hits each. High frequencies do not only occur for synonyms; the corresponding instantiation for the correct antecedent in Ex. (3) age and other risk factors yields over 400 hits on the Web and again none in the BNC.</Paragraph>
    </Section>
    <Section position="2" start_page="11" end_page="12" type="sub_section">
      <SectionTitle>
5.2 Antecedent Preparation
</SectionTitle>
      <Paragraph position="0"> In addition to the antecedent preparation described in Section 2, further processing is necessary. First, pronouns can be antecedents of other-anaphors but they were not used as Web query input as they are lexically empty. Second, all modification was eliminated and only the rightmost noun of compounds was kept, to avoid data sparseness. Third, using patterns containing NEs such as &amp;quot;Will Quinlan&amp;quot; in (9) also leads to data sparseness (see also the use of NE recognition for feature SEMCLASS).</Paragraph>
      <Paragraph position="1"> (9) [...] Will Quinlan had not inherited a damaged retinoblastoma supressor gene and, therefore, faced no more risk than other children [...] We resolved NEs in two steps. In addition to GATE's classification into ENAMEX and NU-MEX categories, we used heuristics to automatically obtain more fine-grained distinctions for the categories LOCATION, ORGANIZATION, DATE and MONEY, whenever possible. No further distinctions were made for the category PERSON.We classified LOCATIONS into COUNTRY,(US)STATE, CITY, RIVER, LAKE and OCEAN, using mainly  If an entity classified by GATE as ORGANIZATION contained an indication of the organization type, we used this as a subclassification; therefore &amp;quot;Bank of America&amp;quot; is classified as BANK.ForDATE and MONEY entities we used simple heuristics to classify them further into DAY, MONTH, YEAR as well as DOLLAR.</Paragraph>
      <Paragraph position="2"> From now on we call BT the list of possible antecedents and CPD2CP the anaphor. For (2), this list</Paragraph>
      <Paragraph position="4"/>
    </Section>
    <Section position="3" start_page="12" end_page="14" type="sub_section">
      <SectionTitle>
5.3 Queries and Scoring Method
</SectionTitle>
      <Paragraph position="0"> We use the list-context pattern:  . This is a consequence of the substitution of the antecedent (&amp;quot;Will Quinlan&amp;quot;)  They were extracted from the Web. Small gazetteers, containing in all about 500 entries, are sufficient. This is the only external knowledge collected for the Web feature.  In all patterns in this paper, &amp;quot;OR&amp;quot; is the boolean operator,  Common noun instantiations are marked by a superscript &amp;quot;c&amp;quot; and proper name instantiations by a superscript &amp;quot;p&amp;quot;. with its NE category (&amp;quot;person&amp;quot;); such an instantiation is not frequent, since it violates standard relations within (O1). Therefore, we also instantiate  in Table 3).</Paragraph>
      <Paragraph position="1"> Patterns and instantiations are summarised in Table 3. We submit these instantiations as queries to the Google search engine.</Paragraph>
      <Paragraph position="2"> For each antecedent CPD2D8 in BT we obtain the raw frequencies of all instantiations it occurs in (C1  other antecedents (including pronouns) get the feature value &amp;quot;webrest&amp;quot;. We chose this method instead of e.g., giving score intervals for two reasons. First, since score intervals are unique for each anaphor, it is not straightforward to incorporate them into a ML framework in a consistent manner. Second, this method introduces an element of competition between several antecedents (see also (Connolly et al., 1997)), which the individual scores do not reflect. We trained and tested the NB classifier with the feature set F1, plus the Web feature. The last row in Table 4 shows the results. We obtained a 9.1 percentage point improvement in precision (an 18% improvement relative to the F1 feature set) and a 12.8 percentage point improvement in recall (32% improvement relative to F1), which amounts to an 11.4 percentage point improvement in BY-measure (25% improvement relative to F1 feature set). In particular, all the examples in this paper were resolved. Our algorithm still misclassified several antecedents. Sometimes even the Web is not large enough to contain the instantiated pattern, especially when this is situation or speaker specific. Another problem is the high number of NE antecedents (39.6%) in our corpus. While our NER module is quite good, any errors in NE classification lead to incorrect instantiations and thus to incorrect classifications. In addition, the Web feature does not yet take into account pronouns (7.43% of all correct and potential antecedents in our corpus).</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="14" end_page="14" type="metho">
    <SectionTitle>
6 Related Work and Discussion
</SectionTitle>
    <Paragraph position="0"> Modjeska (2002) presented two hand-crafted algorithms, SAL and LEX, which resolve the anaphoric references of other-NPs on the basis of grammatical salience and lexical information from WordNet, respectively. In our own previous work (Markert et  al., 2003) we presented a preliminary symbolic approach that uses Web counts and a recency-based tie-breaker for resolution of other-anaphora and bridging descriptions. (For another Web-based symbolic approach to bridging see (Bunescu, 2003).) The approach described in this paper is the first machine learning approach to other-anaphora. It is not directly comparable to the symbolic approaches above for two reasons. First, the approaches differ in the data and the evaluation metrics they used. Second, our algorithm does not yet constitute a full resolution procedure. As the classifier operates on the whole set of antecedent-anaphor pairs, more than one potential antecedent for each anaphor can be classified as &amp;quot;antecedentBPyes&amp;quot;. This can be amended by e.g. incremental processing. Also, the classifier does not know that each other-NP is anaphoric and therefore has an antecedent. (This contrasts with e.g. definite NPs.) Thus, it can classify all antecedents as &amp;quot;antecedentBPno&amp;quot;. This can be remedied by using a back-off procedure, or a competition learning approach (Connolly et al., 1997). Finally, the full resolution procedure will have to take into account other factors, e.g., syntactic constraints on antecedent realization.</Paragraph>
    <Paragraph position="1"> Our approach is the first ML approach to any kind of anaphora that integrates the Web. Using the Web as a knowledge source has considerable advantages.</Paragraph>
    <Paragraph position="2"> First, the size of the Web almost eliminates the problem of data sparseness for our task. For this reason, using the Web has proved successful in several other fields of NLP, e.g., machine translation (Grefenstette, 1999) and bigram frequency estimation (Keller et al., 2002). In particular, (Keller et al., 2002) have shown that using the Web handles data sparseness better than smoothing. Second, we do not process the returned Web pages in any way (tagging, parsing, e.g.), unlike e.g. (Hearst, 1992; Poesio et al., 2002). Third, the linguistically motivated patterns we use reduce long-distance dependencies between anaphor and antecedent to local dependencies. By looking up these patterns on the Web we obtain semantic information that is not and perhaps should not be encoded in an ontology (redescriptions, vague relations, etc.). Finally, these local dependencies also reduce the need for prior word sense disambiguation, as the anaphor and the antecedent constrain each other's sense within the context of the pattern.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML