File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/w01-0716_evalu.xml

Size: 11,896 bytes

Last Modified: 2025-10-06 13:58:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0716">
  <Title>Learning to identify animate references</Title>
  <Section position="6" start_page="3" end_page="6" type="evalu">
    <SectionTitle>
4 Evaluation and discussion
</SectionTitle>
    <Paragraph position="0"> In this section we examine the performance of the system, particularly with respect to the classification of nouns; investigate sources of errors; and highlight directions for future research and improvements to the system.</Paragraph>
    <Section position="1" start_page="3" end_page="4" type="sub_section">
      <SectionTitle>
4.1 The performance of the system
</SectionTitle>
      <Paragraph position="0"> The system was evaluated with respect to two corpora. The first one consists of the files selected from the SEMCOR corpus stripped of the sense annotation. The second one is a selection of texts from Amnesty International (AI) used in our previous research. These texts have been selected because they include a relatively large number of references to animate entities. By including the texts from the second corpus we could compare the results of our previous system with those obtained here. In addition, we can assess the results of the algorithm on data which was not used to determine the animacy of the senses. The characteristics of the two corpora are presented in  In this research three measures were used to assess the performance of the algorithm: accuracy, precision and recall. The accuracy is the ratio between the number of items correctly classified and the total number of items to be classified. This measure assesses the performance of the classification algorithm, but can be slightly misleading because of the greater number of inanimate entities in texts. In order to alleviate this problem, we computed the precision and recall for each type of classification. The precision with which the method classifies animate entities is defined as the ratio between the number of entities it correctly classifies as animate and the total number of entities it classifies as animate (including the ones wrongly assigned to this class). The method's recall over this task is defined as the ratio between the number of entities correctly classified as animate by the method and the total number of animate entities to be classified. The precision and recall for inanimate entities is defined in a similar manner.</Paragraph>
      <Paragraph position="1"> We consider that by using recall and precision for each type of entity we can better assess the performance of the algorithms. This is mainly because the large number of inanimate entities are considered separately from the smaller number of animate entities. In addition to this, by separating  the evaluation of the classification of animate entities from the one for inanimate entities we can assess the difficulty of each classification.</Paragraph>
      <Paragraph position="2"> Table 3 presents the results of the method on the two data sets. For the experiment with the SEMCOR corpus, we evaluated it using five-fold cross-validation. We randomly split the whole corpus into five disjoint parts, using four parts for training and one for evaluation. We repeated the training-evaluation cycle five times, making sure that the whole corpus was used. Note that for each iteration of the cross-validation, the learning process begins from scratch. The results reported were obtained by averaging the error rates from each of the 5 runs. In the second experiment, all 52 files from the SEMCOR corpus were used for training and the texts from Amnesty International for testing.</Paragraph>
      <Paragraph position="3"> In addition to the results of the method presented in this paper, Table 3 presents the results of a baseline method and of the method previously proposed in (Evans and OrVasan, 2000). In the baseline method, the probability that an entity is classified as animate is proportional to the number of animate third person singular pronouns in the text.</Paragraph>
      <Paragraph position="4"> As can be seen in Table 3 the accuracy of the baseline is very low. The results of our previous method are considerably higher, but still poor in the case of animate entities with many of these being classified as inanimate.</Paragraph>
    </Section>
    <Section position="2" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
This can
</SectionTitle>
      <Paragraph position="0"> Due to time constraints and the large amount of effort be explained by the fact that most of the unique beginners were classified as inanimate, and therefore there is a tendency to classify entities as inanimate. The best results were obtained by the new method over both corpora, the main improvement being noticed in the classification of animate entities.</Paragraph>
      <Paragraph position="1"> Throughout this section we referred to the classification of ambiguous nouns without trying to assess how successful the classification of the synsets in WordNet was. Such an assessment would be interesting, but would require manual classification of the nodes in WordNet, and therefore would be somewhat time consuming.</Paragraph>
      <Paragraph position="2"> Even though this evaluation was not carried out, the high accuracy of the system suggests that the current classification is useful.</Paragraph>
    </Section>
    <Section position="3" start_page="4" end_page="6" type="sub_section">
      <SectionTitle>
4.2 Comments and error analysis
</SectionTitle>
      <Paragraph position="0"> During the training phase of TiMBL, the program computes the importance of each feature for the classification. The most important feature according to the gain ratio is the number of animate senses of a noun followed by the number of inanimate senses of the noun. This was expected given that our method is based on the idea that in most of the cases the number of animate and inanimate senses determines the animacy of a noun. However, this would mean that the same noun will be classified in the same required to transform the input data into a format usable by the previous method, it was not possible to assess its performance with respect to the SEMCOR corpus.</Paragraph>
      <Paragraph position="1"> way regardless of the text. Therefore, three text dependent features were introduced. They are the number of animate and inanimate senses of the predicate of the sentence if the noun is a subject, and the ratio between the number of animate third-person singular pronouns and inanimate third-person singular pronouns in the text. In terms of importance, gain ratio ranks them fourth, fifth and sixth, respectively, after the lemma of the noun. The lemma of the noun was included because it was noticed that this improves the accuracy of the method.</Paragraph>
      <Paragraph position="2"> During the early stages of the evaluation, the classification of personal names proved to be a constant source of errors. Further investigation showed that the system performed poorly on all types of named entities. For the named entities referring to companies, products, etc. this can be explained by the fact that in many cases they are not found in WordNet. However, in most cases the system correctly classified them as inanimate, having learned that most unknown words belong to this class. Entities denoted by personal names were constantly misclassified either because the names were not in WordNet or else they appeared with a substantial number of inanimate senses (e.g. the names Bob and Maria do not have any senses in WordNet which could relate them to animate entities). In light of these errors we decided not to present our system with named entities. With no access to more accurate techniques, we considered non-sentence-initial capitalised words as named entities and removed them from the evaluation data. Even when this crude filtering was applied, we still presented a significant number of proper names to our system. This partially explains its lower accuracy with respect to the classification of animate entities.</Paragraph>
      <Paragraph position="3"> By attempting to filter proper names, we could not compare the new system with the one referred to as the extended algorithm in (Evans and OrVasan, 2000). In future, we plan to address the problem of named entities by using gazetteers or, alternatively, developing more sophisticated named entity recognition methods.</Paragraph>
      <Paragraph position="4"> Another source of errors is the unusual usage of senses. For example someone can refer to their pet with he or she, and therefore according to our definition they should be considered animate.</Paragraph>
      <Paragraph position="5"> However, given the way the algorithm is designed there is no way to take these special uses into consideration.</Paragraph>
      <Paragraph position="6">  Another problem with the method is the fact that all the senses have the same weight. This means that a word like pupil, which has two animate senses and one inanimate, is highly unlikely to be classified as inanimate, even if it used to refer to a specific part of the eye.</Paragraph>
      <Paragraph position="7">  The ideal solution to this problem would be to disambiguate the words, but this would require an accurate disambiguation method. An alternative solution is to weight the senses with respect to the text. In this way, if a sense is more likely to be used in a text, its animacy/inanimacy will have greater influence on the classification process. At present, we are trying to integrate the word sense disambiguation method proposed in (Resnik, 1995) into our system. We hope that this will particularly improve the classification of animate entities.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="6" end_page="6" type="evalu">
    <SectionTitle>
5 Related work
</SectionTitle>
    <Paragraph position="0"> Most of the work on animacy/gender recognition has been done in the field of anaphora resolution.</Paragraph>
    <Paragraph position="1"> The automatic recognition of NP gender on the basis of statistical information has been attempted before (Hale and Charniak, 1998).</Paragraph>
    <Paragraph position="2"> That method operates by counting the frequency with which a NP is identified as the antecedent of a gender-marked pronoun by a simplistic pronoun resolution system. It is reported that by using the syntactic Hobbs algorithm (Hobbs, 1976) for pronoun resolution, the method was able to assign the correct gender to proper nouns in a text with 68.15% precision, though the method was not evaluated with respect to the recognition of gender in common NPs. The method has two main drawbacks. Firstly, it is likely to be ineffective over small texts. Secondly, it seems  However, it is possible to reclassify the nodes from WordNet using an annotated corpus where the pets are animate, but this would make the system consider all the animals which can be pets animate.</Paragraph>
    <Paragraph position="3">  Actually the only way this word would be classified as inanimate is if it is in the subject position, and most of the senses of its main verb are inanimate. This is explained by the way the senses are weighted by the machine learning algorithm.</Paragraph>
    <Paragraph position="4"> that the approach makes the assumption that anaphora resolution is already effective, even though, in general, anaphora resolution systems rely on gender filtering.</Paragraph>
    <Paragraph position="5"> In (Denber, 1998), WordNet was used to determine the animacy of nouns and associate them with gender-marked pronouns. The details presented are sparse and no evaluation is given. Cardie and Wagstaff (1999) combined the use of WordNet with proper name gazetteers in order to obtain information on the compatibility of coreferential NPs in their clustering algorithm. Again, no evaluation was presented with respect to the accuracy of this animacy classification task.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML