File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/j02-2003_abstr.xml

Size: 5,361 bytes

Last Modified: 2025-10-06 13:42:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="J02-2003">
  <Title>c(c) 2002 Association for Computational Linguistics Class-Based Probability Estimation Using a Semantic Hierarchy</Title>
  <Section position="2" start_page="0" end_page="188" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> This article concerns the problem of how to estimate the probabilities of noun senses appearing as particular arguments of predicates. Such probabilities can be useful for a variety of natural language processing (NLP) tasks, such as structural disambiguation and statistical parsing, word sense disambiguation, anaphora resolution, and language modeling. To see how such knowledge can be used to resolve structural ambiguities, consider the following prepositional phrase attachment ambiguity: Example 1 Fred ate strawberries with a spoon.</Paragraph>
    <Paragraph position="1"> The ambiguity arises because the prepositional phrase with a spoon can attach to either strawberries or ate. The ambiguity can be resolved by noting that the correct sense of spoon is more likely to be an argument of &amp;quot;ate-with&amp;quot; than &amp;quot;strawberries-with&amp;quot; (Li and Abe 1998; Clark and Weir 2000).</Paragraph>
    <Paragraph position="2"> The problem with estimating a probability model defined over a large vocabulary of predicates and noun senses is that this involves a huge number of parameters, which results in a sparse-data problem. In order to reduce the number of parameters, we propose to define a probability model over senses in a semantic hierarchy and [?] Division of Informatics, University of Edinburgh, 2 Buccleuch Place, Edinburgh, EH8 9LW, UK. E-mail: stephenc@cogsci.ed.ac.uk.</Paragraph>
    <Paragraph position="3"> + School of Cognitive and Computing Sciences, University of Sussex, Brighton, BN1 9QH, UK. E-mail: david.weir@cogs.susx.ac.uk.</Paragraph>
    <Paragraph position="4">  Computational Linguistics Volume 28, Number 2 to exploit the fact that senses can be grouped into classes consisting of semantically similar senses. The assumption underlying this approach is that the probability of a particular noun sense can be approximated by a probability based on a suitably chosen class. For example, it seems reasonable to suppose that the probability of (the food sense of) chicken appearing as an object of the verb eat can be approximated in some way by a probability based on a class such as FOOD.</Paragraph>
    <Paragraph position="5"> There are two elements involved in the problem of using a class to estimate the probability of a noun sense. First, given a suitably chosen class, how can that class be used to estimate the probability of the sense? And second, given a particular noun sense, how can a suitable class be determined? This article offers novel solutions to both problems, and there is a particular focus on the second question, which can be thought of as how to find a suitable level of generalization in the hierarchy.</Paragraph>
    <Paragraph position="6">  The semantic hierarchy used here is the noun hierarchy of WordNet (Fellbaum 1998), version 1.6. Previous work has considered how to estimate probabilities using classes from WordNet in the context of acquiring selectional preferences (Resnik 1998; Ribas 1995; Li and Abe 1998; McCarthy 2000), and this previous work has also addressed the question of how to determine a suitable level of generalization in the hierarchy. Li and Abe use the minimum description length principle to obtain a level of generalization, and Resnik uses a simple technique based on a statistical measure of selectional preference. (The work by Ribas builds on that by Resnik, and the work by McCarthy builds on that by Li and Abe.) We compare our estimation method with those of Resnik and Li and Abe, using a pseudo-disambiguation task. Our method outperforms these alternatives on the pseudo-disambiguation task, and an analysis of the results shows that the generalization methods of Resnik and Li and Abe appear to be overgeneralizing, at least for this task.</Paragraph>
    <Paragraph position="7"> Note that the problem being addressed here is the engineering problem of estimating predicate argument probabilities, with the aim of producing estimates that will be useful for NLP applications. In particular, we are not addressing the problem of acquiring selectional restrictions in the way this is usually construed (Resnik 1993; Ribas 1995; McCarthy 1997; Li and Abe 1998; Wagner 2000). The purpose of using a semantic hierarchy for generalization is to overcome the sparse data problem, rather than find a level of abstraction that best represents the selectional restrictions of some predicate. This point is considered further in Section 5.</Paragraph>
    <Paragraph position="8"> The next section describes the noun hierarchy from WordNet and gives a more precise description of the probabilities to be estimated. Section 3 shows how a class from WordNet can be used to estimate the probability of a noun sense. Section 4 shows how a chi-square test is used as part of the generalization procedure, and Section 5 describes the generalization procedure. Section 6 describes the alternative class-based estimation methods used in the pseudo-disambiguation experiments, and Section 7 presents those experiments.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML