File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0711_intro.xml

Size: 14,276 bytes

Last Modified: 2025-10-06 14:06:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0711">
  <Title>Automatic Adaptation of WordNet to Sublanguages and to Computational Tasks</Title>
  <Section position="3" start_page="0" end_page="83" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Lexical learning methods based on the use of semantic categories are faced with the problem of overambiguity and entangled structures of Thesaura and dictionaries. WordNet and Roget's Thesaura were not initially conceived, despite their success among researchers in lexical statistics, as tools for automatic language processing. The purpose was rather to provide the linguists with a very refined, general purpose, linguistically motivated source of taxonomic knowledge. As a consequence, in most on-fine Thesaura words are extremely ambiguous. with very subtle distinctions among senses. High ambiguity, entangled nodes, and asymmetry have already been emphasized in (Hearst and Shutze, 1993) as being an obstacle to the effective use of on-line Thesaura in corpus linguistics. In most cases, the noise introduced by overambiguity almost overrides the positive effect of semantic clustering. For example, in (BriIl and Resnik, 1994) clustering PP heads according to WordNet synsets produced only a \[% improvement in a PP disambiguation task. with respect to the non-clustered method. A subsequent paper (Resnik. 1997) reports of a 40% precision in a sense disambiguation task. always based on generalization through WordNet synsets. Context-based sense clisambiguation becomes a prohibitive task on a wide-scale basis, because when words in the context of unambiguous word are replaced by their s.vnsets, there is a multiplication of possible contexts, rather than a generalization. \[n (Agirre and Rigau.</Paragraph>
    <Paragraph position="1"> 1996) a method called Conceptual Distance is proposed to reduce this problem, but the reported performance in disambiguation still does not reach 50%. On the other hand, (Dolan.</Paragraph>
    <Paragraph position="2"> 1994) and (Krovetz and Croft. 1992) claim that fine-grained semantic distinctions are unlikely to be of practical value for many applications.</Paragraph>
    <Paragraph position="3"> Our experience supports this claim: often, what matters is to be able to distinguish among contrastive (Pustejowsky. 1995) ambiguities of the bank_river bank_organisation flavor. The problem however is that the notion of&amp;quot;coutrastive&amp;quot; is domain-dependent. Depending upon the sub-language (e.g. medicine, finance, computers.</Paragraph>
    <Paragraph position="4"> etc.) and upon the specific NLP application (e.g. Information Extraction, Dialogue etc.) a given semantic label may be too general or too specific for the task at hand. For example, the word line has 27 senses in WordNet. many of which draw subtle distinctions e.g. line of ~cork (sense 26) and line of products (sense \[9). In aa</Paragraph>
    <Paragraph position="6"> application aimed at extracting information on new products in an economic domain, we would be interested in identi~-ing occurrences of such senses, but perhaps all the other senses could be clustered in one or two categories, lbr example Artifact, grouping senses such as: telephoneline, railway and cable, and Abstraction, grouping senses such as series, conformity and indication. Vice versa, if the sublanguage is technical handbooks in computer science, we would like to distinguish the cable and the string of words senses (7 and 5, respectively), while any other distinction may not have any practical interest.</Paragraph>
    <Paragraph position="7"> The research described in this paper is aimed at providing some principled, and algorithmic, methods to tune a general purpose taxonomy to specific sublanguages and domains.</Paragraph>
    <Paragraph position="8"> In this paper, we propose a method by which we select a set of core semantic nodes in the WordNet taxonomy that &amp;quot;optimally&amp;quot; describe the semantics of a sublanguage, according to a scoring function defined as a linear combination of general and corpus-dependent performance factors. The selected categories are used to prune WordNet branches that appear, according to our scoring function, less pertinent to the given sublanguage, thus reducing the initial ambiguity. Then, we learn from the application corpus a statistical model of the core categories and use this model to further tune the initial taxonomy. Tuning implies two actions: The first is to attempt a reclassification of relevant word:; in the corpus that are not covered bv the selected categories, i.e.. words belonging exclusively to pruned branches. Often. :hese words have domain-dependent .,;enses that are not captured in the initial WordNet classification (,e.g. the software sense of release in a software handbooks sublanguage). The decision to assign an unclassified word to one of the selected categories is based on a strong detected similarity between the contexts in which the word o.:curs, and the statistical model of the core categories.</Paragraph>
    <Paragraph position="9"> The second iis to further reduce the ambiguitv of words that :;till have a high ambiguity, with respect to the other word.s in the corpus. For example, the word stock in a financial domain still preserved the gunstock  sense, because instrumentality was one of the selected core categories for the domain.</Paragraph>
    <Paragraph position="10"> The expectation of this sense ,nay be lowered, as before, by comparing the typical contexts of stock with the acquired model of instrumentality.</Paragraph>
    <Paragraph position="11"> In the next sections, we first describe the algorithm for selecting core categories. Then, we describe the method for redistributing relevant words among the nodes of the pruned hierarchy.</Paragraph>
    <Paragraph position="12"> Finally, we discuss an evaluation experiment.</Paragraph>
    <Paragraph position="13"> 2 Selection of core categories from WordNet The first step of our method is to select from WordNet an inventory of core categories that appear particularly appropriate for the domain.</Paragraph>
    <Paragraph position="14"> and prune all the hierarchy branches that does not belong to such core categories. This choice is performed as follows: Creation of alternative sets of balanced categories First, an iterative method is used to create alternative sets of balanced categories, using information on words and word frequencies in the application corpus. Sets of categories have an increasing level of generality. The set-generation algorithm is an iterative application of the algorithm proposed in (Hearst and Sht, tze. 1993) for creating WordNet categories of a fixed ~tverage size. \[n short t. the algorithm works as follows: Let C be a set of WordNet svnsets s. iV the set of different words (nouns) in the corpus. P(C) the number of words ill W that are instances of C. weighted by their frequency in the corpus, UB and LB the upper and lower bound for P(C). At each iteration step i. a new synset s is added to the current category set C~. iff the weight of s lies within the current boundaries, that is. P(s) &lt;_ UBi and P(s) &gt;_ LBi. If P(s) &gt;_ UBi s is replaced in Ci by its descendants, for which the same constraints are verified. If P(s) &lt; LBi . s is added to a list of &amp;quot;small;' categories SCT(C'i). \[n fact. when replacing an overpopulated category by its sons.</Paragraph>
    <Paragraph position="15"> it may well be the case that some of its sons are under populated.</Paragraph>
    <Paragraph position="16"> I The procedure new_cat\[S) is almost the same as in (Hearst and Shutze, 1993). For sake of brevity, the algorithm is not explained in much details here.</Paragraph>
    <Paragraph position="18"> Scoring Alternative Sets of Categories Second, a scoring function is applied to alternative sets to identify the core set. The core set is modeled as the linear function of four performance factors: generality, coverage of the domain, average ambiguity, and discrimination power. For a formal definition of these four measures, see (Cucchiarelli and Velardi, 1997). We provide here an intuitive description of these factors: Generality (G): In principle, we would like to represent the semantics of the domain using the highest possible level of generalization. A small number of categories allows a compact representation of the semantic knowledge base, and renders word sense disambiguation more simple. On the other side, over general categories fail to capture important distinctions. The Generality is a gaussian measure that mediates between over generality and overambiguity.</Paragraph>
    <Paragraph position="19"> Coverage (CO) This is a measure of the coverage that a given category set C'i has over the words in the corpus. The algorithm for balanced category selection does not allow a full coverage of the words in the domain: given a selected pair &lt; UB, LB &gt;. it may well be the case that several words are not assigned to any category, because when branching from an overpopulated category to its descendants, some of the descendants may be under populated. Each iterative step that creates a C, also creates a set of under populated categories SCT(Ci). Clearly, a &amp;quot;good&amp;quot; selection of Ci is one that minimizes this problem (and has therefore a &amp;quot;high&amp;quot; coverage). Discrimination Power (DP): A certain selection of categories may not allow a full discrimination of the lowest-level senses for a word (leaves-synsets hereafter). For example, if psychological_feature is one of the core categories, and if we choose to tag a corpus only with core categories, it would be impossible to discriminate between the business-target and businessconcern senses. Though nothing can be said about the practical importance of discriminating between such two synsets, in general a good choice of Ci is one that allows as much as possible the discrimination between low level senses of ambiguous words.</Paragraph>
    <Paragraph position="20"> Average Ambiguity (A) : Each choice of Ci in general reduces the initial ambiguity of the corpus. In part. because there are leaves-synsets that converge into a single category of the set.</Paragraph>
    <Paragraph position="21"> in part because there are leaves-synsets of a word that do not reach any of these categories.</Paragraph>
    <Paragraph position="22"> Though in general we don't know if. by cutting out a node. we are removing a set of senses interesting (or not) for the domain, still in principle  a good choice of categories is one that reduces as much as possible the initial ambiguity. The cumulative scoring function for a set of categories Ci is defined as the linear combination of the performance parameters described above:</Paragraph>
    <Paragraph position="24"> Estimation of model parameters and refinements null An interpolation method is adopted to estimate the parameters of the model against a reference. correctly tagged, corpus (SemCor, tile WordNet semantic concordance). The performance of alternative inventories of core categories is evaluated in terms of effective reduction of overambiguity. This measure is a combination of the system precision at pruning out spurious (for the domain) senses, and the global reduction of ambiguity. Notice that we are not measuring the precision of sense disambiguation in contexts, but simply the precision at reducing a-priori the set of possible senses for a word. in a given domain.</Paragraph>
    <Paragraph position="25"> The method above is weakly supervised: the parameters estimated have been used without re-estimation to capture core categories in other domains such as Natural Science and a UNIX manual. Details on portability of this choice are in (Cucchiarelli and Velardi. forthcoming 1998). In the different experiments, the best performing choice of core categories is the one with an upper population of 62.000 words (frequency weighted). This corresponds to the following list of 14 categories: num.x:a.t=14 t=61 UB=62000 LB=24800 N=2000 k----IO00 h=O.40 person, individuM, ~omeone. mortal, human, soul</Paragraph>
    <Paragraph position="27"> message, content. ~ubject-m~tter. 3ubst.xnce measuL'e, quantity, amount, quantum  This selection of core categories is measured to have the following performance: Precision: 77.6~.</Paragraph>
    <Paragraph position="28"> Reduction of Ambiguity: 37~</Paragraph>
    <Paragraph position="30"> In (Cucchiarelli and Velardi, forthcoming 1998) a method is proposed to automatically increase the coverage of the core set with an additional set of categories, selected from the set of under populated categories SCT(Ci) (see step 1 of the algorithm). With the extension:  With some manual refinement of the extended set , the precision rises to over 80%. Obtaining a higher precision is difficult because, neither SemCor nor WordNet can be considered a golden standard. In a recent workshop on semantic texts tagging (TAGWS 1997), the difficulty of providing comprehensible guidelines for semantic annotators in order to avoid disagreement and inconsistencies was highlighted. On the other side. there are many redundancies and some inconsistencies in WordNet that makes the task of (manual) classification very complex. To make an example, one of the detected classificatiou errors in our Wall Street Journal experiment was the selection of two possible core senses for the word market: : organization and act\[city. Vice versa, in the economic fragment of SemCor. market is consistently classifies as socio-economic-class, which happens not to be a descendent of any of these two categories.</Paragraph>
    <Paragraph position="31"> Our intuition when observing the specific examples was more in agreement with the automatic classification than with SemCor. Our feeling was that the selected core categories could, in many cases, represent a good model of clas.~ification for words that remained unclassified with respect to the &amp;quot;not pruned&amp;quot; WordNet. or appeared misclassified in our evaluation experiilleut. null in the next section we describe an method to verify&amp;quot; this hypothesis and. at the same time, to further tune WordNet to a domain.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML