File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/w95-0105_metho.xml

Size: 31,154 bytes

Last Modified: 2025-10-06 14:14:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W95-0105">
  <Title>Disambiguating Noun Groupings with Respect to WordNet Senses</Title>
  <Section position="4" start_page="54" end_page="58" type="metho">
    <SectionTitle>
2 Disambiguation of Word Groups
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="54" end_page="55" type="sub_section">
      <SectionTitle>
2.1 Problem statement
</SectionTitle>
      <Paragraph position="0"> Let us state the problem as follows. We are given a set of words W = {wl,. *., wn}, with each word wi having an associated set Si = {si,1,..., si,m} of possible senses. We assume that there exists some set W' C_ U Si, representing the set of word senses that an ideal human judge would conclude belong to the group of senses corresponding to the word grouping W. The goal is then to define a membership function qo that takes si,j, wi, and W as its arguments and computes a value in \[0, 1\], representing the confidence with which one can state that sense si,j belongs in sense grouping W'.4 Note that, in principle, nothing precludes the possibility that multiple senses of a word are included in W'.</Paragraph>
      <Paragraph position="1"> Example. Consider the following word group: 5 burglars thief rob mugging stray robbing lookout chase crate thieves  Restricting our attention to noun senses in WordNet, only lookout and crate are polysemous. Treating this word group as W, one would expect ~ to assign a value of 1 to the unique senses of the monosemous words, and to assign a high value to lookout's sense as lookout, lookout man, sentinel, sentry, watch, scout: a person employed to watch for something to happen.</Paragraph>
      <Paragraph position="2"> Low (or at least lower) values of q; would be expected for the senses of lookout that correspond to an observation tower, or to the activity of watching. Crate's two WordNet senses correspond to the physical object and the quantity (i.e., crateful, as in &amp;quot;a crateful of oranges&amp;quot;); my own intuition is that the first of these would more properly be included in W' than the second, and should therefore receive a higher value of ~, though of course neither I nor any other individual really constitutes an &amp;quot;ideal human judge.&amp;quot;</Paragraph>
    </Section>
    <Section position="2" start_page="55" end_page="56" type="sub_section">
      <SectionTitle>
2.2 Computation of Semantic Similarity
</SectionTitle>
      <Paragraph position="0"> The core of the disambiguation algorithm is a computation of semantic similarity using the WordNet taxonomy, a topic recently investigated by a number of people (Leacock and Chodorow, 1994; Resnik, 1995; Sussna, 1993). In this paper, I restrict my attention to WordNet's IS-A taxonomy for nouns, and take an approach in which semantic similarity is evaluated on the basis of the information content shared by the items being compared.</Paragraph>
      <Paragraph position="1"> The intuition behind the approach is simple: the more similar two words are, the more informative will be the most specific concept that subsumes them both. (That is, their least upper bound in the taxonomy; here a concept corresponds to a WordNet synset.) The traditional method of evaluating similarity in a semantic network by measuring the path length between two nodes (Lee et al., 1993; Rada et al., 1989) also captures this, albeit indirectly, when the semantic network is just an IS-A hierarchy: if the minimal path of IS-A links between two nodes is long, that means it is necessary to go high in the taxonomy, to more abstract concepts, in order to find their least upper bound. However, there are problems with the simple path-length definition of semantic similarity, and experiments using WordNet show that other measures of semantic similarity, such as the one employed here, provide a better match to human similarity judgments than simple path length does (Resnik, 1995).</Paragraph>
      <Paragraph position="2"> Given two words wl and w2, their semantic similarity is calculated as sim(wl,WE) = max \[- logPr(c)\], (1) c e subsumers(wl,w2) where subsumers(wl, WE) is the set of WordNet synsets that subsume (i.e., are ancestors of) both w~ and w2, in any sense of either word. The concept e that maximizes the expression in (1) will be referred to as the most informative subsumer of Wl and w2. Although there are many ways to associate probabilities with taxonomic classes, it is reasonable to require that concept probability be non-decreasing as one moves higher in the taxonomy; i.e., that el IS-A c2 implies Pr(c2) _&gt; Pr(el). This guarantees that &amp;quot;more abstract&amp;quot; does indeed mean &amp;quot;less informative,&amp;quot; defining informativeness in the traditional way in terms of log likelihood. Probability estimates are derived from a corpus by computing</Paragraph>
      <Paragraph position="4"> where words(c) is the set of nouns having a sense subsumed by concept c. Probabilities are then computed simply as relative frequency:</Paragraph>
      <Paragraph position="6"> where N is the total number of noun instances observed. Singular and plural forms are counted as the same noun, and nouns not covered by WordNet are ignored. Although the WordNet noun taxonomy has multiple root nodes, a single, &amp;quot;virtual&amp;quot; root node is assumed to exist, with the original root nodes as its children.</Paragraph>
      <Paragraph position="7"> Note that by equations (1) through (3), if two senses have the virtual root node as their only upper bound then their similarity value is 0.</Paragraph>
      <Paragraph position="8"> Example. The following table shows the semantic similarity computed for several word pairs, in each case shown with the most informative subsumer. 6 Probabifities were estimated using the Penn Treebank version of the Brown corpus. The pairs come from an example given by Church and Hanks (1989), illustrating the words that human subjects most frequently judged as being associated with the word doctor. (The word sick also appeared on the list, but is excluded here because it is not a noun.)  Doctors are minimally similar to medicine and hospitals, since these things are all instances of &amp;quot;something having concrete existence, riving or nonliving&amp;quot; (WordNet class (ent +-ty)), but they are much more similar to lawyers, since both are kinds of professional people, and even more similar to nurses, since both are professional people working specifically within the health professions. Notice that similarity is a more specialized notion than association or relatedness: doctors and sickness may be highly associated, but one would not judge them to be particularly similar.</Paragraph>
    </Section>
    <Section position="3" start_page="56" end_page="58" type="sub_section">
      <SectionTitle>
2.3 Disambiguation Algorithm
</SectionTitle>
      <Paragraph position="0"> The disambiguation algorithm for noun groups is inspired by the observation that when two polysemous words are similar, their most informative subsumer provides information about which sense of each word is the relevant one. In the above table, for example, both doctor and nurse are polysemous: WordNet records doctor not only as a kind of health professional, but also as someone who holds a Ph.D., and nurse can mean not only a health professional but also a nanny. When the two words are considered together, however, the shared element of meaning for the two relevant senses emerges in the form of the most informative subsumer. It may be that other pairings of possible senses also share elements of meaning (for example, doctor~Ph.D, and nurse~nanny are both descendants of (person, individual}). However, in cases like those illustrated above, the more specific or informative the shared ancestor is, the more strongly it suggests which senses come to mind when the words are considered together. The working hypothesis in this paper is that this holds U'ue in general.</Paragraph>
      <Paragraph position="1"> Turning that observation into an algorithm requires two things: a way to assign credit to word senses based on similarity with co-occurring words, and a tractable way to generalize to the case where more than two polysemous words are involved. The algorithm given in Figure 1 does both quite slraighfforwardly.</Paragraph>
      <Paragraph position="2">  Algorithm. Given W = {w\[1\] ..... w\[n\]}, a set of nouns: for i and j = 1 to n, with i &lt; j { vii, j\] = sirn(w\[i\], w\[j\]) e\[i, j\] = the most informative subsumer for w\[i\] and w\[j\] for k = 1 to num_senses(w\[i\]) if c\[i, j\] is an ancestor of sense\[i, k\] increment support\[i, k\] by v\[i, j\] for k' = 1 to num_senses(w\[j\]) if e\[i, j\] is an ancestor of sense\[j, k'\] increment support\[j, k'\] by vii, j\] increment normalization\[i\] by v\[i, j\] increment normalization\[j\] by v \[i, j\] fori= 1 ton for k = 1 to num_senses(w\[i\]) { if (normalization\[il &gt; 0.0)  phi\[i, k\] = support\[i, k\] / normalization\[i\] else phi\[i, k\] = 1 / num_senses(w\[i\]) }  This algorithm considers the words in W pairwise, avoiding the tractability problems in considering all possible combinations of senses for the group (0 (m ~) if each word had m senses). For each pair considered, the most informative subsumer is identified, and this pair is only considered as supporting evidence for those senses that are descendants of that concept. Notice that by equation (1), support \[i, k\] is a sum of log probabilities, and therefore preferring senses with high support is equivalent to optimizing a product of probabilities. Thus considering words pairwise in the algorithm reflects a probabilistic independence assumption.</Paragraph>
      <Paragraph position="3"> Example. The most informative subsumer for doctor and nurse is &lt;health professional), and therefore that pairing contributes support to the sense of doctor as an M.D., but not a Ph.D. Similarly, it contributes support to the sense of nurse as a health professional, but not a nanny. The amount of support contributed by a pairwise comparison is proportional to how informative the most informative subsumer is. Therefore the evidence for the senses of a word will be influenced more by more similar words and less by less similar words. By the time this process is completed over all pairs, each sense of each word in the group has had the potential of receiving supporting evidence from a pairing with every other word in the group. The value assigned to that sense is then the proportion of support it did receive, out of the support possible. (The latter is kept track of by array normalization in the pseudocode.) Discussion. The intuition behind this algorithm is essentially the same intuition exploited by Lesk (1986), Sussna (1993), and others: the most plausible assignment of senses to multiple co-occurring words is the  one that maximizes relatedness of meaning among the senses chosen. Here I make an explicit comparison with Sussna's approach, since it is the most similar of previous work.</Paragraph>
      <Paragraph position="4"> Sussna gives as an example of the problem he is solving the following paragraph from the corpus of 1963 Time magazine articles used in information retrieval research (uppercase in the Time corpus, lowercase here for readability; punctuation is as it appears in the original corpus): the allies after nassau in december 1960, the u.s. first proposed to help nato develop its own nuclear strike force, but europe made no attempt to devise a plan. last week, as they studied the nassau accord between president kennedy and prime minister macmillan, europeans saw emerging the first outlines of the nuclear nato that the u.s. wants and will support, it all sprang from the anglo-u.s, crisis over cancellation of the bug-ridden skybolt missile, and the u.s. offer to supply britain and france with the proved polaris (time, dec. 28) From this, Sussna extracts the following noun grouping to disambiguate: allies strike force attempt plan week accord president prime minister outlines support crisis cancellation bug missile france polaris time These are the non-stopword nouns in the paragraph that appear in WordNet (he used version 1.2). The description of Sussna's algorithm for disambiguating noun groupings like this one is similar to the one proposed here, in a number of ways: relatedness is characterized in terms of a semantic network (specifically WordNet); the focus is on nouns only; and evaluations of semantic similarity (or, in Sussna's case, semantic distance) are the basis for sense selection. However, there are some important differences, as well. First, unlike Sussna's proposal, this algorithm aims to disambiguate groupings of nouns already established (e.g. by clustering, or by manual effort) to be related, as opposed to groupings of nouns that happen to appear near each other in running text (which may or may not reflect relatedness based on meaning). This provides some justification for restricting attention to similarity (reflected by the scaffolding of IS-A links in the taxonomy), as opposed to the more general notion of association. Second, this difference is reflected algonthmically by the fact that Sussna uses not only IS-A links but also other WordNet links such as PART-OF. Third, unlike Sussna's algorithm, the semantic similarity/distance computation here is not based on path length, but on information content, a choice that I have argued for elsewhere (Resnik, 1993; Resnik, 1995). Fourth, the combinatorics are handled differently: Sussna explores analyzing all sense combinations (and living with the exponential complexity), as well as the alternative of sequentially &amp;quot;freezing&amp;quot; a single sense for each of Wl,..., W~_l and using those choices, assumed to be correct, as the basis for disambiguating wi. The algorithm presented here falls between those two alternatives. A final, important difference between this algorithm and previous algorithms for sense disambiguation is that it offers the possibility of assigning higher-level WordNet categories rather than lowest-level sense labels. It is a simple modification to the algorithm to assign values of ~ not only to synsets directly containing words in W, but to any anccestors of those synsets -- one need only let the list of synsets associated with each word wi (i.e,, Si in the problem statement of Section 2.1) also include any synset that is an ancestor of any synset containing word wi. Assuming that nura sen s e s (w \[ 5_ \] ) and sen s e \[ 5_, k \] are reinterpreted accordingly, the algorithm will compute qo not only for the synsets directly including words in W, but also for any higher-level abstractions of them.</Paragraph>
      <Paragraph position="5"> Example. Consider the word group doctor, nurse, lawyer. If one were to include all subsuming concepts for each word, rather than just the synsets of which they are directly members, the concepts with non-zero values of ~ would be as follows:  nurse: one skilled in caring for the sick health professional: subconcept of professional professional: a person engaged in one of the learned professions lawyer, attorney: a professional person authorized to practice law professional: a person engaged in one of the learned professions Given assignments of ~ at all levels of abstraction, one obvious method of semantic annotation is to assign the highest-level concept for which ~ is at least as large as the sense-specific value of ~. For instance, in the previous example, one would assign the annotation (health professional) to both doctor and nurse (thus explicitly capturing a generalization about their presence in the word group, at the appropriate level of abstraction), and the annotation (professional) to lawyer.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="58" end_page="65" type="metho">
    <SectionTitle>
3 Examples
</SectionTitle>
    <Paragraph position="0"> In this section I present a number of examples for evaluation by inspection. In each case, I give the source of the noun grouping, the grouping itself, and for each word a description of word senses together with their values of ~.</Paragraph>
    <Section position="1" start_page="58" end_page="63" type="sub_section">
      <SectionTitle>
3.1 Distributionally derived groupings
</SectionTitle>
      <Paragraph position="0"> Distributional cluster (Brown et al., 1992): head, body, hands, eye, voice, arm, seat, hair, mouth Word 'head' (17 alternatives) 0.0000 crown, peak, summit, head, top: subconceptofupperbound 0.0000 principaL school principal, head teacher, head: educator who has executive authority 0.0000 head, chief, top dog: subeoncept of leader 0.0000 head: a user of (usually soft) drugs 0.1983 head: &amp;quot;the head of the page&amp;quot;; &amp;quot;the head of the fist&amp;quot; 0.1983 beginning, head, origin, root, source: the point or place where something begins 0.0000 pass, head, straits: a difficult juncture; &amp;quot;a pretty pass&amp;quot; 0.0000 headway, head: subconcept of progress, progression, advance 0.0903 point, hod: a V-shaped mark at one end of an arrow pointer 0.0000 heading, head: a line of text serving to indicate what the passage below it is about 0.0000 mind, head, intellect, psyche: that which is responsible for your thoughts and feelings 0.5428 head: the upper or front part of the body that contains the faee and brains 0.0000 toilet, lavatory, can, head, facility, john, privy, bathroom 0.0000 head: the striking part of a tool; &amp;quot;hammerhead&amp;quot; 0.1685 head: a part that projects out from the rest; &amp;quot;the head of the nail&amp;quot;, &amp;quot;pinhead&amp;quot;  hand: subconeept of linear unit hired hand, hand, hired man: a hired laborer on a farm or ranch bridge player, hand: &amp;quot;we need a 4th hand for bridge&amp;quot; hand, deal: the cards held in a card game by a given player at any given time hand: a round of applause to signify approval; &amp;quot;give the little lady a great big hand&amp;quot; handwriting, cursive, hand, script: something written by hand hand: ability; &amp;quot;he wanted to try his hand at singing&amp;quot; hand, manus, hook, mauler, mitt, paw: the distal extremity of the superior limb hand: subconcept of pointer hand: physical assistance; &amp;quot;give me a hand with the chores&amp;quot;  voice: the relation of the subject of a verb to the action that the verb denotes spokesperson, spokesman, interpreter, representative, mouthpiece, voice voice, vocalization: the sound made by the vibration of vocal folds articulation, voice: expressing in coherent verbal form; &amp;quot;I gave voice to my feelings&amp;quot; part, voice: the melody carried by a particular voice or instrument in polyphonic music voice: the ability to speak; &amp;quot;he lost his voice&amp;quot; voice: the distinctive sound of a person's speech; &amp;quot;I recognized her voice&amp;quot;  Word 'arm' (6 alternatives) 0.0000 branch, subdivision, arm: an administrative division: &amp;quot;a branch of Congress&amp;quot; 0.6131 arm: eornrnonly used to refer to the whole superior limb 0.0346 weapon, arm, weapon system: used in fighting or hunting 0.2265 sleeve, arm: attached at armhole 0.1950 arm: any proj~tion that is thought to resemble an arm; &amp;quot;the arm of the record player&amp;quot; 0.0346 arm: the part of an armchair that supports the elbow and forearm of a seated person Word 'seat' (6 alternatives) 0.0000 seat: a city from which authority is exercised 0.0000 seat, place: a space reserved for sitting 0.7369 buttocks, arse, butt, backside, burn, buns, can ....</Paragraph>
      <Paragraph position="1"> 0.2631 seat: covers the buttocks 0.0402 seat: designed for sitting on 0.0402 seat: where one sits  hair, pilus: threadlike keratinous filaments growing from the skin of mammals hair, tomentum: filamentous hairlike growth on a plant hair, follicular growth: subeoncept of externalbody part hair, mane, head of hair: hair on the head hair: hairy covering of an animal or body part  draw, standoff, tie, stalemate affiliation, association, tie, tie-up: a social or business relationship tie, crosstie, sleeper: subconcept of brace, bracing necktie, tie link, linkup, tie, tie-in: something that serves to join or link drawstring, string, tie: cord used as a fastener tie, tie beam: used to prevent two rafters, e.g., from spreading apart  lookout, lookout man. sentinel, sentry, watch, scout lookout, observation post: an elevated post affording a wide view lookout, observation tower, lookout g.ation, observatory: lookout, outlook: wabconcept of look. looking at  As noted in Section 2.1, this group represents a set of words similar to burglar, according to Schtltze's method for deriving vector representation from corpus behavior. In this case, words rob and robbing were excluded because they were not nouns in WordNet. The word stray probably should be excluded also, since it most likely appears on this list as an adjective (as in &amp;quot;stray bullet&amp;quot;). Machine-generated thesaurus entry (Grefenstette, 1994): method, test, mean, procedure, technique Word 'method' (2 alternatives) 1.0000 method: a way of doing something, esp. a systematic one 0.0000 wise, method: a way of doing or being: &amp;quot;in no wise&amp;quot;; &amp;quot;in this wise&amp;quot; Word 'test' (7 alternatives) 0.6817 trial, test, tryout: trying something to find out about it; &amp;quot;ten days free trial&amp;quot; 0.6817 assay, check, test: subeoncept of appraisal assessanent 0.0000 examination, exam, test: a set of questions or exercises evaluating skill or knowledge 0.3183 test, mental test, mental testing, psychometric test 0.0000 test: a hard outer covering as of some amoebas and sea urchins  0.3183 test, trial: the act ofundergoingtesting; &amp;quot;he survived the great test of battle&amp;quot; 0.3183 test, trial run: the act of testing something Word 'mean' (1 alternatives) 1.0000 mean: an average ofn numbers computed by...</Paragraph>
      <Paragraph position="2"> Word 'proeedure' (4 alternatives) 1.0000 procedure, process: a particular course of action intended to achieve a results 1.0000 operation, procedure: a process or series of acts ,.. involved in a particular form of work 0.0000 routine, subroutine, subprogram, procedure, function 0.0000 procedure: a mode of conducting legal and parliamentary proceedings Word 'technique' (2 alternatives) 1.0000 technique: a tecfiniealmethod 0.0000 profieieney, facility, technique: skillfulness deriving from practice and familiarity I chose this grouping at random from a thesaurus created automatically by Grefenstette's syntacticodistributional methods, using the MED corpus of medical abstracts as its source. The group comes from from the thesaurus entry for the word method. Note that mean probably should be means.</Paragraph>
    </Section>
    <Section position="2" start_page="63" end_page="65" type="sub_section">
      <SectionTitle>
3.2 Thesaurus Classes
</SectionTitle>
      <Paragraph position="0"> There is a tradition in sense disambiguation of taking particularly ambiguous words and evaluating a system's performance on those words. Here I look at one such case, the word line; the goal is to see what sense the algorithm chooses when considering the word in the contexts of each of the Roget's Thesaurus classes in which it appears, where a &amp;quot;class&amp;quot; includes all the nouns in one of the numbered categories.7 The following list provides brief descriptions of the 25 senses of line in WordNet:  1. wrinkle, furrow, crease, crinkle, seam, line: &amp;quot;His faeehas many wrinkles&amp;quot; 2. line: a length (straight or curved) without breadth or thickness 3. line, dividing line: &amp;quot;there is a narrow line between sanity and insanity&amp;quot; 4. agate line, line: space for one line of print used to measure advertising 5. credit line, line of credit, line: the maximum credit that a customer is allowed 6. line: in games or sports; a mark indicating positions or bounds of the playing area 7. line: a spatial location defined by a real or imaginary unidimensional extent 8. eourse, line: a connected series of events or actions or developments 9. fine: a formation of people or things one after (or beside) another 10. lineage, line, line of descent, descent, bloodline, blood line, blood, pedigree I 1. tune, melody, air, strain, melodic fine, line, melodic phrase: a succession of notes 12. line: a linear string of words expressing some idea 13. line: a mark that is long relative to its width; &amp;quot;He drew a line on the chart&amp;quot; 14. note, short letter, line: &amp;quot;drop me a line when you get there&amp;quot; 15. argumentation, logical argument, fine of thought, fine of reasoning, fine 16. telephone fine, phone line, fine: a telephone connection 71 am grateful to Mark Lauer for his kind assistance with the thesaurus.</Paragraph>
      <Paragraph position="1">  17. production fine, assembly fine, fine: a factory system 18. pipeline, line: a long pipeused to transport liquids or gases 19. line: a cornmereial organization serving as a common carrier 20. fine, railway fine, rail line: railroad track and roadbed 21. fine: something long and thin and flexible 22. cable, line, transmission fine: electrical conductor connecting telephones or television 23. line, product fine, line of products, line of merchandise, business fine, fine of business 24. fine: acting in conformity; &amp;quot;in fine with&amp;quot; or &amp;quot;he got out of line&amp;quot; or &amp;quot;toe the fine&amp;quot; 25. occupation, business, line of work, line: the principal activity in your life Since line appears in 13 of the numbered categories in Roget's thesaurus, a full description of the values of qo would be too large for the present paper. Indeed, showing all the nouns in the numbered categories would take up too much space: they average about 70 nouns apiece. Instead, I identify the numbered category, and give the three WordNet senses of line for which ~o was greatest.</Paragraph>
      <Paragraph position="2">  note, short letter, line: &amp;quot;drop me a line when you get there&amp;quot; agate line, line: space for one line of print used to measure advertising tune, melody, air, strain, melodic line, line, melodic phrase  note, short letter, line: &amp;quot;drop me a line when you get there&amp;quot; tune, melody, air, strain, melodic line, line, melodic phrase line: a linear string of words expressing some idea \[#625.\] Business.</Paragraph>
      <Paragraph position="3"> 0.4684 occupation, business, line of work, line: the principal activity in your life 0.1043 line: a commercial organization serving as a common carrier 0.0790 tune, melody, air, strain, melodic line, line, melodic phrase Qualitatively, the algorithm does a good job in most of the categories. The reader might find it an interesting exercise to try to decide which of the 25 senses he or she would choose, especially in the cases where the algorithm did less well (e.g. categories #200, #203, #466).</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="65" end_page="66" type="metho">
    <SectionTitle>
4 Formal Evaluation
</SectionTitle>
    <Paragraph position="0"> The previous section provided illustrative examples, demonstrating the performance of the algorithm on some interesting cases. In this section, I present experimental results using a more rigorous evaluation methodology.</Paragraph>
    <Paragraph position="1"> Input for this evaluation came from the numbered categories of Roget's. Test instances consisted of a noun group (i.e., all the nouns in a numbered category) together with a single word in that group to be disambiguated. To use an example from the previous section, category #590 (&amp;quot;Writing&amp;quot;) contains the following: writing, chirography, penman ship, quill driving, typewriting, writing, manuscript, MS, these presents, stroke of the pen, dash of the pen, coupe de plume, line, headline, pen and ink, letter, uncial writing, cuneiform character, arrowhead, Ogham, Runes, hieroglyphic, contraction, Devanagari, Nagari, script, shorthand, stenography, secret writing, writing in cipher, cryptography, stenography, copy, transcript, rescript, rough copy, fair copy, handwriting, signature, sign manual, autograph, monograph, holograph, hand, fist, calligraphy, good hand, running hand, flowing hand, cursive hand, legible hand, bold hand, bad hand, cramped hand, crabbed hand, illegible hand, scribble, ill-formed letters, pothooks and hangers, stationery, pen, quill, goose quill, pencil, style, paper, foolscap, parchment, veUum, papyrus, tablet, slate, marble, pillar, table, blackboard, ink bottle, ink horn, ink pot, ink stand, ink well, typewriter, transcription, inscription, superscription, graphology, composition, authorship, writer, scribe, amanuensis, scrivener, secretary, clerk, penman, copyist, transcriber, quill driver, stenographer, typewriter, typist, writer for the press Any word or phrase in that group that appears in the noun taxonomy for WordNet would be a candidate as a test instance -- for example, line, or secret writing.</Paragraph>
    <Paragraph position="2"> The test set, chosen at random, contained 125 test cases. (Note that because of the random choice, there were some cases where more than one test instance came from the same numbered category.) Two human judges were independently given the test cases to disambiguate. For each case, they were given the full set of nouns in the numbered category (as shown above) together with descriptions of the WordNet senses for  the word to be disambiguated (as, for example, the list of 25 senses for line given in the previous section, though thankfully few words have that many senses!). It was a forced-choice task; that is, the judge was required to choose exactly one sense. In addition, for each judgment, the judge was required to provide a confidence value for this decision, ranging from 0 (not at all confident) to 4 (highly confident).</Paragraph>
    <Paragraph position="3"> Results are presented here individually by judge. For purposes of evaluation, test instances for which the judge had low confidence (i.e. confidence ratings of 0 or 1) were excluded.</Paragraph>
    <Paragraph position="4"> For Judge 1, there were 99 test instances with sufficiently high confidence to be considered. As a baseline, ten runs were done selecting senses by random choice, with the average percent correct being 34.8%, standard deviation 3.58. As an upper bound, Judge 2 was correct on 65.7% of those test instances.</Paragraph>
    <Paragraph position="5"> The disambiguation algorithm shows considerable progress toward this upper bound, with 58.6% correct.</Paragraph>
    <Paragraph position="6"> For Judge 2, there were 86 test instances with sufficiently high confidence to be considered. As a baseline, ten runs were done selecting senses by random choice, with the average percent correct being 33.3%, standard deviation 3.83. As an upper bound, Judge 1 was correct on 68.6% of those test instances.</Paragraph>
    <Paragraph position="7"> Again, the disambiguation algorithm performs well, with 60.5% correct.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML