File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2070_metho.xml

Size: 23,685 bytes

Last Modified: 2025-10-06 14:12:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-2070">
  <Title>Walker, Donald (1987), &amp;quot;Ka\]owledge Resource Tools for Aeo~'ssing</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
CARVING .SB The gutter
</SectionTitle>
    <Paragraph position="0"> uipment such as a hydraulic on .SB Resembling a power uipmant, valves for nuclear 00 BC, flint-odged wooden 1-penetrating c~bide-tipped ~lt heightens the colors .SB lxaditionM ABC method and nter of ro~tion .PP A rower rshy areas .SB The crowned adz has a concave blade for fonn shovel capable of lifting 26 cubic shovel mounted on a floating hul generators, oil-refinery turbines sickles were used to gather wild drills forced manufacturers to fi Drills live in the forests of equa drill were unchanged, and dissa crane is an assembly of fabricat crane, however, occasionally For optimal training, die concordance set should only include references to the given category. But in practice it will unavoidably include spurious examples since mmty of the words are polysemous (such its drill mid crane in lines 7, 8, and 10 above).</Paragraph>
    <Paragraph position="1"> While the level of noise introduced through polysemy is substantial, it can usually be tolerated because the spurious senses are distributed through tile 1(}41 other categories, whereas the signal is coneenwated in just one. Only if several words had secondary senses in the state category would context typical for the other category appear significant in this context.</Paragraph>
    <Paragraph position="2"> However, if one of these spurious senses was frexluent and dominated the set of examples, file situation could be disastrous. An attempt is made to weight the concordance data to minim~e this effect and to make the sample representative of all tools attd tnachinery, not just the more common ones. If a word such as drill occurs k tinies in the coqms, all words ill the context of drill contribnte weight 1/k to frequency sunos.</Paragraph>
    <Paragraph position="3"> Despite its flaws, this weighted matrix will serve as a representative, albeit noisy, sample of the typical context of'IYOOI.S/MACItlNERY in Grolier's encyclopedia.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Step 2: Identify ,salient words in the
</SectionTitle>
      <Paragraph position="0"> collective cmttext, and weight appropriately Intuitively, a salient word 2 is one which appears siguificantly more often in the context of a category than at other txfints in the corpus, and hence is a better than average indicator for the category. We formalize this wifl\] a nmtual-in formation-like estimate: Pr(wlRCat) / Pr(w), tile probability of a word (w) appearing in the context of a Roget category divided by its overall probability in rile corpus.</Paragraph>
      <Paragraph position="1"> It is imlmrtant to exerci~ some care in estimating Pr(wlRCat). In principle, one could situply count tile number of times that w appears in the collective contexL However, this estimate, which is known as the tuaximnnt likelih(x',d estimate (MLE), can be unreliable, especially when w does not apl~-~ar vely often in the collective coutexl. We have smoothed file local estimates of Pr(wlRCat) with global estinmtes of Pr(w) to obtain a more reliable estimate. Estimates obtained from the local context are subject to measurement errors whereas estimates obtained li'om the global context are subject to being irrelevant. By interpoiathlg between the two, we attempt to find a compromise between the two sources of error, qllis procexlure is b~sed on recent work pioneewM by Willimn Gale, attd is explained in detail in another paper (Gale, Church and Yarowsky, 1992). Space does not permit a complete description here.</Paragraph>
      <Paragraph position="2"> Below are salient words tor Roget categories 348 and 414. *lllose ~lected are tile ntosl important 1o rite models, where importance is delined as the product of salience and local fi'equency. That is to say important words ate distinctive and fi~equcat.</Paragraph>
      <Paragraph position="3"> The nnmhers in parentheses are the log of the salience (logPr(wlRCat) /Pr(w)), which we will henceforth refer to as the word's weight in the statistical model of the category.</Paragraph>
      <Paragraph position="4"> 2. Fo~ illustrative simplicity, we will refer to words in context, In pnlctice, all op~\]lil~t$ ale ac~uMly p~rfonned on the Iemma~ of the words (eal/V = eat,eatg.elling,ate,elae~l), lind inflecdonml dlnincdons tire igltored. While thi* achieves more concentrated and bclter estimated ttttiUics, it throws away uneful information which natty be ext~loited in future work.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
AC'I'ES DE cOL1NG-92, NANTES, 23-28 Ao(rr 1992 4 S g PROC. o1: COLING-92, NAN rJ',S, A\[Jo. 23-28, 1992
ANIMAL,INSECT (Category 414):
</SectionTitle>
    <Paragraph position="0"> species (2.3), family (1.7), bird (2.6), fish (2.4), breed (2.2), cm (2.2), animal (1.7), tail (2.7), egg (2.2), wild (2.6), common (1.3), coat (2.5), female (2.0), inhabit (2.2), eat (2,2), nest (2.5) ....</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
TOOLS/MACHINERY (Category 348):
</SectionTitle>
    <Paragraph position="0"> tool (3.1), machine (2.7), engine (2.6), blade (3.8), cut (2.6), saw (5.1), lever (4.1), pump (3.5), device (2.2), gear (3.5), knife(3.8), wheel (2.8), shaft(3.3), wood(2.0), tooth(2.5), piston(3.6) ....</Paragraph>
    <Paragraph position="1"> Notice that these are not a list of members of the category; they are the words which are likely to co-occur with the members of the category. The complete list for TOOLS/MACH1NFJI.Y includes a broad set of relations, such as meronomy (blade, engine, gear, wheel, shaft, tooth, piston and cylinder), typical functions of machines (cut, rotate, move, turn, pull), typical objecls of those actions (wood, metal), as well as typical modifiers for machines (electric, mechanical, pneumatic). The list for a category typically contains over 3000 words, and is far richer than can be derived from a dictionary definition.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Step 3: Use the resulting weights to predict
</SectionTitle>
      <Paragraph position="0"> the appropriate category for a word in novel text When any of the salient words derived in step 2 appear in the context of an ambiguous word, there is evidence that the word belongs to the indicated category. If several such words appear, the evidence is compounded. Using Bayes' rule, we sum their weights, over all words in context, and determine the category for which the sum is greatest ~.</Paragraph>
      <Paragraph position="1"> ARGMAX ~ log Pr(w\[RCat) x Pr(RCat) Rca: w i~ co,,~1 Pr(w) The context is defined to extend 50 words to the left and 50 words to the right of the polysemous word. This range was shown by Gale, Church and Yarowsky (1992) to be useful for this type of broad topic classification, in contrast to the relatively narrow (+3-6 word) window used in previous studies (e.g. Black, 1988). The 3. The reader may have noticed that the Pr(w) factor can be omitted since it will not change the results of the maximization. It is included here for expository convenience so that it is possible to ~-,npare results across words with very different probabilities, 'nae factor also become* impoc.ant when an incomplete tet of indicators iJ stored be, cause of comlmtational spac~ constraints. Currently we assume a uniform prior- probability for each Roget category (Pr(Rcal)). i,e. tense classification is based exclusively on otmte~tual information, independent of the underlying prd3abillt y of a given Re*el category appearing at any point in the colpos. maximization over RCats is constrained to consider only those categories under which the polysemous word is listed, generally on the order of a half dozen or so. 4 For example the word crane appears 74 times in Groliers; 36 occurrences refer to the animal sense and 38 refer to the heavy machinery sense. The system correctly classified all but one of the machinery senses, yielding 99% overall accuracy. The one miselassified case had a low score for all models, indicating a lack of confidence in any classification.</Paragraph>
      <Paragraph position="2"> It is useful to look at one example in some more detail.</Paragraph>
      <Paragraph position="3"> Consider the following instance of crane and its context of + l0 words: 5 lift water and to grind grain .PP Treadmills attached to cranes were used to lift heavy objects from Roman times, The table below shows the strongest indicators identified for the two categories in the sentence above. The model weights, as noted above, are equivalent to log Pr(wlRCat ) / Pr(w). Several indicators were found for the TOOLS/MACHtNE class. There is very little evidence for the ANIMAL sense of crane, with the possible exception of water. The preponderance of evidence favors the former classification, which happens to be correct. The difference between the two total scores indicate strong confidence in the answer.</Paragraph>
      <Paragraph position="4">  Encydopndia contains 54 instances of the card-playing sense of suit, all of which ale mislabeled if the search is limited to just those categories of suit that are listed in RogeCs. However, if we open up the search to consider all 1042 care*odes, then we find that all 54 instances of su//are correctly labeled ils/oC//usE,~,~cr, and mo~over. the scca~ is large in all 54 instances, indicating great confidence in the assignment. It is poJsiblc that the unrestricted search mode might be * good way to attemps to fill in omisfions in the *ha*auras. In any case. when suit is added to the ,oa~t/s~E~rr category, overall accuracy improves from 68% to 92%.</Paragraph>
      <Paragraph position="5"> 5, &amp;quot;Ibis narrower window is used for iaust rative simplicity.  1) N refers to the total number of each sense obseawed in the test corpus. Corr. indicates file percemage of those tagged correctly.</Paragraph>
      <Paragraph position="6"> 2) Because thexe is no independent ground truth to indicate which is the &amp;quot;correct&amp;quot; Roget category for a given word, the decision is a subjective judgement made by a single human judge, in this case the author.</Paragraph>
      <Paragraph position="7"> 3) As previously noted, the Roger index is incomplete, hi four cases, identified by *, one missing category has been added to the list of possibilities for a word. These ontissions in the lexicon have been identified as outlined in Footnote 4. Without these additions, overall system performance would decrease by 5%.</Paragraph>
      <Paragraph position="8"> 4) Uses which an English speaker may consider a single sense are often realized by several Roget categories. For the purposes of succinct representation, such categories have been merged, and the name of file dominant category used in the table. As of this writing, the process has not been fully automated.</Paragraph>
      <Paragraph position="9"> For many applications such as speech synthesis and assignment to an established dictionary sense number or possible French translations, this merging of Roget classes is not necessary.</Paragraph>
      <Paragraph position="10"> The primary criterion for success is that words are partitioned into pure sense clusters. Words having a different sense from the majority sense of a partition are graded as errors.</Paragraph>
      <Paragraph position="11"> 5) Examples with the ammtation 'speech synthesis' have multiple pronunciations corresponding to sense distinctions. Their disambiguafion is important in speech processing.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. Evaluation
</SectionTitle>
    <Paragraph position="0"> The algorithm described above was applied to 12 polysemous words previously discussed in the sense disambignation literature. Table 1 (previous l~lge) shows the systenl's performance. Authors who have discussed these words are listed in parentheses, along with the reported accuracy of their systems. Direct comparisons of performance between researchers is difficult, compounded by variances in corpora and grading criteria; using the same words is an attempt to minimize these differences.</Paragraph>
    <Paragraph position="1"> Regrettably, most authors have reported their results in qualitative terms. The exceptions include Zemik (1990) who cited &amp;quot;recall and precision of over 70%&amp;quot; for one word (interest) and observed that results for other words, including /ssue, were &amp;quot;less positive.&amp;quot; Clear (1989) reported results for two words (65% and 67%), apparently at 85% recall. Leak (1986) claimed overall &amp;quot;50-70%&amp;quot; accuracies, although it is unclear under which parameters and constraints. In a 5 word test set, Black (1988) observed 75% mean accuracy using his optimal method on high entropy, 4-way sense distinctions. Hearst (1991) achieved 84% on simpler 2-way distinctions, editing out additional senses from the test set. Gale, Church and Yarowsky (1992) reported 92% accuracy, also on 2-way distinctions.</Paragraph>
    <Paragraph position="2"> Out eun'ent work compares favorably with these results, with 92% accuracy on a mean 3-way sense distinction 6.</Paragraph>
    <Paragraph position="3"> The performance is especially promising given that no hand tagging or special corpora were required in training, unlike all other systems considered.</Paragraph>
    <Paragraph position="4"> 4. Limitations of the Method The procedure described here is based on broad context models. It performs best on words with senses which can be distinguished by their broad context. These are most typically concrete nouns. Performance is weaker on the following: Topic Independent Distinctions: One of the reasons that interest is disambiguated poorly is that it can appear in almost any context. While its &amp;quot;curiosity&amp;quot; sense is often indicated by the presence of an academic subject or hobbie, the &amp;quot;advantage&amp;quot; sense (to be in one's interests) has few topic constraints. Distinguishing between two such abstractions is difficult. 7 However, the financial 6. This result is a fair ra~lure of pedorr~nee on words used in p~vi{ms studies, and may he useful for comparison acmsl systems. However, as wolrd$ pmvioully discuJscd in the literature may not he t~preu~tafive of typical English polyk-my, mean performance on * eomlTletely random u~ of words should differ, 7. Black (1988) has noted that this disfnction for interest is strongly corrected with th(c) ~urality (~&amp;quot; the word, a future we cura~ntly don't utilize. sense of interest is readily identifiable, and can be distinguished from the non-financial uses with 92% accuracy. Other distinctions between topic independent and topic constrained senses appear successful as well (e.g. taste, issue, duty and sentence).</Paragraph>
    <Paragraph position="5"> Minor Sense Distinctions within a Category: Distinctions between the medicinal and narcotic senses of drug ate not captured by the system because they both belong to the same Roget category (REMEDY). Similar problems occur with the musical senses of bass. Roget's Thesaurus offers a rich sub-hierarchy within each category, however. Future implementations will likely use this information, which is currently ignored.</Paragraph>
    <Paragraph position="6"> Verbs: Verbs have not been considered in this particular study, and it appears that they may benefit from more local models of their typical arguments. The unmodified system does seem to perform well on verbs which show clear topic distinctions such as fire. It's weapon, engine, furnace, employee, imagination and pottery senses have been disambiguated with 85% accuracy.</Paragraph>
    <Paragraph position="7"> Pre-Nominal Modifiers: The disambiguation of pre-nominal modifiers (adjectives and compound nominals) is heavily dependent on the noun modified, and much less so on distant context. While class-based Bayesian discrimination may be useful here as well, the optimal window size is much narrower.</Paragraph>
    <Paragraph position="8"> Idioms: These broad context, topic-based discriminators are also less successful in dealing with a word like hand, which is usually found in fixed expressions such us on the other hand and close at hand. These fixed expressions have more function than content, and therefore, they do not lend themselves to a method that depends on differences in content. The situation is far from hopeless, as many idioms are listed directly in Roget's Thesaurus and can be associated with a category through simple table lookup. Other research, such as Smadja and McKeown (1990), have shown more general ways of identifying and handling these fixed expressions and collocations.</Paragraph>
    <Paragraph position="9"> Given the broad set of issues involved in sense disambiguation, it is reasonable to use several specialized tools in cooperation. We akeady handle part of speech distinctions through other methods; an efficient idiom recognizer would be an appropriate addition as well.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5. Linking Roget Categories with other
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Sense Representations
</SectionTitle>
      <Paragraph position="0"> The Roget category names tend to be highly mnemonic and may well suffice as sense tags. However, one may want to link the Roget tags with an established reference such as the sense numbers one finds in a dictionary. We accomplish this by applying the models described above to the text of the definitions in a dictionary, creating a table of correspondences between Roget categories and ACRES DE COLING-92, NANTES, 23-28 AOUT 1992 4 5 8 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 sense numbers. Results for the word crane are illustrated below for two dictionaries: (1) COBUILD (Sinclair, 1987), and (2) Collins English Dictionary, First Edition  a machine with a long movable large bird with a long neck and any large long-necked long-leg any similar bird, such as a her a device for lifting and moving a large trolley carrying a boom It may also be possible to link Roget category tags with &amp;quot;natural&amp;quot; sense tags, such as translations in a foreign language. We use a word-aligned parallel bilingual corpus such as the French-English Canadian Hansards for this purpose. For example, consider the polysemous word duty which can be translated into French its devoir or droit, depending on the sense (obligation or tax, respectively). When the Grolier-trained models are applied to the English side of the Hansards, the words tagged PRICE.FI~ most commonly aligned with the French words droits (256), droit (96) and douane (67). Words labeled OUT'/(the Roget category for Obligation)  most frequently aligned with devoir (205). These correlations may have useful implications for machine translation and bilingual lexicography.</Paragraph>
      <Paragraph position="1"> 6. Other Sense Disambiguation Methods:</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
The Knowledge Acquisition Bottleneck
</SectionTitle>
      <Paragraph position="0"> Word sense disambiguation is a long-standing problem in computational linguistics (Kaplan, 1950 ; Yngve, 1955; Bar-Hillel, 1960), with important implications for a variety of practical applications including speech synthesis, information retrieval, and machine translation.</Paragraph>
      <Paragraph position="1"> Most approaches may be characterized by the lollowing generalizations: 1) They tend to focus on the search for sets of word-specific features or indicators (typically words in context) which can disambignate the senses of a word. 2) Efforts to acquire these indicators have faced a knowledge acquisition bottleneck, characterized by either substantial human involvement for each word, and/or incomplete vocalmlary coverage.</Paragraph>
      <Paragraph position="2"> The AI community has enjoyed some success hand-coding detailed &amp;quot;word experts&amp;quot; (Small and Rieger, 1982; HirsL 1987), but this labor intensive process has severely limited coverage beyond small vocabularies.</Paragraph>
      <Paragraph position="3"> Others such as Lesk (1986), Walker (1987), Veronis and Ide (1990), and Guthrie et al. (1991) have turned to machine readable dictionaries (MRD's) in an effort to achieve broad vocabulary coverage. MRD's have the useful property that some indicative words for each sense are directly available in numbered definitions and examples. However, definitions arc often too short to provide an adequate set of indicators, and those words which are found lack significance weights to identify which are crucial and which are merely chaff.</Paragraph>
      <Paragraph position="4"> Dictionaries provide well structured but incomplete information.</Paragraph>
      <Paragraph position="5"> Recently, many have turned to text corpora to broaden the range and volume of available examples. Unlike dictionaries, however, raw corpora do not indicate which sense of a word occurs at a given instance. Several researchers (Kelly and Stone, 1975; Black, 1988) have overcome this through hand tagging of training examples, and were able to discover useful discriminatory patterns from the partitioned contexts. This also has proved labor intensive. Others (Weiss, 1973; Zeroik, 1990; Hearst, 1991) have attempted to partially automate the hand-tagging process through bootstrapping. Yet this has still required significant human intervention for each word in the vocabulary.</Paragraph>
      <Paragraph position="6"> Brown et al. (1991), Dagan (1991), and Gale ct at. (1992) have looked to parallel bilingual corpora to further automate training set acquisition. By identifying word correspondences in a bilingual text such as the Canadian Parliamentary Proceedings (Hansards), the translations found fur each English word may serve as sense tags. For example, the senses of sentence may be identified through their correspondence in the French to phrase (grammatical sentence) or peine (legal sentence). While this method has been used successfully on a portion of the vocabulary, its coverage is also limited. Currently available bilingual corpora lack size or diversity: over hulf of the words considered in this study either never appear in the Hansards or lack examples of secondary senses. More fundamentally, many words are mutually ambiguous across languages. French would be of little use in disambiguating the word interest, as all major senses translate as int~rdt. More promising is a non-lndo European language such as Japanese, which should avoid such mutual ambiguity for etymological reasons. Until more diverse, large bilingual corpora become available, the coverage of these methods will remain limited.</Paragraph>
      <Paragraph position="7"> Each of these approaches have faced a fundamental obstacle: word sense is an abstract concept that is not identified in natural texL Hence any system which hopes to acquire discriminators for specific senses of a word will need to isolate samples of those senses. While this process has been partially automated, it appears to require substantial human intervention to handle an unrestricted vocabulary.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML