XML Viewer - c00-2094

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/c00-2094_evalu.xml
Size: 14,195 bytes
Last Modified: 2025-10-06 13:58:32
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2094">
  <Title>Using a Probabilistic Class-Based Lexicon for Lexical Ambiguity Resolution</Title>
  <Section position="5" start_page="651" end_page="653" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> We evaluated our resolution methods on a pseudo-disambiguation task sinlilar to that used in Rooth et al. (1999) for evaluating clustering models. We used a test set of 298 (v, n, n ~) triples where (v, n) is chosen randomly from a test corpus of pairs, and n ~ is chosen randomly according to the marginal noun distribution for the test corpus. Precision was calculated as the nmnber of times the disambiguation method decided for the non-random target noun (f~. = n).</Paragraph>
    <Paragraph position="1"> As shown in Fig. 4, we obtained 88 % precision for the class-based lexicon (ProbLex), which is a gain of 9 % over the best clustering model and a gain of 15 % over the hmnan  task for noun-ambiguity The results of the pseudo-disambiguation could be confirmed in a fllrther evaluation on a large number of randonfly selected examples of a real-world bilingual corpus. The corpus consists of sentence-aligned debates of the European parliament (mlcc = multilingual corpus for cooperation) with ca. 9 million tokens for German and English. From this corpus we prepared a gold standard as follows. We gathered word-to-word translations from online-available dictionaries and eliminated German nouns fbr which we could not find at least two English translations in the mice-corpus. The resulting 35 word dictionary is shown in Fig. 5. Based on this dictionary, we extracted all bilingual sentence pairs from the corpus which included both the source-noun and the target-noun. We restricted the resulting ca. 10,000 sentence pairs to those which included a source-noun from this 2Similar results for pseudo-dismnbiguation were obtained for a simpler approach which avoids another EM application for probabilistic class labeling. Here ~ (and ~) was chosen such that f~(v,~) = max((fLc (v, n) + 1)pcc (el v, n)). However, the sensitivity to class-parmnetcrs was lost in this approach.</Paragraph>
    <Paragraph position="2"> dictionary in the object position of a verb. Fm'therniore, the target-object was required to be included in our dictionary mid had to appear in a similar verb-object position as the sourceobject fbr an acceptable English translation of the German verb. We marked the German noun n q in the source-sentence, its English translation ne as appearing in the corpus, and the English lexical verb re. For the 35 word dictionary of Fig. 5 this senti-automatic procedure resulted ill a test corpus of 1,340 examples. The average ambiguity in this test corpus is 8.63 translations per source-word. Furthermore, we took the semantically most distant translations for 25 words which occured with a certain fi'equency in the ew~luation corpus. This gave a corpus of 814 examples with an average ambiguity of 2.83 translations. The entries belonging to this dictionary are highlighted in bold face in Fig. 5.</Paragraph>
    <Paragraph position="3"> The dictionaries and the related test corpora are available on the web 3.</Paragraph>
    <Paragraph position="4"> We believe that an evaluation on these test corpora is a realistic simulation of the hard task of target-language disambiguation in real-word machine translation. The translation alternatives are selected fl'om online dictionaries, correct translations are deternfined as the actual translations found in the bilingual corpus, no examples are omitted, the average ambiguity is high, and the translations are often very close to each other. In constrast to this, most other evaluations are based on frequent uses of only two clearly distant senses that were deternfined as interesting by the experimenters.</Paragraph>
    <Paragraph position="5"> Fig. 6 shows the results of lexical ambiguity resolution with probabilistic lcxica in comparison to simpler methods. The rows show the results tbr evaluations on the two corpora with average ambiguity of 8.63 and 2.83 respectively. Colunm 2 shows the percentage of correct translations found by disambiguation by random choice. Column 3 presents as another baseline disambiguation with the major sense, i.e., always choose the most frequent target-noun as translation of the source-noun. In colunto 4, the empirical distribution of (v, n) pairs in the training corpus extracted from the BNC is used as disambiguator. Note that this method yields good results in terms of precision (P #correct / $correct + $incorrect), but is much  aggression, assault, oll)2nce, onset, onsbmght, attack , charge, raid, whammy, inroad form~ type, way, fashion, lit, kind, wise, lllallller, species, mode, sort, wtriety abandonment~ otIieo~ task, exercise, lesson, giveup, jot) , 1)roblcm, tax eligibility, selection, choice, wwlty, assortment, extract, range, sample concept, item, notion, idea ground, land, soil, floor, bottom arrangement, institution, constitution, cstablishlnellt, feature, installation, construction, setup, adjustment, composition, organization amplification, extension, enhancement, expansion, dilatation, upgr~ding, add-on, increment error~ shortcoming, blemish, I)lunder, bug, defect, demerit, failure, fault, flaw, mistake, trouble, slip, blooper, lapsus pernlission, approval, COllSellt, acceptance, al)l)robation , authorization hlstory~ story, tale, saga, strip company~ society, COmlmnionshil), party, associate border, frontier, boundary, Ihnlt, periphery, edge nlaster~ nlatter~ reasoll~ base, catlse, grOlllld~ bottolii root card, map, ticket, chart site, situation, position, bearing, layer, tier deficiency, lack, privation, want, shortage, shortcoming, absence, dearth, demerit, desideratum, insufticimlcy, paucity, scarceness alnollnt~ deal, lot, Illass I mtlltitttde, l)lenty, qtlalltity, quiverful~ vOhlllle 1 abull(latlce, aplellty 1 assemblage , crowd, batch, crop, heal), lashings, scores, set, loads, I)ulk examinatlon, scrutiny, verification, ordeal, test, trial, inspection, tryout, assay, canvass, check, inquiry~ perusal, reconsideration, scruting difficttlty~ trollllle 1 problenl, severity, ar(lotlSlleSS 1 heaviness page~ party~ slde, point, aspect certainty, guarantee, safety, immunity, security , collateral , doubtlessness, sureness, deposit voice~ vote, tones elate, deadline~ meethtg, appointment, time, term assoclation, contact, link~ cha\[ll, ColIjtlnCtlOll~ COlll/ectioll~ fllSiOll, joint , conlpOtlll(l~ alliance, cl~tenation, tie, lllllOIl I t)Olld~ interface, liaison, touch, relation, incorporation ban, interdiction, I)rohibition, forbiddance eominitment: obligation, undertaking, duty, indebtedness , onus, debt, engagement, liability, bond COllfidence~ rellance, trllst~ faith, asstlrance~ dependence, private, secret election, option, choice , ballot, alternagive, poll , list path~ road, way, alley, route, lane resistance, opposition, drag character, icon, Sigll I sigllal, Syllll)ol, lllark, tokell~ figure, olneil ahn, destination, end, designation, target, goal, object, objective, sightings, intentimb prompt coherence, context~ COlltlgtllty, connectloli agreeinent~ approvaI~ assont, accordance, approbation, consent, afIinnation, allowance, compliance, comi)Iiancy, acclamation  worse in terms of effectiveness (E //corre(-t / \]/-correct q #:incorrect t \]/:don't know). The reason for this is that even if the distribution (ff (v,n) pairs is estimated quite precisely for the pairs in the large training corpus, there are still many pairs which receive the same or no positive probability at all. These effects can'be overcome by a clustering approach to disambiguation (column 5). Here the class-smoothed probability of a (v, n) pair is used to decide between alternative target-nouns. Since the clustering model assigns a more fine-grained probability to nearly every pair in its domain, there are no don't know cases for comparable precision values. However, the senmntically smoothed probability of the clustering models is still too coarse-grained when compared to a disambiguation with a prot)abilistic lexicon. Here ~ fllrther gain in precision and equally effectiveness of ca.</Paragraph>
    <Paragraph position="6"> 7 % is obtained on both corpora (column 6).</Paragraph>
    <Paragraph position="7"> We conjecture that this gain (:an be attrilmted to the combination of Dequency iilformation of the nouns and the fine-tuned distribution on the selection classes of the the nominal arguments of the verbs. We believe that including the set of translation alternatives in the ProbLex distribution is important for increasing efficiency, because it gives the dismnbiguation model the opportunity to choose among unseen alternatives. Furthermore, it seems that the higher precision of ProbLex can not be attributed to filling in zeroes in the empirical distribution. Rather, we speculate that ProbLex intelligently filters the empirical distribution by reducing maximal  counts for observations which do not fit into classes. This might help in cases where the empirical distribution has equal values for two alternatives. null  with probabilistic lexica for five sample words with two translations each. For this dictionary, a test corpus of 219 sentences was extracted, 200 of which were additionally labeled with acceptable translations. Precision is 78 % for finding correct translations and 90 % for finding acceptable translations.</Paragraph>
    <Paragraph position="8"> Furthermore, in a subset of 100 test items with average ambiguity 8.6, a lmnlan judge having access only to the English verb and the set of candidates for the targel,-lloun, i.e. the information used by the model, selected anlong translations. On this set;, human precision was 39 %.</Paragraph>
  </Section>
  <Section position="6" start_page="653" end_page="654" type="evalu">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> Fig. 8 shows a comparison of our approadl to state-of-the-art unsupervised algorithlns for word sense disambiguation. Column 2 shows the number of test examples used to evaluate the various approaches. The range is from ca. 100 examples to ca. 37,000 examples. Our method was evaluated on test corpora of sizes 219, 814, and 1,340. Column 3 gives the average number of senses/eranslations for the different disambiguation methods. Here the range of the ambiguity rate is from 2 to about 9 senses 4. Column 4 4The mnbiguity factor 2.27 attributed to Dagan and Itai's (1994) experiment is calculated by dividing their average of 3.27 alternative translations by their average of 1.44 correct translations. Furthermore, we calculated the ambiguity factor 3.51 for Resnik's (1997) experiment shows the rmldom baselines cited for the respective experiments, ranging t'rom ca. 11% to 50 %.</Paragraph>
    <Paragraph position="1"> Precision values are given in column 5. In order to compare these results which were computed for different ambiguity factors, we standardized the measures to an evaluation for binary ambiguity. This is achieved by calculal;ing pl/log2 arab for precision p and ambiguity factor arab. The consistency of this &amp;quot;binarization&amp;quot; can be seen by a standardization of the different random baselines which yields a value of ca. 50 % for all approaches 5. The standardized precision of our approach is ca. 79 % on all test corpora. The most direct point of comparison is the method of Dagan and Itai (1994) whirl1 gives 91.4 % precision (92.7 % standardized) and 62.1% effectiveness (66.8 % standardized) on 103 test; examples for target word selection in the transfer of Hebrew to English. However, colnpensating this high precision measure for the low effectiveness gives values comparable to our results. Dagan and Itai's (1994) method is based on a large variety of gramnmtieal relations tbr verbal, nominal, and adjectival predicates, but no class-based infornmtion or slot-labeling is used. I{esnik (1997) presented a disambiguation method which yields 44.3 % precision (63.8 % standardized) tbr a test set of 88 verb-object tokens. His approach is coral)arable to ours in terlns of infbrmedness of the (tisambiguator. Hc also uses a class-based selection measure, but based on WordNet classes.</Paragraph>
    <Paragraph position="2"> However, the task of his evaluation was to select WordNet-senses tbr the objects rather than the objects themselves, so the results cannot be compared directly. The stone is true for the SENSEVAL evaluation exelcise (Kilgarriff and Rosenzweig, 2000)--there word senses from the HECTOl~-dictionary had to be disambiguated.</Paragraph>
    <Paragraph position="3"> The precision results for the ten unsupervised systems taking part in the comt)etitive evaluation ranged Kern 20-65% at efficiency values from 3-54%. The SENSEVAL '~tan(lard is clearly beaten by the earlier results of Yarowsky (1995) (96.5 % precision) and Schiitze (1992) (92 % precision). However, a comparison to these refrom his random baseline 28.5 % by taking 100/28.5; reversely, Dagan and Itai's (1994) random baseline can be calculated as 100/2.27 = 44.05. Tile ambiguity t;'~ctor for SENSEVAL is calculated for tile llOUll task in the English SENSEVAL test set.</Paragraph>
    <Paragraph position="4"> 5Note that we are guaranteed to get exactly 50 % standardized random 1)aseline if random, arab = 100 %.  sults is again somewhat difficult. Firstly, these at)proaches were ewfluated on words with two clearly (tistmlt senses which were de/;el'nfined by the experimenters. In contrast, our method was evalutated on randonfly selected actual translations of a large t)ilingual cortms. Furthermore, these apl)roaches use large amounts of infbrmation in terms of linguistic ca.tegorizations, large context windows, or even 1111nual intervention such as initial sense seeding (hqtrowsky, 1995).</Paragraph>
    <Paragraph position="5"> Such information is easily obtainabh;, e.g., in I1\].</Paragraph>
    <Paragraph position="6"> at)tflications , but often burdensome to gather or sim.i)ly uslavail~bh'~ in situations such as incremental parsing O1' translation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML