File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/c92-4189_intro.xml

Size: 3,830 bytes

Last Modified: 2025-10-06 14:05:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-4189">
  <Title>Genus Disambiguation: A Study in Weighted Preference*</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Much of tile previous research on the construction of networks of ganns terms front MRD's (Amsler and White 1979; Chodorow et al. 1985; Nakanmra and Nagao 1988; Vossen 1990) rexluired human intervention 1o distinguish fire sanses.</Paragraph>
    <Paragraph position="1"> Recantly, several researchers (Veronis and Ide 1990; Klavans et. al 1990; Copestake 1990; Vossen 1991) have suggested techniques for arttomatic disambiguation of these taxonomies based on neural net techniques, word overlap, or bilingual dictionaries. The * ThiB r~trch w~ ilupported by NSF Grant No. IRI8811108. null techniques we have used to construct a network of rialto senses autoluatically from tile Longman Dictionary of Coutenlporaly Falglish (LDOCE) differ snbst~mtialiy t+rom any of \[l'tose methods.</Paragraph>
    <Paragraph position="2"> In (Guthrie et al. 1990), we suggested and algorithm for disanlbiguating the gentls terms of nout~ definitmns in LDOCE. The procedure we nsed was based on the assumption that the semantic relationship between the headword and its genus should be reflected m their 1,1X)CE semantic categories. In other words, the semantic category of tim genus word should be identical to, or an ancestor of, the semantic category of the headword (an ancestor is a superordinate term in tire hierarchy ot semaltic codes). tJsthg a tandont saInplc of 520 noun word sanse from I,DOCE, we tested this assmnption.</Paragraph>
    <Paragraph position="3"> The semantic categories used (them ate thirtyfour in all) were detined by tile LlYOCE lexicographers, who placed sixteen of ttle basic categories in a hierarchy. The notion of a &amp;quot;more general semantic categoly&amp;quot; was somewhat subjective, as is illustrated in tile next section.</Paragraph>
    <Paragraph position="4"> The disautbiguation algorithm presented th (Guthrie et at. 1990) utilized three factors in determmmg the correct gcnns sense. The algorithm is stated as follows: * Choose tile genns sense with tile same senlalltiC category as the headword (or closest more ganeral category if this is not possible).</Paragraph>
    <Paragraph position="5"> + In the case of a tie, chonsca sense with has the sanle pragnlatic c(xle &amp;quot; In case there is still a tie,, or no germs sense meeting tim above criteria, choose the most flequently used sense:l: of the gemls word.</Paragraph>
    <Paragraph position="6"> :c In lh~ 2nd edition of LDOCE, rio: publidaors stC/,t~ fltat the ot'd~l&amp;quot; ill which word ~lBe8 me liate~l correspondn to am fieatuency with which each ~nne i+ u~e.d (ie. the tir+t ~nae li~C/d in tile most conmmnly u~d, ate.). W~ have obnelvad ACRES DE COL1NG-92, NAND~S, 23-28 AO~r 1992 1 1 8 7 PROC. OF COL1NG-92, NANrES, AUG. 23 28, 1992 The algorithm was successful abont 80% of the time.</Paragraph>
    <Paragraph position="7"> In an effort to improve the disambiguation algorithm, we condncted a series of experiments designed to identify more completely the contribution of each factor consider hi the algorithm. Since we considered three factors in determining the correct genus sense (the semantic code relationship, the pragmatic code relationship, and the frequency information), we designed experiments to first test each factor separately, and then again in combination, weighting each input according to its individual predictive value. Below we describe those experiments, beginning with the formulation of each factor, and undhig with the assignment of weights to the contribution of each input in file final disambiguation algorithm.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML