File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/90/c90-2067_intro.xml

Size: 14,230 bytes

Last Modified: 2025-10-06 14:04:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-2067">
  <Title>Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries</Title>
  <Section position="3" start_page="0" end_page="390" type="intro">
    <SectionTitle>
2. Previous work
</SectionTitle>
    <Paragraph position="0"> 2.1. Machine-readable dictionaries Jbr WSD There have been several attempts to exploit the information in maclfine-readable versions of everyday dictionaries (see, tor instance, Amsler, 1980; Calzolari, 1984; Chodorow, Byrd and Heidorn, 1985; Markowitz, Ahlswede and Evens, 1986; Byrd et al., 1987; V&amp;onis, Ide and Wurbel, 1989), in which an enormous amount of lexical and semantic knowledge is already &amp;quot;encoded&amp;quot;. Such information is not systematic or even complete, and its extraction from machine-readable dictionaries is not always straightforward. However, it has been shown that even in its base form, information from machine-readable dictionaries can be used, for example, to assist in the disambiguation of prepositional phrase attachment (Jensen and Bluet, 1987), or to find subject domains in texts (Walker and Amsler, 1986).</Paragraph>
    <Paragraph position="1"> The most general and well-known attempt to utilize information in machine-readable dictionaries for WSD is that of Lesk (1986), which computes the degree of overlap--that is, number of shared words--in definition texts of words that appear in a ten-word window of</Paragraph>
    <Paragraph position="3"> context. The sense of a word with the greatest number of overlaps with senses of other words in the window is chosen as the correct one. For example, consider the definitions of pen and sheep from the Collins English Dictionary, the dictionary used in our experiments, in  and PAGE in the Collins English Dictionary pen 1 1. an implement for writing or drawing using ink, formerly consisting of a sharpened and split quill, and now of a metal nib attached to a holder. 2. the writing end of such an implement; nib. 3. style of writing. 4. the pen. a. writing as an occupation, b. the written word. 5, the long horny internal shell of a squid. 6. to write or compose.</Paragraph>
    <Paragraph position="4"> pen 2 1. an enclosure in which domestic animals are kept. 2.any place of confinement. 3. a dock for servicing submarines. 4. to enclose or keep in a pen.</Paragraph>
    <Paragraph position="5"> pen 3 short for penitentiary.</Paragraph>
    <Paragraph position="6"> pen 4 a female swan.</Paragraph>
    <Paragraph position="7"> sheep L any of various bovid mammals of the genus O~is and related genera having transversely ribbed horns and a narrow face, There are many breeds of domestic sheep, raised for their wool and for meat. 2. :Barbary sheep. 3. a meek or timid person. 4. separate the sheep from the goats, to pick out the members of any group who are superior in some respects.</Paragraph>
    <Paragraph position="8"> goat 1. any sure-footed agile bovid mammal of the genus Capra, naturally inhabiting rough stony ground in Europe, Asia, and N Africa, typically having a brown-grey colouring and a beard. Domesticated varieties (C. hircus) are reared for milk, meat, and wool. 3. a lecherous man. 4. a bad or inferior member of any group 6. act (or play) the (giddy) goat. to fool around. 7. get (someone's) goat. to cause annoyance to (someone) page I 1. one side of one of the leaves of a book, newspaper, letter, etc. or the written or printed matter it bears. 2. such a leaf considered as a unit 3. an episode, phase, or period 4. Printing. the type as set up for printing a page. 6. to look through (a book, report, etc.); leaf through.</Paragraph>
    <Paragraph position="9"> page 2 1. a boy employed to run errands, carry messages, etc., for the guests in a hotel, club, etc. 2. a youth in attendance at official functions or ceremonies. 3. a. a boy in training for knighthood in personal attendance on a knight, b. a youth in the personal service of a person of rank. 4. an attendant at Congress or other legislative body. 5. a boy or girl employed in the debating chamber of the house of Commons, the Senate, or a legislative assembly to carry messages for members. 6. to call out the name of (a person). 7. to call (a person) by an electronic device, such as bleep, g. to act as a page to or attend as a page.</Paragraph>
    <Paragraph position="10"> If these two words appear together in context, the appropriate senses of pen (2.1: &amp;quot;enclosure&amp;quot;) and sheep (1: &amp;quot;mammal&amp;quot;) will be chosen because the definitions of these two senses have the word domestic in common.</Paragraph>
    <Paragraph position="11"> However, with one word as a basis, the relation is tenuous and wholly dependent upon a particular dictionary's wording. The method also fails to take into account less immediate relationships between words.</Paragraph>
    <Paragraph position="12"> As a result, it will not determine the correct sense of pen in the context of goat. The correct sense of pen (2.1: enclosure ) and the correct sense of goat (1: mammal ) do not share any words in common in their definitions in the Collins English Dictionary; however, a strategy  which takes into account a longer path through definitions will find that animal is in the definition of pen 2.1, each of mammal and animal appear in the definition of the other, and mammal is in the definition of goat 1.</Paragraph>
    <Paragraph position="13"> Similarly, Lesk's method would also be unable to determine the correct sense of pen (1.1: writing utensil ) in the context of page, because seven of the thirteen senses of pen have the same number of overlaps with senses of page. Six of the senses of pen share only the word write with the correct sense of page (1.1: &amp;quot;leaf of a book&amp;quot;). However, pen 1.1 also contains words such as draw and ink, and page 1.1 contains book, newspaper, letter, and print. These other words are heavily interconnected in a complex network which cannot be discovered by simply counting overlaps.</Paragraph>
    <Paragraph position="14"> Wilks et al. (forthcoming) build on Lesk's method by computing the degree of overlap for related word-sets constructed using co-occurrence data from definition texts, but their method suffers from the same problems, in addition to combinatorial problems thai prevent disambiguating more than one word at a time.</Paragraph>
    <Paragraph position="15"> 2.2. Neural networks for WSD Neural network approaches to WSD have been suggested (Cottrell and Small, 1983; Waltz and Pollack, 1985). These models consist of networks in which the nodes (&amp;quot;neurons&amp;quot;) represent words or concepts, connected by &amp;quot;activatory&amp;quot; links: the words activate the concepts to which they are semantically related, and vice versa. In addition, &amp;quot;lateral&amp;quot; inhibitory links usually interconnect competing senses of a given word.</Paragraph>
    <Paragraph position="16"> Initially, the nodes corresponding to the words in the sentence to be analyzed are activated. These words activate their neighbors in the next cycle in turn, these neighbors activate their immediate neighbors, and so on. After a number of cycles, the network stabilizes in a state in which one sense for each input word is more activated than the others, using a parallel, analog, relaxation process.</Paragraph>
    <Paragraph position="17"> Neural network approaches to WSD seem able to capture most of what cannot be handled by overlap strategies such as Lesk's. However, the networks used in experiments so far are hand-coded and thus necessarily very small (at most, a few dozen words and concepts). Due to a lack of real-size data, it is not clear that the same neural net models will scale up for realistic application. Further, some approaches rely on &amp;quot;contextsetting&amp;quot; nodes to prime particular word senses in order to force 1the correct interpretationdeg But as Waltz and Pollack point out, it is possible that such words (e.g., writing in the context of pen ) are not explicitly present in the text under analysis, but may be inferred by the reader from the presence of other, related words (e.g., page, book, inkwell, etc.). To solve this problem, words in such networks have been represented by sets of semantic &amp;quot;microfeatures&amp;quot; (Waltz and Pollack, 1985; Bookman, 1987) which correspond to fundamental semantic distinctions (animate/inanimate, edible/ inedible, threatening/safe, etc.), characteristic duration of events (second, minute, hour, day, etc.), locations (city, country, continent, etc.), and other similar distinctions that humans typically make about situations in the world. To be comprehensive, the authors suggest that these features must number in the thousands. Each concept iin the network is linked, via bidirectional activatory or inhibitory links, to only a subset of the complete microfeature set. A given concept theoretically shares several microfeatures with concepts to which it is closely related, and will therefore activate the nodes corresponding to closely related concepts when it is activated :itself.</Paragraph>
    <Paragraph position="18"> ttowever, such schemes are problematic due to the difficulties of designing an appropriate set of microfeatures, which in essence consists of designing semantic primitives. This becomes clear when one exmnines the sample microfeatures given by Waltz ~md Pollack: they specify micro.f carfares such as CASINO and CANYON, but it is obviously questionable whether such concepts constitute fundamental semantic distinctions.</Paragraph>
    <Paragraph position="19"> More practically, it is simply difficult to imagine how vectors of several thousands of microfeamrcs for each one of the lens of thousands of words and hundreds of thousands of senses can be realistically encoded by hand.</Paragraph>
    <Paragraph position="20"> 3. Word sense disambiguation with VLNNs Our approach to WSD takes advantage of both strategies outlined above, but enables us to address solutions to their shortcomings. This work has been carried out in tile context of a joint project of Vassar College and the Groupe Reprdsentation et Traitement des Connaissances of the Centre National de la Recherche Scientifique (CNRS), which is concerned with the construction and exploitation of a large lexical data base of English and French. At present, the Vassar/CNRS data base includes, through the courtesy of several editors and research institutions, several English and French dictionaries (the Collins English Dictionary, the Oxford Advanced Learner's Dictionary, the COBUILD Dictionary, the Longman) Dictionary of Contemporary English, theWebster's 9th Dictionary, and the ZYZOMYS CD-ROM dictionary from Hachette Publishers) as well as several other lexical and textual materials (the Brown Corpus of American English, the CNRS BDLex data base, the MRC Psycholinguistic Data Base, etc.).</Paragraph>
    <Paragraph position="21"> We build VLNNs utilizing definitions in the Collins English Dictionary. Like Lesk and Wilks, we assume that there are significant semantic relations between a word and the words used to define it. The connections in the network reflect these relations. All of the knowledge represented in the network is automatically generated from a machine-readable dictionary, and therefore no hand coding is required. Further, the lexicon m~d the knowledge it contains potentially cover all of English (90,000 words), and as a result this information cml potentially be used to help dismnbiguate unrestricted text.</Paragraph>
    <Paragraph position="22"> 3.1. Topology of the network In our model, words are complex units. Each word in the input is represented by a word node connected by excitatory links to sense nodes (figure 2) representing the different possible senses tbr that word in the Collins English Dictionary. Each sense node is in turn connected by excitatory links to word nodes rcpreseming the words in tile definition of that sense.</Paragraph>
    <Paragraph position="23"> This process is repeated a number of times, creating an increasingly complex and interconnected network.</Paragraph>
    <Paragraph position="24"> Ideally, the network would include the entire dictionary, but for practical reasons we limit the number of repetitions and thus restrict tile size of the network to a few thousand nodes and 10 to 20 thousand transitions.</Paragraph>
    <Paragraph position="25"> All words in the network are reduced to their lemmas, and grammatical words are excluded. The different sense nodes tor a given word are interconnected by lateral inhibitory links.</Paragraph>
    <Paragraph position="26">  When the network is run, the input word nodes are activated first. Then each input word node sends activation to its sense nodes, which in turn send activation to the word nodes to which they are connected, and so on throughout the network for a number of cycles. At each cycle, word and sense nodes receive feedback from connected nodes. Competing sense nodes send inhibition to one another. Feedback and inhibition cooperate in a &amp;quot;winner-take-all&amp;quot; strategy to activate increasingly related word and sense nodes and deactivate the unrelated or weakly related nodes.</Paragraph>
    <Paragraph position="27"> Eventually, after a few dozen cycles, the network stabilizes in a configuration where only the sense nodes with the strongest relations to other nodes in the network are activated. Because of the &amp;quot;winner-take-all&amp;quot; strategy, at most one sense node per word will ultimately be activated.</Paragraph>
    <Paragraph position="28"> Our model does not use microfeatures, because, as we will show below, the context is taken into account by the number of nodes in the network and the extent to which they are heavily interconnected. So far, we do not consider the syntax of the input sentence, in order to locus on the semantic properties of the model. However, it is clear that syntactic information can assist in the disambiguation process in certain cases, and a network including a syntactic layer, such as that proposed by Waltz and Pollack, would undoubtedly enhance the model's behavior.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML