File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/88/a88-1020_abstr.xml
Size: 6,306 bytes
Last Modified: 2025-10-06 13:46:30
<?xml version="1.0" standalone="yes"?> <Paper uid="A88-1020"> <Title>A TOO1. FOR INVESTIGATING TIlE SYNONYMY RELATION IN A SENSE DISAMBIGUATED THESAURUS</Title> <Section position="2" start_page="0" end_page="144" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper describes an exploration of the implicit synonymy relationship expressed by synonym lists in an on-line thesaurus. A series of automatic steps was taken to properly constrain this relationship. The resulting groupings of semantically related word senses are believed to constitute a useful tool for natural language processing and for work in lexicography.</Paragraph> <Paragraph position="1"> Introduction The importance of semantic processing of natural language i:, generally acknowledged (Grishman 1986) and needs no justification.</Paragraph> <Paragraph position="2"> Work on applications such as information retrieval or machine translation has consistently focused on semantic analysis. A wide range of models has been suggested, based on semantic networks, on fuzzy logic, on conceptual dependencies and more. Common to all these models, however, is the researchers' reliance on hand-built semantic databases. These databases tend to be rather limited in scope and often restricted to narrow domains. If the process of constructing them remains manual, broad-coverage semantic analysis by computers will be severely handicapped for quite a long time. It is our goal, therefore, to explore automatic and semi-automatic ways of constructing these semantic databases, through the manipulation of machine-readable semantic sources. In this paper, we concentrate on heuristics for the automatic manipulation of synonyms found in an on-line thesaurus.</Paragraph> <Paragraph position="3"> First, we should clarify what we mean by &quot;synonyms&quot;. The definition of synonymy and the existence of synonyms have long been debated in linguistics. Some believe it is impossible to capture meaning, not even of the most concrete terms in natural language. Consequently, it is impossible to define synonymy or to identify synonymous terms (Quine 1960).</Paragraph> <Paragraph position="4"> Others believe it is possible to give full semantic representations of meaning and therefore to define synonymy formally and to identify true synonyms (Katz and Fodor 1963). According to this view, synonymy is a relationship of sameness of meaning between words, which is defined as the identity of their semantic representations. null We have chosen an operational approach to synonymy: The synonyms of a headword w are whatever words are listed in the entry for w in an on-line version of The New Collins Thesaurus (1984) (C'I'). l According to the authors, &quot;...no synonym is entered unless it is fully substitutable for the headword in a sensi-We have stored CT as a DAM file (Byrd, et al., 1986) with 16,794 keyed records containing a total of 287,136 synonym tokens. It has been supplemented with part-of-speech information from the UDICT computerized lexicon system (Byrd, 1986).</Paragraph> <Paragraph position="5"> ble English sentence&quot; (Collins 1984:v). This may suggest that each entry (i.e., a headword and its synonym list) contains all and only words that are closely related semantically. But the same synonyms appear in several lists, and headwords are themselves synonyms of other headwords, so that the lists in CT are implicitly interconnected. We seek algorithms to process all the words that are interconnected in the thesaurus into sets which share crucial semantic features.</Paragraph> <Paragraph position="6"> In the first section of this paper, we characterize the properties of the CT interconnections that we discovered in our manipulation of the CT links. Because of the asymmetric and intransitive nature of these links, our main difficulty has been to devise proper means of control to keep the computed sets of words closely related in meaning. In the second section, we describe our first control measure - our manipulation of senses of words rather than of words themselves. In the third section, we describe automatic ways of pruning the semantic trees we obtain. In the final section, we illustrate how this work can benefit various natural language applications by providing automatic access to semantically related word senses and an automatic means for measuring semantic distance.</Paragraph> <Paragraph position="7"> In the context of CT, a strong criterion for defining a set of words which share crucial semantic features is a criterion which requires every member of the set to be a synonym of every other member. The words in such a set would exhibit symmetric and transitive links.</Paragraph> <Paragraph position="8"> There are 27 sets of words in CT which are symmetric and transitive. Within the context of the thesaurus, these may be considered to have identical meaning. 26 out of the 27 are word pairs - the 27th is a triple - and all have a single sense and a unique part of speech. 2 These sets are given below.</Paragraph> <Paragraph position="10"> Most of the synonymy links in CT are markedly different from these. 62% are asymmetric (e.g., part has department as a synonym, but department does not have part); and 65% are non-transitive (e.g., part has piece as a synonym; piece has chunk as a synonym; but part does not have chunk as a synonym)) This asymmetry and non-transitivity have been noted by others (Dewdney 1987). Thus, in order to obtain semantic sets for most of the words in the thesaurus, symmetry and transitivity are too strict. An algorithm which permits asymmctric and non-transitive links must be developed. (See Warnesson 1985 for a different approach.) According to the substitutability definition of synonymy adopted by Collins, links should always be symmetric since if it is possible to substitute b for a in a &quot;sensible&quot; English context, then it is always possible to reintroduce a 2 It should be noted that CT's vocabulary is limited. Thus, it does not contain the verb &quot;perk&quot; or the noun &quot;saw&quot; as an instrument of cutting. The list of transitive and symmetric sets will vary with the size of the on-line source.</Paragraph> </Section> class="xml-element"></Paper>