File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/h93-1061_intro.xml
Size: 4,914 bytes
Last Modified: 2025-10-06 14:05:28
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1061"> <Title>A SEMANTIC CONCORDANCE</Title> <Section position="3" start_page="0" end_page="303" type="intro"> <SectionTitle> 2. WORDNET: A LEXICAL DATABASE </SectionTitle> <Paragraph position="0"> The lexical component of the universal semantic concordance that we are constructing is WordNet, an on-line lexical resource inspired by current psycholinguistic theories of haman lexical memory \[1, 2\]. A standard, handheld dictionary is organized alphabetically; it puts together words that are spelled alike and scatters words with related meanings.</Paragraph> <Paragraph position="1"> Although on-line versions of such standard dictionaries can relieve a user of alphabetical searches, it is clearly inefficient to use a computer merely as a rapid page-turner. WordNet is an example of a more efficient combination of traditional lexicography and modern computer science.</Paragraph> <Paragraph position="2"> The most ambitious feature of WordNet is the attempt to organize lexical information in terms of word meanings, rather than word forms. WordNet is organized by semantic relations (rather than by semantic components) within the open-class categories of noun, verb, adjective, and adverb; closed-class categories of words (pronouns, prepositions, conjunctions, etc.) are not included in WordNet. The semantic relations among open-class words include: synonymy and antonymy (which are semantic relations between words and which are found in all four syntactic categories); hyponymy and hypernymy (which are semantic relations between concepts and which organize nouns into a categorical hierarchy); meronymy and holonymy (which represent part-whole relations among noun concepts); and troponymy (manner relations) and entailment relations between verb concepts. These semantic relations were chosen to be intuitively obvious to nonlinguists and to have broad applicability throughout the lexicon.</Paragraph> <Paragraph position="3"> The basic elements of WordNet are sets of synonyms (or synsets), which are taken to represent lexicalized concepts.</Paragraph> <Paragraph position="4"> A synset is a group of words that are synonymous, in the sense that there are contexts in which they can be interchanged without changing the meaning of the statement.</Paragraph> <Paragraph position="5"> For example, WordNet distinguishes between the synsets: {board, plank, (a stout length of sawn timber)} {board, committee, (a group with supervisory powers)} In the context, &quot;He nailed a board across the entrance,&quot; the word &quot;plank&quot; can be substituted for &quot;board.&quot; In the context, &quot;The board announced last quarter's dividend,&quot; the word &quot;committee&quot; can be substituted for &quot;board.&quot; WordNet also provides sentence frames for each sense of every verb, indicating the kinds of simple constructions into which the verb can enter.</Paragraph> <Paragraph position="6"> WordNet contains only uninflected (or base) forms of words, so the interface to WordNet includes raorphy, a morphological analyzer that is applied to input strings to generate the base forms. For example, given &quot;went&quot; as the input string, rnorphy returns &quot;go&quot;; given &quot;children,&quot; it returns &quot;child,&quot; etc. raorphy first checks an exception list; if the input string is not found, it then uses standard rules of detachment.</Paragraph> <Paragraph position="7"> Words (like &quot;fountain pen&quot;) that are composed of two or more simpler words with spaces between them are called collocations. Since collocations are less polysemous than are individual words, their inclusion in WordNet promises to simplify the task of sense resolution. However, the morphology of collocations poses certain problems. Special algorithms are required for inflected forms of some collocations: for example, &quot;standing astride of&quot; will return the phrasal verb, &quot;stand astride of.&quot; As of the time this is written, WordNet contains more than 83,800 entries (unique character strings, words and collocations) and more than 63,300 lexicalized concepts (synsets, plus defining glosses); altogether there are more than 118,600 entry-concept pairs. The semantic relations are represented by more than 87,600 pointers between concepts.</Paragraph> <Paragraph position="8"> Approximately 43% of the entries are collocations. Approximately 63% of the synsets include definitional glosses. And approximately 14% of the nouns and 25% of the verbs are polysemous.</Paragraph> <Paragraph position="9"> WordNet continues to grow at a rate of almost 1,000 concepts a month. The task of semantic tagging has provided a useful stimulus to improve both coverage and precision.</Paragraph> </Section> class="xml-element"></Paper>