File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2110_intro.xml
Size: 6,189 bytes
Last Modified: 2025-10-06 14:02:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2110"> <Title>A Very Large Dictionary with Paradigmatic, Syntagmatic, and Paronymic Links between Entries</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Types and Features of Entries </SectionTitle> <Paragraph position="0"> Wordforms are grouped together into dictionary entries basing on several important features: Parts of speech It can be any of the four main POS. The POS is defined according to the syntactic role: participles are considered adjectival; a prepositional phrase can be adjectival or adverbial, e.g., in substance is adjectival in unconstitutionality in substance ([?] 'substantial') and adverbial in to verify in substance ([?] 'substantially'). We consider such two-functional entries homonymous.</Paragraph> <Paragraph position="1"> Grammemes For a Russian noun, the singular and plural have their own collocations. Printed dictionaries mark this as mainly plural or the like. So we split the morphoparadigm of a noun to singular and plural, calling such sub-paradigms grammemes.</Paragraph> <Paragraph position="2"> Based on their syntactic roles, we divide morphoparadigms of verbs into the grammemes of participles (adjectival), gerunds (adverbial), and personal forms plus infinitives (predicates).</Paragraph> <Paragraph position="3"> Russian verbs have two aspects differing in their combinability: the perfect tends to collocate with singular nouns, the imperfect being indifferent to number; the perfect is usually modified with 'concentrated' adverbials like suddenly, at once or straightway, the imperfect preferring 'spread' adverbials like gradually, continuously or repeatedly.</Paragraph> <Paragraph position="4"> So we split verbs into aspectual grammemes.</Paragraph> <Paragraph position="5"> Homonyms We consider various homonyms separately. Their combinatorial differences are especially useful for word sense disambiguation.</Paragraph> <Paragraph position="6"> Idioms Idiomatic collocations like point of view are entries, since combinability of an idiom is always different from that of its head. Their components can, though, be also entries on their own.</Paragraph> <Paragraph position="7"> Multiwords If a non-idiomatic multiword has a single-word synonym, we treat it as an entry, since its combinability differs from that of its head.</Paragraph> <Paragraph position="8"> E.g., Rus. puti soobscenija 'routes of communications' has a synonym kommunikacii. Cf. a similar problem in EuroWordNet (Vossen, 2000).</Paragraph> <Paragraph position="9"> Absolute synonyms, abbreviations, and morphological variants Absolute synonyms (sofa = settee) are very rare in any language, but there are other types of equivalence: abbreviations (United States of America = USA = United States) and the so-called morphological variants (e.g., Rus. nul' = nol' 'zero' or mucat' = mucit' 'to torture'). Since all their collocations are the same, we store them as one entry, selecting one of them as a representative.</Paragraph> <Paragraph position="10"> Paste-ups Many Russian noun-headed concepts are used in two equivalent forms: (1) a bi-gram consisting of a modifier with the stem S1 plus its head noun with the stem S2, or (2) a single noun containing the stems S1 and S2, or their initial parts, or only S1: elektriceskij tok 'electrical current' = elektrotok; fiziceskij fakul'tet 'physical faculty' = fizfak; komiceskij akter 'comical actor' = komik.</Paragraph> <Paragraph position="11"> The number of the paste-ups grows, especially in the newswire and everyday speech, but in dictionaries they are scarce. Our dictionary stores about three thousand of them in both forms.</Paragraph> <Paragraph position="12"> Compound pairs Russian has numerous stable pairs of nouns separated by a dash, usually with both parts declinable in parallel: strana-ucastnica 'participant country', letcik-ispytatel' 'test pilot', zavod-izgotovitel' 'manufacturing plant'. A compound pair is considered an entry.</Paragraph> <Paragraph position="13"> Coordinated pairs Dependency links within multiwords can be of stable coordinative type: mother and father, safe and sound, sooner or later.</Paragraph> <Paragraph position="14"> We consider such pairs as both collocations (with syntagmatic links) and separate entries. E.g., each bracketed item of the term [[[probability] [theory]] and [[mathematical] [statistics]]] is an entry.</Paragraph> <Paragraph position="15"> Synonyms, hyperonyms/hyponyms, and antonyms These are semantically paradigmatic links.</Paragraph> <Paragraph position="16"> We take their participants as entries.</Paragraph> <Paragraph position="17"> Proper names We consider as entries those names that are a part of everyday life and encyclopaedic knowledge: names of geographic objects, countries, famous persons, large organizations, etc.</Paragraph> <Paragraph position="18"> They are linked to their hyperonyms: country, mountains, island, writer, organization, etc.</Paragraph> <Paragraph position="19"> Semantic derivates These are lexemes of any POS with same basic meaning, e.g., to marry, marriage, bride, bridegroom, and matrimonial (XPOS in WordNet). We take such words as entries.</Paragraph> <Paragraph position="20"> Idiomaticity in general All complete idioms are included as collocations, e.g., sest' |v galosu 'to get |into a fix', lit. 'to sit |into a galosh'. In rarer cases of tripartite idioms the dichotomy was merely a practical step; e.g. in byt' |bez carja v golove 'to be stupid', lit. 'to be |without the Tsar in one's head', we regard the right part as a modifier. Two marks are used: idiom and possible idiom, the latter for collocations with both figurative and direct senses, e.g., sest' v luzu means 'to get into a mess' or 'to sit down into a puddle'.</Paragraph> <Paragraph position="21"> Usage marks Special, bookish or obsolete: the use in writing is recommended if the meaning is clear to the writer; colloquial: the use in official writing is not recommended; vulgar: both written and oral use are prohibited; and incorrect: used sometimes but contradicts language norms.</Paragraph> </Section> class="xml-element"></Paper>