File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2110_metho.xml

Size: 8,286 bytes

Last Modified: 2025-10-06 14:09:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2110">
  <Title>A Very Large Dictionary with Paradigmatic, Syntagmatic, and Paronymic Links between Entries</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Types of Syntagmatic Links
</SectionTitle>
    <Paragraph position="0"> We define a collocation as a syntactically connected and semantically compatible pair of content words, like full-length dress, well expressed, to briefly expose, to pick up the knife or to listen to the radio (collocation components are underlined).</Paragraph>
    <Paragraph position="1"> Syntactical connectedness is understood as in dependency grammars (Mel'cuk, 1995) (maybe through an auxiliary word), not as co-occurrence (Bentivogli and Pianta, 2002); the components can be distant in the sentence. We consider collocations from absolutely free to purely idiomatic. The following are collocation types.</Paragraph>
    <Paragraph position="2"> Modifiers These are modifying or attributive components: great - country; man - of letters; eat - quickly; enormously - big; very - well.</Paragraph>
    <Paragraph position="3"> Verbs with their subjects The subject is a specific dependent of a predicate verb: soldier died; bus - arrives. A specifically Russian type of the subject-to-predicate link is a predicate containing the copula byt' 'to be' (omitted in Russian in present tense) and an adjectival in short form: god - zaversen (participle) 'the year is over'; vek korotok (adjective) 'the lifetime is short'.</Paragraph>
    <Paragraph position="4"> Verbs with their noun complements Noun complements of a verb are all nouns dependent on the verb as direct, indirect, or prepositional object: to read a book; to strive for peace. We also consider as complements circumstantial phrases like to travel by train. A word can have several complements; each collocation reflects one of them, while the omission of other obligatory complement(s) is marked with the ellipsis: to give ... to the boy.</Paragraph>
    <Paragraph position="5"> Nouns / adjectivals with their noun complements All POS can have noun complements, e.g., nouns the capital of the country, the struggle against poverty; adjectives blind with rage, mentioned by the observer or going to the cinema.</Paragraph>
    <Paragraph position="6"> Verbs / nouns / adjectivals with their infinitive complements E.g. to stop to talk or to permit to enter; permission to enter or cream to protect; forced to return or ready to appeal.</Paragraph>
    <Paragraph position="7"> Adverbials with their infinitive complements These are purely Russian collocations: xolodno idti 'it is cold to go', lit. 'coldly to go'; reshiv (gerund) idti 'after having decided to go'. They are possible only with some predicative adverbs or gerunds.</Paragraph>
    <Paragraph position="8"> Adverbials with noun complements Purely Russian: xolodno (adverb) bez pal'to 'it is cold without a coat', lit. 'coldly without a coat'; pobyvav (gerund) v centre 'after visiting the center'. Verbs / adjectivals with their adjectival complements E.g., to remain silent or to consider...</Paragraph>
    <Paragraph position="9"> stupid; remaining silent or considering ... stupid.</Paragraph>
    <Paragraph position="10"> Coordinated pairs E.g. mom and dad, safe and sound, or sooner or later, cf. (Bolshakov et al.,  onymy groups have a dominant member and may include member(s) marked as its absolute synonyms. Synonyms can be periphrastic multiwords or even short definitions: to help [?] to give help; fall [?] quick descent; suffocation [?] lack of fresh air. Nonabsolute synonyms can be used for heuristic inferences of new collocations from those existing in the dictionary (Bolshakov and Gelbukh, 2002a).</Paragraph>
    <Paragraph position="11"> Hyponyms vs. hyperonyms Hyperonyms are also used for such inferences.</Paragraph>
    <Paragraph position="12"> Antonyms Together with standard antonyms (goodbad, vanguardrearguard), we consider opposite notions: missilesantimissiles.</Paragraph>
    <Paragraph position="13"> Meronyms vs. holonyms E.g. fingerhand, motorcar.</Paragraph>
    <Paragraph position="14"> Semantic derivates They connect parts of the morphoparadigms split into grammemes. Also, they describe the same idea from various aspects, thus compensating for the lack of glosses.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Types of Paronymic Links
</SectionTitle>
    <Paragraph position="0"> The types of such links are as follows: Literal paronyms They are at the distance of few editing operations (replacement, omission, insertion, permutation of adjacent letters) from each other. E.g., for sign: sigh, sin, sing. They are useful, e.g., to correct the malapropisms (Bolshakov and Gelbukh, 2003a).</Paragraph>
    <Paragraph position="1"> Morphemic paronyms They are of the same POS and radix but have different prefixes and/or suffixes, e.g., sens-ation-al, sens-ible, in-sens-ible, sens-itive, sens-less, sens-ual. Foreigners' malapropisms are often confusion of morphemic paronyms, so that we can immediately propose candidates for correcting such errors.</Paragraph>
    <Paragraph position="2"> Auxiliary parts of CrossLexica is a Russian-English-Russian dictionary (e.g., by two English words, the user can find a fluent Russian collocation), and a generator of all morphological forms.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Interlingual Structural Universality
</SectionTitle>
    <Paragraph position="0"> The system operates with two main data structures: a list of entries and a set of links between them. An entry contains a list of its morphological categories.</Paragraph>
    <Paragraph position="1"> This structure is language-independent.</Paragraph>
    <Paragraph position="2"> The specific links between entries can, however, be language specific. Let us outline grammatical peculiarities of Russian that influence these links.</Paragraph>
    <Paragraph position="3"> Nouns and adjectivals declinable In English this problem does not exist.</Paragraph>
    <Paragraph position="4"> Too few tenses Russian verbs have only three tenses, whereas English has many.</Paragraph>
    <Paragraph position="5"> No articles For other languages, it is important to specify the forms of articles in collocations.</Paragraph>
    <Paragraph position="6"> Nouns cannot modify nouns In English the collocations like book review are quite common. A special attributive type of syntagmatic links should be introduced for such English collocations.</Paragraph>
    <Paragraph position="7"> Thus the Cross-Lexica structure is (almost) linguistically universal.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
7 CrossLexica Statistics and Some Discussion
</SectionTitle>
    <Paragraph position="0"> As of April 2004, the dictionary contains more than 120,000 entries. Collocations are divided into three classes: primary, secondary, and inferred.</Paragraph>
    <Paragraph position="1"> The primary collocations are collected manually.</Paragraph>
    <Paragraph position="2"> The secondary collocations result from automatic morphological transformations of the primary ones.</Paragraph>
    <Paragraph position="3"> For example, verbs with their noun complements are transformed into adjectivals with their noun complements, e.g., to participate in the meeting gives participating in the meeting.</Paragraph>
    <Paragraph position="4"> Table 1 shows the statistics of the collocations.</Paragraph>
    <Paragraph position="5">  The inferences are performed with constraints (Bolshakov and Gelbukh, 2002a), e.g., the source collocation cannot be an idiom, to avoid the inference like (hot dog)idiom &amp; (poodle IS_A dog) *(hot poodle). The total of the inferred collocations never exceeded 6 to 8% of the primaries and is declining because the rare species are getting a full description within the primaries.</Paragraph>
    <Paragraph position="6"> In Table 2, other link statistics are given. All links are counted, e.g., n antonyms pairs give 2n unilateral links, and a group of n synonyms gives n(n-1)/2. The total is more than 1.2 million. Thus, the total of links of the three classes is 3.6 million.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML