File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2711_metho.xml
Size: 19,010 bytes
Last Modified: 2025-10-06 14:09:23
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2711"> <Title>Valency Frames of Czech Verbs in VALLEX 1.0</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Logical Structure of the VALLEX Data </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Word Entries </SectionTitle> <Paragraph position="0"> On the topmost level, VALLEX 1.0 is divided into word entries (the HTML 'graphical' layout of a word entry is depicted on Fig. 1). Each word entry relates to one or more headword lemmas5 (Sec. 3.2). The word entry consists of a sequence of frame entries (Sec. 3.5) relevant for the lemma(s) in question (where each frame entry usually corresponds to one of the lemma's meanings).</Paragraph> <Paragraph position="1"> Information about the aspect (Sec. 3.16) of the lemma(s) is assigned to each word entry as a whole.</Paragraph> <Paragraph position="2"> Most of the word entries correspond to lemmas in a simple one-to-one manner, but the following two non-trivial situations (and even combinations of them) appear as well in VALLEX 1.0: 5Remark on terminology: The terms used here either belong to the broadly accepted linguistic terminology, or come from the Functional Generative Description (FGD), which we have used as the background theory, or are defined somewhere else in this text.</Paragraph> <Paragraph position="3"> a0 lemma variants (Sec. 3.3) a0 homonyms (Sec. 3.4) The content of a word entry roughly corresponds to the traditional term of lexeme.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Lemmas </SectionTitle> <Paragraph position="0"> Under the term of lemma (of a verb) we understand the infinitive form of the respective verb, in case of homonym (Sec. 3.4) followed by a Roman number in superscript (which is to be considered as an inseparable part of the lemma in VALLEX 1.0!).</Paragraph> <Paragraph position="1"> Reflexive particles se or si are parts of the infinitive only if the verb is reflexive tantum, primary (e.g. ba't se) as well as derived (e.g. zabi't se, sVi'rVit se, vra'tit se).</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Lemma Variants </SectionTitle> <Paragraph position="0"> Lemma variants are groups of two (or more) lemmas that are interchangable in any context without any change of the meaning (e.g. doveVdeVt se/dozveVdeVt se). The only difference usually is just a small alternation in the morphological stem, which might be accompanied by a subtle stylistic shift (e.g. myslet/myslit, the latter one being bookish).</Paragraph> <Paragraph position="1"> Moreover, although the infinitive forms of the variants differ in spelling, some of their conjugated forms are often identical (mysli (imper.sg.) both for myslet and myslit).</Paragraph> <Paragraph position="2"> The term 'lemma variants' should not be confused with the term 'synonymy'.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 Homonyms </SectionTitle> <Paragraph position="0"> There are pairs of word entries in VALLEX 1.0, the lemmas of which have the same spelling, but considerably differ in their meanings (there is no obvious semantic relation between them). They also might differ as to their etymology (e.g. nakupovata0 - to buy vs. nakupovata0a1a0 - to heap), aspect (Sec. 3.16) (e.g. stacVita0 pf. - to be enough vs. stacVita0a2a0 impf. - to catch up with), or conjugated forms (zVilo (past.sg.fem) for zVi'ta0 - to live vs. zValo(past.sg.fem) zVi'ta0a1a0 - to mow). Such lemmas (homonyms)6 are distinguished by Roman numbering in superscript. These numbers should be understood as an inseparable part of lemma in</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> VALLEX 1.0. 3.5 Frame Entries </SectionTitle> <Paragraph position="0"> Each word entry consists of a non-empty sequence of frame entries, typically corresponding to the individual meanings (senses) of the headword lemma(s) (from this point of view, VALLEX 1.0 can be classified as a Sense Enumerated Lexicon).</Paragraph> <Paragraph position="1"> 6Note on terminology: we have adopted the term 'homonyms' from Czech linguistic literature, where it traditionally stands for what was stated above (words identical in the spelling but considerably different in the meaning); in English literature the term 'homographs' is sometimes used to express the same notion.</Paragraph> <Paragraph position="2"> The frame entries are numbered within each word entry; in the VALLEX 1.0 notation, the frame numbers are attached to the lemmas as subscripts.</Paragraph> <Paragraph position="3"> The ordering of frames is not completely random, but it is not perfectly systematic either. So far it is based only on the following weak intuition: primary and/or the most frequent meanings should go first, whereas rare and/or idiomatic meanings should go last. (We do not guarantee that the ordering of meanings in this version of VALLEX 1.0 exactly matches their frequency of the occurrences in contemporary language.) Each frame entry7 contains a description of the valency frame itself (Sec. 3.6) and of the frame attributes (Sec. 3.13).</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.6 Valency Frames </SectionTitle> <Paragraph position="0"> In VALLEX 1.0, a valency frame is modeled as a sequence of frame slots. Each frame slot corresponds to one (either required or specifically permitted) complementation8 of the given verb.</Paragraph> <Paragraph position="1"> The following attributes are assigned to each slot: a0 functor (Sec. 3.7) a0 list of possible morphemic forms (realizations) (Sec. 3.8) a0 type of complementation (Sec. 3.11) Some slots tend to systematically occur together. In order to capture this type of regularity, we introduced the mechanism of slot expansion (Sec. 3.12) (full valency frame will be obtained after performing these expansions).</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.7 Functors </SectionTitle> <Paragraph position="0"> In VALLEX 1.0, functors (labels of 'deep roles'; similar to theta-roles) are used for expressing types of relations between verbs and their complementations. According to FGD, functors are divided into inner participants (actants) and free modifications (this division roughly corresponds to the argument/adjunct dichotomy). In VALLEX 1.0, we also distinguish an additional group of quasi-valency complementations.</Paragraph> <Paragraph position="1"> Functors which occur in VALLEX 1.0 are listed in the following tables (for Czech sample sentences see (Lopatkova' et al., 2002), page 43): Inner participants: also value DIR occurs in the VALLEX 1.0 data. It is used only as a special symbol for slot expansion (Sec. 3.12). richer than that shown above, moreover, it is still being elaborated within the Prague Dependency Treebank. We do not use its full (current) set in VALLEX 1.0 due to several reasons. Some functors do not occur with a verb at all (e.g. APP - appuertenace, 'my.APP dog'), some other functors can occur there, but represent other than dependency relation (e.g. coordination, 'Jim or.CONJ Jack'). And still others can occur with verbs as well, but their behaviour is absolutely independent of the head verb, thus they have nothing to do with valency frames (e.g. ATT attitude, 'He did it willingly.ATT').</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.8 Morphemic Forms </SectionTitle> <Paragraph position="0"> In a sentence, each frame slot can be expressed by a limited set of morphemic means, which we call forms. In VALLEX 1.0, the set of possible forms is defined either explicitly (Sec. 3.9), or implicitly (Sec. 3.10). In the former case, the forms are enumerated in a list attached to the given slot. In the latter case, no such list is specified, because the set of possible forms is implied by the functor of the respective slot (in other words, all forms possibly expressing the given functor may appear).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.9 Explicitly Declared Forms </SectionTitle> <Paragraph position="0"> The list of forms attached to a frame slot may contain values of the following types: a0 Pure (prepositionless) case. There are seven morphological cases in Czech. In the VALLEX 1.0 notation, we use their traditional numbering: 1 - nominative, 2 - genitive, 3 - dative, 4 - accusative, 5 vocative, 6 - locative, and 7 - instrumental.</Paragraph> <Paragraph position="1"> a0 Prepositional case. Lemma of the preposition (i.e., preposition without vocalization) and the number of the required morphological case are specified (e.g., z+2, na+4, o+6. . . ). The prepositions occurring in VALLEX 1.0 are the following: bez, do, jako, k, kolem, kvu@li, mezi, mi'sto, na, nad, na u'kor, o, od, ohledneV, okolo, oproti, po, pod, podle, pro, proti, prVed, prVes, prVi, s, u, v, ve prospeVch, vu@cVi, v za'jmu, z, za. ('jako' is traditionally considered as a conjunction, but it is included in this list, as it requires a particular morphological case in some valency frames). null a0 Subordinating conjunction. Lemma of the conjunction is specified. The following subordinating conjunctions occur in VALLEX 1.0: aby, at', azV, jak, zda,9 zVe.</Paragraph> <Paragraph position="2"> a0 Infinitive construction. The abbreviation 'inf' stands for infinitive verbal complementation. 'inf' can appear together with a preposition (e.g.</Paragraph> <Paragraph position="3"> 'nezV+inf'), but it happens very rarely in Czech.</Paragraph> <Paragraph position="4"> a0 Construction with adjectives. Abbreviation 'adjdigit' stands for an adjective complementation in the given case, e.g. adj-1 (Ci'ti'm se slaby' - I feel weak). a0 Constructions with 'by't' . Infinitive of verb 'by't' (to be) may combine with some of the types above, e.g.</Paragraph> <Paragraph position="5"> by't+adj-1 (e.g. zda' se to by't dostatecVne' - it seems to be sufficient).</Paragraph> <Paragraph position="6"> a0 Part of phraseme. If the set of the possible le null xical values of the given complementation is very small (often one-element), we list these values directly (e.g. 'napospas' for phraseme 'ponechat napospas' - to expose).</Paragraph> <Paragraph position="7"> If no forms are listed explicitly for a frame slot, then the list of possible forms implicitly results from the functor of the slot according to the following (yet incomplete) lists:</Paragraph> <Paragraph position="9"> 9Note: form 'zda' is in fact an abbreviation for couple of conjunctions 'zda' and 'jestli'.</Paragraph> <Paragraph position="10"> Within the FGD framework, valency frames (in a narrow sense) consist only of inner participants (both obligatory10 and optional, 'obl' and 'opt' for short) and obligatory free modifications; the dialogue test was introduced by Panevova' as a criterium for obligatoriness. In VALLEX 1.0, valency frames are enriched with quasi-valency complementations. Moreover, a few non-obligatory free modifications occur in valency frames too, since they are typically ('typ') related to some verbs (or even to whole classes of them) and not to others. (The other free modifications can occur with the given verb too, but are not contained in the valency frame, as it was mentioned above (Sec. 3.7) ) The attribute 'type' is attached to each frame slot and can have one of the following values: 'obl' or 'opt' for inner participants and quasi-valency complementations, and 'obl' or 'typ' for free modifications.</Paragraph> <Paragraph position="11"> Some slots tend systematically to occur together. For instance, verbs of motion can be often modified with direction-to and/or direction-through and/or directionfrom modifier. We decided to capture this type of regularity by introducing the abbreviation flag for a slot. If this flag is set (in the VALLEX 1.0 notation it is marked with an upward arrow), the full valency frame will be obtained after slot expansion.</Paragraph> <Paragraph position="12"> If one of the frame slots is marked with the upward arrow (in the XML data, attribute 'abbrev' is set to 1), then the full valency frame will be obtained after substituting this slot with a sequence of slots as follows:</Paragraph> <Paragraph position="14"> 10It should be emphasized that in this context the term obligatoriness is related to the presence of the given complementation in the deep (tectogrammatical) structure, and not to its (surface) deletability in a sentence (moreover, the relation between deep obligatoriness and surface deletability is not at all straightforward in Czech).</Paragraph> <Paragraph position="15"> In VALLEX 1.0, frame attributes (more exactly, attribute-value pairs) are either obligatory or optional. The former ones have to be filled in every frame. The latter ones might be empty, either because they are not applicable (e.g. some verbs have no aspectual counterparts), or because the annotation was not finished (e.g. attribute class (Sec. 3.15) is filled only in roughly one third of frames).</Paragraph> <Paragraph position="16"> Obligatory frame attributes: a0 gloss - verb or paraphrase roughly synonymous with the given frame/meaning; this attribute is not supposed to serve as a source of synonyms or even of genuine lexicographic definition - it should be used just as a clue for fast orientation within the word entry! a0 example - sentence(s) or sentence fragment(s) containing the given verb used with the given valency frame.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 3.14 Control </SectionTitle> <Paragraph position="0"> The term 'control' relates in this context to a certain type of predicates (verbs of control)11 and two correferential expressions, a 'controller' and a 'controllee'. In VALLEX 1.0, control is captured in the data only in the situation where a verb has an infinitive modifier (regardless of its functor). Then the controllee is an element that would be a 'subject' of the infinitive (which is structurally excluded on the surface), and controller is the co-indexed expression. In VALLEX 1.0, the type of control is stored in the frame attribute 'control' as follows: a0 if there is a coreferential relation between the (unexpressed) subject ('controllee') of the infinitive verb and one of the frame slots of the head verb, then the attribute is filled with the functor of this slot ('controller'); null 11Note on terminology: in English literature the terms 'equi verbs' and 'raising verbs' are used in a similar context.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 3.15 Class </SectionTitle> <Paragraph position="0"> Some frames are assigned semantic classes like 'motion', 'exchange', 'communication', 'perception', etc.</Paragraph> <Paragraph position="1"> However, we admit that this classification is tentative and should be understood merely as an intuitive grouping of frames, rather than a properly defined ontology.</Paragraph> <Paragraph position="2"> The motivation for introducing such semantic classification in VALLEX 1.0 was the fact that it simplifies systematic checking of consistency and allows for making more general observations about the data.</Paragraph> <Paragraph position="3"> 3.16 Aspect, Aspectual Counterparts Perfective verbs (in VALLEX 1.0 marked as 'pf.' for short) and imperfective verbs (marked as 'impf.') are distinguished between in Czech; this characteristic is called aspect. In VALLEX 1.0, the value of aspect is attached to each word entry as a whole (i.e., it is the same for all its frames and it is shared by the lemma variants, if any).</Paragraph> <Paragraph position="4"> Some verbs (i.e. informovat - to inform, charakterizovat - to characterize) can be used in different contexts either as perfective or as imperfective (obouvidova' slovesa, 'biasp.' for short).</Paragraph> <Paragraph position="5"> Within imperfective verbs, there is a subclass of of iterative verbs (iter.). Czech iterative verbs are derived more or less in a regular way by affixes such as -va- or -iva-, and express extended and repetitive actions (e.g. cVi'ta'vat, chodi'vat). In VALLEX 1.0, iterative verbs containing double affix -va- (e.g. chodi'va'vat) are completely disregarded, whereas the remaining iterative verbs occur as aspectual counterparts in frame entries of the corresponding non-iterative verbs (but have no own word entries, still).</Paragraph> <Paragraph position="6"> A verb in its particular meaning can have aspectual counterpart(s) - a verb the meaning of which is almost the same except for the difference in aspect (that is why the counterparts constitute a single lexical unit on the tectogrammatical level of FGD; however, each of them has its own word entry in VALLEX 1.0, because they have different morphemic forms). The aspectual counterpart(s) need not be the same for all the meanings of the given verb, e.g., odpoveVdeVt is a counterpart of odpovi'dat - to answer, but not of odpovi'dat - to correspond. Therefore the aspectual counterparts (if any) are listed in frame attribute 'asp. counterparts' in VALLEX 1.0. Moreover, for perfective or imperfective counterparts, not only the lemmas are specified within the list, but (more specifically) also the frame numbers of the counterpart frames (which is of course not the case for the iterative counterparts, for they have no word entries of their own as stated above).</Paragraph> <Paragraph position="7"> One frame might have more than one counterpart because of two reasons. Either there are two counterparts with the same aspect (impf. pu@sobit and impf. zpu@sobovat for pf. zpu@sobit), or there are two counterparts with different aspects (impf. scha'zet, pf. seji't, iter. scha'zi'vat).</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 3.17 Idiomatic frames </SectionTitle> <Paragraph position="0"> When building VALLEX 1.0, we focused mainly on primary or usual meanings of verbs. We also noted many frames corresponding to peripheral usages of verbs, however their coverage in VALLEX 1.0 is not exhaustive. We call such frames idiomatic and mark them with label 'idiom'.</Paragraph> <Paragraph position="1"> An idiomatic frame is tentatively characterized either by a substantial shift in meaning (with respect to the primary sense), or by a small and strictly limited set of possible lexical values in one of its complementations, or by occurence of another types of irregularity or anomaly.</Paragraph> </Section> class="xml-element"></Paper>