File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-3132_metho.xml
Size: 12,700 bytes
Last Modified: 2025-10-06 14:13:02
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-3132"> <Title>SURFACE AND DEEP CASES</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> SURFACE AND DEEP CASES JARMILA PANEVOVA </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> In this paper we show the relation between the &quot;surface (morphological) cases&quot; and &quot;deep cases&quot; (participants), and the possible way to automate the creation of a syntactic dictionary provided with frames containing information about deep cases and their morphemic counterparts of particular lexical items (Czech verbs).</Paragraph> <Paragraph position="1"> Introduction In the project MATRACE I (MAchine TRAnslation between Czech and English) the first aim is to create two parallel text corpora (Czech and English), morphologically and syntactically tagged. Then it will be possible to use these corpora not only for creating an MT system but also for other linguistic research, needed e.g. for systems of NL understanding. For these purposes we try to make the syntactic representation &quot;broader&quot; so that the further work would be easier.</Paragraph> <Paragraph position="2"> I Project MATRACE, a research project of the Institute of Applied and Formal Linguistics and the Institute of Theoretical and Computational Linguistics, is carried out within the IBM Academic Initiative project in Czechoslovakia.</Paragraph> <Paragraph position="3"> In the syntactic representation of a sentence, based on dependency grammar, we will specify not only the dependency and syntactic roles of the modifications but also their underlying counterparts (i.e. &quot;deep cases&quot;). For this sort of tagging we need a dictionary with morphological and syntactic information, which consists of morphological paradigms of single words and their valency frames containing both syntactic and underlying roles of their members. As there is no such dictionary in machine-readable form we have to create it. Unfortunately we even cannot extract the words with their frames from an existing corpus as we are only creating it. What we have is a morphological dictionary, which is to be enriched by the syntactic information. The linguist adding this information should enter the surface frame and specify its underlying counterpart. We try to help him/her by automating the choice of the appropriate correspondence between &quot;surface&quot; and &quot;deep&quot; cases.</Paragraph> <Paragraph position="4"> In this paper we will concentrate on the problems of verb and its valency slots. The generalization of our method for nouns and adjectives will not be difficult as in many cases the syntactic frame of these words is just derived from the corresponding verb.</Paragraph> <Paragraph position="5"> AcrEs DE COLING-92, NANfES, 23-28 AO6-F 1992 8 8 5 l'Roc, ol: COLING-92, NANrEs, Au~3.23-28, 1992 Theoretical background Using the framework of the functional generative description (FGP, see Sgall et al. 1986), slightly simplified for the purpose of this paper, we distinguish two levels: a level of underlying structure (US, with the participants or &quot;deep cases&quot;) and a level of surface structure (SS, morphemic units as parts of this are used here). As for the modifications of verbs we distinguish inner participants and free modifications (see Panevov~ 19745). This can be understood as the paradigmatical classification of all possible verbal modifications. The other dimension of their classification (combinatoric or syntagmatic dimension) concerns their obligatoriness and optionality with the particular lexical item within the verbal frame. The verbal frame contains slots for obligatory and optional inner participants (which will be filled by the labels for &quot;deep cases&quot; and corresponding morphemic forms) and obligatory free modifications. The difference between an obligatory and optional participant is important for a parser, however, we will leave this dichotomy aside in this contribution.</Paragraph> <Paragraph position="6"> The following operational criteria for distinguishing between inner participants and free modifications are used: If the verbal modification can occur only once with a single verb token and if the governing verbs for a particular modification may be listed, the modification is considered as an &quot;inner participant&quot;. There are five participants: Actor, Objective, Addressee, Origin and Effect. The other modifications (Time, Locative, Direction, Aim, Reason, Instrument, Regard, Manner etc.) can reoccur with a single verb token and may modify any verb.</Paragraph> <Paragraph position="7"> With some verbs free modifications can also enter the respective verb frame: either the construction is ungrammatical without them (to behave HOW, to last HOW LONG, to live WHERE etc.) or they are semantically obligatory, although they can be omitted on the SS level. This can be tested by a dialogue of the following type: A. My friend came.</Paragraph> <Paragraph position="8"> B. Where? A. *I don't know.</Paragraph> <Paragraph position="9"> Unacceptability of the answer &quot;I don't know&quot; indicates that the modification where is a part of a verbal frame of the verb to come.</Paragraph> <Paragraph position="10"> According to the theory proposed by Panevov~ (1974-5, esp. SS 5) the following consequences are accepted here: If a verb has only one inner participant then this participant is Actor. If a verb has two participants then these are Actor and Objective. As fo~ the l&quot;and 2 ~ participant our approach is similar to Tesni~re's (1959). However, if three or even more slots of a verbal frame are occupied then semantic considerations are involved. This is different from Tesni~re's solution and does not fully coincide with Fillmore's proposals (Fillmore 1968, 1970).</Paragraph> <Paragraph position="11"> Determining the Addressee, Origin and Effect is rather difficult and requires taking into account the combination of surface cases in the frame (including the form of the Objective), the animacy of single members of the frame etc. Though there is no one-to-one mapping between &quot;deep cases&quot; and &quot;surface cases&quot;, we are able to discover certain regularities and provide some generalization reflected in an algorithm.</Paragraph> <Paragraph position="12"> Observation In inflectional languages with (morphological) cases it is apparent that some cases are typical for certain participants.</Paragraph> <Paragraph position="13"> Objective is typically realized AcrEs DE COLING-92. NANTES, 23-28 AOtn&quot; 1992 8 8 6 P~OC. OF COLING-92, NANTES, AUG. 23-28, 1992 as the Accusative and Addressee as the Dative case. in Czech there are other typical (prepositional) cases. Thus z+Genitive (out of sb, st) or od+Genitive (from sb, st) ar~ typical for Origin, ha+Accusative (at st), do+Genitive (to st) or v+Accusative (into sb, st) are typical for Effect etc. This well known fact led us to the idea of creating a program as a tool for in~ troducing verbal frames (to be used even by researchers without deep linguistic training) based on correspondences between sur~ face and deep caseE;. At f~rst we sorted the Czech v~rb~ into four groups: i. Verbs without Nominative in their frames.</Paragraph> <Paragraph position="14"> Examples: pr~i \[(it) rains\] hudl mi (Act (Dat) ) v hlav~ \[(it) is buzzing to me in head\] (my head is buzzing) This group contains verbs with empty frames but also a few verbs with very untypical frames. If the frame contains only one par~ ticipant, then this is obviously an Actor. if there are at least two participants in the frame and one of them is Dative, then this is the Actor. If, beside this, only one more participant occurs in the frame, it is necessarily the Objective. All other verbs must be treated individually by a linguist as a kind of excep~ tion.</Paragraph> <Paragraph position="15"> \[from a seed grew a tree\] to(obj (Nora)) se mi (Act (Dat) ) libl \[it to me appeals\] (I like it) Ma~.</Paragraph> <Paragraph position="16"> vy ~.</Paragraph> <Paragraph position="17"> Accoi'diil%( to the the~)~'y, if the frame contains'; only one participant, it is Actor,. if it contains two part~cipants~ one of them is Actor and the othe~: is Objective. Nominative usually represents the Actor but there is an exception to this rule: if the other par~ ticipant is in Dative, then this participant is the Actor and the Nominative represents the Objective. Reasonability of this exceptiot |call be proved by translating particular verbs into other languages, ~n which the surface frames are different while there is no obvious reason why the deep frames should differ~ Thus e.g. the verb libit se has Nominative/Clause and Dative in its surface frame while in the frame of the corresponding English verb to like there are Sub-ject and obj cot/clause, where subject corresponds to Czech Dative and object to Nominative.</Paragraph> <Paragraph position="18"> 3. Verbs with Nominative and two or more other inner participants, which occur only in &quot;typical&quot; cases (i.e~ Accusative, Dative, z+Genitive, od+Genitive, na+Accusative, do4Accusative, v+Accusative) o A verb belongs to this group even if some of the slots for inner participants can be occupied either by a typical case or any other (prepositional) case o~- a clause or infinitive.</Paragraph> <Paragraph position="19"> In this group Nominative always represents Actor but for determining other participants it is necessary to take into account an additional aspect, namely the prototypical character of the animacy of the participants; this enables us to distinguish the difference between deep frames of the two last examples jmenovat and obklopit. The surface frames are identical: Nominative, Accusative and Instrumental, but while the verb jmenovat has Accusative standing for the Objective and Instrumental for the Effect, the verb obklopit has Accusative standing for the function of Addressee and In@trumental for the function of Objective.</Paragraph> <Paragraph position="20"> Algoritbmisation The algorithms for the verbs of the first two groups were described in the previous paragraph. null The possible algorithmization of determining the correspondences between &quot;surface&quot; and &quot;deep&quot; cases of the verbs of the last two groups can be seen from the following table of several We can see that the prepositional cases &quot;typical&quot; for Origin occur only in the position of Origin, and Dative occurs only in the position of Addressee. After these members of the surface frame are determined, in most cases only one undetermined participant remains, which must be Objective. If two or three participants are remaining we have to take into account the animacy ACRES DE COLING-92, NANTES, 23-28 AOtlr 1992 8 8 8 PROC. OF COLING-92. NANTES, AUG. 23-28, 1992 (typical for Addressee) and inanimacy of the participants and the set of prepositional cases which are typical for Effect.</Paragraph> <Paragraph position="21"> This algorithm is used in a program which reads Czech verbs from an input file and asks a linguist (in the interactive regime) to fill in the surface verbal frame.</Paragraph> <Paragraph position="22"> conclusions Some general linguistic statements concerning relations between &quot;centre&quot; (prototypes) and &quot;periphery&quot; (marginality) in the domain of verb and its valency could be inferred from an application of the rules presented in our paper. In &quot;nominative&quot; languages the verbal frame ~t Obj Addr can be considered as central (while e.g. Aat (Obj) Addr is not typical). Moreover, the correspondences between US and SS as Act -> Nom, Obj -> Ace, Addr -> Dat can be treated as prototypes (while e.g. correspondences Act -> Datr Addr -~ Ace, Obj -> Instr occur in Czech as marginal). The strategy of our algorithm is based principally on an observation of this type. We assume that this method can be easily adapted for any other inflectional language and perhaps also for such languages as English. Languages may differ as to correspondences between a particular deep case (US) and its surface (morphemic form), but the idea of prototypical and marginal relations seems to be valid and is supported by the algorithmic procedure for determining these correspondences. null</Paragraph> </Section> class="xml-element"></Paper>