File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-2001_metho.xml
Size: 23,847 bytes
Last Modified: 2025-10-06 14:12:24
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-2001"> <Title>A Grammar Combining Phrase Structure and Field Structure</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Motivations for field struc- </SectionTitle> <Paragraph position="0"> ture Descriptive grammatical works on Germanic languages often refer to notions such as field and .schema, some early, major works being \[Dider46\] and \[Drach373. Recently, \[H.ue87\] and \['Ibgeby88\] have argued that field grammars in Diderichsen's tradition are useful for the computational analysis of Dan-. ish. If they are right, the same is obviously true for the other Scandinavian languages and possibly other Germanic languages as well.</Paragraph> <Paragraph position="1"> A major motivation for field structure in the Scandinavian languages is the correlation between the position era constituent and grammatical flmction. For instance, a NP occuring after the finite verb but before: the sentence adverbs, i.e. in the field that Diderichsen termed Ncksusfclt, is a subject, while NPs appearing after the sentence adverbs, in the Indholdsfelt (content field) are objects. As main clauses have no surface VP-node consisting solely of the verb and its complements, a configurational definition of subject,; and objects is less appealing.</Paragraph> <Paragraph position="2"> There is a correlation also between positions and thematic functions, the classical example being Diderichsen's l'~undament (foundation), the po-sition before the finite verb which holds thematically prominent constituents of various kinds.</Paragraph> <Paragraph position="3"> A second motivation is that the word order regularities of' clause types are better described if we haw: access to field structure. In a phrase st, ruct, m'e grammar we either have to write a separate rt~le, or rule schema, for each clause type, or else introduce pewerflfl rules such as transforn-mtions or recta-rules to, capture differences and similarities. Field st.rtact.ure can be used to express the common traits directly: the schema in figure 1 apply to virtually all Swedish clause types.* Moreover, variation can be accounted for in terms of constraints on what may occur i',~ the fields and such constraints may be expressed Ly regular expressions. 'Fhvs, the il~corporation of field structure to a formalism does not add to its co~Hputational complexity.</Paragraph> <Paragraph position="4"> 2 Field structure vs. phrase structure It is obvious that schemas such as that of figure 1 can be defined by context-free rewrite rules each of which specifies a number of subfield-relations and a sequ~ a~tial order for the subfields. ~l'he rules below togelher define the schema in figu,'e l, which we name PS.</Paragraph> <Paragraph position="6"> The simplest way of formalizing a field grammar is to define an appropriate set of' rules of this kind and, if we want to derive a functional structure, associate the rules and lexical er, tries with l'uncLio~.lal information. This is essentially the approach taken by \[RueS7\] and by \[Togeby88\]. As a resulL the field notion is merged with the notion of constituerlt. II.</Paragraph> <Paragraph position="7"> is indeed often said that an advantage of l)idcrichsen's analysis is that it offers a 10ett.er coJ~stituem. analysis of Danish than, say, traditio~ml TC,. Tlds is not so, however. On the contrary, it. is one of the</Paragraph> <Paragraph position="9"> \[Telem72,Braunm86\]). Instead field structure is better conceived of as a level that accounts for the linearization of independently defined constituents.</Paragraph> <Paragraph position="10"> While such a conception of field structure is more restricted, it is more motivated and equally amenable to formalization. The formalism must deal with two types of information for a constituent, however, its category and the field(s) it occurs in. Also, we need to distinguish carefully the dominance relations for fields (supe,field) and categories (dominates) as they differ in their logical properties, ttere only two important differences will be noted: \[1\] Fields transmit expressions, categories don't. Given an expression, e, that is situated in a field, f, it will also be situated in every field that is a super field of f. Conversely, fields generally allow multiple occurrences of constituents (incl. none; cf. figure 1), while categories categorize exactly one constituent at a time. \[2\] The supetfieldrelation is non-recursive, which means that schemas have a finite number of elements. The dominatesrel-:~tion, on the other hand, allows recursiou in the usual way.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Field-and-Category Gram- </SectionTitle> <Paragraph position="0"> mars Field-and-Category Grammars (henceforth FCG) may, with fimctional schemas included, be regarded as a generalization of Lexical-Functional Grammars \[LFG82\]. There is then the usual division between two structural levels, but as the surface level includes information about the position of a constituent in a relevant schema, and not just category, we refer to it as a topological structure, or t-structu,~. For tire purposes of this paper the f-structure may be taken as in LI?G.</Paragraph> <Paragraph position="1"> A t-structure for a simple sentence is illustrated in figure 2. The rules necessary for the generation of t-structures form a Basic FCG.</Paragraph> <Paragraph position="2"> A schema is defined as a maximal field. A position is a terminal field. An identifier position is a position that admits an identifier of a phrase, such as a lexical head.</Paragraph> <Paragraph position="3"> Categories are ordered by a subsumption relation. An actual category is a category that does not subsume any other category; an abstract category is one tIann du inte traffa Peter? (Didn't you manage to see Peter?). Nodes are labelled CEp, where (3 indicates the category and p the position of the dominated string.</Paragraph> <Paragraph position="4"> that does. Abstract categories express what is common to a class of actual categories.</Paragraph> <Paragraph position="5"> A configuration is a triple \[D, C, p\] where I) is an actual category, C is an actual category or a word, and p is a position. A configuration corresponds to a branch of a local tree of a t-structure. D is the category of the dominating node, C the category of ~t dominated node and p the position of the latter in the schema associated with the former. Conversely, a local tree can be represented as a multiset of configurations that have a common first element. For instance, the top local tree of figure 2 corresponds to the set {\[PolS, V, v\], \[PolS, NP, nex\], \[PolS, SeX, nex\], \[PolS, InfS, comp\]}. Multisets are required in the general case as there may be several daughter.~ with the same category and position.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Rule types </SectionTitle> <Paragraph position="0"> 3.1.1 Field structure rules Field structure rules define the internal structure of fields in terms of subfields. In addition, they assign each position an occurrence index, stating the maximum number of fillers it takes. I will write p* to indicate a position with any number of fillers, pn for a position with a maximum of n fillers, (p) for a position with one optional filler, and ~imply p for a position with one obligatory filler. The rules in (1) may be expanded as below, where a simplified rule for the noun phrase schema is also stated.</Paragraph> <Paragraph position="1"> p nmst not be empty p must be empty p must conttdn an A p must, contain word w p may only contain an A p may only contain word w p must; not contain an A Category definitions define necessary properties of categories. They may be written as 4-tuples (C, C', '\].', F) where C is defined as a subcategory of C' meeting the topological constraints 'F, and the functional constraints F.</Paragraph> <Paragraph position="2"> Basic topological constraints state what must or may oC/co_r in a specific position. A list of basic topological constraints is found in table 1. The element rl' is a conjunction of such b~sic constraints, or a schema symbol. In the latter case the definition includes a category-schema association, which says that the category, and, by implication, all its subcategories, are linearized by the named schema. The other constraints give information about what occurs in specific positions of that schema.</Paragraph> <Paragraph position="3"> 'I'he functional constraints are written as conjunctions of attribute-wdue assignlnents and value constraint,< A single attribute name indicates that this attribut.c&quot; must have a value at Lstruct.ure. Some examples of category definitions are given below. 'lbgether they define an inheritance hierarchy of constituent categories, where properties of a category high up in the hierarchy are shared by categories below it. Topological properties that are common to a set of actual categories are expressed at their common ancesto,'s, and, in particular, by tile common schema they inherit.</Paragraph> <Paragraph position="5"> (I'olS) is detined as a verb-first clause (V1S), which in (def3) is deiined as a main clause (MainS), which in turn is defined as a clause (S). Being a clause it. is linearized by E according to (csal) and its f-structure must have a subject, a sernantic form and a verbal property. Being a main clause it has a verb in position v (defl). Being a verb-first clause it has an empty foundation. In distinction to other verb-first.</Paragraph> <Paragraph position="6"> clauses it has a finite verb form, and an expressed subject in position nex.</Paragraph> <Paragraph position="7"> While category definitions state what nmst hold oF a given category, configuration rules state what may hold of any category. Each configuration of the language is defined by some configuration rule. A configuration rule may be written as a list of the form (CS, F, i) where CS is a description of a set of COllIigurations, F is a conjunction of functional constraints and i is an occurrence index. We take advantage of the category hierarchy and use abstract categorizes ill the description of configuration sets. Three illust rations are given below: confl: (\[S, Nl', l&quot;\], ISUBJ=I, l) conf2: (IS, NI', nex\], \[SIIB.\]=I, 1) confa: (Is, SA, nex\], 1=t, *) The arrows, I anil .L are used ms in I, FC;: 'l'lJ(: up-arrow identifies the f-structure of the donlinating node of the configuration, whereas the down-arrow identifies the f-structure of the dominated aode.</Paragraph> <Paragraph position="8"> The first two rules state the possible sub.\]ect configurations of Swedish. They apply t.o aI! su\]~ categories S and NP, unless this is contradicting ~h,' definitions of these categories. For instance, (conf!) does not apply to a V1S as defined in (de\[&quot;/).</Paragraph> <Paragraph position="9"> The last two rules both define fillers of position 'nex' without ordering them. The third rule detilJes an iterative configuration, as indicated by its occurrence index. Thus, the subject is allowed to take., diL ferent relative positions w r t the sentence adw~'rbs in agreement with the facts illustrated in (,,I)-((3). ll! this way fields serve to define bol'ders %r l<)ca\] word or(le,' variation.</Paragraph> <Paragraph position="10"> (4) I natt var katten hog inte ute.</Paragraph> <Paragraph position="11"> lasl-nighl was lhe-cal hog nol o~tl &quot;Probably, the cat wasn't outdoors last night&quot; (5) I natt var hog katten inte ute last-night was nog the-cat 7~.ol o~lt (6) I natt var nog inte katteu ute.</Paragraph> <Paragraph position="12"> A lexicM rule may be written on the form (w, C, T, F) where the lexical item w is assigned a category, a (usually null) topological constraint and some timetional information. Three illustrations are given in In order to be well-formed an expression of a FCG must have both a well-formed t-structure and a well-formed Lstructure. We omit the requirements of well-formed Lstructures as they can be taken to coincide with those of a LFG.</Paragraph> <Paragraph position="13"> A topological structure, T, is welt-formed according to a FCG, G, ifr the following condition holds: (i) Each node of T is assigned an actual category and every node apart, from the top-node is assigned a position; (ii) Any local tree, L, of T, with top-node category, C, satisfies the following conditions: (a) for each branch of L there is a configuration rule, or a lexical rule, in G that licenses it; (b) if C is non-terminal, there is a schema, ~r, associated with C, such that the sequential order of the branches is in agreement with the sequential order of their positions in c~; (c) all restrictions on o- imposed by C in its definition are satisfied by L.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Properties of Basic FCGs </SectionTitle> <Paragraph position="0"> By removing all functional information from a FCG we obtain a Basic FCG. It is the Basic FCG that is responsible for the expression of dominance and precedence relations in the grammar, i.e. it has the same role as the phrase-structure rules of a LFG.</Paragraph> <Paragraph position="1"> This section is concerned with some interesting properties of Basic FCGs. First I show that a Basic FCG is weakly equivalent to a context-fi'ee grammar.</Paragraph> <Paragraph position="2"> Let G be a Basic FCG. Let A he the set of actual categories, Z the set of schemas, and P the set of positions, all finite sets. For any CEA let L(C) denote the set of strings dominated by C. The language of G, L(G) is defined as a union of such sets for some suitable subset A' C A, e.g. by the set of subcategories of S.</Paragraph> <Paragraph position="3"> Let W be the set of words that occur in configuration rules and category definitions. Let K be the set AUW.</Paragraph> <Paragraph position="4"> For any a ES we may, by expansion of the relevant field structure rules, derive a positional structure for c,. Call this structure %. For instance, from (2) we may derive a positional structure e>2: (F) (v) nex* (v') obj 2 pobj* (comp) adv* A positional structure can be given the form of a regular expression over P. This is guaranteed, since fields are non-recursive objects.</Paragraph> <Paragraph position="5"> Let D he any actual category that is linearized by ~, and let p be a position that occurs in co. The category definitions associate with D and p a conjunction of topological conditions, Dp,r, where each conjunct has one of the forms in table 1.</Paragraph> <Paragraph position="6"> For given D and p the configuration rules allow us to derive the constituent strings that may occur in p under D. There is only a finite number of applicable configuration rules. Each rule gives a disjunction of actual categories and an occurrence index for that disjunction. If all occurrence indices are finite, or if the occurrence index of p is finite, the constituent strings may be represented by a finite language over K. If some occurrence indices are '*', and p itself he, s occurrence index '*', we may first form a finite sublanguage over K that represents all strings ofnol> iterative constituent categories, and then extend it by introducing the iterative constituents. In either case, the result is a regular language over K. We call this language Lu,p.</Paragraph> <Paragraph position="7"> For instance, assuming that (confl) and (conf2) are the only rules pertaining to position nex, and that NP has three actual subcategories, CNP, PNP and ProNP, we have Lpots,n~ = Ls,n~ = SA*(CNP + PNP + ProNP)SA*.</Paragraph> <Paragraph position="8"> Given LD,v we want to derive the sublanguage of constituent strings that satisfy Dp,~. Call this language LD,v,r. Consider first the primitive cases: 1. If Dp,r = eEp then Lu,p,r = {el.</Paragraph> <Paragraph position="9"> 2. If Dp,T = xEp then Lu,v,T = Lu,p-{e}.</Paragraph> <Paragraph position="10"> 3. If Dp,r = ACp where A is actual, then LD,p,r ---- LD,pNK*AK*.</Paragraph> <Paragraph position="11"> 4. If Dp,r = AEp where A is abstract, then LD,v,r = LD,pN(K*A1K*t..J... UK*AnK*) where A1, ..., A,~ are actual subcategories of A.</Paragraph> <Paragraph position="12"> 5. If Dp,, = (A)Ep where A is actual, then LD,p,~ = Lu,vM(K*AK*U{e}).</Paragraph> <Paragraph position="13"> 6. If Dp,~ = (A)Ep where A is abstract, then LD,p,r = LD,/~(K*A1K*U'&quot; UK*A,,K*U{e}), where A1, ..., A,, are actual subcategories of A.</Paragraph> <Paragraph position="14"> 7. If Dv,~ = A'Ep then Lu,p, r = Lu,/I(K*-K*AK*) 8. If Dv,~ = wEp then LD,p,~ = LD,pV1K*wK*.</Paragraph> <Paragraph position="15"> 9. If g,,, = (w)ep then LD,v,r = LD,pN(K*wK*U{e}).</Paragraph> <Paragraph position="16"> In all cases LD,p,r is a regular set. As Dp,r in the general case is a conjunction of such primitive constraints, it follows that LD,p,~, will always be a regular set over K.</Paragraph> <Paragraph position="17"> Let LD be the totality of constituent strings that D may dominate. Then LD is obtained by substitution 4 4 of L.o,p,~ for p in e(,. As the class of regular sets is closed under substitution, LI) will also be a regular set over K. As D itself may occur in I,D, we may have recursive categories in I,(D), however. In any case, L(D), and by implication, L(G), is a context-free language.</Paragraph> <Paragraph position="18"> It is interesting to note that many simple context-free languages cannot be given a simple Basic FCG. l&quot;or example, if a certain ca.tegory, C, takes one obligatory daughter, II, and two optional daughters A, B, according to the the Cl&quot;-grantmar G1, there is no Basic FOG for L(G1) that has C as an actual category. null 1t&quot; there is such a I'~CC,, it. must employ at least /hree positions, since otherwise alternative orders must be allowed. Ttms it takes three conliguratiol~ rules pertaining to three diffc.rent positions to account for lhe string \[A 1I B\]. But as these are independent the strings \[A tl\] and \[11 B\] ca,, also be generated, contradicting the assun~ption.</Paragraph> <Paragraph position="19"> In a Basic I&quot;CG a category I)ehaving as (2: in C,I must be abstract and its diff(.'rei~t realizations must be divided among a i~tlHlt)er of actual sul~cv.teg(;ries. A Basic FCG weakly eq~livaleut t.o G1 is (i;2:</Paragraph> <Paragraph position="21"> What languages can FCGs describe well', e Intuitively it. seems that complex coJlstituents that share a set of potential (lat~ghters should obey the same constraints as regards their relative order and occurrence. In particular, the occurrence of one daughter should be independent of the occurrence of other daughters. Where there isa difference in these prol)ertie.s, there must be a categorial distinction in the grammar, as the example abow? illustrales. We may call this property catcgo'ry-dcpendeT~l fi:ccd emoting.</Paragraph> <Paragraph position="22"> it, see.ms, however, that thi'~ property is significant for natural languages, al leasl \[or those, like the Germanic languages, t.hat distinguish clause t.ypes on topological grounds.</Paragraph> <Paragraph position="23"> 5 Field structure and partial orderings If the (surface) syntactic structures of'a natural language are specified by means of a context-free gl'all> mar a.s in LI,'G, there is no chance of expressing a~ly generalizations pertaining to word order. I,I;'G admits a number of notational devices to facilitate the writing of c-structure rules, but has made few claims about possible word order restrictions. \[GPSG85\], on the other hand, makes the strong claim that natural languages obey Exhaustive Constant Partial Ordering (ECPO), i.e. that the possible linearizations of a set. of sister constituents are the same in any local tree irrespective of the categories of the mother aud other sisters. Such linearizations are expressed by means of partial orderings, or LP-rules, of the fern\ A<B, It is obvious that this assumption is more nat urally made in a framework that works with local trees Ihat have only two or three branches than in a framework which employs fiat structures, t:'or instance, the existence of unmarked and inverted clauses is not contradicting the FCPO-hypothesis, if the sub-ject is regarded ~Ls a sister of the finite verb ouly in the inverted case. llowever, there are constructions that speak against it. as a universal, such as t.he order of object and verb in German main and subof dirlateclauses: Ich kauflc ein Auto (I bm,ght a cat') vs. lc\]~ babe ei~..4,~to flckaufl (i have a car bough1 :-1 have I)ought a cat'), and the order of verb partici-. pies and their complements in Swedish predicative and attributive constructions: Rapporlen dr&quot; bcatdlhl av Bor.q ('Fhe report is ordered by Borg) vs. De~ av Borg beslMlda rapporten (The by Borg ordered report = The report that Borg ordered). These constructions are not problematic for FCGs, however, although they necessitate a categorial split.</Paragraph> <Paragraph position="24"> Although the number of categorial spli{s can bc many in a FCC;, one would not like tim number of schemas t.o 1oe very high. For a language like Swedish it seems possible to limit tl,e descriptioJ, to five schemas, one for each type ot' pvojectiotl (V, N, A, t )) and one for coordinated structures \[Ahrenb89\]. LP-rules are used also in franteworks which do not subscribe to the ECPO-property, such as IIPSG \[PolSag87\]. llowever, they need to be colnplemented by something, as they miss an important aspect of word order. As they apply to sister constituents, they fail to give any information on the position of a daughter relative to the phonological span of the mother. For instance, as a speaker of English I kt,ow that the definile article appears at the very beginning of an N1 ) and that relative clauses appear at the end. Given a set of IA~-rules ordering detcrmilLers, relative clauses and other NP-constituents we may possibly infer this information, but this is a roundabout way of doing it.. To express such facts dire.ctly we need a device that will impose a sequential strut5 5 ture on phonological spans, and it is tbr this purpose that the topological schema is useful.</Paragraph> <Paragraph position="25"> On the other hand partial orderings seem better suited to describe category-independent word order regularities. Consider the case of complements to a head. In the Germanic languages the norreal order would be the one expressed in (10): NPcomplements precede PP-complements which precede verbal complements whatever the category of the head \[GPSG85, p. 110\].</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> (10) NP-~ PP-~ VP </SectionTitle> <Paragraph position="0"> The rule in (2) defining the complement field (ObjF), repeated here for convenience, specifies three positions, one for bare objects, one for prepositional objects and one for verbal and adjectival complements. null ObjF --+ obj 2 pobj* (comp) Even if we could appeal to the same or a similar field structure rule in the case of complements to the a.djective, it seems natural in this case to explain the ordering in terms of the difference in category between different complements. Thus, with the introduction of (1O) ObjF could be regarded as at position, i.e. as a ternfinal of the schema in figure 1.</Paragraph> <Paragraph position="1"> Note however that in a FCG LP-rules receive a slightly different interpretation. They apply to positions rather than to local trees.</Paragraph> </Section> class="xml-element"></Paper>