File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1034_metho.xml
Size: 22,253 bytes
Last Modified: 2025-10-06 14:14:05
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1034"> <Title>S I S Vr N VO CI VO Figure 3 Figure 4 s N Vr /\</Title> <Section position="4" start_page="194" end_page="194" type="metho"> <SectionTitle> 3 Existing solutions </SectionTitle> <Paragraph position="0"> A few solutions have been proposed for the problems described above. Solutions to the redundancy problem make use of two tools for lexicon representation : inheritance networks and lexical rules. Vijay-Shanker and Schabes (92) have first proposed a scheme for the efficient representation of LTAGs, more precisely of the tree schemata of an I.TAG. They have thought of a monotonous inheritance network to represent the elementary trees, using partial descriptions of trees (Rogers and Vijay-Shanker, 92 and 94) (see section</Paragraph> <Section position="1" start_page="194" end_page="194" type="sub_section"> <SectionTitle> 4.1 for further detail). They also propose to use </SectionTitle> <Paragraph position="0"> &quot;lexical and syntactic rules&quot; to derive new entries.</Paragraph> <Paragraph position="1"> The core hierarchy should represent the &quot;canonical trees&quot;, and the rules derive the ones with redistribution of the functions of arguments (passive, dative shift...) and the ones with extracted argument Becker (93; 95) also proposes a hybrid system with the same dichotomy : inheritance network for the dimension of canonical subcategorization frame and &quot;meta-rules&quot; for redistribution or extraction (or both). The language for expressing the meta-rules is very close to the elementary tree language, except that meta-rules use meta-variables standing for subtrees, lle proposes to integrate the meta-rules to the XTAG system which would lead to an efficient maintenance and extension tool.</Paragraph> <Paragraph position="2"> (Evans et al., 95) have proposed to use I)ATR to represent in a compact and efficient way an I,TAG for English, using (default) inheritance (and thus full trees instead of partial descriptions) and lexical rules to link tree structures. They argue the advantage of using ah:eady existing software. But some information is not taken into account : the lexical rules do not update argument index. For instance the dative shift rule for English changes the second complement - the PP - into a NP, which is not semantically satisfying. The passive rules simply discards the first complement (representing the canonical direct objet), the other complements moving up. But then the relation between the active object and the passive subject is lost.</Paragraph> <Paragraph position="3"> The three cited solutions give an efficient representation (without redundancy) of an f.TAG, but have in our opinion two major deficiencies.</Paragraph> <Paragraph position="4"> First these solutions use inheritance networks and lexical rules in a purely technical way. They give no principle about the form of the hierarchy or the lexical rules 2, whereas we believe that addressing the practical problem of redundancy should give the opportunity of formalizing the well-formedness of elementary trees and of tree families.</Paragraph> <Paragraph position="5"> And second, the ,wnerative aspect of these solutions is not developed. Certainly the lexical rules are proposed as a tool for generation of new schemata or new classes in a inheritance network.</Paragraph> <Paragraph position="6"> But the automatic triggering, ordering and bounding of the lexical rules is not discussed.</Paragraph> </Section> </Section> <Section position="5" start_page="194" end_page="195" type="metho"> <SectionTitle> 4 Proposed solution : efficient </SectionTitle> <Paragraph position="0"> representation and semi-automatic generation We propose a system for the writing and/or the updating of an \[,TAG. It comprises a principled and hierarchical representation of lexico-syntactic structures. Using this hierarchy and p,'inciples of well-formedness, the tool carries out all the relevant crossings of linguistic phenomena to generate the tree families.</Paragraph> <Paragraph position="1"> This solution not only addresses the problem of redundancy but also gives a more principle~based representation of an LTAG. The implementation of the principles gives a real generative power to the tool. So in a sense, our work can relate to (Kasper et al., 95) that describes an algorithm to translate a Head-driven Phrase Structure Grammar (I-\['PSG) into an LTAG. The inheritance hierarchy of tlPSG and its principles are &quot;flattened&quot; into a lexicalized formalism such as \[,TAG. The idea is to benefit from a principle-based formalism such as 1 IPSG and from computational properties of an I,TAG.</Paragraph> <Section position="1" start_page="195" end_page="195" type="sub_section"> <SectionTitle> 4.1 Hierarchical representation of an </SectionTitle> <Paragraph position="0"/> </Section> </Section> <Section position="6" start_page="195" end_page="197" type="metho"> <SectionTitle> LTAG </SectionTitle> <Paragraph position="0"> inheritance network, without meta-rules Like the solutions described in section 3, our system uses a multiple inheritance network. Yet, it does not use meta-rules. Though they could be a further step of factorization, it seemed interesting to &quot;get the whole picture&quot; of the grammar within the hierarchy, and not only the base trees.</Paragraph> <Paragraph position="1"> Further, we have chosen monotonic inheritance, especially as far as syntactic descriptions are concerned. Default inheritance does not seem to be justified to represent tree schemata, from the linguistic point of view. Default inheritance is often necessary to deal with exceptions. One may want to express generalizations despite a few more specific exceptions. Now the set of tree schemata we intend to describe hierarchically is empty of lexical idiosyncrasies, which are in the syntactic lexicon (cf. section 1). The set of tree schemata represents syntactic phenomena that are all productive enough to allow monotonicity. This resulting hierarchy will then be more transparent and will benefit from more declarativity.</Paragraph> <Paragraph position="2"> Technically, monotonicity in syntactic descriptions is allowed by the use of partial descriptions of trees (Rogers and Vijay-Shanker, 92; 94), as was proposed in (Vijay-Shanker and Schabes, 92) (see section 4.1.3).</Paragraph> <Paragraph position="3"> Section 1 briefly described the organization of an LTAG in families of trees. The rules for the organization of a family, its coherence and completeness, are flattened into the different trees. With the approach of an automatic generation of TAG trees, we have found necessary to explicit these rules, which are defined using the notions of argument and syntactic function.</Paragraph> <Paragraph position="4"> Following a functional approach to subcategorization (see for instance Lexical Functional Grammar, (Bresnan, 82)), we clearly separate the &quot;redistributions&quot; of syntactic functions of the arguments from the different realizations of a given syntactic function (in canonical, extracted, cliticized.., position). We intend the term redistribution in a broad sense for manipulation of the number and functions of arguments. It includes cases of reduction of arguments (e.g. agentless passive), restructuration (dative-shift for English) or even augmentation of arguments (some causative constructions 3, introducing an agent whose function is subject). Redistribution is represented in our system by pairing arguments and functions, and not in terms of movement So the proposed hierarchy of syntactic descriptions (for the family anchored by a verb) comprises the three following dimensions : 3We talk about some causative constructions analysed as complex predicates with co-anchors in French as in' Jean a fait s'assoir les enfants. *Jean made sit ihe children. (Jean made the children sit) dimension 1 : the canonical subcategorization h'ame This dimension defines the types of canonical subcategorization. Its classes contain information on the arguments of a predicate, their index, their possible categories and their canonical syntactic function.</Paragraph> <Paragraph position="5"> dimension 2 : the redistribution of syntactic functions This dimension defines the types of redistribution of functions (including the case of no redistribution at all). The association of a canonical subcategorization frame and a compatible redistribution gives an actual subcategorization, namely a list of argumentfunction pairs, that have to be locally realized. dimension 3 : the syntactic realizations of the functions It expresses the way the different syntactic functions are positioned at the phrase-structure level (in canonical position or in cliticized or extracted position). This last dimension is itself partitioned according to two parameters : the syntactic function and the syntactic construction.</Paragraph> <Paragraph position="6"> descriptions of trees The hierarchy is a strict multiple inheritance network whose terminal classes represent the elementary trees of the LTAG. These terminal classes are not written by hand but automatically generated following principles of well-formedness, either technical or linguistic.</Paragraph> <Paragraph position="7"> A partial description is a set of constraints that characterizes a set of trees. Adding information to the description reduces monotonically the set of satisfying trees. The partial descriptions of Rogers and Vijay-Shanker (94) 4 use three relations : left-of, parent and dominance (represented with a dashed line). A dominance link can be further specified as a path of length superior or equal to zero. These links are obviously useful to underspecify a relation between two nodes at a general level, that will be specified at an either lower or lateral level. Figure 3 shows a partial description representing a sentence with a nominal subject in canonical position, giving no other information about possible other complements. The link between the S and V nodes is underspecified, allowing either presence or absence of a cliticized complement on the verb. In the case of a clitic, the path between the S and V nodes can be specified with the description of figure 4. Then, if we have the information that the nodes labelled respectively S and V of figures 3 and 4 are the same, the conjunction of the two descriptions is equivalent to the description of figure 5.</Paragraph> <Paragraph position="8"> 4Vijay-Shanker & Schabes (92) have used the partial descriptions introduced in (Rogers & Vijay-Shanker, 92), but we have used the more recent version of (Rogers & Vijay-Shanker, 94). The difference between the two verskms lies principally in the definition of quasi-trees, first seen as partial models of trees and later as distinguished sets of constraints.</Paragraph> <Paragraph position="9"> In the hierarchy of syntactic descriptions we propose, the partial description associated with a class is the unification of the own description of the class with all inherited partial descriptions. As shown in the above example, the conjunctkm of two descriptions may require statements of identity of nodes. Rogers and Vijay-Shanker (94) foresee, in the case of an application to 'FAG, the systematic identity of lexical anchors. Further, Vijay-Shanker and Schabes (92) make also use of a particular function to state identity of argumental nodes. But this is not enough as one might need to state equality of any type of nodes (like the S nodes in the above example). To achieve this in our' system, one simply needs to &quot;name&quot; both nodes in the same way. dimension 1 Remember we talk about descriptions of trees. In these objects, nodes are referred to by constants. Two nodes, in two conjunct descriptions, referred to by the same constant are the same node, and two nodes referred to by different constants can either be equal or different. Equality of nodes can also be inferred, mainly using the fact that a tree node has only one direct parent node.</Paragraph> <Paragraph position="10"> We trove added atomic features associated with each constant, such as category, index, quality (i.e. foot, anchor or substitution node), canonical syntactic function and actual syntactic function.</Paragraph> <Paragraph position="11"> These features belong to the meta-formalism of I~TAG hierarchical organization. We will call them meta-features (as opposed to the features attached to the nodes of the TAG trees). In the conjunction of two descriptions, the identification of two nodes known to be the same (either by inference or because they have the same constant) requires the unification of such meta-features. Ira case of failure, the whole conjunction fails, or rather, leads to an unsatisfiable description.</Paragraph> <Paragraph position="12"> dimension 3 realization of syntaclic fullclions</Paragraph> <Section position="1" start_page="196" end_page="197" type="sub_section"> <SectionTitle> 4.2 Automatic generation of elementary trees </SectionTitle> <Paragraph position="0"> The three dimensions introduced in section 4.1.2 constitute the core hierarchy. Out of this syntactic database and following principles of well-formedness the generator creates elementary trees. This is a two-steps process : it first creates some terminal classes with inherited properties only - they are totally defined by their: list of super-classes. Then it translates these terminal classes into the relevant elementary tree schemata, in the XTAG 5 format, so that they can be used for parsing.</Paragraph> <Paragraph position="1"> The tree schemata are generated grouped in families. This is simply achieved by fixing a canonical subcat frame (dimension 1), associating XTAG (\[ amubek et al., 92) is a tool for writin~ and using LTAGs, including among other things a tree editor and a syntactic parser.</Paragraph> <Paragraph position="2"> with it all relevant redistributions (dimension 2) and relevant realizations of functions (dimension 3). At the development stage, generation can also be done following other criterions. For instance, one can generate all the passive trees, or all trees with extracted complements...</Paragraph> <Paragraph position="3"> The generation of elementary trees from more abstract data needs the characterization of what is a well-formed elementary tree in the framework of \[,TAG. The common factor to various expressions of linguistic principles made for \[,TAGs is the argument-predicate co-occurrence principle (Kroch and Joshi, 85; Abeill6, 91) : the trees for a predicative item contain positions for all its arguments.</Paragraph> <Paragraph position="4"> But for a given predicate, we expect the canonical arguments to remain constant through redistribution of functions. The canonical subject (argument 0) in a passive construction, even when unexpressed, is still an argument of the predicate. So the principle should be a principle of predicatefunctions co-occurrence : the trees for a predicative item contain positions for all the functions of its actual subcategorization. In the solution we propose, this principle is translated as : 1- subcat principle : a terminal class must inherit of a canonical subcategorization (dimension 1) and a compatible redistribution, including the case of no redistribution at all (dimension 2). This pair of super-classes defines an actual subcategorization.</Paragraph> <Paragraph position="5"> 2- completeness/coherence/unicity principle : the terminal class must inherit exactly one type of realization for each function of the actual subcategorization 6.</Paragraph> <Paragraph position="6"> Well-formedness of elementary trees is also expressed through the form of the hierarchy itself (the content of the classes, the inheritance links, the inheritance modes for the different slots...). This information spread into the hierarchy is used for tree generation following technical principles of wellformedness. Due to a lack of space we detail only the following principle, useful to understand next section.</Paragraph> <Paragraph position="7"> 3- unification principle : the unifications of partial descriptions and meta-equations required by inheritance must succeed; the unification of nodes with same constant is mandatory; moreover two nodes with the same value for the meta-feature &quot;function&quot; must unify.</Paragraph> <Paragraph position="8"> Figure 6 shows an example of generation of a terminal class, corresponding to the tree, for French, for the full passive of a strict transitive verb, in a wh-question on the agent (see figure 7). it can be illustrated by the sentence : (Je me demande) par qui Jean sera accompagn6.</Paragraph> <Paragraph position="9"> By whom will Jean be accompanied? verb, in a wh-question on the agent.</Paragraph> <Paragraph position="10"> The corresponding terminal class W0n0Vnl-pass inherits the canonical subcat STRICT TRANSITIVE and the redistribution PERSONAL FULL PASSIVE. This defines the following actual subcategorization : arg0/par-object; argl/subject. Then the terminal class inherits the relevant realization for each of the cited functions (SUBJECT</Paragraph> </Section> </Section> <Section position="7" start_page="197" end_page="198" type="metho"> <SectionTitle> IN CANONICAL POSITION and PAR-OBJ- </SectionTitle> <Paragraph position="0"> subcategorization, this principle relates to the principles of well-formedness of functional structures in LFG.</Paragraph> <Paragraph position="1"> trees The terminal classes representing elementary trees inherit a (constructed) partial description of tree, with meta-equations and equations. To get elementary trees from these classes, we need to translate the partial descriptions into trees. This is done by taking the least tree(s) satisfying the description. We do not go into the details for brevity reasons, but intuitively the minimal tree is computed by taking the underspecified links to be path of length zero when their ends are compatible, of length one otherwise (figure 8). A description can leave underspecified the order of some daughters, leading to several minimal trees. Rogers and Vijay-Shanker (94) give a formal mechanism to obtain trees from descriptions.</Paragraph> <Paragraph position="2"> After obtaining tree(s) from the partial description, the generator translates the node constants into the concatenation of syntactic category and index (if it exists).</Paragraph> <Paragraph position="3"> 4.2.3 A detailed example Let us go back to the tree of figure 7. The next figure shows in detail the super-classes 7 (introduced at figure 6) for the class W0n0Vnl-pass representing 7We only show the direct super-classes. They are given with their specific properties and with their inherited properties as well. The &quot;equations&quot; slot is not shown. In the partial descriptions shown, the constants naming the nodes start with ?. The conjunction of the inherited partial descriptions leads to the following description : The nodes with same constants have unified (?S/?S) and the constants with same &quot;function&quot; meta-feature have also unified : ?subject/?argl and ?quest/?arg0 (cf. principle 3). Then the node constants are translated and the least satisfying tree is computed, leading to the target tree of figure 7.</Paragraph> </Section> <Section position="8" start_page="198" end_page="198" type="metho"> <SectionTitle> 5 Applications </SectionTitle> <Paragraph position="0"> The tool has been used to update and augment the French LTAG developed at Paris 7. A hierarchy has been written that gives a compact and transparent representation of the verbal families already existing in the grammar. The writing of the hierarchy has been the occasion of updating structures and equations, insuring uniform and coherent handling of phenomena. Furthermore the automatic generation from the hierarchy guarantees the well-formedness of the families, with all possible conjunctions of phenomena. Extra phenomena such as nominal subject inversion, impersonal middle constructions, some causative constructions or free order of complements have been added.</Paragraph> <Paragraph position="1"> The generative power of the tool is effective : out of about 90 hand-written classes, the tool generates 730 trees for the 17 families for verbs without sentential complements 8, 400 of which were present in the pre-existing grammar. The tool is currently used to add trees for some elliptical coordinations.</Paragraph> <Paragraph position="2"> We see several possible applications of the tool.</Paragraph> <Paragraph position="3"> We could try to generate a grammar with weaker constraints, useful for corpora with recurrent ill-formed sentences. Secondly, we could obviously use the tool to build a grammar for another language, either from scratch or using the hierarchy designed for French. Using this already existing hierarchy and the implemented principles of well-formedness will lead to a grammar for another language &quot;compatible&quot; with the French grammar. This could be an advantage in the perspective of machine translation for instance.</Paragraph> <Paragraph position="4"> Because the principles of well-formedness implemented are general and capture mainly the extended domain of locality of LTAG, the generator we have presented can very well be used to generate a grammar with different underlying linguistic choices (for instance the GB perspective used in the English grammar cited).</Paragraph> </Section> <Section position="9" start_page="198" end_page="198" type="metho"> <SectionTitle> 8 By the time of conference, we will be able to give figures for the families with sentential complements also. 6 Conclusion </SectionTitle> <Paragraph position="0"> We have presented a hierarchical and principle-based representation of syntactic information. It insures transparency and coherence in syntactic descriptions and allows the generation of the elementary trees of an LTAG, with systematic crossing of linguistic phenomena.</Paragraph> </Section> class="xml-element"></Paper>