File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/p96-1045_metho.xml
Size: 11,213 bytes
Last Modified: 2025-10-06 14:14:21
<?xml version="1.0" standalone="yes"?> <Paper uid="P96-1045"> <Title>Generating an LTAG out of a principle-based hierarchical representation</Title> <Section position="4" start_page="0" end_page="342" type="metho"> <SectionTitle> 3 Existing solutions </SectionTitle> <Paragraph position="0"> A few solutions have been proposed for the problems described above. They use two main devices for lexicon representation : inheritance networks and lexical rules. But for LTAG representation, inheritance networks have to include phrase-structure information also, and lexical rules become &quot;lexico-syntactic rules&quot;. Vijay-Shanker and Schabes, (92) have first proposed a scheme for LTAG representation. Implemented work is also described in (Becker, 93; 95) and (Evans et al., 95).</Paragraph> <Paragraph position="1"> The three cited solutions give an efficient representation (without redundancy) of an LTAG, but have in our opinion two major deficiencies. First these solutions use inheritance networks and lexical rules in a purely technical way. They give no principle about the form of the hierarchy or the lexical rules, whereas we believe that addressing the practical problem of redundancy should give the opportunity of formalizing the well-formedness of elementary trees and of tree families. And second, the generative aspect of these solutions is not developed. Certainly the lexical rules are proposed as a tool for generation of new schemata or new classes in a inheritance network. But the automatic triggering, ordering and bounding of the lexical rules is not discussed 2.</Paragraph> </Section> <Section position="5" start_page="342" end_page="343" type="metho"> <SectionTitle> 4 Proposed solution : a principle-based </SectionTitle> <Paragraph position="0"> representation and a generation system We propose a system for the writing and/or the updating of an LTAG. It comprises a principled and hierarchical representation of lexico-syntactic structures. Using this hierarchy and principles of well-formedness, the tool carries out automatically the relevant crossings of linguistic phenomena to generate the tree families.</Paragraph> <Paragraph position="1"> This solution not only addresses the problem of redundancy but also gives a more principle-based representation of an LTAG. The implementation of the principles gives a real generative power to the tool.</Paragraph> <Paragraph position="2"> Due to a lack of space we cannot develop all the aspects of this work 3. After a brief description of the organization of the syntactic hierarchy, we will focus on the use of partial descriptions of trees.</Paragraph> <Section position="1" start_page="342" end_page="342" type="sub_section"> <SectionTitle> 4.1 Organization of the hierarchy </SectionTitle> <Paragraph position="0"> The proposed organization of the hierarchy follows from the linguistic principles of well-formedness of elementary TAG trees, mainly the predicate-arguments co-occurrence principle (Kroch and Joshi, 85; Abeillt, 91) : the trees for a predicative item contain positions for all its arguments.</Paragraph> <Paragraph position="1"> But for a given predicate, we expect the canonical arguments to remain constant through redistribution of functions. The canonical subject (argument 0) in a passive construction, even when unexpressed, is still an argument of the predicate. So the principle should be a principle of predicate-functions co-occurrence : the trees for a predicative item contain positions for all the functions of its actual subcategorization.</Paragraph> <Paragraph position="2"> This reformulated principle presupposes the definition of an actual subcategorization, given the canonical subcategorization of a predicate. This presupposition and the predicate-functions co-occurrence principle are fulfilled by organizing the hierarchy along the three following dimensions : dimension 1 : canonical subcategorization frame This dimension defines the types of canonical subcategorization. Its classes contain information on the arguments of a predicate, their index, their possible categories and their canonical syntactic function.</Paragraph> <Paragraph position="3"> dimension 2 : redistribution of syntactic functions This dimension defines the types of redistribution of functions (including the case of no redistribution at all). The association of a canonical subcategorization frame and a compatible redistribution gives an actual subcategorization, namely a list of argument-function pairs, that have to be locally realized.</Paragraph> <Paragraph position="4"> dimension 3 * syntactic realizations of functions It expresses the way the different syntactic functions are positioned at the phrase-structure level (in canonical, cliticized, extracted position...).</Paragraph> <Paragraph position="5"> These three dimensions constitute the core hierarchy.</Paragraph> <Paragraph position="6"> Out of this syntactic database and following principles of well-forrnedness the generator creates elementary trees. This is a two-steps process : it first creates some terminal classes with inherited properties only - they are totally defined by their list of super-classes. Then it translates these terminal classes into the relevant elementary tree schemata, in the XTAG 4 format, so that they can be used for parsing.</Paragraph> <Paragraph position="7"> Tree schemata generation respects the predicate-functions co-occurrence principle. Their corresponding terminal classes are created first by associating a canonical subcat (dimension 1) with a compatible redistribution, including the case of no redistribution (dimension 2). Then for each function defined in the actual subcat, exactly one realization of function is picked up in dimension 3.</Paragraph> <Paragraph position="8"> The generation is made family by family. This is simply achieved by fixing the canonical subcat frame (dimension 1), At the development stage, generation can also be done following other criterions. For instance, all passive trees or all trees with extracted complements can be generated.</Paragraph> </Section> <Section position="2" start_page="342" end_page="343" type="sub_section"> <SectionTitle> 4.2 Formal choices : monotonic inheritance </SectionTitle> <Paragraph position="0"> network and partial descriptions of trees The generation process described above is quite powerful in the context of LTAGs, because it carries out automatically all the relevant crossings of linguistic phenomena. These crossings are precisely the major source of redundancy in LTAGs. Because of this generative device, we do not need to introduce lexico-syntactic rules, and thus we do not have to face the problems of ordering and bounding their application.</Paragraph> <Paragraph position="1"> Further, as was mentioned in section 1, lexical idiosyncrasies are handled in the syntactic lexicon, and not in the set of tree schemata. So to represent hierarchically this set, we do not think that nonmonotonicity is linguistically justified. We have thus chosen monotonicity, which gives more transparency and improves declarativity. We follow here Vijay-Shanker and Schabes (92) and use partial descriptions of trees (Rogers and Vijay-Shanker, 94) 5. A partial description is a set of constraints that characterizes a set of trees. Adding information to the description reduces monotonically the set of satisfying trees. The partial descriptions of Rogers and Vijay-Shanker (94) use three relations : left-of, parent and dominance (represented with a dashed line). A dominance link can be further specified as a path of length superior or equal to zero. These links are obviously useful to underspecify a relation between two nodes at a general level, that will be specified at an either lower or lateral level. Figure 1 shows a partial description representing a sentence with a nominal subject in canonical position, giving no other information about possible other complements. The underspecified link between the S and V nodes allows for either presence or absence of a cliticized complement on the verb. In the case of a clitic, the path between the S and V nodes can be specified with the description of figure 2. Then, if we have the information that the nodes labelled respectively S and V of figures 1 and 2 are the same, the conjunction of the two descriptions is equivalent to the description of figure 3. $ This example shows the declarativity obtained with partial descriptions that use large dominance links. The inheritance of descriptions of figure 1 and 2 is order independent. Without large dominance links, an order of inheritance of the classes describing a subject in canonical position and a cliticized complement should be predefined.</Paragraph> <Paragraph position="2"> In the hierarchy of syntactic descriptions we propose, the partial description associated with a class is the unification of the own description of the class with all inherited partial descriptions. Identity of nodes is stated in our system by &quot;naming&quot; both nodes in the same way, since in descriptions of trees, nodes are referred to by constants. Two nodes, in two conjunct descriptions, referred to by the same constant are the same node.</Paragraph> <Paragraph position="3"> Equality of nodes can also be inferred, mainly using the fact that a tree node has only one direct parent node.</Paragraph> <Paragraph position="4"> We have added atomic features associated with each constant, such as category, index, canonical syntactic function and actual syntactic function. In the conjunction of two descriptions, the identification of two nodes known to be the same requires the unification 5Vijay-Shanker & Schabes (92) have used the partial descriptions introduced in (Rogers & Vijay-Shanker, 92), but we have used the more recent version of (Rogers & Vijay-Shanker, 94). The difference lies principally in the definition of quasi-trees, first seen as partial models of trees and later as distinguished sets of constraints. of such features. In case of failure, the whole conjunction leads to an unsatisfiable description.</Paragraph> <Paragraph position="5"> A terminal class is translated into its corresponding elementary tree(s) by taking the minimal satisfying tree(s) of the partial description of the class 6.</Paragraph> </Section> <Section position="3" start_page="343" end_page="343" type="sub_section"> <SectionTitle> 4.3 Application to the French LTAG </SectionTitle> <Paragraph position="0"> The tool was used to generate tree families of the French grammar, using a hand-written hierarchy of syntactic descriptions. This task is facilitated by the guidelines given on the form of the hierarchy. Out of about 90 hand-written classes, the tool generates 730 trees for the 17 families for verbs without sentential complements 7, 400 of which were present in the pre-existing grammar. We have added phenomena such as some causative constructions or free order of complements.</Paragraph> <Paragraph position="1"> The proposed type of hierarchy is meant to be universal, and we are currently working on its application to Italian.</Paragraph> </Section> </Section> class="xml-element"></Paper>