File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1018_metho.xml
Size: 16,098 bytes
Last Modified: 2025-10-06 14:10:18
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1018"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Polarized Unification Grammars</Title> <Section position="5" start_page="138" end_page="142" type="metho"> <SectionTitle> 3 Examples of PUGs </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="138" end_page="139" type="sub_section"> <SectionTitle> 3.1 Tree gramars </SectionTitle> <Paragraph position="0"> The first tree grammars belonging to the paradigm of PUGs was proposed by Nasr 195. The folowing grammar G allows generating all finite trees (a tree is a conected directed graph such that every node except one is the target of at most one edge); objects are nodes and edges; the initial structure (the box on the left) is reduced to a black node; the grammar has only one other elementary structure, which is composed of a black edge linking a white node to a black node. Each white node must unify with a black node in order to be neutralized and each black node can unify with whatever number of white nodes. It can easily be verified that the structures generated by the grammar are trees, because every node has one and only one governor, except the node introduced by the initial structure, which is the root of the tree.</Paragraph> <Paragraph position="1"> does not control the number of dependents of nodes. A grammar like G allows controling the valence of each node, but it does not ensure that generated structures are trees, because two white nodes can unify and a node can have more than one governor.</Paragraph> <Paragraph position="2"> solves the problem. In fact, G can be viewed as the superimposition of G With the same principles, one can build a dependency gramar generating the syntactic dependency trees of a fragment of natural language. Grammar G , directly inspired from Nasr 195, proposes a fragment of grammar for English generating the syntactic tree of Peter eats red beans. Nodes of this grammar are labeled by two label maps, /cat/ and /lex/. Note that the root of Nasr 195 proposes such a gramar in order to generate tres. He uses an external requirement, which forces, when two structures are combined, the rot of one to combine with a node of the other one.</Paragraph> <Paragraph position="3"> a b c the elementary structure of an adjective is a white node, allowing an unlimited number of such structures to adjoin to a noun.</Paragraph> </Section> <Section position="2" start_page="139" end_page="139" type="sub_section"> <SectionTitle> 3.2 Rewriting systems and ordered trees </SectionTitle> <Paragraph position="0"> PUG can simulate any rewriting system and have the weak generative capacity of Turing machines.</Paragraph> <Paragraph position="1"> We folow ideas developed by Burroni 193 or Dymetman 199, themselves folowing van Kampen 193's ideas.</Paragraph> <Paragraph position="2"> A sequence abc is represented by a string of labeled edges a, b and c: Intuitively, edges are intervals and nodes model their extremities. This is the simplest way to model linear order and precedence rules: X precedes Y iff the end of X is the begining of Y. The initial category S of the grammar gives us the initial structure: A terminal symbol a corresponds to a positive edge: A rewriting rule ABC - DE gives us the elementary structure: This elementary structure is a &quot;cell&quot; whose uper frontier is a string of positive edges corresponding to the left part of the rule, while the lower frontier is a string of negative edges corresponding to the right part of the rule. Each positive edge must unify with a negative edge and vice versa, in order to give a black edge. Nodes are grey (= absolutely neutral) and their unification is entirely driven by the unification of edges. Cells wil unify with each other to give a final structure representing the derivation structure of a sequence, which is the lower edge of this structure. The next figure shows the derivation structure of the sequence Peter eats red beans with a standard phrase structure grammar, which can be reconstructed by the reader. In such a representation, edges represent phrases and correspond to intervals in the cuting of the sequence, while nodes are bounds of these intervals.</Paragraph> <Paragraph position="3"> For a context-free rewriting system, the grammar generates the derivation tree, which can be represented in a more traditional way by adding the branches of the tree (giving us a 2-graph).</Paragraph> <Paragraph position="4"> Let us recall that a derivation tree for a context-free grammar is an ordered tree. An ordered tree combines two structures on the same set of nodes: a structure of tree and a precedence relation on the node of the tree. Here the precedence relation is explicitly represented (a &quot;node&quot; of the tree precedes another &quot;node&quot; if the target of the first one is the source of the second one). It is not posible, with a PUG, to generate the derivation tree, including the precedence relation, in a simpler way.</Paragraph> <Paragraph position="5"> Note that the usual representation of ordered trees (where the precedence relation is not explicit, but only deductible from the planarity of the representation) is very misleading from the computational viewpoint. When they calculate the precedence relation, parsers (of the CKY type for instance) in fact calculate a data structure like the one we present here, where beginings and ends of phrase are explicitly considered as objects.</Paragraph> </Section> <Section position="3" start_page="139" end_page="140" type="sub_section"> <SectionTitle> 3.3 TAG (Tree Adjoining Gramar) </SectionTitle> <Paragraph position="0"> PUG has a clear kinship with TAG, which is the first formalism based on combination of structures to be studied at length. TAGs are generally presented as grammars combining (ordered) trees. In fact, as a tree grammar, TAG is not The most natural idea would be to encode a rewriting rule with a tre of depth 1 and the precedence relation with edges from a node to its sucesor. The dificulty is then to propagate the order relation to the descendants of two sister nodes when we aply a rewriting rule by substituting a tre of depth 1. The simplest solution is undeniably the one presented here, consisting to introduce objects representing the begining and the end of phrases (our grey nodes) and to indicate the relation betwen a phrase, its begining and its end by representing the phrase with an edge from the beginning to the end.</Paragraph> <Paragraph position="1"> monotonic and cannot be simulated with PUG.</Paragraph> <Paragraph position="2"> As shown by Vijay-Shanker 192, to obtain a monotonic formalism, TAG must be viewed as a grammar combining quasi-trees. Intuitively, a quasi-tree is a tree whose nodes has been split in two parts and have each one an uper part and a lower part, between which another quasi-tree can be inserted (this is the famous adjoining operation of TAG). Formally, a quasi-tree is a tree whose branches have two types: dependency relations and dominance relations (respectively noted by plain lines and doted lines). Two nodes linked by a negative dominance relation are potentially the two parts of a same node; only the lower part can have dependents.</Paragraph> <Paragraph position="3"> The next figures give an a-tree (= to be substituted) and a b-tree (= to be adjoined) with the corresponding PUG structures.</Paragraph> <Paragraph position="4"> A substitution node (like D|) gives a negative node, which wil unify with the root of an a tree. A b-tree gives a white root node and a black foot node, which wil unify with the uper and the lower part of a split node. This is why the root and the foot node are linked by a positive dominance link, which wil unify with a negative dominance link conecting the two parts of a split node.</Paragraph> <Paragraph position="5"> An a tre and its PUG translation For sake of simplicity, we leave aside the precedence relation on sister nodes. It might be encoded in the same way as context-fre rewriting systems, by modeling seminodes of TAG tres by edges. It does not pose any problem but would make the figures dificult to read.</Paragraph> <Paragraph position="6"> A b tre and its PUG translation At the end of the derivation, the structure must be a tree and all nodes must be reconstructed: this is why we introduce the next rule, which presents a positive dominance link linking a node to itself and which wil force two seminodes to unify by neutralizing the dominance link between them.</Paragraph> <Paragraph position="7"> This last rule again shows the advantage of PUG: the reunification of nodes, which is procedurally ensured in Vijay-Shanker 192 is given here as a declarative rule.</Paragraph> </Section> <Section position="4" start_page="140" end_page="140" type="sub_section"> <SectionTitle> 3.4 HPSG (Head-driven Phrase Structure Gramar) </SectionTitle> <Paragraph position="0"> There are two ways to translate feature structures (FSs) into PUG. Clearly atomic values must be labels and (embedded) feature structures must be nodes, but features can be translated by maps or by edges, that is, objects. Encoding features by maps ensures to identify them in PUG. Encoding them by edges allows us to polarize them and control the number of identifications.</Paragraph> <Paragraph position="1"> For the sake of clarification of HSPG structures, we chose to translate structural features such as HDTR and NHDTR, which give the phrase structure and which never unify with other &quot;features&quot;, by edges and other features by maps (which wil be represented by hashed arrows). In any case, the result loks like a dag whose &quot;edges&quot; (true edges and maps) represent features and whose nodes represent values (e.g. Kesper & Monich 203). We exemplify the translation of HPSG in PUG with the schema of combination Perier 200 uses a feature-structure based formalism where only features are polarized. Although more or les equivalent we prefer to polarize the FS themselves, i.e. the nodes.</Paragraph> <Paragraph position="2"> of head phrase with a subcategorized sister phrase, namely the head-daughter-phrase: This FS gives the folowing structure, where a list is represented recursively in two pieces: its head (value of H) and its queue (value of Q). A negative node of this FS can be neutralized by the combination with a similar FS representing a phrase or with a lexical entry. The next figure proposes a lexical entry for eat, indicating that eat is a V whose SUBCAT list contains two phrases headed by an N (for sake of simplicity we deal with the subject as a subcategorized phrase).</Paragraph> <Paragraph position="3"> The combination of two head-daughterphrases with the lexical entry of eat gives us the previous lexicalized rule, equivalent to the rule for eat of the dependency grammar G (/subj/ is the NHDTR of the maximal projection and /obj/ Numbers in boxes are values shared by several features. The value of SUBCAT (= SC) is a list (the list of subcategorized phrases). The non-head daughter phrase (NHDTR) has a saturated valence and so neds an empty SUBCAT list (elist). The subcat list of the head daughter phrase (HDTR) is the concatenation, noted [?], of two lists: a list with one element that is the description of the non-head daughter phrase and the SUBCAT list of the whole phrase. The rest of the description of this phrase (value of HEAD) is equal to the one of the head daughter phrase.</Paragraph> <Paragraph position="4"> the NHDTR of the intermediate projection of eat).</Paragraph> <Paragraph position="5"> Polarization of objects shows exactly what is constructed by each rule and what are the requests filed by other rules. Moreover it allows us to force SUBCAT lists to be instantiated (and therefore allows us to control the saturation of the valence), which is ensured in the usual formalism of HPSG by a botom-up procedural presentation.</Paragraph> </Section> <Section position="5" start_page="140" end_page="142" type="sub_section"> <SectionTitle> 3.5 LFG (Lexical Functional Gramar) </SectionTitle> <Paragraph position="0"> and synchronous gramars We propose a translation of LFG into PUG that makes LFG appear as a synchronous grammar approach (see Shieber & Schabes 190). LFG synchronizes two structures (a phrase structure or c-structure and a dependency/functional structure or f-structure) and it can be viewed as the synchronization of a phrase structure grammar and a dependency grammar.</Paragraph> <Paragraph position="1"> Let us consider a first LFG rule and its translation in PUG:</Paragraph> <Paragraph position="3"> Equations under phrases (in the right side of [1]) ensure the synchronization between the objects of the c-structure and the f-structure: each phrase is synchronized with a &quot;functional&quot; node. Symbols |and |respectively designate the functional node synchronized with the current phrase and the one synchronized with the mother phrase (here S). Thus the equation |= |means that the current phrase (VP) and its mother (S) are synchronized with the same functional node. The expression |SUBJ designates the functional node depending on |by the relation SUBJ.</Paragraph> <Paragraph position="4"> In PUG we model the synchronization of the phrases and the functional nodes by synchronization links (represented by doted lines with diamond-shaped polarities) (see Bresnan 200 for non-formalized similar representations). The two synchronizations ensured by the two constraints |= |SUBJ and |= |of [1], and therefore built by this rule, are polarized in black.</Paragraph> <Paragraph position="5"> A phrasal rule such as [1] introduces an f-structure with a totally white polarization. It wil be neutralized by lexical rules such as [2]: [2] V - wants</Paragraph> <Paragraph position="7"> The feature Pred is interpreted as the labeling of the functional node, while the valence <SUBJ,VCOMP> gives us two black edges and two white nodes. The functional equation |SUBJ = |VCOMP SUBJ introduces a white edge SUBJ between the nodes |SUBJ and |VCOMP (and is therefore to be interpreted very differently from the constraints of [1], which introduce black synchronization links.) PUG allows to easily split up a rule into more elementary rules. For instance, the rule [1] can be split up into three rules: a phrase structure rules linearizing the daughter phrases and two rules of synchronization indicating the functional link between a phrase and one of its daughter phrases.</Paragraph> <Paragraph position="8"> Our decomposition shows that LFG articulated two different grammars: a classical phrase structure generating the c-structure and an interface grammar between c- and f-structures (and even a third grammar because the f-structure is really generated only by the lexical rules). With PUG it is easy to join two (or more) grammars: it suffices to ad on the objects by both grammars a white polarity that wil be saturated in the other grammar (and vice versa) (Kahane & Lareau 205).</Paragraph> <Paragraph position="9"> Let us consider another problem, ilustrated here by the rule for the topicalization of an object. The unbounded dependency of the object with its functional governor is an undetermined path expressed by a regular expression (here VCOMP* OBJ; functional uncertainty, Kaplan & Zaenen 1989).</Paragraph> <Paragraph position="11"> The path VCOMP* (represented by a dashed arrow) is expanded by the folowing regular grammar, with two rules, one for the propagation and one for the ending.</Paragraph> <Paragraph position="12"> Again the translation into PUG brings to the fore some fundamental components of the formalism (like synchronization links) and some non-explicit mechanisms such as the fact that the lexical equation |PRED = 'want <SUBJ,VCOMP> ' introduces both resources (a node 'want') and needs (its valence).</Paragraph> </Section> </Section> class="xml-element"></Paper>