File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/j82-1001_metho.xml
Size: 26,481 bytes
Last Modified: 2025-10-06 14:11:28
<?xml version="1.0" standalone="yes"?> <Paper uid="J82-1001"> <Title>Phrase Structure Trees Bear More Fruit</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. Gazdar's Formulation </SectionTitle> <Paragraph position="0"> Gazdar 1979 has introduced categories with holes and some associated rules in order to allow for the base generation of &quot;unbounded&quot; dependencies. Let V N be the set of basic nonterminal symbols. Then we define a set D(V N) of derived nonterminal symbols as follows.</Paragraph> <Paragraph position="1"> D(VN) = {a/~ I ct, fl e V N } For example, if S and NP are the only two nonterminal symbols, then D(V N) would consist of S/S, S/NP, NP/NP, and NP/S. The intended interpretation of a derived category (slashed category or a category with a hole) is as follows: A node labeled a/18 will dominate subtrees identical to those that can be dominated by a, except that somewhere in every sub-tree of the a/18 type there will occur a node of the form ~/fl dominating a resumptive pronoun, a trace, or the empty string, and every node linking a/fl and /3//3 will be of the form a/t9. Thus a/18 labels a node of type a that dominates material containing a hole of the type/3 (i.e., an extraction site in a movement analysis). For example, S/N P is a sentence that has an N P missing somewhere. The derived rules allow the propagation of a hole and the linking rules allow the introduction of a category with a hole. For example, given the</Paragraph> <Paragraph position="3"/> <Paragraph position="5"> An example of a linking rule is a rule (rule schema) that introduces a category with a hole as needed for topicalization.</Paragraph> <Paragraph position="7"> This rule will induce a structure like (6). The technique of categories with holes and the associated derived and linking rules allows unbounded dependencies to be accounted for in the phrase structure representation; however, this is accomplished at the expense of proliferation of categories of the type a/\[3 (see also Karttunen 1980). Later, in Section 3, we will present an alternate way of representing (6) by means of local constraints and some of their generalizations.</Paragraph> <Paragraph position="9"> The notion of categories with holes is not completely new. In his 'String Analysis of Language Structure', Harris 1956, 1962 introduces categories such as S - NP or S Np (like S/NP of Gazdar) to account for moved constituents. He does not however seem to provide, at least not explicitly, machinery for carrying the &quot;hole&quot; downwards. He also has rules in his framework for introducing categories with holes.</Paragraph> <Paragraph position="10"> Thus, in his framework, something like (6) would be accomplished by allowing for a sentence form (a center string) of the form (7) (not entirely his notation).</Paragraph> <Paragraph position="12"> Sager, who has constructed a very substantial parser starting from some of these ideas and extending them significantly, has allowed for the propagation of the 'hole' resulting in structures very similar to those of Gazdar. She has also used the notion of categories with holes in order to carry out some coordinate structure computation. For example, Sager allows for the coordination of S/a and S/a but not S and S/a. (See Sager 1967 for an early reference to her work.) Gazdar is the first, however, to incorporate the notion of categories with holes and the associated rules in a formal framework for his syntactical theory and also to exploit it in a systematic manner for explaining coordinate structure phenomena.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3. Local Constraints </SectionTitle> <Paragraph position="0"> In this section we briefly review our work on local constraints. Although this work has already appeared (Joshi and Levy 1977, Joshi, Levy, and Yueh 1980) and attracted some attention recently, the demonstration of our results has remained somewhat inaccessible to many due to the technicalities of the tree automata theory. In this paper we present an intuitive account of these results in terms of interacting finite state machines. null The method of local constraints is an attempt to describe context-free languages in an apparently context-sensitive form that helps to retain the intuitive insights about the grammatical structure. This form of description, while apparently context-sensitive, is in fact context-free.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Definition of Local Constraints </SectionTitle> <Paragraph position="0"> Context-sensitive grammars, in general, are more powerful (with respect to weak generative capacity) than context-free grammars. A fascinating result of Peters and Ritchie 1969 is that if a context-sensitive grammar G is used for &quot;analysis&quot; then the language &quot;analyzed&quot; by G is context-free. First, we describe what we mean by the use of a context-sensitive grammar G for &quot;analysis&quot;. Given a tree t, we define the set of proper analyses of t. Roughly speaking, a proper analysis of a tree is a slice across the tree. More precisely, the following recursive definition applies: American Journal of Computational Linguistics, Volume 8, Number 1, January-March 1982 3 Aravind K. Joshi and Leon S. Levy Phrase Structure Trees Bear More Fruit Definition 3.1. The set of proper analyses of a tree t, denoted Pt, is defined as follows.</Paragraph> <Paragraph position="2"> A then Pt = {A} u P(to).P(t 1) ..... P(t n) where to, t 1 .... t n are trees, and '.' denotes concatenation (of sets).</Paragraph> <Paragraph position="3"> Example 3.1 p11rA~#2 (pl,P2e V*).</Paragraph> <Paragraph position="4"> The contextual condition associated with such a &quot;vertical&quot; proper analysi~ is called a domination predicate.</Paragraph> <Paragraph position="5"> The general form of a local constraint combines the proper analysis and domination predicates as follows: Definition 3.2. A local constraint rule is a rule of the form A -* u/C A where C A is a Boolean combination of proper analysis and domination predicates.</Paragraph> <Paragraph position="6"> In transformational linguistics the context-sensitive and domination predicates are used to describe conditions on transformations; hence we have referred to these local constraints elsewhere as local transformations. null</Paragraph> <Paragraph position="8"> cde}.</Paragraph> <Paragraph position="9"> Let G be a context-sensitive grammar; i.e., its rules are of the form A -&quot; ~/~r__~ where A cV - E (V is the alphabet and E is the set of terminal symbols), w e V + (set of non-null strings on V) and ~r, 4~ E V* (set of all strings onV). If ~r and are both null, then the rule is a context-free rule. A tree t is said to be &quot;analyzable&quot; with respect to G if for each node of t some rule of G &quot;holds&quot;. It is obvious how to check whether a context-free rule holds of a node or not. A context-sensitive rule A -- ~/~r ~ holds of a node labeled A if the string corresponding to the immediate descendants of that node is t~ and there is a proper analysis of t of the form p17rAdpp 2 that &quot;passes through&quot; the node, (Pl,P2 E V*). We call the contextual condition qr ff a proper analysis predicate.</Paragraph> <Paragraph position="10"> Similar to these context-sensitive rules, which allow us to specify context on the &quot;right&quot; and &quot;left&quot;, we often need rules to specify context on the &quot;top&quot; or &quot;bottom&quot;. Given a node labeled A in a tree t, we say that DOM(~r q0, qr, ~ E V*, holds of a node labeled A if there is a path from the root of the tree to the frontier, which passes through the node labeled A, and is of the form</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Results on Local Constraints </SectionTitle> <Paragraph position="0"> Theorem 3.1 (Joshi and Levy 1977) Let G be a finite set of local constraint rules and z(G) the set of trees analyzable by G. (It is assumed here that the trees in +(G) are sentential trees; i.e., the root node of a tree in ~-(G) is labeled by the start symbol, S, and the terminal nodes are labeled by terminal symbols.) Then the string language L(z(G)) = {xlx is the yield of t and t E ~-(G)} is context-free.</Paragraph> <Paragraph position="1"> Example 3.2 Let V = {S,T,a,b,c,e} and Y. = {a,b,c,e}, and G be a finite set of local constraint rules: 1. S--e 2. S --- aT 3. T~aS 4. S -- bTc/(a )A DOM (T) 5. T-~ bSc/(a ) A DOM (S) In rules 1, 2, and 3, the context is null, and these rules are context-free. In rule 4 (and in rule 5), the constraint requires an 'a' on the left, and the node dominated (immediately) by a T (and by an S in rule 5). The language generated by G can be derived by G l:</Paragraph> <Paragraph position="3"> S 1 --,. bTc In G 1 there are additional nonterminals S 1 and T 1 that enable the context checking of the local constraints grammar, G, in the generation process.</Paragraph> <Paragraph position="4"> It is easy to see that, under the homomorphism that removes subscripts on the nonterminals T 1 and S 1, each tree generable in G 1 is analyzable in G. Also, each tree analyzable in G has a homomorphic preimage in G 1.</Paragraph> <Paragraph position="5"> The methods used in the proof of the theorem use tree automata to check the local constraint predicates,</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 American Journal of Computational Linguistics, Volume 8, Number 1, January-March 1982 </SectionTitle> <Paragraph position="0"> Aravind K. Joshi and Leon S. Levy Phrase Structure Trees Bear More Fruit since tree automata used as recognizers accept only tree sets whose yield languages are context-free.</Paragraph> <Paragraph position="1"> We now give an informal introduction to the ideas of (bottom-up) tree automata. Tree automata process labeled trees, where there is a left-to-right ordering on the successors of a node in the tree. When all the successors of a node v have been assigned states, then a state is assigned to v by a rule that depends on the label of v and (he states of the successors of v considering their left-to-right ordering. Note that the automaton may immediately assign states to the nodes on the frontier of the tree since these nodes .have no successors. If the set of states is partitioned into final and non-final states, then a tree is accepted by the automaton if the state assigned to the root is a final state. A set of trees accepted by a tree automaton is called a recognizable set. Note that the automaton may operate non-deterministically, in which case, as usual, a tree is accepted if there is some set of state assignments leading to its acceptance.</Paragraph> <Paragraph position="2"> The importance of tree automata is that they are related to the sets of derivation trees of context-free grammars. Specifically, if T is the set of derivation trees of a context-free grammar, G, then there is a tree automaton that recognizes T. Conversely, if T is the set of trees recognized by a tree automaton, A, then T may be systematically relabeled as the set of derivation trees of a context-free grammar.</Paragraph> <Paragraph position="3"> The basic idea presented in detail in Joshi and Levy 1977 is that, because tree automata have nice closure properties (closure under union, intersection, and concatenation), they can do the computations required to check the local constraints.</Paragraph> <Paragraph position="4"> Another way of looking at the checking of a labeled tree by a tree automaton is as follows. We imagine a finite state machine sitting at each node of a tree. The role of the finite state machine is to check that a correct rule application is made at the node it is checking. Initially, the nodes on the frontier are turned on and signal their parent nodes. At any other node in the tree, the machine at that node is turned on as soon as all its direct descendants are active. Assuming that at each node the machine for that node has checked that the rule applied there was one of the rules of the context-free grammar we are looking for, then when the root node of the tree signals that it has correctly checked the root we know that the tree is a proper tree for the given context-free grammar.</Paragraph> <Paragraph position="5"> When checking for local constraints, a machine at a given node not only passes information to its parent, as described above, but also passes information about those parts of the local constraints, corresponding to the given node as well as all its descendants, that have not yet been satisfied. The point is that this information is always bounded and hence a finite number of states are adequate to code this information.</Paragraph> <Paragraph position="6"> The fact that the closure properties hold can be seen as follows. Consider a slightly more general situation. We consider an A machine and a B machine at each node. Depending on the connections between these A and B machines, we obtain additional results.</Paragraph> <Paragraph position="7"> For example, as each A machine passes information to its parent, it may also pass information to the B machine, but the \[3 machine will not pass information back to the A machine. The tree is accepted if the B machine .at the root node of the tree ends up in a final state. Although this seems to be a more complicated model, it can in fact be subsumed in our first model and is the basis of an informal proof that the recognition rules are closed under intersection, since the A machine and the \[3 machine can check different rules.</Paragraph> <Paragraph position="8"> An important point is that the local constraint on a rule applied at a given node may only be verified by the checking automata at some distant ancestor of that node. In particular, in the case of a proper analysis constraint, it can only be verified at a node sufficiently high in the tree to dominate the entire string specified in the constraint.</Paragraph> <Paragraph position="9"> The perceptive reader may now be wondering what replaces all these hypothetical finite state machines when the set of trees corresponds to a context-free grammar. Well, if we were to convert our local constraints grammar into a standard form context-free grammar, we would require a larger nonterminal set.</Paragraph> <Paragraph position="10"> In effect this larger nonterminal set is an encoding of the finite state information stored at the nodes.</Paragraph> <Paragraph position="11"> The intuitive explanation presented in this section is, in fact, a complete characterization of recognizability. Given a context-free grammar, one can specify the finite state machine to be posted at each node of a tree to check the tree. And conversely, given the finite state machine description, one can derive the equivalent context-free grammar.</Paragraph> <Paragraph position="12"> The essence of the local constraints formulation is to paraphrase the finite state checking at the nodes of the tree in terms of patterns or predicates.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4. Some Generalizations </SectionTitle> <Paragraph position="0"> The result of Theorem 3.1 can be generalized in various ways. Generalizations in (i) and (ii) below are immediate.</Paragraph> <Paragraph position="1"> (i) Variables can be included in the constraint. Thus, for example, a local constraint rule can be of the form A --,,. w I BCDXE FYG where A,B,C,D.E,F,G are nonterminals, w is a string of terminals and/or nonterminals, and X and Y are variables that range over arbitrary strings of terminals and nonterminals.</Paragraph> <Paragraph position="2"> American Journal of Computational Linguistics, Volume 8, Number 1, January-March 1982 5 Aravind K. Joshi and Leon S. Levy Phrase Structure Trees Bear More Fruit (ii) Finite tree fragments can be included in the constraint. Thus, for example, a local constraint rule can be of the form Another useful generalization has the following essential character.</Paragraph> <Paragraph position="3"> (iii) Predicates that relate nodes mentioned in the proper analysis predicates and domination predicates (associated with a rule), as well as nodes in finite tree fragments dominated by these nodes, can be included in the constraint. Unfortunately, at this time we are unable to give a precise characterization of this generalization. The following two predicates are special cases of this generalization, and Theorem 3.1 holds for these two cases.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Local Constraints in Semantics </SectionTitle> <Paragraph position="0"> Since a local constraint rule has a context-free part and contextual constraint part, it is possible to define context-sensitive compositional semantics in the following manner.</Paragraph> <Paragraph position="1"> For a context-free rule of the form A--,- BC if o(A), o(B), o(C) are the 'semantic' translations associated with A, B, and C, respectively, then o(A) is a composition of o(B) and o(C).</Paragraph> <Paragraph position="2"> For a local constraint rule of the form A-,- BC I P where A -* BC is the context-free part and P is the contextual constraint, we can have o(A) as a composition of o(B) and o(C), which depends on P. This idea has been pursued in the context of programming languages (Joshi, Levy, and Yueh 1980). Whether such an approach would be useful for natural language is an open question. (An additional comment appears in Section 5.) 5. Linked Nodes 4 (Peters's and Kartunnen's framework) null Peters 1980 and Kartunnen 1980 have proposed a device for linking nodes to handle unbounded dependencies. Thus, for example, instead of (6) or (7), we have (8).</Paragraph> <Paragraph position="4"> The dotted line that loops from the VP node back to the moved constituent is a way of indicating the loca-tion of the gap in the object position under the VP.</Paragraph> <Paragraph position="5"> The link also indicates that there is certain dependency between the gap and the dislocated element. Both in our approach and that of Peters and Karttunen, proliferation of categories as in Gazdar's approach is avoided. Further, for Peters and Karttunen, while carrying 4 We give a very informal description of a linked tree. A precise definition can be found in S. Peters and R.W. Ritchie, Phrase Linking Grammars, Technical Report, Department of Linguistics, University of Texas at Austin, 1982.</Paragraph> <Paragraph position="6"> out bottom-up semantic translation, the moved constituent is &quot;visible&quot; at the VP node. In our approach, this &quot;visibility&quot; can be obtained if the translation is made to depend on the contextual constraint which, of course, has already been checked prior to the translation. This is the essence of our suggestion in Section 4.1.</Paragraph> <Paragraph position="7"> Karttunen 1980 has constructed a parser incorporating the device of linked nodes. Karttunen also discusses the problem of complex patterns of moved constituents and their associated gaps or resumptive pronouns. This is not easy to handle in Gazdar's framework without multiplying the categories even further, e.g., by providing categories such as S/NP NP, etc. 5 Karttunen handles this problem by essentially incorporating the checking of the patterns of gaps and fillers in the parser, i.e., in the control structure of the parser. null Our approach can be regarded as somewhat intermediate between Gazdar's and that of Peters and Karttunen in the following sense. We avoid multiplication of categories as do Peters and Karttunen. On the other hand, the relationship between the moved constituent and the gap is expressed in the grammar itself (more in the spirit of Gazdar) instead of in the parser (more precisely, in the data structure created by the parser) as in the Peters and Karttunen approach.</Paragraph> <Paragraph position="8"> We have not pursued the topic of multiple gaps and fillers in our framework but, obviously, in it we would opt for Karttunen's suggestion of checking the constraints on the patterns of gaps and fillers in the parser itself. It could not be done by local constraints alone because local constraints essentially do the work of the links in the Peters and Karttunen framework.</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 6. Skeletal Structural Descriptions (Skeletons) </SectionTitle> <Paragraph position="0"> In Section 4, we showed how local constraints allowed us to prevent proliferation of categories. We can dispense with the local constraints and construct an equivalent context-free grammar that would have potentially a very large number of categories. While pursuing the relation between 'structure' and the size of the nonterminal vocabulary (i.e., the syntactic categories), we were led to the following surprising result: the actual labels, in a sense, carry no information.</Paragraph> <Paragraph position="1"> (This result was also used by us in developing some heuristics for converting a context-free grammar into a more compact but equivalent local constraints grammar. We will not describe this use of our result in the present paper. (For further information, see Joshi, Levy, and Yueh 1980.) First we need some definitions. A phrase structure tree without labels will be called a skeletal structural</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 S/NP NP means an S tree with two NP type holes. </SectionTitle> <Paragraph position="0"> American Journal of Computational Linguistics, Volume 8, Number 1, January-March 1982 7 Aravind K. Joshi and Leon S. Levy Phrase Structure Trees Bear More Fruit description or a skeleton. A skeleton exhibits all of the grouping structure without naming the syntactic categories. For example, (9) is a skeleton. The structural description is characterized only by the shape of the tree and not the associated labels. The only symbols appearing in the structure are the terminal symbols (more precisely, the preterminal symbols and the terminal symbols, in the linguistic context, as in (10); however, for the rest of the discussion, we will take skeletons to mean trees with terminal symbols only).</Paragraph> <Paragraph position="1"> Let G be a context-free grammar and let T G be the set of sentential derivation trees (structural descriptions) of G. Let S G be the skeletons of G, i.e., all trees in T G with the labels removed.</Paragraph> <Paragraph position="2"> It is possible to show that for every context-free grammar G we can construct a skeletal generating system (consisting of skeletons and skeletal rewriting rules) that generates exactly SG; i.e., all category labels can be eliminated while retaining the structural In this system, generation proceeds from an initial skeleton through a sequence of intermediate skeletons to the desired skeleton. Clearly, because of the definition of a skeleton and the nature of the skeletal rewriting rules, the rules must always apply to one of the lowermost configurations in a skeleton that matches with the left-hand side of a rule. Thus the derivation of the skeleton (3) in S G would be as in (11). The configurations encircled by a dotted line are the ones to which the skeletal rule is applied.</Paragraph> <Paragraph position="3"> In the above example, there was only one nonterminal; hence the result is obvious. Following is a somewhat more complicated example.</Paragraph> <Paragraph position="4"> context-free grammar G' that is equivalent to G.</Paragraph> <Paragraph position="5"> Rather than taking a complicated context-free grammar and then exhibiting the equivalent skeletal grammar, we will take the local constraints grammar G and exhibit a skeletal grammar equivalent to G. This will allow us to present a complicated example without making the resulting skeletal grammar too unwieldy.</Paragraph> <Paragraph position="6"> Also, this example will give some idea about the relationship between local constraints grammars and skeletal grammars; in particular, the skeletal rewriting rules indirectly encode the local constraints in the rules in Example 6.2.</Paragraph> <Paragraph position="7"> We have eliminated all labels by introducing structural rewriting rules and defining the derivation as proceeding from skeleton to skeleton rather than from string to string. This result clearly brings out the relationship between the grouping structure and the syntactic categories labeling the nodes.</Paragraph> <Paragraph position="9"> la bl a... a , , , j!\ , , ,-7\'..</Paragraph> <Paragraph position="10"> Aravind K. Joshi and Leon S. Levy Phrase Structure Trees Bear More Fruit Since skeletons pay attention to grouping only, this result may be psycholinguistically important because our first intuition about the structure of a sentence is more likely to be in terms of the grouping structure and not in terms of the corresponding syntactic categories, especially those beyond the preterminal categories. null The theory of skeletons may also provide some insight into the problem of grammatical inference. For a finite state string automaton, it is well know that if the number of states is 2k then, if we are presented with all acceptable stings of length <2k, the finite state automaton is completely determined. We have a similar situation with the skeletons. First, it can be shown that for each skeletal set S G (i.e., the set of skeletons of a context-free grammar) we can construct a bottom-up tree automaton that recognizes precisely S G (Levy and Joshi 1978). Further, if the number of states of this automaton is k, then the set of all acceptable sets of skeletons of depth _<2k completely determines 5 G (Levy and Joshi 1979). Using skeletons (i.e., string with their grouping structure) rather than just strings as input to a grammatical inference machine is an idea worth pursuing further.</Paragraph> </Section> class="xml-element"></Paper>