File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/j89-3001_metho.xml
Size: 39,786 bytes
Last Modified: 2025-10-06 14:12:17
<?xml version="1.0" standalone="yes"?> <Paper uid="J89-3001"> <Title>PRACTICAL PARSING OF GENERALIZED PHRASE STRUCTURE GRAMMARS</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 A PARSING ALGORITHM FOR GPSGs </SectionTitle> <Paragraph position="0"> The algorithm belongs to the class of algorithms that obtain a grammar G', variously called a skeleton grammar or an underlying grammar, from a given grammar G and then parse according to G'. In these algorithms the skeleton grammar G' is chosen such that L(G) C_ L(G'), so, if the parse according to G' fails, the sentence can be rejected immediately. If the parse succeeds, it is necessary to check some additional constraints, typically by examining the parse tree, to ensure that the sentence is indeed acceptable to the more restrictive, given, grammar G. The extra checking process typically annotates the parse tree with extra information but does not change its shape. At the end of the checking process, either the sentence is rejected as not conforming to G, or the sentence is accepted, in which case the annotated parse tree is the parse tree of the sentence according to G. Wegner's (1980) algorithm for VWGs belongs to this class.</Paragraph> <Paragraph position="1"> In the present algorithm, the skeleton grammar G' is a GPSG that is obtained from a given GPSG G by neglecting some of the FCRs and the percolating feature propagation constraints. The skeleton grammar can be parsed by a simple modification of Earley's (1970) algorithm. The algorithm comprises a precompilation phase, in which the skeleton grammar G' is obtained from the given grammar G, followed by three parse-time phases that are executed one after the other.</Paragraph> <Paragraph position="2"> Precompilation. Given a GPSG G = (V F, V r, X o, R, F, F e, Far), first define the side-effect-free FCR set F' algorithmically in the following manner.</Paragraph> <Paragraph position="3"> Computational Linguistics, Volume 15, Number 3, September 1989 141 Anthony J. Fisher Practical Parsing of Generalized Phrase Structure Grammars 1. Convert F to clausal form, in other words a conjunction of disjunctions of terms, each of which is either an unnegated literal feature or a negated literal feature. (This can be done uniquely, apart from questions of ordering of terms; see, e.g., Loveland (1978:32ff).) 2. Remove from the clause set all clauses (i.e., disjunctions) that contain one or more unnegated literals, leaving behind only those clauses that contain only negated literals.</Paragraph> <Paragraph position="4"> The resulting set of clauses represents the side-effect-free FCR set F'. Since F' was obtained from F by removing clauses, the resulting function F' cannot be more restrictive than F; in other words, F ~ F'.</Paragraph> <Paragraph position="5"> The reason for removing clauses that contain unnegated literals is that the evaluation of FCRs can, in general, cause the instantiation of new features on a node. This can be viewed as a &quot;side effect&quot; of the evaluation, whose primary function is to filter out inadmissible parses. Side effects are difficult to handle, because they interact with each other and with other aspects of the grammar, in particular with the propagation constraints. For example, a new feature added as a result of a &quot;non-side-effect-free&quot; FCR clause might cause some other clause, which was satisfied before the new feature was added, to become false. This is not possible, however, if no clause contains an unnegated literal: each clause can be satisfied only by the absence of one or more stated features, and if the required features are not absent, the clause yields false m there is no mechanism in GPSG for removing features from a category.</Paragraph> <Paragraph position="6"> Now define the skeleton grammar G' of G by G' = (Vp, VT., Xo, R, F', O, FT).</Paragraph> <Paragraph position="7"> Clearly L(G) C_ L(G'), since whenever conditions 4 and 5 of section 2 hold for a derivation according to G, they will also hold for a corresponding derivation according to G'. (Remember that F D F'.) Informally, G' is more permissive than G. For the same reason, each parse according to G has a corresponding parse according to G', differing only in the distribution of features among categories on nodes. In other words, each parse tree according to G is the same &quot;shape&quot; as some parse tree (of the same sentence) according to G'.</Paragraph> <Paragraph position="8"> Phase 1. We now parse G' by applying a modified form of Earley's algorithm (Earley 1970; see also Pulman 1985, Ritchie and Thompson 1984). (The reader is assumed to be familiar with Earley's algorithm, in particular with the r61e played by the predictor in adding new states to a state set.) The algorithm is extended so that it creates a parse tree as the parse progresses. A method for doing this is described briefly by Earley (1970) and in more detail by Earley (1968).</Paragraph> <Paragraph position="9"> There is no need to handle the percolating feature propagation constraint at this stage, because in G' Fe is empty. There is no need to consider what happens when the evaluation of an FCR causes a new feature to be added to a category, since the FCR set F' is side-effectfree. 'iCe can therefore treat G' as a CFG whose non-terminals are categories, provided that we allow for extension (condition 3 of section 2) and the trickling feature; constraint (condition 5(ii)). This is done in the followi~ng way.</Paragraph> <Paragraph position="10"> A category appears on a node by virtue of the appearance of a category cs on the right-hand side of a rule R 1 and tlae appearance of a &quot;matching&quot; category c 2 on the.' left-lhand side of a rule R 2. The extension condition permits the free addition of features to Cl and to c 2 to generate the category c which appears on the node. By the extension condition, c~ C_ c and c 2 C_ c, which implies that c _~ (c~ U Cz). Now let us neglect for the moment the trickling feature constraint. Since we are ignoring the non-side-effect-free FCRs and the propagation constraints, any superset of c ILI c 2 which satisfies F'(c) will suffice; consequently, we take the smallest superset, namely c~ U c2, which is the least upper bound of c~ and c2 under the ordering relation of extension (see Gazdar et al. 1985:39).</Paragraph> <Paragraph position="11"> Now we consider the trickling feature constraint.</Paragraph> <Paragraph position="12"> The effect of this constraint is to instantiate extra features on certain categories: those features that belong to a mother category and which are also members of F~- must be instantiated on each daughter category.</Paragraph> <Paragraph position="13"> The category that is instantiated on the node of the parse tree is the smallest superset of c~ U c2 which contains all of its mother's trickling features, which is cn U c z tA (c o fq Fr), where c o is the category of the mother node.</Paragraph> <Paragraph position="14"> To determine the category to place on a node of the parse tree, therefore, the algorithm needs to know: * the a priori category c~ on the right-hand side of a rule; * the a priori category c 2 on the left-hand side of the rule which &quot;matches&quot; c~; * the fully-evaluated a posteriori category on the mother of the node to which a category is currently being assigned.</Paragraph> <Paragraph position="15"> All of this information is available to the predictor in Earley's algorithm. This follows from the fact that Earley's algorithm is &quot;top down&quot;, which means that the full category on a node is known before any of that node's daughters are considered.</Paragraph> <Paragraph position="16"> We now consider how Earley's algorithm can be modified to parse the skeleton grammar in the manner outlined above. A state in Earley's standard algorithm can be written as X --~ a./3, which signifies that the algoritZhm is considering the rule X --> a/3, and has successfully matched the a with some portion of the sentence being parsed. The predictor is applied to states</Paragraph> <Paragraph position="18"> the dot:. The predictor adds new states Y --> * 7 for each rule Y ~ 3' with matching non-terminal Y.</Paragraph> <Paragraph position="19"> In the new algorithm, a state is written (Co) c --~ a */3. 142 Computational Linguistics, Volume 15, Number 3, September 1989 Anthony J. Fisher Practical Parsing of Generalized Phrase Structure Grammars As in the standard algorithm, this signifies that the algorithm is considering the rule c --~ t~fl, and has successfully matched the ~. The extra category c o contains the features that are passed from mother to daughter and which will ultimately appear on a node of the parse tree.</Paragraph> <Paragraph position="20"> The predictor in the new algorithm is applied to states (Co) c --~ t~ * Clfl with a non-terminal category to the right of the dot. The predictor adds new states (C 1 U C2 U (C O OFr) ) C 2 ~ * ')/ for each rule c 2 -~ 3' such that F'(c~ U c2 U (Co A FT)) holds* (At first sight it appears that this entails a search through all of the rules of the grammar, but a means of avoiding a full search is presented in section 4.) The r61es of the scanner and of the completer in Earley's standard algorithm are unchanged in the new algorithm* The initial state that is entered to start the parse is (~) ~ --~ * X 0. The associated &quot;dummy&quot; rule -~ Xo is not considered part of the grammar, and is exempt from being matched by the predictor.</Paragraph> <Paragraph position="21"> It would be quite possible to use a &quot;bottom up&quot; algorithm in place of Earley's algorithm, in which the r61es of trickling and percolating features would be reversed. It is not possible to handle both percolating and trickling features in phase 1, since a provisional decision at some point deep down in the parse tree to instantiate a feature on a certain category would in general cause changes to the membership of categories in remote parts of the tree.</Paragraph> <Paragraph position="22"> Phase 2. The &quot;parse tree&quot; that is generated by Earley's algorithm is in general not a tree at all; it is a directed graph* Besides non-terminal nodes and terminal nodes, the graph will in general contain branching nodes that point to alternative daughters of a non-terminal node. It is by this means that multiple parses, arising from an ambiguous sentence, are represented* If the degree of ambiguity of the sentence with respect to the skeleton grammar is infinite, the finite graph must represent infinitely many distinct parse trees; in this case the graph is cyclic. We assume that the degree of ambiguity is finite, in which case the graph is a directed acyclic graph (DAG). A DAG differs from a tree in that whereas each node in a tree (except the root) has precisely one parent, a node in a DAG may have more than one parent* In other words, a DAG represents common sub-trees only once; a single sub-tree may be descended from several parents. DAGs are often used in the construction of compilers for computer programming languages.</Paragraph> <Paragraph position="23"> Let p be the degree of ambiguity of the skeleton grammar, i.e., the number of distinct parse trees represented by the DAG. We expand the DAG, generating p distinct parse trees* This can easily be done by means of conventional tree processing techniques, provided that p is finite, in other words if the graph is acyclic.</Paragraph> <Paragraph position="24"> Phase 3. Each distinct parse tree is examined in turn.</Paragraph> <Paragraph position="25"> For each tree, sufficient features are added to the categories on each node of the tree to cause the tree to reflect a parse according to the original GPSG G. This entails the evaluation of F and of the propagation constraints on each category, and the construction of a category on each node which satisfies all of the constraints. Once again, the smallest possible category is constructed. That is, if a category c satisfies all of the constraints, and so does a larger category c U c', we choose c. It is debatable whether this is the correct behaviour; some might argue that separate parse trees ought to be constructed in which all possible legal extensions are shown. However, the resulting set of parse trees would then in general be very large, and it is difficult to believe that this behaviour is desirable. Our smallest category is similar to the most general unifier of a set of expressions in mathematical logic; as in logic, particular, less general instances can be derived from the most general case, but it is the most general (least fully specified) case that is of most interest.</Paragraph> <Paragraph position="26"> We assume that the FCR set F is expressed in clausal form and that each clause (i.e., each disjunction) is a Horn clause (a clause with either zero or one unnegated literal; see, e.g., Loveland (1978:99)). Number the clauses F l ..... F M.</Paragraph> <Paragraph position="27"> We denote by M(N) the mother of the node N, if it exists (i.e. unless N is the root of its tree). We denote by C(N) the category on the node N.</Paragraph> <Paragraph position="28"> Let the distinct parse trees produced by phase 2 be T I ..... Tp. The algorithm unify, defined below, is applied to each T i in turn, for i = 1 ..... p.</Paragraph> <Paragraph position="29"> unify (T): Let the non-terminal nodes in T be NI,</Paragraph> <Paragraph position="31"> for k = 1 ..... /M r do let f+ be the set of unnegated literals in let f_ be the set of negated literals in Fk;</Paragraph> <Paragraph position="33"> t~ again then go to step 1; output T.</Paragraph> <Paragraph position="34"> Proof of the algorithm. First notice that the flag again is set (in steps 2.1.2, 2.2.2, and 3.1.3.2.2) whenever a feature is added to a category that is not already in that category, and at no other time. Since there are only finitely many features, steps 1 to 4 are repeated only finitely many times, so the procedure terminates.</Paragraph> <Paragraph position="35"> Computational Linguistics, Volume 15, Number 3, September 1989 143 Anthony J. Fisher Practical Parsing of Generalized Phrase Structure Grammars Next observe that on successful termination, again is false, so steps 2.1 and 2.2 must have been obeyed for each node with the conditions in 2.1 and 2.2 false each time. Consequently, on successful termination, the propagation constraints h01d for each node.</Paragraph> <Paragraph position="36"> Finally, on successful termination, the FCR set F also holds for each node, for the following reasons. Step 3.1.3 checks the negated literals in the clause F k against the category C(Nj.). If the condition in 3.1.3 is false, there is at least one negated literal in F k which is indeed absent from C(Nj), so the clause F k is satisfied. If, on the other hand, the condition in 3.1.3 is true, none of the negated literals can possibly be satisfied, since features may not be removed from a category, only added. Since Fk is a Horn clause, there is at most one unnegated literal in F k, so f/ is either empty or has one member. If f/ is empty, the clause can not be satisfied, so the algorithm fails. If f/ is not empty, the feature is added to the category if it is not already there, and the clause is thereby satisfied. The addition of the new feature might invalidate previously satisfied clauses or propagation constraints, so the flag again is set which causes the propagation constraints and FCR clauses to be checked afresh. As noted, the process will eventually terminate with all propagation constraints and all FCR clauses satisfied, or else the algorithm will fail, in which case the sentence does not belong to the language generated by the original grammar G.</Paragraph> <Paragraph position="37"> End of proof.</Paragraph> <Paragraph position="38"> Now consider what would happen if one of the clauses were not a Horn clause. The algorithm would not know which of the several features from f/ to add in step 3.1.3.2.1 in order to satisfy the clause. The only solution would seem to be to generate copies of the parse tree, and to follow through each choice of feature from f/ on a different copy of the parse tree, finally presenting the user of the parsing system with all of the parse trees. This would cause a combinatorial explosion, since the splitting and copying would have to be done at each level of the parse tree at which the particular feature in question is instantiated.</Paragraph> <Paragraph position="39"> The linguistic consequences of the Horn clause restriction are not clear, but experience with the parsing system suggests that they are not severe. The Horn clause restriction prohibits the grammar writer from writing FCRs such as \[PRD +\] A \[VFORM\] D \[VFORM PAS\] v \[VFORM PRP\] (Gazdar et al. 1985:111), in which a disjunction of non-negated literals appears on the right of D . It is in such FCRs that the Horn clause restriction appears in its true colours, as a mechanism for curbing a combinatorial explosion or, to put it another way, a mechanism for prohibiting a source of non-determinism. If the consequences of forbidding such FCRs later appear too severe, the possibility will be investigated of moving the non-determinism from the FCRs into the rules of the grammar, by replacing an FCR like the one above by a new FCR \[PRI) +~ A \[VFORM\] D \[F\] where F is a new feature, and adding appropriate rules to the grammar. The details of this have yet to be worked out; it is presented as a possible solution to a problem that has not yet arisen.</Paragraph> <Paragraph position="40"> Time and space bounds. The following parameters are relevant to a consideration of time and space bounds for the algorithm: Earley's algorithm, as is well known, operates in time order G2n 3. Earley's proof of the time complexity of his algorithm (Earley 1970) is in no way affected by the elaboration of the predictor to handle feature matching. In particular, the number of states in a state set does not increase with K. Although K features may in principle be combined to construct 2 K different categories, the algorithm generates new categories by extension only when they are required. In fact, if a new feature specification is added to a rule that is previously unspecified for that feature, the state sets will either remain the same size or become smaller, since adding a feature restricts the range of rules that the rule in question will &quot;match&quot;. Speaking informally, it is under-specified rules that cause the problems; the more fully specified are the rules, the closer is the GPSG to a CFG, and the fewer are the states that are needed.</Paragraph> <Paragraph position="41"> The factor K does, however, enter into the time bound for phase 1 in the following manner. Although the number of &quot;primitive steps&quot; (Earley's terminology) that are executed by the modified algorithm is independent of K, the time taken to complete certain primitive steps, in particularly the addition of a state to a state set and the feature matching operation in the predictor, is proportional to K. The overall time bound is therefore KG2n 3,, The expansion of the DAG to yield p distinct parse trees can be done by conventional tree processing techniques in time proportional to p, the number of nodes in each tree, and the size of a node (which affects the time taken to copy a node). This gives a bound of order pKGn 2 for phase 2.</Paragraph> <Paragraph position="42"> The algorithm unify contains three nested loops (steps 3 and 3.1, and the again loop). An upper bound on the number of nodes is a constant times Gn 2, and an upper bound on the number of times round the again loop i,; the number of features, K. To simplify the analysis, we take M - G. (Formally, we define G to be the sum of the number of rules in the grammar and the number of clauses in the FCR set.) Moreover, the set operaffon C in step 3.1.3 can realistically be expected to 144 Computational Linguistics, Volume 15, Number 3, September 1989 Anthony J. Fisher Practical Parsing of Generalized Phrase Structure Grammars take time proportional to K, although the operations involving f+ can be done in constant time, since f+ has either zero or one member. A time bound for unify is therefore g2G2n 2. Finally, unify is obeyed p times, which gives a time bound for phase 3 of order pK2G2n 2.</Paragraph> <Paragraph position="43"> It is unfortunate that, as noted earlier, p is not in general independent of n. To see why this is so, consider the following grammar.</Paragraph> <Paragraph position="44"> It has been shown (Church and Patil 1982) that the number of distinct trees generated, for a sentence of length n, grows factorially with n. This means that the algorithm as a whole will take factorial time to parse a sentence of length n according to this grammar. This is a matter of concern, because constructions similar to this example are commonly used to handle coordination. It is even possible in principle for p to be infinite, in which case the algorithm will not terminate (although the advertised time bound ofpK2G2n 3 will still hold!). In practice, however, no grammar has been encountered which unavoidably has infinite p. (Self-referential rules of the form X ----> X have occasionally appeared, but these were always traced to an error in the grammar.) Considerable effort has been expended in an attempt to improve the theoretical worst-case performance of the algorithm when p is a finite valued but rapidly increasing function of n. It might be possible to combine phases 2 and 3, employing &quot;lazy evaluation&quot; (a technique often used in functional programming) to expand the DAG only when necessary. If this were done, much of the DAG might remain unexpanded, with consequent savings in time and space. The problem with this approach is that some features are required to percolate right to the root of their tree, and a given branching point might have different (and incompatible) features percolating to it from each of its alternative descendants. It turns out to be often necessary to expand the DAG all the way back to the root, in which case little is saved by using lazy evaluation. It is worth pointing out that, in cases (such as the example) in which the algorithm is least efficient, the output is often very large, consisting of many parse trees. In many (but not all) of these cases, the time taken is asymptotically linear in the length of the output, i.e. the number of nodes in the set of parse trees displayed. Surely, no algorithm can ever behave sublinearly on the length of its output. Furthermore, as discussed later, in cases in which this problem does not arise, the execution time is dominated by phase 1. We therefore have an algorithm that: * behaves as well as one of the best general CF parsing algorithms, for all unambiguous grammars and for many ambiguous grammars; * takes time that is linear in the length of the output for some &quot;problem&quot; grammars; and * takes a very long time in a small number of really awkward cases.</Paragraph> <Paragraph position="45"> The time bounds for the three phases are KG2n 3, pKGn 2, and pK2G2n 2. This gives an overall worst-case time bound of order pK2G2n 3.</Paragraph> <Paragraph position="46"> The space bound is of order pKGn 3 in the worst case, for the following reasons. Earley's algorithm requires space proportional to KGn 2 to hold the states, n for the state sets (that is, the list-processing overhead), and KGn 3 for the DAG. The grammar itself requires space proportional to KG. The p distinct parse trees require space proportional to KGn 2 each. Phase 3 does not require any working storage. The worst-case space bound is therefore of order pKGn 3.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 IMPLEMENTATION </SectionTitle> <Paragraph position="0"> A practical GPSG parsing system has been constructed, based on the algorithm just described. The system comprises a table generator and a parser. The system was originally written in the programming language BCPL, and ran on a VAX 780 computer under the Unix operating system. The system has recently been re-implemented in C to run on a Sun 3/50 workstation. The Sun version generally runs several times faster than the VAX version. The parse times given below relate to the slower VAX implementation.</Paragraph> <Paragraph position="1"> The table generator performs the precompilation phase of the algorithm. It generates a tabular representation of the skeleton grammar, which the parser can interpret more efficiently than it could the &quot;raw&quot; rules, and it converts the FCR set into clausal form. The table generator also performs various checks to ensure, as far as possible, that the grammar is well formed. Besides the obvious syntactic checks (to detect such errors as a comma in the wrong place), the table generator checks that the FCR set is not identically false, that there are no obvious blind alleys or non-reachable categories (this is not checked rigorously), and that various other subtle &quot;well-formedness&quot; conditions are satisfied. This error checking has proved very useful in practice, since GPSGs are notoriously difficult to debug.</Paragraph> <Paragraph position="2"> The input grammar is written in the notation of Gazdar et al. (1985), with a few concessions to the limitations of the typical computer input device. In particular, features have values, and what we have referred to as a feature is, in the notation accepted by the table generator, a feature-value pair, written If v\]. Each distinct feature value pair is associated by the table generator with a particular bit in a computer word.</Paragraph> <Paragraph position="3"> A category is represented by a set of bits, i.e., by a word with several bits set, one for each feature-value pair in the set. Category valued features correspond to trees, and a distinct bit is allocated to each terminal node of the tree. For example, the feature PAST, with two Computational Linguistics, Volume 15, Number 3, September 1989 145 Anthony J. Fisher \]Practical Parsing of Generalized Phrase Structure Grammars values + and -, would have two bits allocated to it, and for a category valued feature SLASH, the values \[N +, V -\] and \[N-, V +\] would be allocated four bits. Note that, in any given grammar, the depth of the tree induced by a category valued feature is finite; furthermore, the range of possible values of a category valued feature is known at table generation time, so it is known at this stage how many bits to allocate to the feature.</Paragraph> <Paragraph position="4"> The representation of feature-value pairs by bit positions in a computer word allows the very efficient logical instructions of the computer (N, t.J, -7 ), which operate on a whole word of bits at a time, to be used.</Paragraph> <Paragraph position="5"> As explained earlier, a feature in GPSG 85 may take at most one value at a time, since a GPSG 85 feature is in fact a function. This restriction is expressed by conjoining an FCR, known as a group FCR, to the FCR set. For example, if the grammar contains the twovalued feature PAST referred to above, the FCR -a(\[PAST +\]/k \[PAST -\]) would be conjoined to the FCR set. In general, the presence of an n-valued feature f, with values v I .....</Paragraph> <Paragraph position="6"> vn, entails the addition of the FCRs &quot;&quot;l(\[f Vii /~ If Vj'\]) for each i, j = 1 ..... n, i < j. When converted to clausal form these FCRs become the n(n- 1)/2 clauses ~\[f vi\] ~/&quot;7\[\]&quot; vj\] for each i, j = 1 ..... n, i < j whose inclusion in the set of clauses presented to the parser would make the table very large. Consequently, these group clauses are abbreviated. For each n-valued feature f with n - 2, a group mask is included in the parser table which has one bit set for each feature-value pair whose conjunction is to be prohibited. The parser checks these group masks whenever it consults the FCR clause set. If g is a group mask and c is a mask representing a category, the parser has only to check that (g N c) has not more than one bit set.</Paragraph> <Paragraph position="7"> It has been observed that, in practice, it is likely that the explicit FCR set supplied by the grammar writer will contain mostly non-side-effect-free clauses. However, the (notional) group FCRs are, by definition, sideeffect-free. Because of this, the algorithm is modified for implementation in the following way. The FCR set F' which is used in the definition of the skeleton grammar is taken to be just the set of notional group FCRs; any &quot;genuine&quot; FCRs, be they side-effect-free or not, are excluded from F'. Furthermore, the table generator ensures that the full FCR set F is satisfiedon each node at table generation time. For example, if the grammar contains the FCR \[NOM\] 3 \[NFORM NORM\], then a category \[NOM +\] occurring in a phrase structure rule would be rewritten by the table generator as \[NOM +~, NFORM NORM\]. These modifications increase the efficiency of the implementation, and enable certain errors to be detected at table generation time.</Paragraph> <Paragraph position="8"> In practice, most GPSGs closely resemble traditional CFGs, with most categories fully specified for the &quot;major&quot; features N, V, and perhaps BAR. Consequently, the group FCR constraints ensure that the skeleton grammar also resembles a traditional CFG, and is certainly not, in practice, massively ambiguous. Indeed, the table generator insists that categories in rules are written as X\[Y\], where X is a name (a traditional non-terminal), and Y is a category. The non-terminal X is defined (by the grammar writer) to stand for some set of major features. This convention is perhaps controversial, but Gazdar et al. (1985) is full of such rules, and the linguists who use the parsing system have not grumbled yet. The convention does allow the table generator to check the grammar more stringently than would otherwise be the case, and it enables the parser to be made considerably more efficient, by dividing the set of all categories (which must be searched by the predictor) into disjoint subsets. The convention has no theoretical significance; the program would work without it. The head feature convention. The grammar writer is able to denote certain non-terminals on the right-hand side of a rule as head non-terminals, which correspond to the head symbols of traditional X-bar syntax. This is done by prefixing the name of the non-terminal in the rule by a star. The percolation and trickling of features can be restricted to occur only between a mother and a head daughter. There are thus nine possible propagation behaviours for any feature: one of not trickling trickling, but only to head daughters trickling to all daughters together with one of not percolating percolating, but only from head daughters percolating from any daughter The head feature convention is simulated by defining head fi~atures to trickle, but only to heads. This is adequate in most situations, but it falls short of the behaviour postulated by Gazdar et al. (1985:94ff). In particular, the notion of free feature specification sets is not accommodated. This causes problems in, for example, the treatment of conjunctions, in which the conjoined constituents are conventionally all heads. GPSG 85 allows a rule that in our notation would be written NP: *NP \[CONJ and\], *NP \[CONJ NIL\].</Paragraph> <Paragraph position="9"> In the present implementation, any PER feature (for example) which happens to be present on the mother would trickle to both head daughters, thereby forcing agreement between the daughters. Our solution has been to make the daughters non-heads, which is unattractive;, but which has been made to work.</Paragraph> <Paragraph position="10"> The foot feature principle. The foot feature principle is more of a problem than the head feature convention. It 146 Computational Linguistics, Volume 15, Number 3, September 1989 Anthony J. Fisher Practical Parsing of Generalized Phrase Structure Grammars is clear that foot features ought to percolate, but the situation is more complicated than this. In a rule such as S: NP, S \[SLASH NP\] the SLASH feature (which is a foot feature) must be prevented from percolating from the node that is generated by extension from the right-hand-side S. This is achieved by forbidding any feature that has been declared to be a foot feature (e.g., SLASH and WH) to percolate from a node on which the feature appears by virtue of its appearance on the right-hand side of a phrase structure rule. This is easy to implement.</Paragraph> <Paragraph position="11"> This is only a partial solution to the problem, however. The rule given above correctly generates the telephone Carol tested.</Paragraph> <Paragraph position="12"> It is not possible, however, by this mechanism to prevent * the telephone Carol tested the telephone, in which &quot;Carol tested the telephone&quot; is correctly parsed as an S, but in which a SLASH NP specification is &quot;gratuitously&quot; instantiated in order to satisfy the extension conditions imposed by the rule given above.</Paragraph> <Paragraph position="13"> To solve this and other problems, a tree is now defined to be admissible only if each non-terminal node of the tree satisfies the foot condition, which is related to the original FFP of GPSG 85. The foot condition is defined as follows.</Paragraph> <Paragraph position="14"> Define a lexical node of a parse tree as a node that immediately dominates a terminal node. (A gap, which is explicitly denoted in the grammar by the word GAP, is a terminal node.) Define an interior node as a node that is neither terminal nor lexical. An interior node is said to meet the foot condition (FC) iffeach foot feature that it contains appears also on at least one daughter from which it can legally percolate. A lexical node is said to meet the FC iff each foot feature that it contains appears also on the left-hand side of the lexical rule that gave rise to the lexical node.</Paragraph> <Paragraph position="15"> This definition implies that the FC cannot cause the instantiation of any features. In this respect, the FC differs from the propagation conventions, which add the necessary features to make the conditions hold. The FC mechanism operates on the tree as it is after FCRs and propagation conditions have been enforced. It does not alter the tree; it merely checks that the foot condition is true on each node. Note that all of this follows from the definition. It is not necessary to put forward a procedural definition of the FC, which would fit ill with the non-procedural definition of GPSG. In contrast to the GPSG 85 FFP, the FC readily permits a straightforward, efficient, and deterministic implementation. The control agreement principle. A mechanism has been provided for specifying horizontal propagation of features in a way similar to that implied by the control agreement principle of Gazdar et al. (1985). Sister categories in a rule may be designated control sisters (by prefixing the name of the non-terminal by a dollar). A set of control features is defined by the grammar writer, analogous to the sets of trickling and percolating features. Each non-terminal node N has associated with it an extra node N'. If a node No has daughters NI .....</Paragraph> <Paragraph position="16"> N n, then N o' is called the stepmother of each N l ..... Nn. If N i is a control sister, then any control features in C(N;) are required to percolate to the stepmother, and any features on the stepmother are required to trickle to each stepdaughter that is a control sister. The effect is that control features present on a control sister are forced to appear on each other control sister (which has the same mother).</Paragraph> <Paragraph position="17"> One consequence of this modified CAP is that agreement is mutual, or bidirectional, whereas in the CAP of GPSG 85 it is unidirectional. Another consequence is that, in the present implementation, it is impossible by these means to express agreement between (for example) the daughter NP and the NP &quot;under the slash&quot; in S: NP, S \[SLASH NP\].</Paragraph> <Paragraph position="18"> This has not yet proved to be a problem; such agreement can easily be accommodated by defining appropriate propagation constraints for those features (such as PER, PLU and NFORM) that must agree.</Paragraph> <Paragraph position="19"> Metarules. Despite the misgivings expressed earlier concerning the possible exponential growth in grammar size, a form of metarule mechanism has been incorporated. Metarules are implemented by precompilation by the table generator. In fact, there is a separate metarule preprocessing program, called metagee, which runs as a Unix filter, passing the expanded set of rules to the table generator proper. It would be possible to process separated ID/LP rules by means of a similar preprocessor. This has not been done.</Paragraph> <Paragraph position="20"> Form of the parser table. The output from the table generator, the table which is interpreted by the parser, comprises: * an encoded list of rules, with a pointer from each occurrence of a non-terminal on the right-hand side of a rule to a list of rules with matching left-hand side non-terminals; * an encoded lexicon; * a list of non-terminal names, feature names and feature value names; * a set of group masks; * a set of FCR clauses, each comprising two bit masks.</Paragraph> <Paragraph position="21"> One mask (word of bits) represents the category f/ and the other represents the category f_.</Paragraph> <Paragraph position="22"> Performance. The parsing system has been tested with a grammar for a subset of English. The grammar contains 512 rules after metarule expansion, comprising 228 non-lexical rules and 284 lexical rules. There are 107 feature value pairs. There are 18 FCRs, which, when converted to clausal form, yield only 39 clauses. The size of the parser table is about 63,000 bytes, about 93% of which is occupied by the encoded rules. The remaining 7% (4,500 bytes) comprises the tables of bit masks Computational Linguistics, Volume 15, Number 3, September 1989 147 Anthony J. Fisher Practical Parsing of Generalized Phrase Structure Grammars that represent the FCRs and the propagation masks, a table of non-terminal names, and the lexicon. The table generator takes about two minutes to compile the grammar.</Paragraph> <Paragraph position="23"> shows, for simple short sentences (unambiguous sentences of fewer than 15 words), phase 1 consistently takes more time than phases 2 and 3 together. For sentences of moderate ambiguity, the times for phase 1 and phase 2+3 are comparable. The 15-word sentence for which a time is given in the table is which number ought Carol to have dialed on the telephone the happy engineer was testing? which, the parser correctly reports, is ambiguous (it has two parses). Phase 1 yields a DAG that represents four parses. Phase 2 expands this into four distinct trees, two of which are then ruled out by phase 3. The figures for phase 2+3 include the time taken to format and print the trees, which for the longer sentences is not insignificant, amounting to almost half of the processing time for the 15-word sentence.</Paragraph> </Section> class="xml-element"></Paper>