File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/j01-2005_metho.xml
Size: 13,553 bytes
Last Modified: 2025-10-06 14:07:33
<?xml version="1.0" standalone="yes"?> <Paper uid="J01-2005"> <Title>Squibs and Discussions Nonminimal Derivations in Unification-based Parsing</Title> <Section position="3" start_page="278" end_page="279" type="metho"> <SectionTitle> 3. The Abstract Parsing Algorithm </SectionTitle> <Paragraph position="0"> Based on the logic described above, Shieber defines an abstract parsing algorithm as a set of four logical deduction rules. Each rule derives a new item, from previous items and/or productions in the grammar. An item is a 5-tuple {i,j, p, M, d), where i and j are indices into the sentence and specify which words in the sentence have been used to construct the item; p is the production used to construct the item; M is a model; and d is the position of the &quot;dot&quot;; i.e., how many subconstituents in p have been completed so far.</Paragraph> <Paragraph position="1"> The logical rules of the abstract algorithm are shown in Figure 2. The Initial Item rule produces the first item, and is constructed from the start production P0. It spans none of the input (i and j are both 0), and its model is the minimal model (ram) of P0.</Paragraph> <Paragraph position="2"> The Prediction rule is essentially the top-down rewriting of the expectation (a subconstituent just after the dot) in a prior item. In this rule, the extraction of M/(d + 1 / retrieves the d + 1st submodel in M (i.e., expectation). The function p, which is left underspecified as a parameter in the abstract algorithm, filters out some features predefined in the various instantiations of the algorithm. Here, it is applied to the expectation, by which it effectively controls the top-down predictive power of the Computational Linguistics Volume 27, Number 2 INITIAL ITEM: {O,O, po, mm(~o),O) PREDICTION: SCANNING: li, j,p = la, ~l,M,d) (j,j, p', p(M/(d+l)) t3 mm(~'), 0) ' where d K a and p' = (a',O') * P (i,j,p = (a, ~},M,d} {i,j+lip, M t_l (mm(~2') \ {d+l)),d+l} ' where d < a and (wj+l, O'} * P COMPLETION: li'j'P = la' ~l'M'd) (j,k,p' = (a',/I~'),M',a' / where d < a I {i, kip, M El (M' \ {d+l) ),d+l) Figure 2 Shieber's parsing operations.</Paragraph> <Paragraph position="4"> Items produced in the parse of John sleeps, and the final parse.</Paragraph> <Paragraph position="5"> algorithm and provides flexibility to the instantiated algorithms. Then the expectation is unified with a production (~'), which can consistently rewrite it. By this operation, some features in the expectation may be propagated down in the production.</Paragraph> <Paragraph position="6"> The remaining two rules advance the dot in a prior item, by unifying the sub-constituent to the right of the dot with either a lexical item from the input string (the Scanning rule) or some other completed higher-level item (the Completion rule). Both rules perform the correct unification by utilizing the embedding operator (signified by \), which places a model M under a path p (M\p).</Paragraph> <Paragraph position="7"> We illustrate these operators with a simple step-by-step example parse. Consider the grammar that consists of the rules presented in Figure 1. Using this grammar, Figure 3 shows the parse of the sentence John sleeps. First, the Initial Item operator is applied, producing item I0, whose model is mm(~o). Next, the Scanning operator scans the word John, producing 11. The Prediction operator then produces 12. Next, the word sleeps is scanned (since the first subconstituent of the model in 12 is a V), producing 13. Finally, since the item in 13 is complete (d = 1, the arity of production p2), Completion is applied to items 11 and/3, producing 14. Model M4 is the final parse of the sentence.</Paragraph> </Section> <Section position="4" start_page="279" end_page="282" type="metho"> <SectionTitle> 4. Nonminimal Derivations </SectionTitle> <Paragraph position="0"> In Section 2, we noted that Shieber's definition of parse trees allows them to be nonminimal. We consider these to be invalid based on a principle that, since the unification operation as set union preserves minimality (as proved in Shieber, \[1992\]), repeated applications of unification using licensing productions should result in parses that contain features only from those productions and nothing more. In this section, we Tomuro and Lytinen Nonminimal Derivations</Paragraph> <Paragraph position="2"> A phrasal production that results in a nonminimal derivation.</Paragraph> <Paragraph position="4"> Figure 5 Nonminimal derivation of John sleeps. formally define minimal and nonminimal parse trees, and show an example in which nonminimal parse trees are produced by Shieber's algorithm. Our definition of minimal parse tree is to a large extent similar to Shieber's definition, but to ensure minimality, our definition uses the equality relation instead of D, and inductively specifies a minimal parse tree bottom-up.</Paragraph> <Paragraph position="5"> Definition Given a grammar G, a minimal parse tree r admitted by G is a model that is a member of the infinite union of sets of bounded-depth parse trees 11' = Oi>0 IIl, where each</Paragraph> <Paragraph position="7"> For each lexical production p = (w, ~b) E G, mm(~) E 11'o.</Paragraph> <Paragraph position="8"> For each phrasal production p = (a, ~} E G, let rl ..... ra E Uj<i I1;. If</Paragraph> <Paragraph position="10"> It is obvious that 1I' is a subset of 17 in Shieber's definition. Then, a nonminimal parse tree is defined as a model that is a member of the difference of the two sets (II - 1I'). 3 Here is a simple example in which a nonminimal parse is produced in Shieber's algorithm. Say that we add the production in Figure 4 to the grammar in the previous section. The intent of this production is to mark the verb with the feature modified if an adverb follows. Using this grammar, Shieber's algorithm will produce a nonminimal parse for the sentence John sleeps, in addition to the minimal parse shown in the previous section. 4 The nonminimal parse, shown in Figure 5, arises as follows: after scanning John, Prediction can produce items I~ and I~', first using production p4 (thus inserting /head modified} - true into the model), and then P2. Scanning the word 3 Note that using subsumption (which we will discuss in Section 5) here does not work, for instance by saying &quot;a model r&quot; is a nonminimal parse tree if r&quot; E 17 and there exists r' E II such that r' _< r&quot;&quot;, because some r&quot;'s are minimal. See the example in Section 5.</Paragraph> <Paragraph position="11"> 4 Here, we are assuming that the filtering function/9 is the identity function.</Paragraph> <Paragraph position="12"> Computational Linguistics Volume 27, Number 2 sleeps then produces I~ from I~ I. Completion then can be applied directly to 11 and 11 by skipping a completion using I~ and I~, thereby producing item I~. The feature modified remains in I~, even though an adverb was never encountered in the sentence. The final parse M~, shown in Figure 5, is clearly nonminimal according to our definition because of this feature.</Paragraph> <Paragraph position="13"> Note that the example grammar can be changed to prevent the nonminimal parse, by moving the feature modified off of the head path in ff~4 (i.e., (modified / - true instead of (head modified / - true), s However, the point of the example is not to argue whether or not well-designed grammars will produce erroneous parses. A formally defined parser (see the discussion below) should in principle produce correct parses regardless of the grammar used; otherwise, the grammar formalism (i.e., Shieber's logic for unification grammars) must be revised and properly constrained to allow only the kinds of productions with which the parser produces correct results.</Paragraph> <Paragraph position="14"> In general, nonminimal derivations may arise whenever two or more predictions that are not mutually exclusive can be produced at the same point in the sentence; i.e., two prediction items (i, i, p, M, 0 / and (i, i, p', M ~, 0 / are produced such that M M / and M and M ~ are unifiable. In the example, items 12 = (1,1, p2, M2, 0/ and I~ -(1,1, P4, M~, 0) (as well as I2 and I~ ~ = (1,1, p2, M~ ~, 0/) are two such items. Since the two predictions did not have any conflicting features from the beginning, a situation may occur where a completion generated from one prediction can fill the other prediction without causing conflict. When this happens, features that were in the other prediction but not the original one become nonminimal in the resulting model.</Paragraph> <Paragraph position="15"> As to what causes nonminimal situations, we speculate that there are a number of possibilRies. First, nonminimal derivations occur when a prediction is filled by a complete item that was not generated from the prediction. This mismatch will not happen if parsing is done in one direction only (e.g. purely top-down or bottom-up parsing). Thus, the mixed-direction parsing strategy is a contributing factor.</Paragraph> <Paragraph position="16"> Second, wrong complete items are retrieved because Shieber's item-based algorithm makes all partial results available during parsing, as if they are kept in a global structure (such as a chart in chart parsing). But if the accessibility of items were somehow restricted, prediction-completion mismatch would not happen. In this respect, other chart-based algorithms for unification grammars which adopt mixed-direction parsing strategy, including head-corner parsing (van Noord 1997) and left-corner parsing (Alshawi 1992), are subject to the same problem.</Paragraph> <Paragraph position="17"> Third, extra features can only appear when the grammar contains rules which interact in a certain way (such as rules P2 and P4 above). If the grammar contained no such rules, or if p (the filtering function applied in Prediction) filtered out those features, even the prediction-completion mismatch would not produce nonminimal derivations.</Paragraph> <Paragraph position="18"> As we stated in the beginning of this section, we consider nonminimal parses to be invalid on the basis of minimality. It then immediately follows that any parsing algorithm that produces nonminimal parses is considered to be unsound; in particular, Shieber's algorithm is unsound. However, since nonminimal parse trees have the same yield as their minimal counterparts, his algorithm does indeed recognize exactly the language of a given grammar. So, Shieber's algorithm is sound as a recognizer, 6 but not as a transducer or parser (as in van Noord, \[1997\]) where the correctness of output models (i.e., parse trees) is critical. In other words, Shieber's algorithm is correct up to 5 Note that adding (head modified) -- false to ~2 (VP --* V) or ~3 (sleeps) is not feasible, because they cannot specify the modified feature at their level, 6 In fact, Shieber hints at this: &quot;The process of parsing (more properly, recognition)...&quot; (Shieber 1992, 78). Tomuro and Lytinen Nonminimal Derivations licensing, but incorrect on the basis of a stronger criteria of minimality. Thus, to guarantee correctness based on minimality, we need another algorithm; such an algorithm is exactly the solution to the nonminimal derivation problem.</Paragraph> </Section> <Section position="5" start_page="282" end_page="282" type="metho"> <SectionTitle> 5. Practical Techniques </SectionTitle> <Paragraph position="0"> Before presenting our solution to the nonminimal derivation problem, we discuss several possible practical techniques to get around the problem in implemented systems. These are known techniques, which have been applied to solve other problems in unification-based systems. However, most of them only offer partial solutions to the nonminimal derivation problem. First, whenever Shieber's algorithm produces a nonminimal derivation, it also produces a corresponding minimal derivation (Tomuro 1999). Thus, one possible solution is to use subsumption to discard items that are more specific than any other items that are produced. Subsumption has often been used in unification-based systems to pack items or models (e.g., Alshawi 1992). However, simple subsumption may filter out valid parses for some grammars, thus sacrificing completeness. 7 Another possibility is to filter out problematic features in the Prediction step by using the function p. However, automatic detection of such features (i.e., automatic derivation of p) is undecidable for the same reason as the prediction nontermination problem (caused by left recursion) for unification grammars (Shieber 1985). Manual detection is also problematic: when a grammar is large, particularly if semantic features are included, complete detection is nearly impossible. As for the techniques developed so far which (partially) solve prediction nontermination (e.g., Shieber 1985; Haas 1989; Samuelsson 1993), they do not apply to nonminimal derivations because nonminimal derivations may arise without left recursion or recursion in general s One way is to define p to filter out all features except the context-free backbone of predictions. However, this severely restricts the range of possible instantiations of Shieber's algorithm. 9 A third possibility is to manually fix the grammar so that nonminimal derivations do not occur, as we noted in Section 4. However, this approach is problematic for the same reason as the manual derivation of p mentioned above.</Paragraph> </Section> class="xml-element"></Paper>