File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/83/p83-1021_metho.xml

Size: 26,603 bytes

Last Modified: 2025-10-06 14:11:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="P83-1021">
  <Title>PARSING AS DEDUCTION l</Title>
  <Section position="4" start_page="0" end_page="138" type="metho">
    <SectionTitle>
2. Basic Notions
2.1. Definite Clauses
</SectionTitle>
    <Paragraph position="0"> A definite clause has the form P:Q~&amp;... &amp;Q..</Paragraph>
    <Paragraph position="1"> to be read as &amp;quot;P is true if Q1 and ... and Qa are true&amp;quot;. If n --~ 0, the clause is a unit clause and is written simply as P.</Paragraph>
    <Paragraph position="2"> P and QI ..... Qn are literals. P is the positive literal or head of the clause; Ql .... , Qn are the negative literals, forming the body of the clause. Literals have the forn~ pit I ..... tk), where p is the predicate of arity k and the t i the arguments. The arguments are terms. A term may be: a variable {variable names start with capital letters); a constant; a compound term J~tl,...,t m) where f is a functor of arit$ m and the t i are terms. All the variables in a clause are implicitly universally quantified.</Paragraph>
    <Paragraph position="3"> A set of definite clauses forms a program, and the clauses in a program are called input clauses. A program defines the relations denoted by the predicates appearing in the heads of clauses. When using a definite-clause proof procedure, such as Prolog (Roussel. 1975), a goal statement requests the proof procedure to find provable instances of P.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2. Definite Clause Grammars
</SectionTitle>
      <Paragraph position="0"> Any context-free rule i'~or 1 ... O n can be translated into a definite clause</Paragraph>
      <Paragraph position="2"> The variables S i are the string arguments, representing positions m the input string. For example, the context-free rule &amp;quot;S ~ NP VP&amp;quot; is translated into &amp;quot;s(S0,S2) np{,qO.Sl} k&amp;quot; vp(S1,S2),&amp;quot; which can be paraphrased as &amp;quot;'there is an S from SO to $2 in the input string if there is an NP from SO to S1 and a V'P from S1 to 82.&amp;quot; Given the translation of a context-free grammar G with start symbol S into a set of definite clauses G&amp;quot; with corresponding predicate s, to say that a string w is in the grammar's language is equivalent to saying that the start goal S{po,pj is a consequence of G&amp;quot; U W, where Po and p represent the left and right endpoints of u,, and W is a set of unit clauses that represents w.</Paragraph>
      <Paragraph position="3"> It is easy to generalize the above notions to define DCGs. DCG nonterminals have arguments in the same way that predicates do. A DCG nonterminal with u arguments is translated into a predicate of n+2 arguments, the last two of which are the string points, as in the translation of context-free rules into definite clauses. The context-free grammar obtained from a DCG by dropping all nonterminal arguments is the context-free skeleton of the DCG.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="138" type="sub_section">
      <SectionTitle>
2.3. Dedu.ction in Definite Clauses
</SectionTitle>
      <Paragraph position="0"> The fundamental inference rule for definite clauses is the following resolution rule: From the clauses  B C/= A l PS: ... &amp; A m . (l) C: D 1 &amp; ,.. &amp; D i &amp; ... &amp; D n. (2} when B and D i are unifiable by substitution a, infer</Paragraph>
      <Paragraph position="2"> (2).</Paragraph>
      <Paragraph position="3"> The proof procedure of Prolog is just a particular embedding of the resolution rule in a search procedure, in which a goal clause like (2) is successively rewritten by the res,qution rule using clauses from the program (1). The Prolog proof procedure can be implemented very efficiently, but it has the same theoretical problems of the top-dC/.wn backtrack parsing algorithms after which it is motif?led. These problems do not preclude its use for creating uniquely efficient parsers for suitably constructed grammars (Warren and Pereira, 1983: Pereira, 1982), but the broader questions of the relation between parsing and deduction and of the derivation of online parsing algorithms for unification formalisms require that we look at a more generally applicable class of proof procedures.</Paragraph>
      <Paragraph position="4"> 2.4. Chart Parsing and the Earley Algorithm Chart parsing is a general framework for constructing parsing algorithms for context-free grammars and related formalisms. The Earley context-free parsing algorithm, although independently developed, can be seen as a particular case ,)f chart parsing. We will give here just the basic terminolog-y of chart parsing and of the Eartey algorithm. Full accounts can be found in the articles by Kay (Kay. l.qS0} and Earley/Earley, 1970).</Paragraph>
      <Paragraph position="5"> The state of a chart parser is represented by the chart.</Paragraph>
      <Paragraph position="6"> which is a directed graph. The nodes of the chart represent positions in the string being analyzed. Each odge in Ihe chart is either active or passive. Both types of edges are labeled. A passive edge with label ,V links node r to node .~ if the string between r and s h,~ been analyzed as a phr,'tse of type N. Initially, the only edges are passive edges that link consecutive nodes and are labeh,d with Ihe words of the input string (see Figure I}. Active edges represent partially applied grammar rules.</Paragraph>
      <Paragraph position="7"> In the siml)le~.t case, active edges are labeled by dotted rules. A dolled rule is a grammar rule with a dot inserted some~vhcre on its right-hand side X--- % ... ~i-I * ~i-'&amp;quot; % {4) An edge with this label links node r to node s if the sentential form ~! ... o%1 is an analysis of the input string between r and s. An active edge that links a node to  itself is called empty and acts like a top-down prediction. Chart-parsing procedures start with a chart containing the passive edges for the input string. New edges are added in two distinct ways. First, an active edge from r to s labeled with a dotted rule {4) combines with a passive edge from s to t with label a i to produce a new edge from r to t, which will be a passive edge with label X if a i is the last symbol in the right-hand side of the dotted rule; otherwise it will be an active edge with the dot advanced over cr i. Second, the parsing strategy must place into the chart, at appropriate points, new empty active edges that will be used to combine existing passive edges. The exact method used determines whether the parsing method is seen as top-down, bottom*up, or a combination of the two.</Paragraph>
      <Paragraph position="8"> The Earley parsing algorithm can be seen as a special case of chart parsing in which new empty active edges are introduced top-down and, for all k, the edge combinations involving only the first k nodes are done before any combinations that involve later nodes. This particular strategy allows certain simplifications to be made in the general algorithm.</Paragraph>
      <Paragraph position="9"> 3. DCGs and LFG We would like to make a few informal observations at this point, to clarify the relationship between DCGs and other unification grammar formalisms -- LFG in particular. A more detailed discussion would take us beyond the intended scope of this paper.</Paragraph>
      <Paragraph position="10"> The diffl,rcnt nolational conventions of DCGs and LFG make the two formalisms less similar on the surface than the), actually are from the computational point of view. The object~ that appear ,as arguments in DCG rules are tree fragments every node of which has a number of children predetermined by the functor that labels the node. Explicit variables mark unspecified parts of the tree. In contrast, the functional structure nodes that are implicitly mentioned in LFG equations do not have a pred(,fined number of children, and unspecified parts are either omitted or defined implicitly through equations.</Paragraph>
      <Paragraph position="11"> As a first approximation, a DCG rule such as s(s(Subj,Obj)) ~ np(Subj) vp(Obj} (5) might correspond to the LFG rule</Paragraph>
      <Paragraph position="13"> is an np with structure Subj followed by a vp with structure Obj.&amp;quot; The LFG rule can be read as &amp;quot;an S is an NP followed by a V'P, where the value of the subj attribute of the S is the functional structure of the NP and the value of the attribute obj of the S is the functional structure of the VP.&amp;quot; For those familiar with the details of the mapping from functional descriptions to functional structures in LFG, DCG variables are just &amp;quot;placeholder&amp;quot; symbols (Bresnan and Kaplan, 1982).</Paragraph>
      <Paragraph position="14"> As we noted above, an apparent difference between LFG and DCGs is that LFG functional structure nodes, unlike DCG function symbols, do not have a definite number of children. Although we mu~t leave to a separate paper the details of the application to LFG of the unification algorithms from theorem proving, we will note here that the formal properties of logical and LFG or UG unification are similar, and there are adaptations to LFG and UG of the algorithms and data structures used in the logical case.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="138" end_page="139" type="metho">
    <SectionTitle>
4. Earley Deduction
</SectionTitle>
    <Paragraph position="0"> The Earley Deduction proof procedure schema is named after Earley's context-free parsing algorithm (Earley, 1970), on which it is based Earley Deduction provides for definite clauses the same kind of mixed top-down bottom-up mechanism that the Earley parsing algorithm provides for context-free grammars.</Paragraph>
    <Paragraph position="1"> Earley Deduction operates on two sets of definite clauses called the program and the state. The program is just the set of input clauses and remains fixed. The state consists of a set of derived clauses, where each nonunit .:Iause has one of its negative literals selected; the state is continually being added to. Whenever a nonunit clause is added to the state, one of its negative literals is selected. Initially tile state contains just the goal statement (with one of its negative \[iterals selected}.</Paragraph>
    <Paragraph position="2"> There are two inference rules, called instantiation and reduction, which can map the current state into a new one by adding a new derived clause. For an instantiation step, there is some clause in the current state whose selected literal unifies with the positive literal of a ,onunit clause C in the program. In this case, the derived clause is a\[C\], where cr is a most general unifier (\[~obinson, 1965} of the two literals concerned. The selected literal is said to instantiate C to a\[C\].</Paragraph>
    <Paragraph position="3"> For a reduction step, there is some clause C in the current state whose selected literal unifies with a unit clause from either the program or the current state. In this case, tile derived clause is siC'l, where a is a most general unifier of the two Iiterals concerned, and C&amp;quot; is C minus its selected literal. Thus, the deriydd clause is just the res,)lvent of C with the unit clause and the latter is said to reduce C to a(C&amp;quot; I.</Paragraph>
    <Paragraph position="4"> Before a derived clause is added to the state, a check is made to see whether the derived clause is subsumed by any clause already in the state. \[f the derived clause is subsumed, it is not added to the state, and that inference step is said to be blocked.</Paragraph>
    <Paragraph position="5"> In the examples that follow, we assume that the selected literal in a derived clause is always the leftmost literal in the body. This choice is not optimal (Kowalski, 1980), but it is sufficient for our purposes.</Paragraph>
    <Paragraph position="6"> For example, given the program</Paragraph>
    <Paragraph position="8"> here is a sequence of clauses derived by Early Deduction</Paragraph>
    <Paragraph position="10"> At this point, all further steps are blocked, so the computation terminates.</Paragraph>
    <Paragraph position="11"> Earley Deduction generalizes Earley parsing in a direct and natural way. \[nstantiation is analogous to the &amp;quot;predictor&amp;quot; operation of Earley's algorithm, while reduction corresponds to the &amp;quot;scanner&amp;quot; and &amp;quot;completer&amp;quot; operations. The &amp;quot;scanner&amp;quot; operation amounts to reduction with an input unit clause representing a terminal symbol occurrence, while the &amp;quot;completer&amp;quot; operation amounts to reduction with a derived unit clause representing a nonterminal symbol occurrence.</Paragraph>
  </Section>
  <Section position="6" start_page="139" end_page="139" type="metho">
    <SectionTitle>
5. Chart Parsing and Earley Deduction
</SectionTitle>
    <Paragraph position="0"> Chart parsing {Kay, I980) and other tabular parsing algorithms (Aho and Ullman, 1972; Graham et al., I980) are usually presented in terms of certain (abstract) data structures that keep a record of the alternatives being explored by the parser. Looking at parsing procedures as proof procedures has the following advantages: (i) unification, ~aps and unbounded dependencies are automatically handled: (ii} parsing strategies become possible that cannot be formulated in chart parsing.</Paragraph>
    <Paragraph position="1"> The chart represents completed nonterminals {passive edges) and partially applied rules {active edges). From the standpoint of Earley Deduction, both represent derived clauses that have been proved in the course of an attempt to deduce a goal statement whose meaning is that a string belongs to the language generated by the grammar. An active edge corresponds to a nonunit clause, a passive edge to a unit clause. Nowhere in this definition is there mention of i.he &amp;quot;endpoints&amp;quot; of the edges. The endpoints correspond to certain literal arguments, and are of no concern to the (abstract) proof procedure. Endpoints are just a convenient way of indexing derived clauses in an implementalion to reduce the number of nonproductive (nonunifying) attempts at applying the reduction rule.</Paragraph>
    <Paragraph position="2"> We shall give now an example of the application of Earley Deduction to parsing, corresponding to the chart  corresponds to the following definite-clause program:</Paragraph>
    <Paragraph position="4"> The lexical categories of the sentence oAg ath~ 1 's2h usband3hit4 Ulrich s (26) can be represented by the unit clauses</Paragraph>
    <Paragraph position="6"> Thus. the t~k of determining whether (26) is a sentence can be represented by the goal statement ans ~ s(0.5). (32) If the sentence is in the language, the unit clause ass will be derived in the course of an Eariey Deduction proof. S.ch a pro(_)f could proceed as follows:</Paragraph>
    <Paragraph position="8"> Note how subsumption is used to curtail the left recursion of rules (21) and (22), by stopping extraneous instantiation steps from the derived clauses (35) and (36).</Paragraph>
    <Paragraph position="9"> As we have seen in the example of the previous section, this mechanism is a general one, capable of handling complex grammar symbols within certain constraints that will be discussed later.</Paragraph>
    <Paragraph position="10"> The Earley Deduction derivation given above corresponds directly to the chart in Figure 1.</Paragraph>
    <Paragraph position="11"> In general, chart parsing cannot support strategies that would create active edges by reducing the symbols in the right-hand side of a rule in any arbitrary order. This is because an active edge must correspond to a contiguous sequence of analyzed symbols. Definite clause proof procedures do not have this limitation. For example, it is very simple t.o define a strategy, &amp;quot;head word narC/,ng (NlgCord, 19801, which would use the&amp;quot; reduction rule to</Paragraph>
    <Paragraph position="13"> Each arc in tile chart is labeled with the number of a clause in the proof. In each clause that, corresponds to a chart arc, two literal arguments correspond to the two endpoints of the arc. These arguments have been underlined in the derivation. Notice how the endpoint arguments are tile two string arguments in the head for unit clauses {passive edges) but, in the case of nonunit clauses (passive edges), are the first string argument in the head and the first in the leftmost literal in the body.</Paragraph>
    <Paragraph position="14"> As we noted before, our view of parsing as deduction makes it possible to derive general parsing mechanisms for augmented phraso-structure grammars with gaps and unbounded dependencies. It is difficult (especially in the case of pure bottom-up parsing strategies} to augment chart parser~ to handle gaps and dependencies (Thompson, 1981}. However, if gaps and dependencies are specified by extra predicate arguments in the clauses that correspond to the rules, the general proof procedures will handle those phenomena without further change.</Paragraph>
    <Paragraph position="15"> This is the technique used in DCGs and is the basis of the specialized extra.position grammar formalism (Pereira, t081).</Paragraph>
    <Paragraph position="16"> The increased generality of our approach in the area of parsing strategy stems from the fact that chart parsing strategies correspond to specialized proof procedures for definite clauses with string arguments. In other words, the origin of these proof procedures means that string arguments are treated differently from other arguments, as they correspond to the chart nodes.</Paragraph>
    <Paragraph position="17"> from the clauses</Paragraph>
    <Paragraph position="19"> n(2,3).</Paragraph>
    <Paragraph position="20"> \[There is an N between points 2 and 3 in the input\] This example shows that the class of parsing strategies allowed in the deductive approach is broader than what is p,,ssible in the chart parsing approach. It remains to be shown which of those strategies will have practical importance as well.</Paragraph>
  </Section>
  <Section position="7" start_page="139" end_page="142" type="metho">
    <SectionTitle>
6. Implementing Earley Deduction
</SectionTitle>
    <Paragraph position="0"> To implement Earley Deduction with an efficiency comparable, say. to Prolog, presents some challenging problems. The main issues are *tlow to represent the derived clauses, especially the substitutions involved.</Paragraph>
    <Paragraph position="1"> * ttow to avoid the very heavy computational cost of subsunlption.</Paragraph>
    <Paragraph position="2"> * How to recognize when derived clauses are no longer 2This particular strategy could be implemented ia a chart parser, by changing the rules for combining edges but the generality demonstrated here would be lost.</Paragraph>
    <Paragraph position="3"> ihl needed and space can be recovered.</Paragraph>
    <Paragraph position="4"> There are two basic methods for representing derived clauses in resolution systems: the more direct copying method, in which substitutions are applied explicitly; the structure-shaelng method of Bayer and Moore, which avoids copying by representing derived clauses implicitly with the aid of variable binding environments. A promising strategy for Earley Deduction might be to use copying for derived unit clauses, structure sharing for other derived clauses. When copying, care should be taken not to copy variable-free subterms, but to copy just pointers to those subterrns instead.</Paragraph>
    <Paragraph position="5"> It is very costly to implement subsumption in its full generality. To keep the cost within reasonable bounds, it will be essential to index the derived clauses on at least the predicate symbols they contain -- and probably also. on symbols in certain key argument positions. A simpfification of full subsumption checking that would appear adequate to block most redundant steps is to keep track of selected literals that have been used exhaustively to generate instantiation steps. If another selected literal is an instance of one that has been exhaustively explored, there is no need to consider using it as a candidate for instantiation steps, Subsuvnption would then be only applied to derived unit clauses.</Paragraph>
    <Paragraph position="6"> A major efficiency problem with Earley deduction is that it is difficult to recognize situations in which derived clauses are no longer needed and space can be reclaimed. There is a marked contrast with purely top-down proof procedures, such as Prolog, to which highly effective ~pace recovery techniques can be applied relatively easily. The Eartey algorithm pursues all possible parses in parallel, indexed by string position. In principle, this permits space to be recovered, as parsing progresses, by deleting information relating to earlier string positions, l't amy be possible to generalize this technique to Earley Deduction. by recognizing, either automatically or manually, certain special properties of the input clauses. 7. Decidability and Computational Complexity It is not at. all obvious that grammar formalisms based on unification can be parsed within reasonable bounds of time and space. \[n fact, unrestricted DCGs have Turing machine power, and LFG, although decidable, seems capable of encoding exponentially hard problems.</Paragraph>
    <Paragraph position="7"> llowever, we need not give up our interest in the complexity analysis of unification-based parsing. Whether for interesting subclasses of, grammars or specific ~rammars of interest, it is still important to determine how efficient parsing can be. A basic step in that direction is to estimale the cost added by unification to the operation of combining {reducing or expanding) a nontcrmin.~l in a derivation with a nonterminal in a grammar rule.</Paragraph>
    <Paragraph position="8"> Because definite clauses are only semidecidable, general proof procedures may not terminate for some sets of definite clauses. However, the specialized proof procedures we have derived from parsing algorithms are stable: if a set of definite clauses G is the translation of a context-free grammar, the procedure will always terminate (in success or failure) when to proving any start goal for G. More interesting in this context is the notion of strong stability, which depends on the following notion of off'line parsability. A DCG is offline-parsable if its context-free skeleton is not infinitely ambiguous. Using different terminology, Bresnan and Kaplan (Bresnan and Kaplan, 1982) have shown that the parsing problem for LFG is decidable because LFGs are offline parsable. This result can be adapted easily to DCGs, showing that the parsing problem for offline-parsable DCGs is decidable. Strong stability can now be defined: a parsing algorithm is strongly stable if it always terminates for offline-parsab\[e grammars. For example, a direct DCG version of the Earley parsing algorithm is stable but not strongly so.</Paragraph>
    <Paragraph position="9"> In the following complexity arguments, we restrict ourselves to offline-parsable grammars. This is a reasonable restriction for two reasons: (i) since general DCGs have Turing machine power, there is no useful notion of computational complexity for the parser on its own; (ii) (.here are good reasons to believe that linguistically relevant grammars must be offliae-parsable {Bresnan and Kaplaa, 1982).</Paragraph>
    <Paragraph position="10"> In estimating the added complexity of doing online unification, we start from the fact that the length of any derivation of a terminal string in a finitely ambiguous context-free grammar is linearly bounded by the length of the termin:fi string. The proof of this fact is omitted for lack of spa~.e, but can be found elsewhere (Pereira and Warren, 1.q83).</Paragraph>
    <Paragraph position="11"> General definite-clause proof procedures need to access ttle values of variables {bindings} in derived clauses. The strueture-sh:lring method of representation makes the lime to access a variable binding at worst linear in the length of 1he derivation. Furthermore, the number of variables to be looked up in a derivation step is at worst linear in the size of tile derivation. Finally, the time (and space) to finish a derivation step, once all the relevant bindings are known, does not depend on the size of the derivation. Therefore, using this method for parsing offline-parsable grammars makes the time complexity of each step at worst oIn 2) in the length of the input.</Paragraph>
    <Paragraph position="12"> Some simplifications are possible that improve that time bound. First, it, is possible to use a value array rcpresenta~i(m of hinding~ (Bayer and Moore. 1972} while exploring any given derivation path. reducing to a constant the variable lookup time at the cost of having to save and restore o(n} variable bindings from the value array each time the parsing procedure moves to explore a different derivation path. Secondly, the unification cost can be mode independent of the derivation length, if we for~o the occurs check that prevents a variable from being bound to a term containing it. Finally, the combination of structure sharing and copying suggested in the last section eliminates the overhead of switching to a different derivation path in the value array method at the cost of a uniform o(log n) time to look up or create a variabl, binding in a balanced binary tree.</Paragraph>
    <Paragraph position="13"> When adding a new edge to the chart, a chart parser  must verify that no edge with the same label between the same nodes is already present. In general DCG parsing (and therefore in online parsing with any unification-based formalism}, we cannot check for the &amp;quot;same label&amp;quot; (same lemma), because lemmas in general will contain variables. \Ve must instead check for subsumption of the new lemma by some old lemma. The obvious subsumption checking mechanism has an o(n 3) worst case cost, but the improved binding representations described above, together with the other special techniques mentioned in the previous section, can be used to reduce this cost in practice.</Paragraph>
    <Paragraph position="14"> We do not yet have a full complexity comparison between online and offline parsing, but it is easy to envisage situations in which the number of edges created by an online algorithm is much smaller than that for the corresponding offline algorithm, whereas the cost of applying the unification constraints is the same for both algorithms.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML