File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/j89-4001_metho.xml

Size: 44,518 bytes

Last Modified: 2025-10-06 14:12:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="J89-4001">
  <Title>A PARSING ALGORITHM FOR UNIFICATION GRAMMAR</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 THE PARSER WITHOUT EMPTY SYMBOLS
</SectionTitle>
    <Paragraph position="0"> Our first parser does not allow rules with empty right sides, since these create complications that obscure the main ideas. Therefore, throughout this section let G be a ground grammar in which no rule has an empty side.</Paragraph>
    <Paragraph position="1"> When we say that a derives/3 we mean that ~ derives/3 in G. Thus a ~ e iff a = e.</Paragraph>
    <Paragraph position="2"> A dotted rule in G is a rule of G with the right side divided into two parts by a dot. The symbols to the left of the dot are said to be before the dot, those to the right are after the dot. DR is the set of all dotted rules in G. A dotted rule (A --&gt; a./3) derives a string if a derives that string. To compute symbolic products on sets of rules or dotted rules, we must represent them as g-expressions.</Paragraph>
    <Paragraph position="3"> We represent the rule (A --~ B C) as the list (A B C), and the dotted rule (A --&gt; B.C) as the pair \[(A B C) (C) \].</Paragraph>
    <Paragraph position="4"> We write A ~ B if A derives B by a tree with more than one node. The parser relies on a chain table--a table of all pairs \[A B\] such that A :~, B. Let C O be the set of all \[A B\] such that A ~ B by a derivation tree of depth d. Clearly C1 is the set of all \[A B\] such that (A B) is a rule of G. If S l and $2 are sets of pairs of terms, define link(S~,S 2) = {\[A C\] \[ (3 B. \[A B\] E $1/~ \[B C\] E $2) } The function link is equal to the symbolic product defined by fl = cdr, f2 = car, and g = (h x y .</Paragraph>
    <Paragraph position="5"> cons(car(x), cdr(y))). Therefore we can compute link ($1, $2) by applying Theorem 2.1. Clearly C O / i = link(Cd,C0. Since the grammar is depth-bounded, there exists a number D such that every derivation tree whose yield contains exactly one symbol has depth less than D. Then C D is empty. The algorithm for building the chain table is this: compute C, for increasing values of Computational Linguistics, Volume 15, Number 4, December 1989 223 Andrew Haas A Parsing Algorithm for Unification Grammar n until C n is empty* Then the union of all the C,'s is the chain table.</Paragraph>
    <Paragraph position="6"> We give an example from a finite ground grammar.</Paragraph>
    <Paragraph position="7"> Suppose the rules are</Paragraph>
    <Paragraph position="9"> The terminal symbols are g and h . Then Cl = {\[a b\], \[b c\], \[c d\]}, C 2 : {\[a c\],\[b d\]}, and C 3 = {\[a d\]}. C 4 is empty.</Paragraph>
    <Paragraph position="10"> Definitions. ChainTable is the set of all \[A B\] such that A ~ B. If S is a set of dotted pairs of symbols and S' a set of symbols, ChainUp(S,S') is the set of symbols A such that \[A B\] ~ S for some B ~ S'. &amp;quot;ChainUp&amp;quot; is clearly a symbolic product. If S is a set of symbols, close(S) is the union of S and ChainUp(ChainTable,S).</Paragraph>
    <Paragraph position="11"> By the definition of ChainTable, close(S) is the set of symbols that derive a symbol of S.</Paragraph>
    <Paragraph position="12"> In the example grammar, ChainTable is the union of Cl, C2, and C3--that is, the set {\[ a b\],\[b c\],\[c d\],\[a c\], \[b d\],\[a d\]}. ChainUp({ a}) = {}, but ChainUp({ d}) = { a,b,c}, close({ a}) = { a}, while close({ at}) = { a,b,c,d}. Let a be an input string of length L &gt; 0. For each a\[i k\] the parser will construct the set of dotted rules that derive a\[i k\]. The start symbol appears on the left side of one of these rules iff a\[i k\] is a sentence of G. By lemma 2.5 this can be tested, so we have a recognizer for the language generated by G. With a small modification the algorithm can find the set of derivation trees of a. We omit details and speak of the algorithm as a parser when strictly speaking it is a recognizer only.</Paragraph>
    <Paragraph position="13"> The dotted rules that derive a\[i k\] can be partitioned into two sets: rules with many symbols before the dot and rules with exactly one. For each a\[i k\], the algorithm will carry out three steps. First it collects all dotted rules that derive a\[i k\] and have many symbols before the dot. From this set it constructs the set of all symbols that derive a\[i k\], and from these symbols it constructs the set of all dotted rules that derive a\[i k\] with one symbol before the dot. The union of the two sets of dotted rules is the set of all dotted rules that derive a\[i k\]. Note that a dotted rule derives a\[i k\] with more than one symbol before the dot iff it can be written in the form (A ~ fiB./3') where/3 ~ a\[ij\], B ~ a\[j k\], and i &lt; j &lt; k (this follows because a nonempty string/3 can never derive the empty string in G).</Paragraph>
    <Paragraph position="14"> If (A --* B. (7) derives a\[ij\] and B derives a\[j k\], then (A ~ B C .) derives a\[i k\]. This observation motivates the following.</Paragraph>
    <Paragraph position="15"> Definition. If S is a set of dotted rules and S' a set of symbols, AdvanceDot(S,S') is the set of rules (A aB./3) such that (A ~ a.Bfl) ~ S and B E S'. Clearly AdvanceDot is a symbolic product.</Paragraph>
    <Paragraph position="16"> For example, AdvanceDot({( d ~ k. fi},{ ao'}) equals {( d----&gt; k f .)}.</Paragraph>
    <Paragraph position="17"> Suppose that for each proper substring of a\[i k\] we have already found the dotted rules and symbols that derive that substring. The following lemma tells us that we can then find the set of dotted rules that derive a\[i k\] with many symbols before the dot.</Paragraph>
    <Paragraph position="18"> Lemma 3.1. For i &lt; j &lt; k, let S(ij) be the set of dotted rules that derive a\[i j\], and S'(j,k) the set of symbols that derive a\[j k\]. The set of dotted rules that derive a\[i k\] with many symbols before the dot is  When the dot reaches the end of the right side of a rule, the parser has finished building the symbol on the left side--hence the name finished. For example, finislaed({( d ~ k f .),(a ~ . b)}) is the set { d}.</Paragraph>
    <Paragraph position="19"> The next lemma tells us that if we have the set of dotted rules that derive a\[i k\] with many symbols before the dot, we can construct the set of symbols that derive a\[i k\].</Paragraph>
    <Paragraph position="20"> Lemma 3.2. Suppose length(a) &gt; 1 and S is the set of dotted rules that derive a with more than one symbol before the dot. The set of symbols that derive a is close(finished(S)).</Paragraph>
    <Paragraph position="21"> Proof. Suppose first that A ~ close(finished(S)).</Paragraph>
    <Paragraph position="22"> Then for some B, A ~B, (B ~/3.) is a dotted rule, and /3 ~ a. Then A ~ a. Suppose next that A derives a. We show by induction that if t is a derivation tree in G and A ~ a by t, then A E close(finished (S)). t contains more than one node because length(a) &gt; 1, so there is a rule  In our example grammar, the set of dotted rules deriving a\[0 2\] = gh with more than one symbol before the dot is {(d ~ kf.)}, finished({( d ~ kf.)} ) is { d}, and close({ d} ) = { a,b,c,d}. It is easy to check that these are all the symbols that derive gh.\[3 Definitions. RuleTable is the set of dotted rules (A .a) such that (A ~ a) is a rule of G. If S is a set of symbols, NewRules(S) is AdvanceDot(RuleTable, S).</Paragraph>
    <Paragraph position="23"> In our example grammar, NewRules ({k}) = {( d ~ k  the set of dotted rules that derive a with one symbol before the dot is NewRules(S).</Paragraph>
    <Paragraph position="24"> Proof. Expanding the definitions gives Advance Dot({( A ---&gt; ./3l (A ---&gt; /3)EP}, { C\[ C ~ a}) = {(A --&gt; C./3') (A --&gt; C/3') E P/~ C ~a}. This is the set of dotted rules that derive a with one symbol before the dot.</Paragraph>
    <Paragraph position="25"> Let terminals(i,k) be the set of terminals that derive a\[i k\]; that is, if/+ 1 = k then terminals(i,k) = {a\[i k\]}, and otherwise terminals(i,k) = f~. Let a be a string of length L &gt; 0. For 0 &lt; i &lt; k -&lt; L, define</Paragraph>
    <Paragraph position="27"> Theorem 3.1. For 0 &lt;- i &lt; k &lt;- L, dr(i,k) is the set of dotted rules that derive a\[i k\].</Paragraph>
    <Paragraph position="28"> Proof. By induction on the length of a\[i k\]. If the length is 1, then i + 1 = k. The algorithm returns NewRules(close({a\[i i + 1\]})). close({ a\[i i + 1\]} ) is the set of symbols that derive a\[ i i + 1\] (by the definition of ChainTable), and NewRules(close({a\[i i + 1\]})) is the set of dotted rules that derive a\[i i + 1\] with one symbol before the dot (by lemma 3.3). No rule can derive</Paragraph>
    <Paragraph position="30"> Suppose a\[i k\] has a length greater than 1. If i &lt;j&lt;k, dr(Q) contains the dotted rules that derive a\[i j\] and dr(j,k) contains the dotted rules that derive o~\[j k\], by induction hypothesis. Then finished(drfj, k)) is the set of nonterminals that derive a\[j k\], and terminals(j,k) is the set of terminals that derive a\[j k\], so the union of these two sets is the set of all symbols that derive a\[j k\]. By lemma 3.1, rules~ is the set of dotted rules that derive a\[i k\] with many symbols before the dot. By lemma 3.2, close(finished(rules1)) is the set of symbols that derive a\[i k\], so by lemma 3.3 rules2 is the set of dotted rules that derive a\[i k\] with one symbol before the dot. The union of rulesl and rules2 is the set of dotted rules that derive a\[i k\], and this completes the proof.\[\] Suppose we are parsing the string gh with our example grammar. We have dr(O, 1) = {(k ---&gt; g .),(d ----&gt; k. j&amp;quot;)}</Paragraph>
    <Paragraph position="32"/>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 ThE PARSER WITH EMPTY SYMBOLS
</SectionTitle>
    <Paragraph position="0"> Throughout this section, G is an arbitrary depth-bounded unification grammar, which may contain rules whose right side is empty. If there are empty rules in the grammar, the parser will require a table of symbols that derive the empty string, which we also call the table of empty symbols. Let E d be the set of symbols that derive the empty string by a derivation of depth d, and let E'd be the set of symbols that derive the empty string by a derivation of depth d or less. Since the grammar is depth-bounded, it suffices to construct E d for successive values of d until a D &gt; 0 is found such that E D is the empty set.</Paragraph>
    <Paragraph position="1"> E I is the set of symbols that immediately derive the empty string; that is, the set of all A such that (A ---&gt; e) is a rule. A ~ E d / 1 iffthere is a rule (A ---&gt; B 1 ...Bn) such that for each i, B~ ~ e at depth di, and d is the maximum of the di's. In other words: A E Ed / i iff there is a rule</Paragraph>
    <Paragraph position="3"> If DR is the set of ground instances of a finite set of rules with variables, there is a finite bound on the length of the right sides of rules in DR (because the right side of a ground instance of a rule r has the same length as the right side of r). If L is'the length of the right side of the longest rule in DR, then AdvanceDot*(DR,S) is well defined because the recursion stops at depth L or before. Clearly AdvanceDot*(DR,S) is the set of rules</Paragraph>
    <Paragraph position="5"> S~ is the set of dotted rules (A ---&gt; a./30) such that every symbol of a is in E'd. $2 is then the set of dotted rules (A ---&gt; aB./3 0 such that B ~ Ed and every symbol of a is in E'd. Therefore $3 is the set of dotted rules (A ---&gt; aB/3./32) such that B E Ed and every symbol of a and/3 is in E'd.</Paragraph>
    <Paragraph position="6"> Finally $4 is the set of symbols A such that for some rule (A ---&gt; aBfl), B E E d and every symbol of a and/3 is in E' d. Then $4 is E d + i- In this way we can construct Ed for increasing values of d until the table of empty symbols is complete.</Paragraph>
    <Paragraph position="7"> Here is an example grammar with symbols that derive the empty string:</Paragraph>
    <Paragraph position="9"> The terminal symbols are r and s. In this grammar, E l = {a,b}, E2 = {c}, and E 3 = Q.</Paragraph>
    <Paragraph position="10"> Definitions Let EmptyTable be the set of symbols that derive the empty string. If S is a set of dotted rules, let SkipEmpty(S) be AdvanceDot*(S, EmptyTable).</Paragraph>
    <Paragraph position="11"> Computational Linguistics, Volume 15, Number 4, December 1989 225 Andrew Haas A Parsing Algorithm for Unification Grammar Note that SkipEmpty(S) is the set of dotted rules (A ---&gt; a/3!./32) such that (A ~ a./31/32) E S and/3! ~ e.</Paragraph>
    <Paragraph position="12"> SkipEmpty(S) contains every dotted rule that can be formed from a rule in S by moving the dot past zero or more symbols that derive the empty string. In the example grammar EmptyTable = {a,b,c}, so SkipEmpty({( k --.-&gt; . cfcgc)}) = {( k ---&gt; . cfcgc), (k ----&gt; c. fcgc)}. If the dotted rules in S all derive a, then the dotted rules in SkipEmpty(S) also derive a.</Paragraph>
    <Paragraph position="13"> Let Ca be the set of pairs \[A B\] such that A ~ B by a derivation tree in which the unique leaf labelled B is at depth d (note: this does not imply that the tree is of depth d). C~ is the set of pairs \[A B\] such that (A ---&gt; aB/3) is a rule and every symbol of a and/3 derives the empty string. CI is easily computed using SkipEmpty. Also Ca + i = link(Ca,C0, so we can construct the chain table as before.</Paragraph>
    <Paragraph position="14"> In the example grammar there are no A and B such that A ~B, but if we added the rule (k ~ cfc), we would have k ~f. Note that k derivesfby a tree of depth 3, but the path from the root of this tree to the leaf labeledfis of length one. Therefore the pair \[k\]\] is in C~.</Paragraph>
    <Paragraph position="15"> The parser of Section 4 relied on the distinction between dotted rules with one and many symbols before the dot. If empty symbols are present, we need a slightly more complex distinction. We say that the string a derives /3 using one symbol if there is a derivation of/3 from a in which exactly one symbol of a derives a non-empty string. We say that a derives 13 using many symbols if there is a derivation of/3 from a in which more than one symbol of a derives a nonempty string. If a string a derives a string/3, then a derives/3 using one symbol, or a derives/3 using many symbols, or both. In the example grammar, cfc derives r using one symbol, and cfcg derives rs using many symbols.</Paragraph>
    <Paragraph position="16"> We say that a dotted rule derives /3 using one (or many) symbols if the string before the dot derives /3 using one (or many) symbols. Note that a dotted rule derives a\[i k\] using many symbols iff it can be written as (A --~ /3B/3'./30 where/3~a\[ij\], B ~ a\[j k\], /3' ~ e, and i &lt; j &lt; k. This is true because whenever a dotted rule derives a string using many symbols, there must be a last symbol before the dot that derives a nonempty string. Let B be that symbol; it is followed by a/3' that derives the empty string, and preceded by a/3 that must contain at least one more symbol deriving a non-empty string.</Paragraph>
    <Paragraph position="17"> We prove lemmas analogous to 3.1, 3.2, and 3.3.</Paragraph>
    <Paragraph position="18"> Lemma 4.1. For i &lt;j &lt; k let S(id) be the set of dotted rules that derive a\[ij\] and S'(j,k) the set of symbols that derive a\[j k \]. The set of dotted rules that derive a\[i k\] using many symbols is</Paragraph>
    <Paragraph position="20"> Proof. Expanding definitions and using the argument of lemma 3.3 we have</Paragraph>
    <Paragraph position="22"> This in turn is equal to {(B -~/3A/3'./3a) E DR \[ (=1 j. i &lt; j &lt; k/k/3 ~ a\[ij\]/k A ~ a\[j k\]) A/3' ~ e} This is the set of rules that derive a\[i k\] using many symbols, as noted above.</Paragraph>
    <Paragraph position="23"> If we have a = rs, then the set of dotted rules that derive a\[0 1\] is {ff--~ r .),(k--~ cf . cgc),(k ~ cfc . gc)} The set of symbols that derive a\[1 2\] is {g,s}. The set of dotted rules that derive a\[0 2\] using many symbols is {(k ~ cfcg . c),(k--&gt; cfcgc .)} Lemma 4.1 tells us that to compute this set we must apply SkipEmpty to the output of AdvanceDot. If we failed to apply SkipEmpty we would omit the dotted rule (k ~ cfcgc .) from our answer.</Paragraph>
    <Paragraph position="24"> Lemma 4.2. Suppose length(a) &gt; 1 and S is the set of dotted rules that derive a using many symbols. The set of symbols that derive a is close(finished(S)).</Paragraph>
    <Paragraph position="25"> Proof. By induction as in Lemma 3.2.</Paragraph>
    <Paragraph position="26"> Definitions. Let RuleTable' be SkipEmpty({( A --&gt; .a)</Paragraph>
    <Paragraph position="28"> set of symbols let NewRules'(S) be SkipEmpty(Advance Dot(RuleTable' ,S)).</Paragraph>
    <Paragraph position="29"> RuleTable' is like the RuleTable defined in Section 4, except that we apply SkipEmpty. In the example grammar, this means adding the following dotted rules:</Paragraph>
    <Paragraph position="31"> NewRules'({f}) is equal to {( k--* cf . cgc),(k ~ cfc .</Paragraph>
    <Paragraph position="32"> gc)}.</Paragraph>
    <Paragraph position="33"> The following lemma tells us that NewRules' will perform the task that NewRules performed in Section 4. Lemma 4.3. If S is the set of symbols that derive a, the set of dotted rules that derive a using one symbol is NewRules'(S).</Paragraph>
    <Paragraph position="34"> Proof. Expanding definitions gives</Paragraph>
    <Paragraph position="36"> This is the set of dotted rules that derive a using one symbol, by definition.</Paragraph>
    <Paragraph position="37"> Let a be a string of length L. For 0 -&lt; i &lt; k - L,</Paragraph>
    <Paragraph position="39"> Theorem 4.1. dr(i,k) is the set of dotted rules that derive a\[i k\].</Paragraph>
    <Paragraph position="40"> Proof. By induction on the length of a\[i k\] as in the proof of theorem 3.1, but with lemmas 4.1, 4.2, and 4.3 replacing 3.1, 3.2, and 3.3, respectively.D If a = rs we find that</Paragraph>
    <Paragraph position="42"/>
  </Section>
  <Section position="7" start_page="0" end_page="229" type="metho">
    <SectionTitle>
6 THE PARSER WITH ToP-DOWN FILTERING
</SectionTitle>
    <Paragraph position="0"> We have described two parsers that set dr(i,k) to the set of dotted rules that derive a\[i k\]. We now consider a parser that uses top-down filtering to eliminate some useless rules from dr(i, k). Let us say that A follows/3 if the start symbol derives a string beginning with/3A. A dotted rule (A ~ X) follows/3 if A follows/3. The new algorithm will set dr(i,k) to the set of dotted rules that derive a\[i k\] and follow a\[0 i\].</Paragraph>
    <Paragraph position="1"> If A derives a string beginning with B, we say that A can begin with B. The new algorithm requires a prediction table, which contains all pairs \[A B\] such that A can begin with B. Let P1 be the set of pairs \[A B\] such that (A --~ /3B/3') is a rule and /3 ~ e* Let P. + 1 be Pn tO Link(P,, P1).</Paragraph>
    <Paragraph position="2"> Lemma 5.1. The set of pairs \[A B\] such that A can begin with B is the union of Pn for all n -&gt; 1.</Paragraph>
    <Paragraph position="3"> Proof. By induction on the tree by which A derives a string beginning with B. Details are left to the reader.I\] Our problem is to construct a finite representation for the prediction table* To see why this is difficult, consider a grammar containing the rule</Paragraph>
    <Paragraph position="5"> Thus if we try to build the prediction table in the obvious way, we get an infinite set of pairs of terms.</Paragraph>
    <Paragraph position="6"> The key to this problem is to recognize that it is not necessary or even useful to predict every possible feature of the next input. It makes sense to predict the presence of traces, but predicting the subcategorization frame of a verb will cost more than it saves. To avoid predicting certain features, we use a weak prediction table; that is, a set of pairs of symbols that properly contains the set of all \[A B\] such that A ~ B. This weak prediction table is guaranteed to eliminate no more than what the ideal prediction table eliminates. It may leave some dotted rules in dr(i,k) that the ideal prediction table would remove, but it may also cost less to use.</Paragraph>
    <Paragraph position="7"> Sato and Tamaki (1984) proposed to analyze the behavior of Prolog programs, including parsers, by using something much like a weak prediction table. To guarantee that the table was finite, they restricted the depth of terms occurring in the table* Shieber (1985b) offered a more selective approach--his program predicts only those features chosen by the user as most useful for prediction. Pereira and Shieber (1987) discuss both approaches. We will present a variation of Shieber's ideas that depends on using a sorted language.</Paragraph>
    <Paragraph position="8"> To build a weak prediction table we begin with a set</Paragraph>
    <Paragraph position="10"> ground(Q')). Let Oi + 1 equal Oi LI LP (Oi,Ol). Then by lemma 2.3 and induction, i~lPi c_ ground(i &gt;_to l ai) That is, the union of the Qi s represents a weak prediction table* Thus we have shown that if a weak prediction table is adequate, we are free to choose any Q1 such that</Paragraph>
    <Paragraph position="12"> Since ground(Q i / 1) is a function of ground(Qi) for all i, it follows that ground(Qi) = ground(Qo) for all i -&gt; D, so ground(QD) = (tO i &gt;-- 1 ground(Qi))* That is, QD is a finite representation of a weak prediction table. Our problem is to choose QI so that QD subsumes Qo + i for some D.</Paragraph>
    <Paragraph position="13"> Let sl and s z be sorts. In section 2 we defined sl &gt; s 2 if there is a function letter of sort s~ that has an argument of sort s 2. Let &gt;* be the transitive closure of &gt;; a sort t is cyclic if t &gt;* t, and a term is cyclic if its sort is cyclic* P~ is equal to {\[A B\] \] (A ~/3*B/3') E RuleTable'} so we can build a representation for P~. Let us form Q~ from this representation by replacing all cyclic terms with new variables. More exactly, we apply the following recursive transformation to each term t in the representation of PI:</Paragraph>
    <Paragraph position="15"> if the sort of f is cyclic then new-variable0 else fltransform (h)--.transform(tn)) whei-e new-variable is a function that returns a new variable each time it is called.</Paragraph>
    <Paragraph position="16"> Then P1 C_ ground(Ql), and Q1 contains no function letters of cyclic sorts. For example, if the function letter s belongs to a cyclic sort, we will turn</Paragraph>
    <Paragraph position="18"> Qi subsumes Q2, and Q1 is already a finite representation of a weak prediction table. The following lemma Computational Linguistics, Volume 15, Number 4, December 1989 227 Andrew Haas A Parsing Algorithm for Unification Grammar shows that in general, the Ql defined above allows us to build a finite representation of a weak prediction table.</Paragraph>
    <Paragraph position="19"> Lemma 5.2. Let Q~ be a set of pairs of terms that contains no function letters of cyclic sorts, and let Qi be as defined above for all i &gt; 1. Then for some D, QD subsumes LP(QD,Q O.</Paragraph>
    <Paragraph position="20"> Proof. Note first that since unification never introduces a function letter that did not occur in the input, Q~ contains no function letters of cyclic sort for any i -&gt; 1.</Paragraph>
    <Paragraph position="21"> Let C be the number of noncyclic sorts in the language. Then the maximum depth of a term that contains no function letters of cyclic sorts is C + 1.</Paragraph>
    <Paragraph position="22"> Consider a term as a labeled tree, and consider any path from the root of such a tree to one of its leaves. The path can contain at most one variable or function letter of each noncyclic sort, plus one variable of a cyclic sort.</Paragraph>
    <Paragraph position="23"> Then its length is at most C + 1.</Paragraph>
    <Paragraph position="24"> Consider the set S of all pairs of terms in L that contain no function letters of cyclic sorts. Let us partition this set into equivalence classes, counting two terms equivalent if they are alphabetic variants. We claim that the number of equivalence classes is finite.</Paragraph>
    <Paragraph position="25"> Since there is a finite bound on the depths of terms in S, and a finite bound on the number of arguments of a function letter in S, there is a finite bound V on the number of variables in any term of S. Let v~...vK be a list of variables containing V variables from each sort.</Paragraph>
    <Paragraph position="26"> Then there is a finite number of pairs in S that use only variables from v~...vK; let S' be the set of all such pairs. Now each pair p in S is an alphabetic variant of a pair in S', for we can replace the variables of p one-for-one with variables from v~...vK. Therefore the number of equivalence classes is no more than the number of elements in S'. We call this number E. We claim that QD subsumes LP(QD,QI) for some D -&lt; E.</Paragraph>
    <Paragraph position="27"> To see this, suppose that Qi does not subsume LP(Qi,Q1) for all i &lt; E. If Qi does not subsume LP(Qi,Q1), then Qi/~ intersects more equivalence classes than Qi does. Since Q~ intersects at least one equivalence class, QE intersects all the equivalence classes. Therefore QE subsumes LP(QE,QI), which was to be proved.I\] This lemma tells us that we can build a weak prediction table for any grammar by throwing away all subterms of cyclic sort. In the worst case, such a table might be too weak to be useful, but our experience suggests that for natural grammars a prediction table of this kind is very effective in reducing the size of the dr(i,k) s. In the following discussion we will assume that we have a complete prediction table; at the end of this section we will once again consider weak prediction tables.</Paragraph>
    <Paragraph position="28"> Definitions. If S is a set of symbols, let first(S) = S U { B I (q A E S. \[A B\] ~ PredTable }. If PredTable is indeed a complete prediction table, first(S) is the set of symbols B such that some symbol in S can begin with B.</Paragraph>
    <Paragraph position="29">  The terminal symbols are r and s. In this grammar first({start}) = {start, a,r}, and next({( a --&gt; r. g)}) = { g}.&gt; The following lemma shows that we can find the set of symbols that follow a\[0 j'\] if we have a prediction table and the sets of dotted rules that derive a\[ij\] for all i &lt; j.</Paragraph>
    <Paragraph position="30"> Lemma 5.3. Let j satisfy 0 -&lt; j -&lt; length(a). Suppose that for 0 &lt; i &lt; j, S(/) is the set of dotted rules that follow a\[0 i\] and derive a\[ij\] (ifj = 0 this is vacuous). Let start be the start symbol of the grammar. Then the set of symbols that follow a\[0 j'\] is</Paragraph>
    <Paragraph position="32"> Proof. We show first that every member of the given set follows a\[0 Jl. If j = 0, certainly every member of first({start}) follows a\[0 0\] = e. Ifj &gt; 0, suppose that C follows a\[0 i\], (C ---&gt;/3B/3') is a rule, and/3 ~ a\[ij\]; then clearly B follows a\[0 j\].</Paragraph>
    <Paragraph position="33"> Next we show that if A follows a\[0 Jl, A is in the given set. We prove by induction on d that if start ~ a\[0 jlAa' by a tree t, and the leaf corresponding to the occula'ence of A after a\[0 ./1 is at depth d in t, then A belongs to the given set. If d = 0, then A = start, and j = 0. We must prove that start E first({start}), which is obvious.</Paragraph>
    <Paragraph position="34"> If d &gt; 0 there are two cases. Suppose first that the leaf n corresponding to the occurrence of A after a\[0 j\] has younger brothers dominating a nonempty string (younger brothers of n are children of the same father occuJrring to the left of n). Then the father of n is admitted by a rule of the form (C--&gt;/3A/3'). C is the label of the father of n, and /3 consists of the labels of the younger brothers of n in order. Then/3 ~ a\[i j\], where 0 --- i &lt; j. Removing the descendants of n's father from t giw~s a tree t' whose yield is a\[0 i\]Ca'. Therefore C follows a\[0 i\]. We have shown that (C ---&gt;/3A/3') is a rule, C follows a\[0 i\], and/3 ~ a\[i j\]. Then (C ---&gt; /3.A/3') E S(i), A E next(S(/)), and A E (U 0-- &lt; i &lt; j next(S(/))).</Paragraph>
    <Paragraph position="35"> Finally suppose that the younger brothers of n dominate the empty string in t. Then if C is the label of n's father, C can begin with A. Removing the descendants of' n's father from t gives a tree t' whose yield begins with a\[0 j\]C. Then C belongs to the given set by induction hypothesis. If C E first(X) and C can begin with A, then A E first(X). Therefore A belongs to the given set. This completes the proof.</Paragraph>
    <Paragraph position="36"> As an example, let a = rs. Then the set of dotted rules that derive a\[0 1\] and follow a\[0 0\] is {(a ---&gt; r. g)}. The dotted rule (c ~ r. h) derives a\[0 I\], but it does not follow a\[0 0\] because c is not an element of first({start}). 228 Computational Linguistics, Volume 15, Number 4, December 1989 Andrew Haas A Parsing Algorithm for Unification Grammar We are finally ready to present the analogs oflemmas 3.1, 3.2, and 3.3 for the parser with prediction. Where the earlier lemmas mentioned the set of symbols (or dotted rules) that derive a\[ij\], these lemmas mention the set of symbols (or dotted rules) that follow a\[0 i\] and derive a\[i j\].</Paragraph>
    <Paragraph position="37"> Lemma 5.4. Let a be a nonempty string. Suppose that for i &lt; j &lt; k, S(ij) is the set of dotted rules that follow a\[0 i\] and derive a\[ij\], while S'(j,k) is the set of symbols that follow a\[0 j\] and derive a\[j k\]. The set of dotted rules that follow a\[0 i\] and derive a\[i k\] using many symbols is  then A follows a\[0 j\]. Therefore the statement that A follows a\[0 j\] is redundant and can be deleted, giving SkipEmpty({(B --&gt;/3A./32) E DR I B follows a\[0 i\] /k (a j. i&lt;j&lt;k/k/3 ~ a\[i j\]/k A ~ a\[i k\])}) This in turn is equal to {(B ~ flA/3'.fl3 ) E DR \] B follows a\[O i\] A (3 j. i&lt;j&lt;k A/3 ~ a\[ij\] A A ~ a\[j k\]) A/3' ~e} This is the set of dotted rules that follow a\[0 i\] and derive a\[i k\] using many symbols.D Lemma 5.5. Suppose length(a\[ij\]) &gt; 1, S is the set of symbols that follow a\[0 i\], and S' is the set of dotted rules that follow a\[0 i\] and derive a\[i j\] using many symbols. Then S n close(finished(S')) is the set of symbols that follow a\[0 i\] and derive a\[ij\].</Paragraph>
    <Paragraph position="38"> Proof. S' is a subset of the set of dotted rules that derive a\[i j\], so by lemma 4.2 and monotonicity, close(finished(S')) is a subset of the set of symbols that derive a\[ij\]. Therefore every symbol in S n close(finished(S'))) derives a\[ij\] and follows a\[0 i\]. This proves inclusion in one direction.</Paragraph>
    <Paragraph position="39"> For the other direction, suppose A follows a\[0 i\] and derives a\[ij\]. Then by lemma 4.2 there is a dotted rule (B ---&gt;/3.) such that/3 ~ a\[i j\] using many symbols and A ~ B. Then B follows a\[0 i\], so B is in finished(S'), which means that A is in S n close(finished(S')).\[-\] Definition. If S is a set of symbols and R a set of dotted rules, filter(S,R) is the set of rules in R whose left sides are in S. In other words, filter(S,R) = {( A ---&gt;/3./3') CRI AES}.</Paragraph>
    <Paragraph position="40"> Lemma 5.6. Suppose S is the set of symbols that follow a\[0 i\], and S' is the set of symbols that follow a\[0 i\] and derive a\[ij\]. Then the set of rules that follow a\[0 i\] and derive a\[i j\] using one symbol is filter(S,NewRules'(S')).</Paragraph>
    <Paragraph position="41"> Proof: S' is a subset of the set of symbols that derive Computational Linguistics, Volume 15, Number 4, December 1989 a\[i j\]. By lemma 4.3 and monotonicity, we know that every dotted rule in NewRules'(S') derives a\[ij\] using one symbol. Therefore every dotted rule in filter(S,NewRules(S')) follows a\[0 i\] and derives a\[i j\] using one symbol. This proves inclusion in one direction.</Paragraph>
    <Paragraph position="42"> For the other direction, consider any dotted rule that follows a\[0 i\] and derives a\[ij\] using one symbol; it can be written in the form (A --&gt; flB/3'.fll), where 13 and/3' derive e, B derives a\[i j\], and A follows a\[0 i\]. Since /3 ~ e, B follows a\[0 i\]. Therefore B E S' and (A --&gt; /3B/3'./31) is in NewRules'(S'). Since A follows a\[0 i\], (A --&gt;/3B/3'./3 0 is in filter(S,NewRules'(S')).</Paragraph>
    <Paragraph position="43"> Let a be a string of length L. For O&lt;-i&lt;k &lt; L, define</Paragraph>
    <Paragraph position="45"> Note that the new version of dr(i,k) is exactly like the previous version except that we filter the output of close by intersecting it with pred(i), and we filter the output of NewRules' by applying the function filter.</Paragraph>
    <Paragraph position="46"> Theorem 5.6 For O&lt;-k&lt;-L, pred(k) is the set of symbols that follow a\[0 i\], and if 0-&lt;i&lt; k, dr(i,k) is the set of dotted rules that follow a\[0 i\] and derive a\[i k\].</Paragraph>
    <Paragraph position="47"> Proof. This proof is similar to the proof of theorem 3.4, but it is more involved because we must show that pred(k) has the desired values. Once more we argue by induction, but this time it is a double induction: an outer induction on k, and an inner induction on the length of strings that end at k.</Paragraph>
    <Paragraph position="48"> We show by induction on k that pred(k) has the desired value and for O&lt;-i&lt;k, dr(i,k) has the desired value. If k = 0, lemma 5.3 tells us that pred(O) is the set of symbols that follow a\[0 0\], and the second part of the induction hypothesis is vacuously true.</Paragraph>
    <Paragraph position="49"> If k &gt; 0, we first show by induction on the length of a\[i k\] that dr(i,k) has the desired value for 0 &lt;-i&lt;k. This part of the proof is much like the proof of 3.4. If a\[i k\] has length 1, then pred(i) is the set of symbols that follow a\[0 i\] by the hypothesis of the induction on k.</Paragraph>
    <Paragraph position="50"> Then pred(i) n close({a\[i k\]}) is the set of symbols that follow a\[0 i\] and derive a\[i k\], so lemma 5.6 tells us that filter(pred (i),NewRules'(pred(i) n close({a\[i k\]}))) is the set of dotted rules that follow a\[0 i\] and derive a\[i k\].</Paragraph>
    <Paragraph position="51"> If length(a\[/k\]) &gt; 1, consider any j such that i&lt;j&lt;k.</Paragraph>
    <Paragraph position="52"> dr(i,j) and dr(j,k) have the desired values by induction  Andrew Haas A Parsing Algorithm for Unification Grammar hypothesis. Then lemma 5.4 tells us that rules~ is the set of dotted rules that follow a\[0 i\] and derive a\[i k\] using many symbols, pred(i) is the set of symbols that follow a\[0 i\], so pred(i) fq close(finished(rulesO) isthe set of symbols that follow a\[0 i\] and derive a\[i k\], by lemma 5.5. Therefore rulesz is the set of dotted rules that follow a\[0/\] and derive a\[i k\] using one symbol, by lemma 5.6. The union of rules~ and rulesz is the set of dotted rules that follow a\[0 i\] and derive a\[i k\], and this completes the inner induction.</Paragraph>
    <Paragraph position="53"> To complete the outer induction, we use lemma 5.3 to show that pred(k) is the set of symbols that follow a\[0 k\]. This completes the proof.E3 Corollary: Start E finished(dr(O,L)) iff a is a sentence of the language generated by G.</Paragraph>
    <Paragraph position="54"> Suppose we are parsing the string rs using the example grammar. Then we have</Paragraph>
    <Paragraph position="56"> We have proved the correctness of the parser when it uses an ideal prediction table. We must still consider what happens when the parser uses a weak prediction table.</Paragraph>
    <Paragraph position="57"> Theorem 5.7. If PredTable is a superset of the set of all \[A B\] such that A can begin with B, then start E finished(dr(O,L)) iff a is a sentence of the language generated by G.</Paragraph>
    <Paragraph position="58"> Proof. Note that the parser with filtering always builds a smaller dr(i,k) than the parser without filtering. Since all the operations of the parser are monotonic, this is an easy induction. So if the parser with filtering puts the start symbol in dr(O,L), the parser without filtering will do this also, implying that a is a sentence. Note also that the parser with filtering produces a larger dr(i,k) given a larger PredTable (again, this follows easily because all operations in the parser are monotonic). So if a is a sentence, the parser with the ideal prediction table includes Start in dr(O,L), and so does the parser with the weak prediction table.\[\]</Paragraph>
  </Section>
  <Section position="8" start_page="229" end_page="230" type="metho">
    <SectionTitle>
7 DISCUSSION AND IMPLEMENTATION NOTES
7.1 RELATED WORK AND POSSIBLE EXTENSIONS
</SectionTitle>
    <Paragraph position="0"> The chief contribution of the present paper is to define a class of grammars on which bottom-up parsers always halt, and to give a semi-decision procedure for this class. This in turn makes it possible to prove a completeness theorem, which is impossible if one considers arbitrary unification grammars. One can obtain similar results for the class of grammars whose context-free backbone is finitely ambiguous--what Pereira and Warren (1983) called the offline-parsable grammars. However, as Shieber (1985b) observed, this class of grammars excludes many linguistically interesting grammars that do not use atomic category symbols.</Paragraph>
    <Paragraph position="1">  The present parser (as opposed to the table-building algorithm) is much like those in the literature. Like near\]ty all parsers using term unification, it is a special case of Earley deduction (Pereira and Warren 1985).</Paragraph>
    <Paragraph position="2"> The tables are simply collections of theorems proved in advance and added to the program component of Earley deduction. Earley deduction is a framework for parsing rather than a parser. Among implemented parsers, BUP (Matsumota et al. 1983) is particularly close to the present work. It is a bottom-up left-corner parser using term unification. It is written in Prolog and uses backtracking, but by recording its results as clauses in the Prolog database it avoids most backtracking, so that it is close to a chart parser. It also includes top-down filtering, although it uses only category symbols in filtering. The paper includes suggestions for handling rules with empty right sides as well. The main difference from the present work is that the authors do not describe the class of grammars on which their algorithm halts., and as a result they cannot prove completeness.</Paragraph>
    <Paragraph position="3"> Tlae grammar formalism presented here is much simpler than many formalisms called &amp;quot;unification grammars.&amp;quot; There are no meta-rules, no default values of features, no general agreement principles (Gazdar et al.</Paragraph>
    <Paragraph position="4"> 1986). We have found this formalism adequate to describe a substantial part of English syntax--at least, substantial by present-day standards. Our grammar currently contains about 300 syntactic rules, not counting simple rules that introduce single terminals. It includes a thorough treatment of verb subcategorization and less thorough treatments of noun and adjective subcategorization. It covers major construction types: raising, control, passive, subject-aux inversion, imperatives, wh-movement (both questions and relative clauses), determiners, and comparatives. It assigns parses to 85% of a corpus of 791 sentences. See Ayuso et al. 1988 for a description of the grammar.</Paragraph>
    <Paragraph position="5"> It is clear that some generalizations are being missed.</Paragraph>
    <Paragraph position="6"> For example, to handle passive we enumerate by hand tile rules that other formalisms would derive by metarule. We are certainly missing a generalization here, but we have found this crude approach quite practical---our coverage is wide and our grammar is not hard to maintain. Nevertheless, we would like to add metarule,~ and probably some general feature-passing principles. We hope to treat them as abbreviation mechanisms-we would define the semantics of a general feature-passing principal by showing how a grammar using that principal can be translated into a grammar written in our original formalism. We also hope to add feature disjunction to our grammar (see Kasper 1987; Kasper and Rounds 1986).</Paragraph>
    <Paragraph position="7"> Though our formalism is limited, it has one property that is theoretically interesting: a sharp separation between the details of unification and the parsing mechanism. We proved in Section 3 that unification allows us to compute certain functions and predicates on sets of grammatical expressions--symbolic products, unions, Computational Linguistics, Volume 15, Number 4, December 1989 Andrew Haas A Parsing Algorithm for Unification Grammar and so forth. In Section 4 and 5 we assumed that these functions were available as primitives and used them to build bottom-up parsers. Nothing in Sections 4 and 5 depends on the details of unification. If we replace standard unification with another mechanism, we have only to re-prove the results of Section 3 and the correctness theorems of Sections 4 and 5 follow at once. To see that this is not a trivial result, notice that we failed to maintain this separation in Section 6. To show that one can build a complete prediction table, we had to consider the details of unification: we mentioned terms like &amp;quot;alphabetic variant&amp;quot; and &amp;quot;subsumption.&amp;quot; We have presented a theory of bottom-up parsing that is general in the sense that it does not rely on a particular pattern-matching mechanism--it applies to any mechanism for which the results of Section 3 hold. We claim that these results should hold for any reasonable pattern-matching mechanism; the reader must judge this claim by his or her own intuition.</Paragraph>
    <Paragraph position="8"> One drawback of this work is that depth-boundedness is undecidable. To prove this, show that any Turing machine can be represented as a unification grammar, and then show that an algorithm that decides depth-boundedness can also solve the halting problem.</Paragraph>
    <Paragraph position="9"> This result raises the question: is there a subset of the depth-bounded grammars that is strong enough to describe natural language, and for which membership is decidable? Recall the context-free backbone of a grammar, described in the Introduction. One can form a context-free backbone for a unification grammar by keeping only the topmost function letters in each rule. There is an algorithm to decide whether this backbone is depthbounded, and if the backbone is depth-bounded, so is the original grammar (because the backbone admits every derivation tree that the original grammar admits).</Paragraph>
    <Paragraph position="10"> Unfortunately this class of grammars is too restricted-it excludes rules like (major-category(n,2) ~ majorcategory(n,1)), which may well be needed in grammars for natural language.</Paragraph>
    <Paragraph position="11"> Erasing everything but the top function letter of each term is drastic. Instead, let us form a &amp;quot;backbone&amp;quot; by applying the transformation of Section 6, which eliminates cyclic function letters. We can call the resulting grammar the acyclic backbone of the original grammar.</Paragraph>
    <Paragraph position="12"> We showed in Section 6 that if we eliminate cyclic function letters, then the relation of alphabetic variance will partition the set of all terms into a finite number of equivalence classes. We used this fact to prove that the algorithm for building a weak prediction table always halts. By similar methods we can construct an algorithm that decides depth-boundedness for grammars without cyclic function letters. Then the grammars whose acyclic backbones are depth-bounded form a decidable subset of the depth-bounded grammars. One can prove that this class of grammars generates the same languages as the off-line parsable grammars. Unlike the off-line parsable grammars, they do not require atomic category symbols. A forthcoming paper will discuss these matters in detail.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML