File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/87/e87-1037_metho.xml

Size: 22,485 bytes

Last Modified: 2025-10-06 14:12:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="E87-1037">
  <Title>A Comparison of Rule-Invocation Strategies in Context-Free Chart Parsing</Title>
  <Section position="3" start_page="226" end_page="229" type="metho">
    <SectionTitle>
2 A Survey of
Rule-Invocation Strategies
</SectionTitle>
    <Paragraph position="0"> This section surveys the fundamental rule-invocation strategies in context-flee chart parsing. 3 In a chart-parsing framework, different rule-invocation strategies correspond to different conditions for and ways of predicting new edges 4. This section will therefore in effect constitute a survey of different methods for predicting new edges.</Paragraph>
    <Section position="1" start_page="226" end_page="227" type="sub_section">
      <SectionTitle>
2.1 Top-Down Strategies
</SectionTitle>
      <Paragraph position="0"> The principle of top-down parsing is to use the rules of the grammar to generate a sentence that matches the one being analyzed.</Paragraph>
      <Paragraph position="1">  A strategy for top-down chart parsing 5 is given below. Assume a context-free grammar G. Also, we make the usual assumption that G is cycle-free, i.e., it does not contain derivations of the form A1 --* A~,</Paragraph>
      <Paragraph position="3"> Whenever an active edge is added to the chart, if its first required constituent is C, then add an empty active C edge for every rule in G which expands C. 7 This principle will apply to itself recursively, ensuring that all subsidiary active edges also get produced. null  Realistic natural-language grammars are likely to be highly branching. A weak point of the ~normal = top-down strategy above will then be the excessive number of predictions typically made: in the beginning of a phrase new edges will be introduced for all constituents, and constituents within those constituents, that the phrase can possibly start with. One way of limiting the number of predictions is by making the strategy %elective = (Griffiths aI assume a basic familiarity with chart parsing. For an excellent introduction, see Thompson and Ritchie (1984).  going into an infinite loop, this strategy needs a redundancy check which prevents more than one identical active edge from being added to the chart.</Paragraph>
      <Paragraph position="4">  and Petrick 1965:291): by looking at the category/categories of the next word, it is possible to rule out some proposed edges that are known not to combine with the corresponding inactive edge(s). Given that top-down chart parsing starts with a scanning phase, the adoption of this filter is straightforward. The strategy makes use of a reachability relation where A\]~B holds if there exists some derivation from A to B such that B is the first element in a string dominated by A. Given preterminal look-ahead symbol(s) py corresponding to the next word, the processor can then ask if the first required constituent of a predicted active edge (say, C) can somehow start with (some) p~.. In practice, the relation is implemented as a precompiled table. Determining if holds can then be made very fast and in constant time. (Cf. Pratt 1975:424.) The strategy presented here corresponds to Kay's adirected top-down&amp;quot; strategy (Kay 1982:338) and can be specified in the following manner.</Paragraph>
      <Paragraph position="5"> Strategy 2 {TD0) Let r(X} be the first required constituent of the (active) edge X. Let u be the vertex to which the active edge about to be proposed extends.</Paragraph>
      <Paragraph position="6"> Let Pl,..., Pn be the preterminal categories of the edges extending from v that correspond to the next word. -- Whenever an active edge . is added to the chart, if its first required constituent is C, then for every rule in G which expands C add an empty active C edge if for some \] r(C) = pj or r(O)~pj.</Paragraph>
    </Section>
    <Section position="2" start_page="227" end_page="229" type="sub_section">
      <SectionTitle>
2.2 Bottom-Up Strategies
</SectionTitle>
      <Paragraph position="0"> The principle of bottom-up parsing is to reduce a sequence of phrases whose types match the right-hand side of a grammar rule to a phrase of the type of the left-hand side of the rule. To make a reduction possible, all the right-hand-side phrases have to be present. This can be ensured by matching from right to left in the right-hand side of the grammar rule; this is for example the case with the Cocke--Kasami-Younger algorithm (Aho and Ullman 1972).</Paragraph>
      <Paragraph position="1"> A problem with this approach is that the analysis of the first part of a phrase has no influence on the analysis of the latter parts until the results from them are combined. This problem can be met by adopting left-corner parsing.</Paragraph>
      <Paragraph position="2">  Left-corner parsing is a bottom-up technique where the right-hand-side symbols of the rules are matched from left to right, s Once the left-corner symbol has been found, the grammar rule can be used to predict what may come next.</Paragraph>
      <Paragraph position="3"> A basic strategy for left-corner chart parsing is given below.</Paragraph>
      <Paragraph position="4"> Strategy 3 g (LC) Whenever an inactive edge is added to the chart, if its category is T, then for every rule in G with T as left-corner symbol add an empty active edge. 1deg Note that this strategy will make aminimal&amp;quot; predictions, i.e., it will only predict the nezt higher-level phrases which a given constituent can begin.</Paragraph>
      <Paragraph position="5">  Kilbury (1985) presents a modified left-corner strategy. Basically it amounts to this: instead of predicthag empty active edges, edges which subsume the inactive edge that provoked the new edge are predicted. A predicted new edge may then be either active or inactive depending on the contents of the inactive edge and on what is required by the new edge.</Paragraph>
      <Paragraph position="6"> This strategy has two clear advantages: First, it saves many edges compared to the anormal&amp;quot; left corner because it never produces empty active edges. Secondly (and not pointed out by Kilbury), the usual redundancy check is not needed here since the strategy itself avoids the risk of predicting more than one identical edge. The reason for this is that a predicted edge always subsumes the triggering (inactive) edge. Since the triggering edge is guaranteed to be unique, the subsuming edge will also be unique. By virtue of this, Kilbury's prediction strategy is actually the simplest of all the strategies considered here.</Paragraph>
      <Paragraph position="7"> The price one has to pay for this is that rules with empty-string productions (or e-productions, i.e. rules of the form A -* e), cannot be handled. This might look like a serious limitation since most current linguistic theories (e.g., LFG, GPSG) make explicit use of e-productions, typically for the handling of gaps. On the other hand, context-free grammars can be converted into grammars without e-productions (Aho and Ullman 1972:150).</Paragraph>
      <Paragraph position="8"> In practice however, e-productions can be handled in various ways which circumvent the problem. For example, Karttunen's D-PATR system SThe left corner of a rule is the leftmost symbol of its right-hand side.</Paragraph>
      <Paragraph position="9"> degThis formulation is again equivalent to the one in Thompson (1981:4). Thompson however refers to it a8 &amp;quot;bottom-up&amp;quot;. *degIn this case, left-recursive rules will not lead to infinite loops. The redundancy check is still needed to prevent superfluotm analyses from being generated, though.</Paragraph>
      <Paragraph position="10">  does not allow empty productions. Instead, it takes care of fillers and gaps through a ~threading&amp;quot; technique (Karttunen 1986:77). Indeed, the system has been successfully used for writing LFG-style grammars (e.g., Dyvik 1986).</Paragraph>
      <Paragraph position="11"> Kilbury's left-corner strategy can be specified in the following manner.</Paragraph>
      <Paragraph position="12"> Strategy 4 (LCK) Whenever an inactive edge is added to the chart, if its category is T, then for every rule in G with T as left-corner symbol add an edge that subsumes the T edge.</Paragraph>
      <Paragraph position="13">  As often pointed out, bottom-up and left-corner strategies encounter problems with sets of rules like A ~ BC and A --* C (right common factors). For example, assuming standard grammar rules, when parsing the phrase athe birds fly&amp;quot; an unwanted sentence ~birds fly&amp;quot; will be discovered.</Paragraph>
      <Paragraph position="14"> This problem can be met by adopting top-dowN j~tering, a technique which can be seen as the dual of the selective top-down strategy. Descriptions of top-down filtering are given for example in Kay (1982) (~directed bottom-up parsing&amp;quot;) and in Slocum (1981:2). Also, the aoracle&amp;quot; used by Pratt (1975:424) is a top-down filter.</Paragraph>
      <Paragraph position="15"> Essentially top-down filtering is like running a top-down parser in parallel with a bottom-up parser. The (simulated} top-down parser rejects some of the edges that the bottom-up parser proposes, vis. those that the former would not discover. The additional question that the top-down filter asks is then: is there any place in a higher-level structure for the phrase about to be built by the bottom-up parser? On the chart, this corresponds to asking if any (active) edge ending in the starting vertex of the proposed edge needs this this kind of edge, directly or indirectly. The procedure for computing the answer to this again makes use of the reachability relation (cf. section 2.1.2). 11 Adding top-down filtering to the LC strategy above produces the following strategy.</Paragraph>
      <Paragraph position="16"> Strategy 5 (Let) Let v be the vertex from which the triggering edge T extends. Let At, ..., Am be the active edges incident to v, and let r(A~) be their l*Kilbury (1985:10) actually makes use of a similar relation encoding the left-branchings of the grammar (the &amp;quot;firstrelation&amp;quot;), but he uses it only for speeding up grammar-rule access (by indexing rules from left corners) and not for the purpose of filtering out unwanted edges.</Paragraph>
      <Paragraph position="17"> respective first required constituents. -- Whenever an inactive edge is added to the chart, if its category is T, then for every rule C in G with T as left-corner symbol add an empty active C edge if for some i r(A,) = C or r(A,)~C.</Paragraph>
      <Paragraph position="18"> Analogously, adding top-down filtering to Kilbury's strategy LCK results in the following.</Paragraph>
      <Paragraph position="19"> Strategy 6 (LCKt) (Same preconditions as above.) -- Whenever an inactive edge is added to the chart, if its category is T, then for every rule C in G with T as left-corner symbol add a C edge subsuming the T edge if for some i r(A,) = C or r(A~)~C.</Paragraph>
      <Paragraph position="20"> One of the advantages with chart parsing is direction independence: the words of a sentence do not have to be parsed strictly from left to right but can be parsed in any order. Although this is still possible using top-down filtering, processing becomes somewhat less straightforward (cf. Kay 1982:352). The simplest way of meeting this problem, and also the solution adopted here, is to presuppose left-to-right parsing.</Paragraph>
      <Paragraph position="21">  By again adopting a kind of lookahead and by utilizing the reachability relation )~, it is possible to limit the number of edges built even further. This lookahead can be realized by performing a dictionary lookup of the words before actually building the corresponding inactive edges, storing the results in a table. Being analogous to the filter used in the directed top-down strategy, this filter makes sure that a predicted edge can somehow be extended given the category/categories of the next word. Note that this filter only affects active predicted edges.</Paragraph>
      <Paragraph position="22"> Adding selectivity to Kilbury's strategy LCK results in the following.</Paragraph>
      <Paragraph position="23">  Let pl,..., p,, be the categories of the word corresponding to the preterminal edges extending from the vertex to which the T edge is incident.</Paragraph>
      <Paragraph position="24"> Let r(C) be defined as above. -- Whenever an inactive edge is added to the chart, if its category is T, then for every rule C in G with T as left-corner symbol add a C edge subsuming the T edge if for some \] r(C) = py or r(C)~py.</Paragraph>
      <Paragraph position="25">  The final step is to combine the two previous strategies to arrive at a maximally directed version of Kil- null bury's strategy. Again, left-to-right processing is presupposed.</Paragraph>
      <Paragraph position="26"> Strategy 8 (LCK,t) Let r(A,), r(C), and pj be defined analogously to the previous. -- Whenever an inactive edge is added to the chart, if its category is T, then for every rule C in G with T as left-corner symbol add a C edge subsuming the T edge if for some i r(A,) = C or r(A,)~C and for some i r(C) = py or r(C)\]~pj.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="229" end_page="5278" type="metho">
    <SectionTitle>
3 Empirical Results
</SectionTitle>
    <Paragraph position="0"> In order to assess the practical behaviour of the strategies discussed above, a test bench was developed where it was made possible in effect to switch between eight different parsers corresponding to the eight strategies above, and also between different grammars, dictionaries, and sentence sets.</Paragraph>
    <Paragraph position="1"> Several experiments were conducted along the way. The test grammars used were first partly based on a Swedish D-PATR grammar by Merkel (1986).</Paragraph>
    <Paragraph position="2"> Later on, I decided to use (some of) the data compiled by Tomita (1986) for the testings of his extended LR parser.</Paragraph>
    <Paragraph position="3"> This section presents the results of the latter experiments. null</Paragraph>
    <Section position="1" start_page="229" end_page="229" type="sub_section">
      <SectionTitle>
3.1 Grammars and Sentence Sets
</SectionTitle>
      <Paragraph position="0"> The three grammars and two sentence sets used in these experiments have been obtained from Masaru Tomita and can be found in his book (Tomita 1986).</Paragraph>
      <Paragraph position="1"> Grammars I and II are toy grammars consisting of 8 and 43 rules, respectively. Grammar III with 224 rules is constructed to fit sentence set I which is a collection of 40 sentences collected from authentic texts. (Grammar IV with 394 rules was not used here.) Because grammar Ill contains one empty production, not all sentences of sentence set I will be correctly parsed by Kilbury's algorithm. For the purpose of these experiments, I collected 21 sentences out of the sentence set. This reduced set will henceforth be referred to as sentence set I. 12 The sentences in this set vary in length between 1 and 27 words.</Paragraph>
      <Paragraph position="2"> Sentence set II was made systematically from the schema noun verb det noun (prep det noun) &amp;quot;-z.</Paragraph>
      <Paragraph position="3"> 12The sentences in the set are 1-3, 9, 13-15, 19-25, 29, and 35-40 (cf. Tomita 1986:152).</Paragraph>
      <Paragraph position="4"> An example of a sentence with this structure is ~I saw the man in the park with a telescope...'. In these experiments n = 1, ..., 7 was used.</Paragraph>
      <Paragraph position="5"> The dictionary was constructed from the category sequences given by Tomita together with the sentences (Tomita 1986 pp. 185-189).</Paragraph>
    </Section>
    <Section position="2" start_page="229" end_page="229" type="sub_section">
      <SectionTitle>
3.2 Efficiency Measures
</SectionTitle>
      <Paragraph position="0"> A reasonable efficiency measure in chart parsing is the number of edges produced. The motivation for this is that the working of a chart parser is tightly centered around the production and manipulation of edges, and that much of its work can somehow be reduced to this. For example, a measure of the amount of work done at each vertex by the procedure which implements ~the fundamental rule&amp;quot; (Thompson 1981:2) can be expressed as the product of the number of incoming active edges and the number of outgoing inactive edges. In addition, the number of chart edges produced is a measure which is independent of implementation and machine.</Paragraph>
      <Paragraph position="1"> On the other hand, the number of edges does not give any indication of the overhead costs involved in various strategies. Hence I also provide figures of the parsing times, albeit with a warning for taking them too seriously, zs The experiments were run on Xerox 1186 Lisp machines. The time measures were obtained using the Interlisp-D function TIMEALL. The time figures below give the CPU time in seconds (garbage-collection time and swapping time not included; the latter was however almost non-existent).</Paragraph>
    </Section>
    <Section position="3" start_page="229" end_page="5278" type="sub_section">
      <SectionTitle>
3.3 Experiments
</SectionTitle>
      <Paragraph position="0"> This section presents the results of the experiments.</Paragraph>
      <Paragraph position="1"> In the tables, the fourth column gives the accumulated number of edges over the sentence set. The second and third columns give the corresponding numbers of active and inactive edges, respectively. The fifth column gives the accumulated CPU time in seconds. The last column gives the rank of the strategies with respect to the number of edges produced and, in parentheses, with respect to time consumed (ff differing from the former).</Paragraph>
      <Paragraph position="2"> Table 1 shows the results of the first experiment: running grammar I (8 rules) with sentence set II (7 sentences). There were 625 parses for every strategy (1, 2, 5, 14, 42, 132, and 429).</Paragraph>
      <Paragraph position="3"> iSThe parsers are experimental in character and were not coded for maximal efficiency. For example, edges at a given vertex are being searched linearly. On the other hand, grammar rules (llke reachability relations) are indexed through pre-compiled hashtables.</Paragraph>
      <Paragraph position="4">  Table 2 shows the results of the second experiment: grammar II with sentence set II. This grammar handles PP attachment in a way different from grammars I and III which leads to fewer parses: 322 for every strategy.</Paragraph>
      <Paragraph position="5"> Table 3 shows the results of the third experiment: grammar III (224 rules) with sentence set II. Again, there were 625 parses for every strategy.</Paragraph>
      <Paragraph position="6"> Table 4 shows the results of the fourth experiment: running grammar III with sentence set I (21 sentences}. There were 885 parses for every strategy.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="5278" end_page="5278" type="metho">
    <SectionTitle>
4 Discussion
</SectionTitle>
    <Paragraph position="0"> This section summarizes and discusses the results of the experiments.</Paragraph>
    <Paragraph position="1"> As for the three undirected methods, and with respect to the number of edges produced, the top-down (Earley-style) strategy performs best while the standard left-corner strategy is the worst alternative. Kilbury's strategy, by saving active looping edges, produces somewhat fewer edges than the standard left-corner strategy. More apparent is its time advantage, due to the basic simplicity of the strategy. For example, it outperforms the top-down strategy in experiments 2 and 3.</Paragraph>
    <Paragraph position="2"> Results like those above are of course strongly grammar dependent. If, for example, the branching factor of the grammar increases, top-down overpredictions will soon dominate superfluous bottom-up substring generation. This was clearly seen in some of the early experiments not showed here. In cases like this, bottom-up parsing becomes advantageous and, in particular, Kilbury's strategy will outperform the two others.</Paragraph>
    <Paragraph position="3"> Thus, although Wang (1985:7) seems to be right in claiming that ~... Earley's algorithm is better than Kilbury's in general.&amp;quot;, in practice this can often be different (as Wang himself recognizes). Incidentally, Wang's own example (:4), aimed at showing that Kilbury's algorithm handles right recursion worse than Earley's algorithm, illustrates this: Assume a grammar with rules S --* Ae, A --* aA, A -* b and a sentence aa a a a b c&amp;quot; to be parsed. Here a bottom-up parser such as Kilbury's will obviously do some useless work in predicting several unwanted S edges. But even so the top-down overpredictions will actually dominate: the Earley-style strategy gives 16 active and 12 inactive edges, totailing 28 edges, whereas Kilbury's strategy gives 9 and 16, respectively, totalling 25 edges.</Paragraph>
    <Paragraph position="4"> The directed methods -- those based on selectivity or top-down filtering -- reduce the number of edges very significantly. The selectivity filter here  turned out to be much more time efficient, though.</Paragraph>
    <Paragraph position="5"> Selectivity testing is also basically a simple operation, seldom involving more than a few lookups (depending on the degree of lexical ambiguity).</Paragraph>
    <Paragraph position="6"> Paradoxically, the effect of top-down filtering was to degrade time performance as the grammars grew larger. To a large extent this is likely to have been caused by implementation idiosyncrasies: active edges incident to a vertex were searched linearly; when the number of edges increases, this gets very costly. After all, top-down filtering is generally considered beneficial (e.g. Slocum 1981:4).</Paragraph>
    <Paragraph position="7"> The maximally directed strategy m Kilbury's algorithm with selectivity and top-down filtering remained the most efficient one throughout all the experiments, both with respect to edges produced and time consumed (but more so with respect to the former). Top-down filtering did not degrade time performance quite as much in this case, presumably because of the great number of active edges cut off by the selectivity filter.</Paragraph>
    <Paragraph position="8"> Finally, it should be mentioned that bottom-up parsing enjoys a special advantage not shown here, namely in being able to detect ungrammatical sentences much more effectively than top-down methods (cf. Kay 1982:342).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML