File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/89/j89-4001_concl.xml

Size: 6,067 bytes

Last Modified: 2025-10-06 13:56:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="J89-4001">
  <Title>A PARSING ALGORITHM FOR UNIFICATION GRAMMAR</Title>
  <Section position="9" start_page="230" end_page="230" type="concl">
    <SectionTitle>
7.2 THE IMPLEMENTATION
</SectionTitle>
    <Paragraph position="0"> Our implementation is a Common Lisp program on a Symbolics Lisp Machine. The algorithm as stated is recursive, but the implementation is a chart parser. It builds a matrix called &amp;quot;rules&amp;quot; and sets rules\[/k\] equal to dr(i,k), considering pairs \[i k\] in the same order used for the induction argument in the proof. It also builds a matrix &amp;quot;symbols&amp;quot; and sets symbols\[/k\] to the set of symbols that derive a\[i k\], and a matrix pred with pred\[i\] equal to the set of symbols that follow a\[0 i\].</Paragraph>
    <Paragraph position="1"> Currently the standard parser does not incorporate prediction. We have found that prediction reduces the size of the chart dramatically, but the cost of prediction is so great that a purely bottom-up parser runs faster.</Paragraph>
    <Paragraph position="2">  Table 1 presents the results of predicting different features on a sample of 11 sentences. It describes parsing without prediction, with prediction of categories only, with traces and categories, and finally with categories, traces, and verb form information. In each case it lists the total number of entries in the matrices &amp;quot;rules&amp;quot; and &amp;quot;symbols&amp;quot; for every sentence, and the total time to parse the 11 sentences. The reader should compare this table with the one in Shieber 1985. Shieber tried predicting subcategorization information along with categories. In our grammar there is a separate VP rule for each subcategorization frame, and this rule gives the categories of all arguments of the verb.</Paragraph>
    <Paragraph position="3"> Shieber eliminated these multiple VP rules by making the list of arguments a feature of the verb. Therefore by predicting categories alone, we get the same information that Shieber got by predicting subcategorization information. The table shows that for our grammar, prediction reduces the chart size drastically, but it is so costly that a straight bottom-up parser runs faster than any version of prediction.</Paragraph>
    <Paragraph position="4"> The parsing tables for the present grammar are quite tractable. The largest table is the table of chain rules, which has 2,270 entries and takes under ten minutes to build. A prediction table that predicts categories, traces, and verb forms has 1,510 entries and takes six minutes to build.</Paragraph>
    <Paragraph position="5"> Computational Linguistics, Volume 15, Number 4, December 1989 231 Andrew Haas A Parsing Algorithm for Unification Grammar In the special case of a context-free grammar, our parsing program is essentially the same as the parser of Graham et al. (1980), in particular algorithm 2.2 of that paper. The only significant differences are that their chart includes entries for empty substrings, which we omit, and that we record symbols while they record only dotted rules. When running on a context-free grammar, the parser takes time proportional to the cube of the length n of the input string--because the number of symbolic products is proportional to n 3, and the time for a symbolic product is independent of the input string. This result also holds for a grammar without cyclic function letters. If there are cyclic function letters, the size of the nonterminals built by the parser depends on the length of the input, so the time for unifications and symbolic products is no longer independent of the input, and the parsing time is not bounded by n 3.</Paragraph>
    <Paragraph position="6"> To save storage we use a simplified version of structure-sharing (Boyer and Moore 1972). Following the suggestion of Pereira and Warren (1983), we use structure-sharing only for dotted rules with symbols remaining after the dot. When the dot reaches the end of the right side of a rule, we translate the left side of the rule back to standard representation. This method guarantees that in each resolution only one resolvent is in structure-sharing representation. Instead of general resolution we are doing what the theorem-proving literature calls input resolution. This allows us to represent a substitution as a simple association list, using the function assoc to retrieve the substitutions that have been made for variables.</Paragraph>
    <Paragraph position="7"> Pereira (1985) describes a more sophisticated version of structure-sharing. This method has two advantages over our version. First, the time to retrieve a substitution is O(log n), where n is the length of the derivation, compared to O(n) for Boyer-Moore. Second, only symbols that derive the empty string need to be translated from structure-sharing form to the standard representation, and this saves storage. The first advantage may not be important, for two reasons. By using a single assoc to retrieve a substitution, we reduce the constant factor in O(n). Also by eliminating the structure sharing each time the dot reaches the end of a rule, we keep our derivations short--n is no more than the length of the right side of the longest rule. The second advantage of Pereira's method is more important, since our current parser uses a lot of storage.</Paragraph>
    <Paragraph position="8"> The other optimizations are fairly obvious. As usual we skip the occur check in our unifications (as long as there are no cyclic sorts, this is guaranteed to be safe). In each symbolic product, one set is indexed by the topmost function letter of the term to be matched, which saves a good number of failed unifications. These simple techniques gave us adequate performance for some time, but as the grammar grew the parser slowed down, and we decided to rewrite the program in C. This version, running on a Sun 4, is much more efficient. It parses a corpus of 790 sentences, with an average length of nine words, in half an hour.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML