XML Viewer - c92-1058

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-1058_metho.xml
Size: 27,700 bytes
Last Modified: 2025-10-06 14:12:53
<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-1058">
  <Title>The Primordial Soup Algorithm A Systematic Approach to the Specification of Parallel Parsers</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Primordial Soup
</SectionTitle>
    <Paragraph position="0"> The Primordial Soup Algorithm will be intro duced after some renlarks about notation and parsers We show that the algorithm is a generalization ()\[ well-known parsing strategies.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 l'reliminaries
</SectionTitle>
      <Paragraph position="0"> We nse the following notational conventions.</Paragraph>
      <Paragraph position="1"> Nonterminals are denoted by A, 1/,.., E N; ter rninals arc denoted by a,b,... E ~:. We write V for NUY\], with typieal elements X, Y~.... Termi nal strings are denoted by s, t,u,v,w,... E E*,</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ACI~.S DE COLING-92, NAN'IES, 23-28 AO(Z|' 1992 3 7 3 PkoC. oJ: COLING-92, N^N'rES, AUC;. 23 28. 1992
</SectionTitle>
    <Paragraph position="0"> arbitrary strings by c~,/3,.., E V*.</Paragraph>
    <Paragraph position="1"> Let G = (N, Z, P, S) be a context free grammar. Let w -= al...a, E ~* be the sentence. While executing an arbitrary parsing algorithm, we maintain a set of trees that might be su.btrees of a parse for w. Let .Tbt be the class of finitely branching trees, in which all nodes have a label from some universal class of symbols. Let T(G) C J:bt be the class of trees that can be constructed from P; i.e., if some node is labelled A and its children X1,..., X~, then A--~XI ... X, E P. We will usually write T for T(G); individual trees are denoted p, a, T,. * * E T.</Paragraph>
    <Paragraph position="2"> We write root(T) for the label of the root of a tree T. The yield of a tree T, denoted by yield(T)~ is defined as the concatenation of the labels of the leaves. Clearly, y~eld(T) ~ V*. Note that leaves labelled C/ (generated by empty productions) are not visible in the yield as C/ disappears in concatenation. A tree T is a parse tree for w if root(v) = S and yield(v) = w. For arbitrary w E \]E* a subclass T~ C T is defined that conrains trees v with yield(v) = ai...aj for some substring al..-aj of w. T~ is called the set of subparses of w. The root of a subparse need not be 8, it can be any nonterminal A E N.</Paragraph>
    <Paragraph position="3"> As a convenient notation for trees we write = (A &amp;quot;~ ~) for an arbitrary tree with A = root(T) and c~ = yield(T). In general {A-~ c~) is not uniquely determined, as every derivation A=~+a defines a tree (A -,-* a). If we want to stress that a derivation A=~,+a/3&amp;quot;~ can be obtained as A:::~+ aB&amp;quot;/::~+ afl&amp;quot;/ we write (A',-* a (B&amp;quot;,~ ~) ~/) for the tree (A-,-*~fl-y). Thus the tree notation is generalized into (A',~I... ~.), where ~ is either a leaf or a subtree. This simple tree notation is extended with the following conventions: * A tree (A-,,*~) corresponding to a single-step derivation A=~a is also denoted as (A--~a).</Paragraph>
    <Paragraph position="4"> This corresponds to a production A-..*a E P.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Various bottom-up parsers
</SectionTitle>
      <Paragraph position="0"> Our basic approach results from a generalization of various bottom-up parsing algorithms. The oldest and perhaps best known of these is the Cocke-Younger-Kasami (CYK) algorithm \[You\].</Paragraph>
      <Paragraph position="1"> It requires the grammar to be in Chomsky Normal Form, i.e., productions have the form A---*BC or A--re. If we have trees vl = (B&amp;quot;.za~+l... ak) and v2 -- (C',*a~+l...at) and if there is a production A~BC E P, we can construct a larger tree (A',~a~+l ... aj) from vl and v~. This can be continued until (S',~al... a, / has been derived, or no new trees can be constructed.</Paragraph>
      <Paragraph position="2"> The CYK algorithm is usually described as a recognizer, rather than a parser. A recognition algorithm collects a set of items that denote the existence of trees, rather than trees themselves.</Paragraph>
      <Paragraph position="3"> If it is deduced that A:~*a~+l .. * a t (without having constructed a corresponding tree), this will be denoted by an item \[A',.*a~+l... at\]. In general, an item \[A.-~ a\] denotes the existence of one or more trees (A.,*a I. The string w is grammatically correct if and only if an item \[S-~ w\] can be recognized.</Paragraph>
      <Paragraph position="4"> The CYK algorithm recognizes items of the form \[A-,-* a~+l...aj\]. For notational convenience, such an item is usually written as \[i, A,j\]. Thus we get the conventional description of CYK recognition: An item \[i,A,j\] can be recognized iff \[i, B, k\] and \[k, C, j\] have been recognized previously for some i &lt; k &lt; j and A---*BC E P.</Paragraph>
      <Paragraph position="5"> Several recognition and parsing algorithms deal with arbitrary context-free grammars along the same line as CYK, involving some more technicalities for handling productions of arbitrary length, including e-productions. For example, a bottom-up variant of Earley's recognition algorithm \[Ear, GHR\] recognizes items of the form \[i, A--*ao/3, j\] denoting the fact that a~*a~+l...aj. That is, the first part of a production has been recognized. If fl = e, i.e. the item is of the form \[i, A--~a.,j\], the entire production has been recognized; such an item denotes the existence of a tree (A-~ a~+l.., at). We call this algorithm Bottom- Up Earley (BUE) in the sequel; the top-down filter of Earley's algorithm has been deleted so as to allow parallel bottom-up, rather than left-to-right processing of the string.</Paragraph>
      <Paragraph position="6"> Still, BUE recognizes each individual nonterminal in left-to-right manner, for which there is no a priori reason. De Vreught and Honig \[dVH\] describe a similar, more general algorithm (which we abbreviate VH), using double dotted items \[i,A--*a./3,'y,j\] where/3=C/.*a~+1 .&amp;quot; a t. In this case /3 corresponds to a part of the string that has been recognized, whereas a and &amp;quot;~ still need to be recognized.</Paragraph>
      <Paragraph position="7"> Both BUE and VH can easily be extended AcrEs DE COLING-92, NANTES, 23-28 AOOT 1992 3 7 4 Pgoe. OF COLING-92, NANTES, AUG. 23-28, 1992 to parsing algorithms, producing partial parse trees of the form (A---, (a &amp;quot;.~ ai+l...aj)~) and (A--*~ (/~&amp;quot;-*ai+l'&amp;quot; aj) 7), respectively.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 The Primordial Soup Algorithm
</SectionTitle>
      <Paragraph position="0"> VH is by no means the most general algorithm.</Paragraph>
      <Paragraph position="1"> As the ultimate generalization we can allow any tree in T. The top is a nonterminal and the leaves (:an be any symbol in V; a tree may or may not be part of a parse for w.</Paragraph>
      <Paragraph position="2"> Initially we start with elementary trees that correspond to the productions in our grammar.</Paragraph>
      <Paragraph position="3"> New trees can be added by merging (copies of) existing trees which agree on their common parts. This can be seen as some kind of unification process on parse trees. The string is parsed when a tree T = (S &amp;quot;,~ al ... a,,) is produced; the algorithm terminates when no new trees can be added. Metaphorically speaking, one can think of the initial set of trees as a primordial soup in which small structures react with each other, creating ever larger and more complicated structures. We therefore call it the Primordial Soup Algorithm. Superficially, it may resemble the unification space of Vosse and Kempen \[VK\], who think of molecules floating in a test-tube and entering into chemical bonds with other molecules.</Paragraph>
      <Paragraph position="4"> The paradigms are different however, as in the primordial soup, unlike the test-tube, raw material abounds and and multiple copies of any structure can be created.</Paragraph>
      <Paragraph position="5"> The most general version of the Primordial Soup Algorithm--allowing to combine trees by unification of arbitrary overlapping parts is a formalism in which a wide variety of parsing algorithms can be specified with great ease. Before that, we first formalize a slightly limited, but somewhat easier version of the Primordial Soup Algorithm.</Paragraph>
      <Paragraph position="6"> The algorithm starts of with an initial set of recognized trees S consisting of trees corresponding to the productions in our grammar. New trees can be added to S by taking combinations of existing trees. The simplest way to combine trees is the following.</Paragraph>
      <Paragraph position="7"> Let c~ = (A~-* c~B3') C/ S and r = (B--~ fl) (: S. We can unify the leaf B in a with the root B in r, yielding a new tree (A --~ a (B ,,~ fl} &amp;quot;/). This tree is denoted by o&lt;1T. The (partial) function &lt;1 : QCbt x Ybt--,.Ybt is called composition. Note that there can be inultiple occurrences of B in yield(a), which means that a&lt;lT need not be determined uniquely. Also, we will use the operator &lt;1 in a liberal way, allowing more than one extension to be made at the same time. Let</Paragraph>
      <Paragraph position="9"> for the tree (A&amp;quot;~a0 (B,'-~fl~)ch (B=',zf~2) ~2), using &lt;1 as a polyadic operator with one left-hand argument and an arbitrary number of right-hand arguments.</Paragraph>
      <Paragraph position="10"> As initial contents of the primordial soup, we take the trees (A--*~} corresponding to productions A--*~ e P. Such a tree (A-+c~) is called a production tTee or a production for short. We define an operator `4 : 27--~27 that yields all new trees that can be composed from the contents of the soup by A(s) &amp;quot;deg--' {~&lt;1r,,..., rk ~ 7- I {~, ~,..., ~k} c s}.</Paragraph>
      <Paragraph position="11"> This definition of .4 has one shortcoming, however. Rather than all parses for all sentences we only want the parses for one particular sentence w (~ Z*. In general, this problem is tackled by redefining A as `4(S) &amp;quot;deg' {a&lt;~,. .., ~ C 7- I {a, ~,,..., ~~} C S A allowed(a&lt;1rh..., ~'k)} in which a predicate allowed specifies which trees are allowed to be added. Which trees can be discarded right away, and which ones should be added to the soup? As we are only interested in trees that can be extended to parses for some specific sentence w, the terminal part of the yield should he extendable to w. That is, w can be produced from yield(r) by replacing every nonterminal in r with some string of terminals. Formally, for terminal strings s (~ E* we define extends(s, t) d~r 3U, v C ~:*(t = USV), i.e. s is a substring of t. For strings in V* containing at least one nonterminal, we define extends reeursively a.~ extend4(~Z, t) '~deg~ 3s C ~: (extends(~sZ, t)). Finally we define atto~ed ( r ) adegS extends(~ield(C/), w), in accordance with the informal definition given above. Note, however, that we still may create an infinite number of useless trees, simply by not adding terminals to the yieht! If yield(r) C N* then allowed(T) holds: each leaf can be extended t() ~, and the empty string is indeed a substring of w. In 3.2 we will see how this problem can be tackled in general; here we will only regard a subclass of 7&amp;quot; ttmt does not contain trees with arbitrarily large nonterminal yields.</Paragraph>
      <Paragraph position="12"> ACq'ES DE COL1NG-92, NANTES, 23-28 Aotrr 1992 3 7 5 Paoc. OF COLING-92. NANTES, AUG. 23-28. 1992 This finally allows us to define the Primordial Soup Algorithm.</Paragraph>
      <Paragraph position="14"/>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Specifying parse strategies
</SectionTitle>
      <Paragraph position="0"> More specific and more useful instances of the algorithm can be defined by imposing restrictions on the trees to be added. A strategy is a characterization of trees that are to be added to the primordial soup S under some additional constraints. Different constraints specify different strategies. We call it strategy, rather than algorithm, as no control structure is specified explicitly. For the sake of simplicity we assume that ~4(S) is added all at once, but it should be understood that, if so desired, only a subsets of ~4(S) need be added at each step. A strategy can be refined into a (parallel or sequential) algorithm by adding control structure and data structures so as to keep track of intermediate results in an efficient manner. For examples of the design of parsing algorithms from such strategies, see \[JPSZ\].</Paragraph>
      <Paragraph position="1"> Parsing strategies can be characterized by two types of restrictions: on the types of trees allowed in the soup and on the operators that create new trees from existing ones. Both kinds of restrictions are interchangeable most of the time; if trees are allowed to combine only in some specific way, the set of generated trees will be restricted, and vice versa.</Paragraph>
      <Paragraph position="2"> As a simple example, we will specify a strategy for the CYK parser. To that end, we define an additional predicate complete(T) a~ yield(r) e ~* i.e., a tree is complete if its yield does not contain any uonterminal. Such a tree can only be u~'d as a right-hand side argument of a composition. Recalling that the CYK algorithm is defined only for grammars in Chomsky Normal Form (i.e., productions are of the type A---~BC and A-+a), we can define the CYK strategy by</Paragraph>
      <Paragraph position="4"> Apart from the initial production trees, S will only contain trees of the form (A-,~ ai+l.-, aj).</Paragraph>
      <Paragraph position="5"> The complete predicate specifies that newly created trees have a terminal yield; this must be a subtring of w due to the allowed predicate. It is trivial to verify that all such trees are added to S in due course. Hence the specification of CYK is sound and complete.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Other parse strategies
</SectionTitle>
    <Paragraph position="0"> We redefine the Primordial Soup Algorithm from section 2 in a more general manner, and show its power and elegance by specifying the parsing strategies of Bottom-Up Earley, De Vreught &amp; Honig and some variants of CYK.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Unification and superposition
</SectionTitle>
      <Paragraph position="0"> In section 2 we used only the composition operator &lt;1 to create new trees from existing ones.</Paragraph>
      <Paragraph position="1"> Composition can be seen as a specific case of superposition, in which arbitrary overlapping parts of trees can be unified.</Paragraph>
      <Paragraph position="2"> We will first define unification, which is a special case of superposition in which the roots of two trees are mapped onto each other, for the definition of unification, we use the derivation operator =~ for trees. If T = (A'~ c~B~) and a = (A-~ ~(B--~)'y), we write r=~cr. A tree a is called an extensioT, of r if r=C'*a, where =~* means applying the derivation =C/- zero or more times. Now two trees r and a unify if a tree p exists that is an extension of both cr and r. I.e., unify(a, r) %~ ~p C T(T=C.*p A a:=C/.* p) .</Paragraph>
      <Paragraph position="3"> p is called an upper bound of r and a. Furthermore, if a and r unify, there is a unique least upper bound, denoted by rkla, satisfying if T=C-*p and a=:C,*p then rllff=:~'*p .</Paragraph>
      <Paragraph position="4"> rtJa is called the unification of T and a. Note that the roots of r and a coincide in TUcr. Unification can be generalized to superposition by allowing the root of one tree to be unified with an arbitrary node of the other tree, under the constraint that the overlapping parts of both trees are be identical; see Figure 1. This superposition operator is denoted by ~. Note that, in general, superposition is not uniquely determined. Hence it is defined as a function ~ : .7:bt x .Ybt--*2 3:bt, whereas unification is defined as a partial function kl : .Ybt x .Wbt---*JYbt. For a more formal definition, see \[JPSZ\].</Paragraph>
      <Paragraph position="6"> if anT' = p for a subtree r t of % then 7&amp;quot; is replaced by p.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Some general restrictions
</SectionTitle>
      <Paragraph position="0"> As discussed in 2.3, we do not want to recognize all trees leading to parses of arbitrary strings.</Paragraph>
      <Paragraph position="1"> We introduced the general idea that a tree is allowed only if the terminal part of the yield extends to the sentence. For the CYK algorithm, this simple criterion is fine. In general, however, it is too restrictive, in the sense that some familiar parsing algorithms cannot handle it. Suppose, for example, that a tree (A,',~ aB) is extcnded with a production (B--*bCd) into {A ,~ abCd). In principle, this should only be allowed if ab and d occur in w in this order. A parser which uses only local information, e.g. an LR(1) parser, cannot determine wtmther a terminal d occurs somewhere in the string, perhaps after a large substring produced by (7.</Paragraph>
      <Paragraph position="2"> We will use a rather more subtle scheme to match the yield of a tree against the sentence, so as to allow for refinement into arbitrary parsing algorithms. Having a tree {A-,~ aB) we can check that a occurs in w and mark the leaf a accordingly. Marking a leaf is denoted by underlining the terminal symbol. The tree (A ~ aB) can be extended to (A'~ a(B-+bCd) ) = (A~.~ abCd) and then to {A~,~abCd), irrespective of whether d occurs in the string at all.</Paragraph>
      <Paragraph position="3"> The notion of marking ternfinals with occurrences in the string fits quite well to parsing natural languages, rather than un-interpreted context-free grammars. In practical NL parsing, the word categories rather than the individual words are used as terminals, although they are in fact pre-terminals. Using the word categories as terminals, a marked terminal is a word category applied to a word from the sentence.</Paragraph>
      <Paragraph position="4"> As an example, consider the sentence the bird flies. The initial soup might contain:</Paragraph>
      <Paragraph position="6"> Word categories need not be uniquely defined. In this case the word flies fits into two categories. A tree (NP ~ the noun) could be combined with (noun~fiies), yielding a noun phrase the flies.</Paragraph>
      <Paragraph position="7"> This tree is ruled out by extends, however, as the flies does not extend to the bird flies.</Paragraph>
      <Paragraph position="8"> In summary, we distinguish two types of ini tial trees:</Paragraph>
      <Paragraph position="10"> The extends predicate can be defined so as to apply to strings of markings (i.e. words) rather than terminals. Furthermore, if we do not want to construct arbitrarily large trees with a non-marked yield, we can define def allowed ( r ) = extends(yield(r),w) ^ lyield(r)l &lt; Iwl.</Paragraph>
      <Paragraph position="11"> Finally, allowing arbitrary tree construction with superposition (~) rather than composition (&lt;1), a general version of the operator A is given  For acyclic grammars (i.e., grammars that do not allow a derivation A:=~+A), only a finite number of trees can be constructed, hence the algorithm is guaranteed to halt. When a gramnlar is cyclic, an infinite number of parses exist. Every finite (subtree of a) parse will be found within a finite number of steps.</Paragraph>
      <Paragraph position="12"> From the point of efficiency, the above algorithm isn't sensible at all. lts strength, however, derives from the fact that a very large class of parallel parsing algorithms can be defined as specializations, by constraining the general algorithm in various ways. Some examples will be given shortly.</Paragraph>
      <Paragraph position="13"> Ac2Y~s DE COLING-92, NANTES, 23-28 AOt~2&amp;quot; 1992 3 7 7 PROC. OF COL1NG-92, NANTES, AUG. 23-28, 1992 We have concentrated on context-free grammars for the sake of simplicity. It should be clear, though, that extension to various types of unification grammars is straightforward.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Different breeds of trees
</SectionTitle>
      <Paragraph position="0"> As we have seen in the CYK example, complete trees are an important class of trees. But, having introduced markers, it is obvious that we consider a tree to be complete only if the entire yield has been marked. Therefore we redefine complete(v) d.~ yield(~) e U*.</Paragraph>
      <Paragraph position="1"> Note that all marker trees are complete, and that production trees axe complete iff they correspond to an e-production.</Paragraph>
      <Paragraph position="2"> Palm trees consist of a roof (corresponding to a single production) and a trunk (consisting of a number of adjacent complete trees). They are the result of composing production trees and complete trees. We can define them as )d~f</Paragraph>
      <Paragraph position="4"> By notational convention, A-~fl? is a pro~ duction and v E E*. Note that in general is a sequence of symbols X1 ..' X,; each X~ is the root of a complete tree XC,~ P-r Degenerate cases, with only a trunk (a~ = e) or only a roof</Paragraph>
      <Paragraph position="6"> As a generalization of palm trees, we may consider trees with more than one trunk. This type of tree is denoted by baobab)  IThe baobab is ~.n African tree that has branches from which roots originate, supporting the roof. Such roots grow out to additional trunks.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 CYK revisited
</SectionTitle>
      <Paragraph position="0"> The only addition to our previous specification of CYK is that it should produce trees with marked yields. To that end, we can define an initial step A0(S) doj (a&lt;T e TIa, T e S A production(g) A marker(r)).</Paragraph>
      <Paragraph position="1"> For the remainder of the algorithm, production trees a = (A~BC) are composed with two complete trees T1 and T2 as usual, denoting ternary composition by a&lt;lzl, T2.</Paragraph>
      <Paragraph position="2"> TO keep in line with other algorithms to follow, we could alternatively define CYK with a binary composition operator. As a consequence, a new tree is created in two steps. First a production tree is combined with a complete tree, giving a palm. In the second step the palm is combined with a second complete tree, giving a new complete tree. We define two functions A:  A more liberal approach would be to allow the intermediate results to be in the soup: A,, ~S ~ d.~ At ($) u As(8). CYKk \] = For grammars in Chomsky Normal Form this hardly seems sensible. But when CYK is extended to arbitrary CFGs, a complete tree can be created from a production tree through an intermediate series of palm trees. If symbols in the right-hand side of a production can be recognized in arbitrary order, the condition palm(g) in the definition of A2 should be replaced by baobab (a).</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3.5 Bottom-Up Earley
</SectionTitle>
    <Paragraph position="0"> The BUE algorithm is defined for arbitrary context-free grammars. It is usually described as a recognition algorithm. An item \[i,A---*a,3,j\] denotes the fact that a~a,+t ' * * a s has been recognized. From \[i, A-~a.B%j\] and LJ, B-*3o, k\]a new item \[i, A--~aBdeg% k) can be derived. We will define the algorithm on trees, rather than items.</Paragraph>
    <Paragraph position="1"> Trees of the form (A--* (a',~v / 3) are recognized for v = ai+l *. * aj a substring of w.</Paragraph>
    <Paragraph position="2"> ACTES DE COLING-92. NANTES, 23-28 AOOT 1992 3 7 8 PgoC. OF COLING-92. NANTES, AUG, 23-28, 1992 We define the set of Earley trees g C 7&amp;quot; as PS ddegd {(A--* (a-,~v)Z) * T\] A--~a/~ E P ^ v E E*}.</Paragraph>
    <Paragraph position="3"> Note that productions (~ = e) and complete trees (~ = e) are also included in 8. The operation of the algorithm is described by * ABuE(8) d.f { a.~r E PS I o,r * 8 ^ allowed(a,~r)}.</Paragraph>
    <Paragraph position="4"> From the definition of g' it follows that a,~r * E iff complete(T) and the leftmost unmarked symbol of yield(a) is root(T). The soundness follows from the definitions and completeness is trivially proven with induction on the size of the tree, hence the algorithm is correct.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.6 De Vreught and Honig's algorithm
</SectionTitle>
      <Paragraph position="0"> The VH algorithm also uses complete trees and palm trees, with the difference that the trunk of a palm tree does not necessarily cover the left-most part of the roof. We define a set l) of trees, analogously to the set of Earley trees by</Paragraph>
      <Paragraph position="2"> The functions to combine trees are defined differently, however: Ads) ~degJ {o~r e v l o,r e s  The first operation was originally called inclusion, the second concatenation. The former combines a nonterminal tree and a complete tree to a palm tree, whereas the latter combines two palm trees into a palm tree with a wider trunk, using unification. It cannot result in a proper baobab because of the definition of 12. A subtle difference to the original algorithm is that we allow trunks of o and r to overlap, which is prohibited in their approach. It is not difficult to add this condition, if required.</Paragraph>
      <Paragraph position="3"> A similar result is obtained by replacing the functions ~4t and .A2 by a function similar to the one used for Earley's algorithm (but now for trees in 12 instead of in PS). Thus a generalized bottom-up Earley parser, for which left-to-right parsing of a constituent is not necessary, is defined by</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Conclusions
</SectionTitle>
    <Paragraph position="0"> The Primordial Soup paradigm facilitates the specification of parsing strategies, i.e., high-level specifications or parsing algorithms, without explicit control flow and data structures.</Paragraph>
    <Paragraph position="1"> A specification without control flow is a good basis for the design of a parallel implementation, as it allows a further refinement of the design before any decision on architecture is taken. For more details, see \[JPSZ\], where this has been exemplified with a design for a parallel CYK parser, using the Primordial Soup paradigm and the formalism introduced in \[JPZ\].</Paragraph>
    <Paragraph position="2"> The Primordial Soup framework can be used to design new parsing algorithms by mixing features of existing algorithms. For example, the Earley operator for tree composition in combination with the De Vreught &amp; Honig set of allowed trees yields a generalized Earley parser that has been rigorously defined in only two lines.</Paragraph>
    <Paragraph position="3"> The specification of parsing strategies is given in a formalism closely resembling predicate logic.</Paragraph>
    <Paragraph position="4"> This makes it almost trivial to derive prototype implementations in (parallel) logic programming languages like Prolog or Parlog \[JPSZ\].</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML