File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-2053_metho.xml

Size: 13,119 bytes

Last Modified: 2025-10-06 14:12:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-2053">
  <Title>A New Parallel Algorithm for Generalized LR Parsing</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Generalized LR Parsing Al-
</SectionTitle>
    <Paragraph position="0"> gorithm The execution of the generalized LR algorithm is controlled by an LR parsing table generated from predetermined grammar rules. Figure 1 shows an ambiguous English grammar structure, and Figure 2 an LR parsing table generated from Figure 1.</Paragraph>
    <Paragraph position="1"> Action table entries are determined by a parser's; state (the row of the table) and a look-ahead preterminal (the column of the table) of an input sentence. There are two kinds of stack operations: shift and reduce operations. Some entries in the LR table contain more than two operations and are thus in conflict. In such cases, a parser must conduct more than two operations simultaneously.</Paragraph>
    <Paragraph position="2"> The symbol 'sh N' in some entries indicates that the generalized LR parser has to push a look-ahead preterminal on the LR stack and go to 'state N'. The symbol 're N' means that the generalized LR parser has to reduce from the top of the stack the number of elements equivalent to that of the right-hand side of the rule numbered 'N'. The symbol 'ace' means that the generalized LR parser has success-Nlly completed parsing. If an entry contains no operation, the generalized LR parser will detect an error.</Paragraph>
    <Paragraph position="3"> The right-hand table entry indicates which state the parser should enter after a reduce op- null eration. The LR ta.ble shown in Figure 2 has two conflicts at state 11 (row no. 11) and ste~te (1) S -+ NP, VP.</Paragraph>
    <Paragraph position="4"> (2) S ~ S, PP.</Paragraph>
    <Paragraph position="5"> (3) NP --. NP, PP.</Paragraph>
    <Paragraph position="6"> (4) NP ~ det, noun.</Paragraph>
    <Paragraph position="7"> (5) NP --* pron.</Paragraph>
    <Paragraph position="8"> (6) VP --+ v, NP.</Paragraph>
    <Paragraph position="9"> (7) PP --+ p, NP.</Paragraph>
    <Paragraph position="10"> Fig.l: An Ambiguous English Grammar 12 for the 'p' column. Each of the conflicting  two entries contains a shift and a reduce operation and is called a shift-reduce conflict. When parser encounters a conflict, it cannot determine which operation should be carried out first. In our parser, conflicts will be resolved using a parallel processing technique such that the order of the operations in conflict is; of no concern.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="11" type="metho">
    <SectionTitle>
3 Brief Introduction to GHC
</SectionTitle>
    <Paragraph position="0"> Before explaining the details of our algorithm, we will give a brief introduction to GHC, typical statements of which are given in Figure 3.</Paragraph>
    <Paragraph position="1"> Roughly speaking, the vertical }::~ar i,l a. GttC statement (Fig.3) functions as a cut symbol of Prolog. When goal 'a' is executed, a process of statement (1) is activated and the body becomes a new goal in which 'b(Stream)' and 'c(Stream)' are executed simultaneously. In GHC, this is cMled AND-parallel execution.</Paragraph>
    <Paragraph position="2"> In other words, subprocesses 'b(Stream)' and  det noun pron v</Paragraph>
    <Paragraph position="4"> 'c(Strea,m)' are created by a parent process 'a' and they run in parallel. Note that the definition of process 'c' in statement (3) is going to instantiate the variable 'Stream' in 'c(Stre~m)' with '\[a \[ Streaml\]'. In such a case the execution of process 'c' will be suspended until 'Stream' has been instantiated by process 'b(Stream)'. By the recursive process ,:'all in the body of definition (2), process 'b' continues to produce the atom 'x' and places it on stream. The atom 'x' is sent to process 'c' by the GIIC stream communication; process )c' continues to consume atom 'x' on stream.</Paragraph>
    <Paragraph position="5">  (1) a:- true I b (Stream) , c (Stream) .</Paragraph>
    <Paragraph position="6"> (2) b(Stream):- true \[  is given in the form of a list data structure in GHC. Consider the following generalized LR parsing, using for the input sentence, the grammar and the table in Figure 1 and Figure 2 respectively. After the parser has shifted the word 'with', the following two stacks with the same top state '6' will be obtained:  Sentence : &amp;quot; I open the door with a key &amp;quot; (1) top &lt; \[ 3,s,0 \] (2) top &lt; \[ 6,p,12,np,8,v,4,np,0 \]  Fig.4: Two Stacks to be Merged We will merge these two stacks and get the following TSS:</Paragraph>
    <Paragraph position="8"> Figure 5 shows an image of the TSS above.</Paragraph>
  </Section>
  <Section position="5" start_page="11" end_page="11" type="metho">
    <SectionTitle>
4 New Parallel Generalized LR
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
Parsing Algorithm
</SectionTitle>
      <Paragraph position="0"> q'he new parallel parsing algorithm is a.</Paragraph>
      <Paragraph position="1"> ~a.tural extension of Tomita's algorithm _oml~.a 86\] and enables us to achieve greater paral\]el performance. In our algorithm, if a parsing sentence contains syntactic ambiguities, two or more parsing processes will run in parallel.</Paragraph>
      <Paragraph position="2"> 4:.1 Tree Structured Stacks 'lb avoid tile duplication of parsing processes, the new algorithm makes use of Tree Struc-</Paragraph>
    </Section>
    <Section position="2" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
4.2 Stack Operations on Stream
</SectionTitle>
      <Paragraph position="0"> In order to merge the stacks, Tomita's algorithm must synchronize the parsing processes for shift operations, thereby reducing parallel performance. &amp;quot;ib solve this problem, we have developed an improved parallel generalized LR algorithm that involves no waiting for shift operations before merging many stacks.</Paragraph>
      <Paragraph position="1"> The new algorithm is made possible by a GHC stream communication mechanism.</Paragraph>
      <Paragraph position="2"> Through this stream communication mechanism, a process that has completed a 'shift N' first has the privilege of proceeding to subsequent actions and continuing to do so until a reduce action pops an element with state 'N'  into the stack. If other parsing processes carry out the same 'shift N' actions, their stacks will be merged into the position in which the &amp;quot;privileged&amp;quot; process had, by the 'shift N' action, inserted an element. The merging of stacks is tlhus greatly facilitated by the GHC stream communication mechanism.</Paragraph>
      <Paragraph position="3"> To begin parsing, we will create a sequence of !goal processes, namely pl,p2,... ,pn,p$, each of which corresponds to a look-ahead preterminal of an input sentence (referred to hereafter as a parent process). The stack information is sent from process pl to process p$ using the GtIC communication mechanism.</Paragraph>
      <Paragraph position="4"> Each parsing process receives the TSS from its input stream, changes the TSS in parallel according to the LR table entry, and sends the results as the output stream -- which in turn becomes the input stream of the adjacent process. The stream structure is as follows: \[ Stackl,Stack2,...,Stackn \] Stream \] where Stacki is a TSS like (3) o1&amp;quot; a simple stack like (1).</Paragraph>
      <Paragraph position="5"> Consider the case where a shift-reduce conflict occurs and the parent process produces two subprocesses which create stacks (1) and (2) (Fig.4). In order to merge both stacks, Tornita's parser forces the parent process to wait until the two subprocesses have returned the stacks (1) and (2). Our algorithm attempts to avoid such synchronization: even though only one subprocess has returned stack (2), the parent process does not wait for stack (1), but generates the following stack structure and sends it on to the output stream (which in turn becomes the input stream of the adjacent process). The adjacent process can then perform its own operations for the top of stack (2) on the input stream. Thus the new algorithm achieves greater parallel performance than its predecessor.</Paragraph>
      <Paragraph position="6"> Output Stream of Parent Process : \[ \[6,p I Tail\] I Stream \] where '6,p' are the top two elements of the stack (2).</Paragraph>
      <Paragraph position="7"> Note that 'Tail' and 'Stream' remain undefined until the other subprocess returns stack (1). If the adjacent process wants to retrieve 'Tail' and 'Stream' after processing the top of stack (2), the process will be suspended until 'Tail' and 'Stream' have been instantiated by the rest of stacks (2) and (1).</Paragraph>
      <Paragraph position="8"> This kind of synchronization is supported by GItC. Let's suppose the adjacent process receives the above output stream from the pa.rent process. Before the parent process has generated stack (1), the adjacent process can execute 5 steps for the top two elements of stack (2) ( see Figure 6 ). During the execution of the adjacent process, the parent process will be able to run in parallel.</Paragraph>
      <Paragraph position="9"> As soon as the parent process receives stack (1) with the same top elements '6,p' of stack (2), it instantiates the variables 'Tail' and 'Stream' and merges '6,p', getting the same TSS shown in Figure 5:</Paragraph>
      <Paragraph position="11"> We need to consider the case where the top element of stack (1) is different from that of stack (2). For example, suppose that stack (1)  is \[ 8,p,3,s,o \], then the variables 'Tail' and 'Stream' will be instantiated as follows:</Paragraph>
      <Paragraph position="13"> In this case, we have two simple stacks in the stream.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="11" end_page="11" type="metho">
    <SectionTitle>
5 Comparison of Parallel Pars-
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
ing Performance
</SectionTitle>
      <Paragraph position="0"> In this section, we will show by way of a simple example that our algorithm has greater parallel parsing performance than Tomita's original algorithm. Consider the parallel parsing of the input sentence &amp;quot; I open the door with a key &amp;quot;, using a grammar in Figure 1 and a table in Figure 2. As the input sentence has two syntactic ambiguities, the parsing process encounters a shift-reduce conflict of tile LR table and is broken down into two subprocesses.</Paragraph>
      <Paragraph position="1">  and grammatical symbols which are put into a stack. When the process has the state 12 and tile look-aheM preterminal 'p', the process encounters a 'sh 6/re 6' conflict. Then it is broken down into two subprocesses: the first process performs the 'sh 6' operation and goes to state 6, and the other performs the 're 6' operation. The second process also goes to the state 6 after performing 'goto 9','re l','goto 3', and 'sh 6' operations. 'File processes that run according to the simple parallel LR parsing algorithm are shown in Figure 7(a).</Paragraph>
      <Paragraph position="2"> We can see that the two processes perform the same operations after performing the 'sh 6' operations. If we do not merge these kinds of processes, we will face an explosion in the number of processes. Tomita's algorithm ( shown in Figure 7(b) ) can avoid the duplication of parsing processes by merging them into one process. However, tile algorithm needs a synchronization that decreases the number of processes which are able to run in parallel.</Paragraph>
      <Paragraph position="3"> On the other tiand, our algorithm ( shown in Figure 7(c) ) does not require such synchronization as long as these processes do not try to reduce the incomplete part of a stack. In this example, two processes run in parallel after a 'sh 6/re 6' conflict has occurred. Then, an incomplete stack like \[6,plTail\] is created, with tile upper process in Figure 7(c)  performing the 'sh 1', 'sh 5', and 're 4' stack operations while the lower process calculates its incomplete part. After finishing the 'sh 6' operation of the lower process, the incomplete part 'Tail' will be instantiated and thus we obtain the following tree structured stack:</Paragraph>
      <Paragraph position="5"> It is remarkable that our algorithm takes less time to than either the simple algorithm or Tomita's to generate the first result of parsing.</Paragraph>
      <Paragraph position="6"> The reason is that our algorithm can analyze two or more positions of an input sentence in parallel, which is a merit when parsing with incomplete stacks.</Paragraph>
      <Paragraph position="7"> The complexity of our algorithm is identical to that of Tomita's \[Johnson 89\]. The only difference between the two is the number of processes that run in parallel. So if we simulate the parsing of our algorithm and that of Tomita's on a single processor, the time of parsing will be exactly the same.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML