File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-2158_metho.xml

Size: 21,424 bytes

Last Modified: 2025-10-06 14:12:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-2158">
  <Title>Object-Oriented Parallel Parsing for Context-Free Grammars</Title>
  <Section position="4" start_page="0" end_page="776" type="metho">
    <SectionTitle>
2 The Basle Scheme
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="774" type="sub_section">
      <SectionTitle>
2:1 A Symbol as a Comi)utlng Agent
</SectionTitle>
      <Paragraph position="0"> Our approach is basically bottom-up. Suppose we have a context fi-ee grammar rule such as:</Paragraph>
      <Paragraph position="2"> In bottom-up parsing, a usual interpretation of this kind of rule is: In a substring of an input string, if its first half portion can he reduced to a category (terminal/non-terminal symbol) V and subsequently, if its second half portion can be reduced to a category VP, then the whole substring can be reduced to a category VP.</Paragraph>
      <Paragraph position="3"> This interpretation is implicitly based upon the following two assumptions about parsing process: - a single computing agent (processor or process) is working on the input string, and * non-terminal or terminal symbols such as VP, V, and NP are viewed as passive tokens or data.</Paragraph>
      <Paragraph position="5"> Instead, we will take a radically different approach, ill which * more than one, actually, a number of computing agents are allowed to work concurrently, each performing a rather simple task, * for each occurrence of a non-terminal or terminal symbol in grammar rules, a computing agent is assumed, * such a computing agent receives data (messages), manipulates and stores data in its local memory, and also can send data (messages) asynchronously to other computing agents that correspond to non-terminal or terminal symbols, and * data to be passed around among such computing agents are partial parse trees.</Paragraph>
      <Paragraph position="6"> Suppose that the computing agent which acts for the V symbol in Rule (1) has received a (token that represents a) partial parse tree tl. Also suppose that the computing agent which acts for the NP symbol in Rule (1) has received a partial parse tree t2. If the terminal symbol which is the right boundary of tl is, in the original input string, adjacent to the terminal symbol which is the left boundary of t2, then tl and t2 can be put together and they can form a larger partial parse tree which corresponds to the VP symbol in Rule (1). See Figure i.</Paragraph>
      <Paragraph position="7"> For example, let us consider an input string: I saw a girl wflh a ~elescope.</Paragraph>
      <Paragraph position="8"> If tl is a parse tree constructed from 'saw' and t2 is a parse tree constructed from 'a girl', then the right boundary of tl is adjacenl to the left boundary of t2. But if t2 is a parse tree constructed from ~a telescope', then tl and t2 are not adjacent and a larger parse tree cannot be constructed from them.</Paragraph>
      <Paragraph position="9"> Now, which computing agent should check the boundary adjacency, and which one should perform the treeconstructing task? In our scheme, it is natural that the computing agent acting for the NP symbol does the boundary checking because, in many simple cases, the NP agent often receives t2 after the V agent receives tl (due to the left-to-right nature of on-llne processing). In order for the NP agent to be able to perform this task, the V agent must send tl to the NP agent. Upon receiving tl from the V agent, tl~e NP agent checks the boundary adjacency between tl and t2 if it has already received t2. If t2 has not arrived yet, the NP agent has to postpone the boundary checking until t2 arrives and tl will be stored in the NP agent's local memory. If the two boundaries are not adjacent, the NP agent stores tl in its local memory for future references. Later on.when the NP agent receives subsequently arriving partial parse trees, their left boundary will be checked againt the right boundary of tl. When the adjacency test succeeds, the NP agent concatenates tl and t2 and sends them to the computing agent  acting for the non-terminal symbol VP in Rule (1). The VP agent constructs, out of tl and t2, a partial parse tree with the root-node tag being the non-terminal symbol 'VP.' This newly constructed partial parse tree is then distribuled by tile VP agent to all the computing agents each of which acts for an occurrence of symbol VP in the right-hand side of a rule. This distributed tree in turn plays a role of data (messages) to the computing agents in exactly the same way as tl and t2 play roles of data to the V and NP agents above.</Paragraph>
      <Paragraph position="10"> This is the basic idea of our parsing scheme. It is very simple. It is the matter of course that every single computing agent acting for a non-terminal or terminal symbol can work independently, in parallel and asynchronously. Rule (1) is represented as the computing agent network illustrated in Figure 1. (This is part of a larger network.) Boxes and arrows denote computing agents and flows of trees, respectively. null</Paragraph>
    </Section>
    <Section position="2" start_page="774" end_page="774" type="sub_section">
      <SectionTitle>
2.2 A Set of Rules as a Netwol'k of Computing
Agents
</SectionTitle>
      <Paragraph position="0"> It should be clear from the previous subsection that a set of context-free grammar rules (even a singleton grammar) is represented as a network of computing agents each of which acts for an occurrence of a non-terminal or terminal symbol in a grammar rule. More precisely, the correspondence between the set of computing agents and the set of occurrences of symbols in the set of grammar rules is oneto-one; for each occurrence of a symbol in a rule, there is one distinct computing agent. For example, the following set of rules (including Rule (1)) is represented as the network depicted in Figure 2.</Paragraph>
      <Paragraph position="2"> A white box corresponds to the computing agent acting for a symbol in the right-hand side of a grammar rule and a dark box corresponds to the computing agent acting for the non-terminal symbol in the left-hand side of a grammar rule. Note that the dark box labeled with 'NP' (at the bottom of the figure) is linked to three boxes labeled with 'NP.' This means that a partial parse tree constructed by the computing agent acting for the left symbol NP in Rule (4) is distributed to the three computing agents acting for tile three occurrences of symbol NP ill Rules (1), (2), and (5). Note that Rule (3) is left-recursive, which is represented as the feed-back link in Figure 2.</Paragraph>
    </Section>
    <Section position="3" start_page="774" end_page="774" type="sub_section">
      <SectionTitle>
2.3 Three Types of Computing Agents
</SectionTitle>
      <Paragraph position="0"> types of computing agents: Type-1 corresponds to the left symbol in a grammar rule, 'type-2 corresponds to the left-corner (i.e. left-most) right symbol, and Type-3 corresponds to other right symbols. (If a grammar rule has more than two right symbols, each of tlle right symbols except the left-corner symbol is represented as a Type-3 agent.) For example, ill Rule (1), VP is Type-l, V is Type-2, and NP is Type-3.</Paragraph>
      <Paragraph position="1"> l This subsection may be skipped if the idea of the scheme is already clear.</Paragraph>
      <Paragraph position="3"> A Type-1 computing agent A1 receives a concatenation of parse trees from the Type-3 agent acting for the right-most right symbol (e.g., NP for the case of Rule (1)) and constructs a new parse tree with its root node being the non-terminal symbol that A1 acts for and distributes it to all the Type-2 or Type73 agents acting for the occurrences of the same non-terminal symbol (e.g., 'NP' in the above case).</Paragraph>
      <Paragraph position="4"> A Type-2 computing agent A2 receives a partial parse tree from some computing agent that is acting for the occurrence of the same symbol as A2 acts for, and simply passes it to the computing agent acting for the symbol occurrence which is right-adjacent to the symbol occurrence that A2 is acting for. In the case of Rule (1), a Type-2 agent acting for V simply passes the received partial parse tree to the computing agent acting for NP. In the case where a grammar rule has just one right symbol as in</Paragraph>
      <Paragraph position="6"> a Type-2 agent acting for N sends a partial parse tree to the 'type-1 agent acting for NP.</Paragraph>
      <Paragraph position="7"> A Type-3 computing agent has two kinds of sources of parse trees to receive: one from Type-1 agents and the other from the Type-2 or Typeo3 agent acting for its left-adjacent symbol occurrence. In the case of Rule (1), the Type-3 agent acting for NP receives partial parse trees from Type-1 agents acting for occurrences of symbol NP in other rules and also from the Type-2 agent acting for V in Rule (1). Upon receiving a partial parse tree tl from one of the sources, a Type-3 agent A3 checks to see if it has already received, from the other kind of source, a partial parse tree which clears the boundary adjacency test against tl. If such a parse tree t2 has already arrived at A3, then A3 concatenates tl and t2 and passes them to the computing agent acting ibr the symbol occurrence which is right-adjacent to the symbol occurrence A3 acting for. If no such parse tree has arrived yet, A3 stores tl in its local memory for the future use. In the case where no right-adjacent symbol exits in the grammar rule, (which means that the symbol occurrence A3 is acting for is the right-most right symbol in the glamrnar rule), A3 sends the concatenated trees to the Type--1 computing agent acting for the left symbol of the grammar rule.</Paragraph>
    </Section>
    <Section position="4" start_page="774" end_page="774" type="sub_section">
      <SectionTitle>
2.4 Terminal Symbols as Computing Agents
</SectionTitle>
      <Paragraph position="0"> It should be noted that, ill our basic scheme we do not make any distinction between non-terminal symbols and terminal symbols. In fact, this unlfonn treatment contributes to the conceptual simplicity of our parsing scheme. We do not have to make a special treatment for grammar rules such as:</Paragraph>
      <Paragraph position="2"> where a lower case symbol 'and' is a terminal symbol. The uniformity implies that a word of a natural language, say 'fly' in English, which has more than one grammatical category should be described as follow: v--&gt; fly (8) --&gt; ~ly (9) where Rules (8) and (9) indicate that a word 'fly' can be a verb or noun. The two rules are represented by two Type-1 agents acting for V and N, and two Type-2 agents acting for the two occurrences of 'fly' in Rules (8) and (9). Thus, in our parsing scheme, the grammatical categories of each word in the whole vocabulary in use are described by grammar rules with a single right symbol. This means that conceptually, one or more computing agents exist for each word. (Those who might worry about the number of computing agents acting for words should read Subsection</Paragraph>
    </Section>
    <Section position="5" start_page="774" end_page="775" type="sub_section">
      <SectionTitle>
4.2.)
2.5 Input to the Network
</SectionTitle>
      <Paragraph position="0"> In our parsing scheme, a given set of grammar rules is compiled as a network of computing agents in the manner described above. Then, how is an input string fed into the network of computing agents? We assume that an input string is a sequence of words (namely, terminal symbols).</Paragraph>
      <Paragraph position="1"> In feeding an input string into the network, two things has to be taken into account. One is: for each word in an input string, appropriate computing agents, to which the word should be sent, must be found. Of course, such computing agents are ones that act for the occurrences of .the word in the grammar rules. Notice that there can be more than one such computing agent for each word, due to multiplicity of grammatical category and the multiple occurrences of the same symbol in grammar rules. Since the set of appropriate computing agents can be known in compiling a given set of grammar rules, such information should be kept in a special computing agent which does the managerial work for the network. Let us call it the manager agent The manager agent, receives an input string and sends (or distributes) each word in the input string to the corresponding agents in the network in the on-line manner.</Paragraph>
      <Paragraph position="2"> The other thing needed to be considered in feeding the input is: the information about the order of words appearing in an input string must be provided to computing agents in the network in an appropriate manner. This is because Type-3 computing agents need such information to perform the boundary adjacency test. For this, each word to be sent (or distributed) to computing agents in the network should be accompanied with its positional information in the input string. Snppose an input string is I saw a girl with a telescope. Then a word girl should be sent with the pair of its starting position and its ending position. The</Paragraph>
      <Paragraph position="4"> actual form of data (message) for the word girl may look like (3 4 girl). See Figure 3. This data form convention is adopted in dealing with more general parse trees. (In fact, a single word (terminal symbol) is also the simplest case of parse tree.)</Paragraph>
    </Section>
    <Section position="6" start_page="775" end_page="776" type="sub_section">
      <SectionTitle>
2.6 How Partial Parse Trees Flow
</SectionTitle>
      <Paragraph position="0"> To get a more concrete feeling of how symbols are processed in the network, let us look at the flows of words a and girl in the initial phase. (See Figure 4) Assuming that the following rules are compiled in addition to Rules (1)</Paragraph>
      <Paragraph position="2"> the manager agent sends (2 3 a) and (3 4 girl) to the Type-2 computing agent acting for a in Rule (10) and the Type-2 computing agent acting for girl in Rule (11), respectively. They are in turn sent to a Type-1 agent Detl acting for DET in Rule (10) and a Type-1 agent N1 acting for N in Rule (11), respectively. These Type-1 agents construct a parse tree with its root node label being DET or N.</Paragraph>
      <Paragraph position="3"> Then the parse tree constructed by Detl is sent to a Type-2 agent Det2 acting for DET in Rule (4). Similarly, the parse tree constructed by N1 is sent to a Type-3 agent N2 acting for N in Rule (4). In both cases, the positional information is accompanied. That is, the actual data forms to be sent are (:2 3 (DET a)) and (3 4 (N girl)).</Paragraph>
      <Paragraph position="4"> Agent Det2 simply passes the parse tree to agent N2.</Paragraph>
      <Paragraph position="5"> N2 performs the boundary adjacency test between (2 3 (DET a)) and (3 4 (N girl)) and finds the test to be ok. Since the test is ok, N2 concatenates the two data forms, constructing a new single data form: (2 4 (DET a) (N girl)) This new data form is then sent to the Type-1 agent acting for NP in Rule (4). This agent constructs a data form of the parse tree for NP, which looks like: (2 4 (NP (DET a) (N girl))) This data form will be distributed among the Type-2 and Type-3 computing agents acting for symbol NP in the network. (See Figure 4.) Finally, when a computing agent acting for S receives a message (0 7&amp;quot; (S ... )), we can say that a complete parse tree for the input string has been constructed as part of the message.</Paragraph>
      <Paragraph position="6"> It should be reminded that actions taken by computing agents such as Detl, Det2, N1, and N2 are performed all  in parallel. Also note that such computing agents keep being activated as long as data forms continue to arrive, and computing agents acting for S receive messages containing (partial) parse trees with the root node label being S.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="776" end_page="776" type="metho">
    <SectionTitle>
3 Applieatlons
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="776" end_page="776" type="sub_section">
      <SectionTitle>
3.1 On-Line Parsing and Overlap Parsing
</SectionTitle>
      <Paragraph position="0"> In starting the parsing process, our scheme does not require the network of computing agents to be fed any token that indicates the end of an input string. That is, an input string can be processed one by one from the beginning in an on-line fashion. Even if feeding an input string to the network is suspended in the middle of the &amp;quot;string, partial parse trees can be constructed based on the part of the input string that has been fed so far, and the feeding of the rest of the input string can be resumed at any moment.</Paragraph>
      <Paragraph position="1"> Thus, our parsing scheme is quite useful in real-time applications such as interpreting telephony (simultaneous interpretation). Notice that our scheme does not require that an input string is fed in the left-to-right manner; words in the input string can be fed in any order as long as the positional information of each word in the input string is accompanied. (cf. Subsection 2.5) Our parsing scheme has no difficulty even when more than one input string is fed to the network simultaneously as long as different input strings are fed separately. The separation can be easily made by attaching the same tag (or token) to each word in the same input string. Such a tag is copied and inherited to partial parse trees which are constructed from the same input string. When a Type-3 computing agent tests the boundary adjacency between two partial parse trees, the sameness of the tags of the two partial parse trees are checked additionally. This capability of handling the multiple input strings is useful in processing the overlapping utterances by more than two persons engaged in conversation.</Paragraph>
      <Paragraph position="2"> This way of handling the multiplicity of input strings is similar to the idea of color tokens used in data-flow computer architectures.</Paragraph>
      <Paragraph position="4"/>
    </Section>
    <Section position="2" start_page="776" end_page="776" type="sub_section">
      <SectionTitle>
3.2 Unparsing
</SectionTitle>
      <Paragraph position="0"> Suppose the user is typing an input string on a keyboard and s/he hits the 'backspace' key to correct previously typed words. In the case where these incorrect words have already been fed to the network, our parsing scheme is able to unpart;e the incorrect portion of the input string and allows the user to retype it. Furthermore, the user can continue to type the rest of the originally intended input string.</Paragraph>
      <Paragraph position="1"> This unparsing capability is realized by the use of antimessages. The anti-message/Jefferson85/of a message M sent to a computing agent A is a message that will be sent to A in order to cancel the effects caused by M. The actual task of cancelling the effects is carried out by A. (Thus A has been programmed beforehand so that it can accept cancelling messages and perform the cancelling task.) If necessary, A must in turn send anti-messages to cancel the effects caused by the messages A itself has already sent. In implementing the unparsing capability, the express-mode message passing in ABCL/1/Yonezawa86/is useful, which iz a kind of interrupt-like high priority message passing.</Paragraph>
    </Section>
    <Section position="3" start_page="776" end_page="776" type="sub_section">
      <SectionTitle>
3.3 Pipe-Linlng to Semantic Processing Agents
</SectionTitle>
      <Paragraph position="0"> Our parsing scheme produces all the possible (partial) parse trees for a given input string. In fact, if each Typed computing agent in the network stores in its local memory all the parse trees it constructs, all the components of the triangle matrix used in CKY parsing method (i.e., all the possible parse trees) are in fact stored among the Type-1 agents in the network in a distributed manner. If the semantic processing is required, these partial or complete parse trees can be sent to some computing agents which do semantics processing.</Paragraph>
      <Paragraph position="1"> Actually, parse trees can be sent to semantic processing agents in a pipe..liniTtg manner. Suppose a Type-1 computing agent Npl is acting for an occurrence of a non-terminal symbol NP. Instead of letting Npl distribute the parse trees it constructs to Type-2 or Type-3 agents acting for occurrences of the symbol NP, we can let Npl send the parse trees to the semantics processing agent which checks the semantic validity of the parse trees in tim pipe-lining manner. After filtered by the semantic processing agent, only semantically valid parse trees (possibly with semantics information being attached) are distributed to Type-2 or Type-3 computing agents acting for NP. See Figure 5.</Paragraph>
      <Paragraph position="2"> These ,~emantic filtering agents can be inserted at any links between Type-1 agents and Type-2 or Type-3 agents.</Paragraph>
      <Paragraph position="3"> The complete separation of the semantic processing phase from the syntactic processing phase in usual natural language processing systems corresponds to the placing semantic processing agents only after the Type-1 computing agents that act for tile non-terminal symbol S that stands for correct sentences.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML