File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1052_metho.xml
Size: 22,588 bytes
Last Modified: 2025-10-06 14:11:36
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1052"> <Title>Simplifying Deterministic Parsing</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. CONSTITUENT BUFFER </SectionTitle> <Paragraph position="0"> DPARSER maintains a constituent buffer which is manipulated by the grammar to derive the constituent structure of the input sentence. Each node of the buffer contains a constituent consisting of a set of feature-type, feature-value pairs, and a set of subconstituents. When parsing begins the constituent buffer contains a single node with an associated subgrammar for parsing sentence constituents. As the subgrammar of the sentence node examines the buffer positions to its right, words are brought in from the input sentence to fill the empty positions. When the grammar discovers a subconstituent phrase to be parsed, it performs a PUSH operation specifying a subgrammar for parsing the constituent and the position of the rightmost word in the constituent phrase. The PUSIt operation inserts a new node into the buffer immediately preceding the constituent phrase and begins executing the specified Isupported in part by an I.W. Killaw Predoctoral Fellowship 2supported in part by the Blum-Kovler Foundation, Chicago, Ill.</Paragraph> <Paragraph position="1"> subgrammar. This subgrammar may of course perform its own PUSH operations and the same process will be repeated. Once the subeonstituent is complete control returns to the sentence node and the buffer will contain the parsed constituent in place of those which made up the constituent phrase. The sentence node can now attach the parsed constituent removing it from the buffer. When all the subconstituents of the sentence node have been attached the parsing is complete.</Paragraph> <Paragraph position="2"> To familiarize the reader with the form of the constituent buffer we consider the processing of the sentence Jones teaches the course, as the final NP is about to be parsed. Figure 1 shows the current state of each buffer node giving its position, state of execution, essential syntactic features, and the phrase which it dominates so far. Following the terminology of Marcus we refer to the nodes which have associated subgrammars as active nodes and the one currently executing is called the current active node. All buffer positions are given relative to the current active node whose position is labeled **&quot; .</Paragraph> <Paragraph position="3"> The buffer in its current state contains two active nodes: the original sentence node and a new node which was created to parse the sentence predicate (i.e. verb phrase and its complements}. The next modification of the buffer takes place when the subgrammar for the predicate node examines its first position causing the word the to be inserted in that position. At this point a bottom-up parsing mechanism recognizes that this is the beginning of a noun phrase and a PUSH is performed to parse it; this leaves the buffer in the state shown in Figure 2 .</Paragraph> <Paragraph position="4"> The subgrammar for the noun phrase now executes and attaches the words the and course. It then examines the buffer for modifiers of the simple NP which causes the final punctuation, &quot;.&quot;, to be inserted into the buffer. Since the period can not be part of a noun phrase, the subgrammar ends its execution, the PUSH is</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> POSITION -1 active SYNCLASS S SENT-TYPE DECL (Jooc,) POSITION * current active SYNCLASS PHED VTYPE ONE-OBJ {te~cAe,) UNSEEN WORDS: the course. </SectionTitle> <Paragraph position="0"/> </Section> <Section position="5" start_page="0" end_page="239" type="metho"> <SectionTitle> SYNCLASS DET WORD THE EXT DEF (,he) UNSEEN WORDS: course. </SectionTitle> <Paragraph position="0"> pxrsin~ the noun phrase completed, and the predicate node again becomes the current active node. The resulting state of the buffer is shown in Figure 3; the words the and course have been replaced by the noun phrase constituent which dominates them.</Paragraph> <Paragraph position="1"> Aside from PUSH and ATTACH, the following three operations are commonly used by the grammar to manipulate the constituent buffer.</Paragraph> <Paragraph position="2"> LABEL label a constituent with a syntactic feature MOVE move a constituent from one position to another INSERT insert a word into a specified position Examples of these actions are presented in the following section. The differences between the data structures maintained by PARSIFAL and DPARSER are for the most part conceptual. PARSIFAL's active nodes are stored in an active node stack which is separate from the constituent buffer. To allow active nodes to parse constituent phrases which are not at the front of the buffer an offset into the buffer can be associated with an active node. The control of which active node is currently executing is affected through operations which explicitly manipulate the active node stack. Church's deterministic parser, YAP \[Church 80\], uses a constituent buffer consisting of two halls: an upper buffer and a lower</Paragraph> </Section> <Section position="6" start_page="239" end_page="239" type="metho"> <SectionTitle> SYNCLASS FINAL-PUNCT WORD . (.) </SectionTitle> <Paragraph position="0"> Figure3. after the push is completed buffer. The grammar rules try to attach nodes from the lower buffer to those in the upper buffer. While this structure is similar to PARSIFAL's, it does not draw such a rigid distinction between active and inactive nodes. There are no separate subgrammars associated with the nodes which constituents are being attached to, and nodes may be moved freely from one buffer to the other allowing them to be attached before they are complete. While our constituent structure does maintain active nodes with separate subgrammars, the control of the parsing process is similar to that used by Church in that it is possible for incomplete nodes to be attached.</Paragraph> <Paragraph position="1"> As will be seen in a latter section this is an essential feature of DPARSER's constituent buffer.</Paragraph> </Section> <Section position="7" start_page="239" end_page="240" type="metho"> <SectionTitle> 3. SEQUENCES </SectionTitle> <Paragraph position="0"> In DPARSER each constituent is assigned a sequence. Each sequence consists of n list of steps which are applied to the buffer in the order specified by the sequence. A step operator indicates how many times each step can apply: steps marked with ~+&quot; need never apply, those marked by &quot;=&quot; must apply once, and those marked by &quot;*&quot; can apply any number of times. A step may call another sequence which has the effect of inserting immediately following that step, the steps of the named sequence.</Paragraph> <Paragraph position="1"> Each step consists of a list of rules where the priority of the rules are made explicit by their ordering in the list. Each rule is of the form \[Pl\] \[P~\] &quot;&quot;\[PJ --> (al){a2)'&quot;(a) Each precondition, p,. tests a buffer node for the presence or absence of specified feature-type, feature-value pairs. When a rule is applied each action, a c is evaluated in the specified order. In attempting to apply a step each of the step's rules is tested in order, the first one whose preconditions match the current buffer state is performed.</Paragraph> <Paragraph position="2"> In order to recognize certain constituent types bottom-up, sequences may be associated with a bottom-up precondition. When the parser encounters a node which matches such a precondition, a PUSH to the sequence is performed. This mechanism is equivalent to PARSIFAL's attention shifting rules and is used primarily for parsing noun phrases.</Paragraph> <Paragraph position="3"> In order to clarify the form of a sequence, the example sequence TRANS-MAJOR-S shown in Figure 4 is discussed in detail.</Paragraph> <Paragraph position="4"> This sequence is associated with the initial sentence node of every input sentence. It performs the operations necessary to reduce the task of parsing an input sentence to that of parsing a normal sentence constituent as would occur in a relative clause or a sentence complement. While this sequence will misanMyze certain sentences it does handle a large number through a small set of rules.</Paragraph> <Paragraph position="5"> STEP 1 handles the words which and who which behave differently when they appear at the beginning of a sentence. The first rule determines if which is the first word; if it is then it labels it as a determiner. The second rule handles who which is labels as a STEP 2 examines the initial constituents of the sentence to determine whether the sentence is imperative, interrogative, declarative, etc.. Since each sentence must be analyzed as one of these types the step is modified by the &quot;-----&quot; operator indicating that one of the step's rules must apply. The first rule tests whether the initial constituent of the sentence is a WH type NP; NP's like who, which professor, what time, etc. fall into this category. If this precondition succeeds then the sentence is labeled as a question whose focus is a noun phrase. The second rule tests for a leading NP and, if it is found, the sentence is labeled as declarative. Note that this rule will not be tested if the first rule is successful and the step depends on this feature of step evaluation. The following rule tries to determine if have, appearing as the first word in a sentence, is a displaced auxiliary or is the main verb in an imperative sentence. If the rule succeeds then the sentence is labeled as imperative, otherwise the following rule will label any sentence beginning with an auxiliary as a yes/no type question. The final rule of the step labels sentences which begin with a tenseless verb as imperatives.</Paragraph> <Paragraph position="6"> STEP 3 picks up a constituent which has been displaced to the front of the sentence and places it in the special WH-COMP register. Generally a constituent must have been displaced if it is a WH type Nap followed by an auxiliary followed by another NaP; however, an exception to this is sentences like Who is the professor? in which the entire sentence consists of these three constituents.</Paragraph> <Paragraph position="7"> STEP 4 undoes any interrogative or imperative transformations. The first rule moves a displaced auxiliary around the NP in sentences like Has Jones taught Lisp ~ and When did Jones teach Lisp f. Note that for the latter sentence the previous step would have picked up when and hence did would be at the front of the buffer. The second rule of this step inserts you into the buffer in front of imperative sentences.</Paragraph> <Paragraph position="8"> Like DPARSER, PARSIFAL's grammar language is composed of a large set of production rules. The major difference between the two languages is how the rules are organized. PARSIFAL's rules are divided into packets several of which may be active at once. At any point in the parsing each of the rules in each active packet may execute if its precondition is matched. In contrast to this organization, DPARSER's sequences impose a much stronger control on the order of execution of the productions.</Paragraph> <Paragraph position="9"> Aside from the bottom up parsing mechanism the only competition between rules is between those in the individual steps. The purpose of constraining the order of execution of the productions is to reflect the fact that the parsing of a particular constituent type is essentially a sequential process. Most of the rules involved in the parsing of a constituent can only apply at a particular point in the parsing process. This is particularly true of transformational rules and rules which attach constituents. Those rules which can apply at various points in the parsing may be repeated within the sequence so that they will only be tested when it is possible for them to apply and they will not be allowed to apply at points where they should not. Clearly the necessity to repeat rules at different points in a sequence can increase the size of the grammar; however, it is felt that a grammar which clearly specifies the possible set of actions at each point can be more easily understood and modified.</Paragraph> </Section> <Section position="8" start_page="240" end_page="240" type="metho"> <SectionTitle> 4. SEMANTIC PROCESSING </SectionTitle> <Paragraph position="0"> While semantic processing was outside Marcus's central concern a semantic system was developed which operates in parallel with PARSIFAL , constructing the semantic representation as its subconstituents were attached. In order to deal with syntactic ambiguities the action part of rules can contain semantic tests which compare the semantic well-formedness of interpretations resulting from a set of possible attachments. Such comparative tests can choose between one or more constituents to attach in a particular syntactic role; for example a rule for attaching a direct object can use such a test to choose whether to attach a displaced constituent or the next constituent in the buffer. Comparative tests can also be used to decide whether to attach an optional modifier (such as a prepositional phrase) or leave it because it better modifies a higher level node. Unfortunately this latter class of tests requires each rule which attaches an optional modifier to determine each node which it is syntactically possible to attach the node to. Once this set of syntactically possible nodes is found, semantics must be called to determine which is the best semantic choice. Such tests complicate the grammar by destroying the modularity between the subgrammars which parse different constituent types.</Paragraph> <Paragraph position="1"> For the LUNAR system \[Woods 73\] Woods added an experimental facility to the basic ATN framework which allowed an ATN to perform such comparative tests without requiring them to be explicitly coded in the grammar. The Selective Modifier Placement mechanism was invoked upon completion of an optional modifier such as a PP. It then collected all the constituents which could attach the modifier and performed the attachment it determined to be the best semantic fit. A mechanism similar to this is incorporated as a central part of DPARSER and is intended to be used whenever an attachment is locally optional. Before giving the details of this mechanism we discuss the semantic interface in general.</Paragraph> <Paragraph position="2"> In DPARSER a narrow interface is maintained between syntax and semantics which alleviates the grammar writer of any responsibility for semantic processing. The interface consists of the ATTACH action which immediately performs the specified attachment and the W-ATTACH test which only succeeds if the attachment can be performed in light of the other constituents which may want to attach it.</Paragraph> <Paragraph position="3"> Both ATTACH and IF-ATTACH have the same parameters: the buffer position of the constituent to be attached and a label identifying the syntactic relationship between the constituent and its parent. Such a label is equivalent to a &quot;functional label&quot; of the BUS system \[Bobrow & Webber 80\]. When an attachment is performed the semantic system is passed the parameters of the attachment which it then uses to recompute the interpretation of the current active node.</Paragraph> <Paragraph position="4"> W-ATTACH tests are included as the final precondition of those grammar rules which wish to attach a trailing modifier; the test returns true if it is syntactically possible for the modifier to be attached and the modifier best semantically modifies that node. If the test is true then the attachment is performed as a side effect of the test.</Paragraph> <Paragraph position="5"> To the grammar writer the IF-ATTACH test has the prescient capability to foresee which active node should be allowed to attach the modifier and immediately returns true or false. However, the implementation requires that when an IF-ATTACH test is performed, the current active node must be suspended and the node which pushed to it restarted. This node can then execute normally with the suspended active node appearing like any other node in the buffer. The node continues executing until it either completes, in which case the process continues with the next higher active node, or it encounters the IF-ATTACHed node. If, at this point, the active node issues another IF-ATTACH then this new request is recorded with the previous ones and the process continues with the next higher active node. This sequence of suspensions will end if an active node becomes blocked because it expects a different constituent type than the one in the position of the IF-ATTACHed node. When this occurs the interpretations which would result from each of the pending IF-ATTACH tests are computed and the attachment whose interpretation the semantic system considers to be the most plausible is performed. Alternately, a sequence of suspensions may be terminated when an active node ATTACHes the node that the suspended active nodes had tried to IF-ATTACH. Such a situation, an example of which occurs in the parsing of the sentence Is the block in the boar, indicates that the pending W-ATTACH requests are syntactically impossible and so must fail.</Paragraph> <Paragraph position="6"> The following example shows how the IF-ATTACH mechanism is used to handle sentences where the attachment of a prepositional phrase is in question. We consider the parsing of the sentence Jones teaches the course in Lisp. We start the example immediately following the parsing of the PP (Figure 5). At this point the sequence Fi\[~ure 6. after the completion of 'in Lisp' for the noun phrase is about to apply the rule shown in Figure 6&quot; which tries to attach PP modifiers. Since the precondition preceding the IF-ATTACH test is true the IF-ATTACH test is made. This causes the current active node to be suspended until it can be decided whether the attachment can be performed {Figure 7).</Paragraph> <Paragraph position="7"> Control now returns to the predicate node which attaches the suspended NP as the object of the verb. As normally occurs after an attachment, the NP node is removed from the buffer; however, because the node will eventually be restarted it retains a virtual buffer position. The sequence for parsing the predicate now applies the same IF-ATTACH rule (Figure 6) to attach any prepositional phrase modifiers. Again since the PP is the first constituent in the buffer the IF-ATTACH test is performed and the predicate node is suspended returning control to the sentence active node (Figure 8).</Paragraph> <Paragraph position="8"> When the sentence node restarts it execution, it attaches the predicate of the sentence leaving the buffer as shown in Figure 9.</Paragraph> </Section> <Section position="9" start_page="240" end_page="241" type="metho"> <SectionTitle> \[I SYNCLASS PP\]~F-ATTACH I PP-MOD\] -~ </SectionTitle> <Paragraph position="0"> Figure0. after the subject and predicate have been attached Ilaving found a complete sentence the sentence node executes a final step which expects to find the final punctuation; since there is none the step fails. This failure triggers the arbitration of the set of pending IF-ATTACH requests for the attachment of the PP. In this case the semantic system determines that the PP should modify the NP. The parser then restarts the NP node at the point where it i~ued the IF-ATTACH and allows it to make the attachment (Figure 10). The NP node then tries again to attach a PP but seeing only the period it realizes that its constituent is complete and terminates. null Next the monitor restarts the predicate active node but does not allow it to make the attachment. This results in the node eventually terminating without performing any more actions. At this point each of the IF-ATTACH requests have been processed and the step whose failure caused the processing of the requests is retried. This time it is successful in finding the final punctuation and attaches it. The parse is now complete {Figure 11).</Paragraph> <Paragraph position="1"> Aside from prepositional phrase attachment there are many other situations where optional modifiers can arise. For example in</Paragraph> </Section> <Section position="10" start_page="241" end_page="241" type="metho"> <SectionTitle> POSITION -I active SYNCLASS S SENT-TYPE DECL </SectionTitle> <Paragraph position="0"> (Jones teaches the course in lisp)</Paragraph> </Section> <Section position="11" start_page="241" end_page="241" type="metho"> <SectionTitle> DELETED suspended active SYNCLASS PRED VTYPE ONE-OBJ </SectionTitle> <Paragraph position="0"> (teaches the course in Lisp)</Paragraph> </Section> <Section position="12" start_page="241" end_page="241" type="metho"> <SectionTitle> SYNCLA3S FINAL-PUNCT WORD. </SectionTitle> <Paragraph position="0"/> </Section> <Section position="13" start_page="241" end_page="241" type="metho"> <SectionTitle> POSITION * current active SYNCLASS S SENT-TYPE DECL </SectionTitle> <Paragraph position="0"> (Jones teaches the course ia Lisp .) Fisure It.</Paragraph> <Paragraph position="1"> the sentence ! saw the boy using the telescope the phrase using the telescope may modify boy as a relative clause where the relative pronoun has been deleted, or it may modify saw where the preposition by has been deleted. Another example is the sentence Is the block in the boz?. In this sentence the PP in the b0z must, for syntactic reasons, complement the verb; however, in the local context of parsing the NP the block, it is possible for the PP to modify it. IF-ATTACH can easily be extended to attach optional pre-modifiers; it could then be used to derive the internal structure of such complex noun phrases a8 the Lisp course programming assignment.</Paragraph> <Paragraph position="2"> The IF-ATTACH test is proposed as a mechanism to solve this general cl~s of problems without requiring the grammar writer to explicitly list all constituents to which an unattached constituent can be attached. Instead, it is sufficient to indicate that a trailing modifier is optional and the monitor does the work in determining whether the attachment should be made.</Paragraph> </Section> class="xml-element"></Paper>