XML Viewer - e83-1017

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/83/e83-1017_metho.xml
Size: 20,186 bytes
Last Modified: 2025-10-06 14:11:36
<?xml version="1.0" standalone="yes"?>
<Paper uid="E83-1017">
  <Title>AN ISLAND PARSING INTERPRETER FOR THE FULL AUGMENTED TRANSITION NETWORK FORMALISM</Title>
  <Section position="3" start_page="0" end_page="101" type="metho">
    <SectionTitle>
I INTRODUCTION
A. Island Parsing
</SectionTitle>
    <Paragraph position="0"> In an ordinary ATN parser, the parsing of a sentence is performed unidirectionally (normally left-to-right); the parser traverses each arc in the directed graph of the grammar in the same direction, starting from the initial state.</Paragraph>
    <Paragraph position="1"> An island ATN parser, on the other hand, can start at any point in the transition network with a word match from anywhere in the input string, not just at the left end, and parse the rest of the string working outwards to the left and right, adding words to each end of the 'island' formed. Indeed, any number of islands can be built, the parser merging the islands together as their boundaries meet. Clearly, in speech processing, island parsing is well suited to gearing sentence processing to the most solid inputs from the acoustic anal yser.</Paragraph>
    <Paragraph position="2"> The main problems with previous implementations of island parsing for ATNs have been with scope clauses and LIFTR and SENDR actions; essentially, these problems arise because in island parsing structure determination has to work from right-to-left as well as in the more usual left-to-right direction, i.e. against the normal parsing flow.</Paragraph>
    <Paragraph position="3"> B. Scope Clauses The ATN formalism provides for actions on the arcs of the network which can set and modify the contents of 'registers', and arbitrary tests on an arc to determine whether that arc is to be followed.</Paragraph>
    <Paragraph position="4"> In an island parser, an action or test is referred to as being context-sensitive when it either requires the value of a register that is set somewhere to the left, or changes the value of a register that is used somewhere also to the left. For each context sensitive action or test, there exists a set of states to its left such that the action can safely be performed if its execution is delayed until the parse has passed through one of these states. This list of states must be expressed, and in the HWIM system (Woods, 1976), this is done when writing the grammar by using a scope clause. The form of a scope clause is (SCOPE &lt;scope specification&gt; &lt;list of context-sensltive actions&gt;) where the scope specification is the list of precursor states. This requirement for prior specification of scope clauses clearly adds to the burden of the grammar writer.</Paragraph>
    <Paragraph position="5"> I have implemented a more satisfactory treatment of scope clauses. This is described belo~ following the discussion of LIFTR and SENDR actions, which require special handling in scoping.</Paragraph>
    <Paragraph position="6"> II LIFTR AND SENDR ACTIONS Two important actions (indeed it is difficult to write a grammar of any substantial subset of English without them) defined by Woods (1970), namely LIFTR and SENDR, present implementation difficulties in an island parsing interpreter. These actions were evidently excluded from the HWIM parser since there is no mention of them by Woods (1976).</Paragraph>
    <Paragraph position="7"> The action LIFTR can occur on any arc in the network, to transmit the value of a register up to the next higher level in the network, whereas SENDR can only occur on a PUSH arc, to transmit the value of a register down to a lower level.</Paragraph>
    <Paragraph position="8">  A. LIFTR The same mechanism can be used to implement LIFTR actions as is used to transmit the result of each lower level computation up to the next higher level as the value of the special register '*'.</Paragraph>
    <Paragraph position="9"> However, LIFTR presents problems with scope clauses in an island parsing ATN interpreter: if an action</Paragraph>
  </Section>
  <Section position="4" start_page="101" end_page="101" type="metho">
    <SectionTitle>
(LIFTR &lt;register&gt; ...)
</SectionTitle>
    <Paragraph position="0"> occurs in a sub-network, any action using that register in any higher sub-network that PUSHes for the one containing the LIFTR must be scoped so that the action is not performed in a right-to-left parse at least until after the PUSH has been executed'. See figure I.</Paragraph>
    <Paragraph position="2"> action using &lt;register&gt; here must be scoped to before the PUSH arc $</Paragraph>
    <Paragraph position="4"> Figtre I. Scoping LIFTR actions.</Paragraph>
    <Paragraph position="5"> So, for example, when parsing English from right to left, tests that the verb and subject agree in person and number (if this information is carried in registers) must be postponed until the PUSH for the beginning of the subject noun phrase. Section III describes how my interpreter takes care of this scoping problem.</Paragraph>
    <Paragraph position="6"> B. SENDR I. Treatment of actions using SENDRed re~isters Since in a right-to-left parse, lower level sub-networks are traversed before the PUSH to them is performed, there is no way of knowing the value of a register that is being SENDRed at least until after the PUSH. Thus all actions involving registers whose values depend on the value of that register must he saved to be executed at the higher level.</Paragraph>
    <Paragraph position="7"> I have dealt with this by putting such actions into SCOPE clauses containing a special new scope specification, which I call scope SENDR. Actions with scope SENDR are never executed at the current level in the network, but are saved and incorporated into the next higher level subnetwork (possibly with a changed scope specification) during processing of  the PUSH at that higher level, as follows:(I) The form on the FOP arc to be returned as the value of the special register '*' on return to the next higher level is put into an explicit LIFTR action.</Paragraph>
    <Paragraph position="8"> (2) The scopes of all the saved actions are changed to the same as those of the SENDR actions on the PUSH arc.</Paragraph>
    <Paragraph position="9"> (3) All LIFTR actions are changed to highlvl-setr actions (see below).</Paragraph>
    <Paragraph position="10"> (q) Scoped calls to lowlvl-start and lowlvl-finish (see below) are put respectively before and after the saved actions.</Paragraph>
    <Paragraph position="11"> (5) All the SENDR actions on the PUSH arc are put  in front of the lower level saved actions.</Paragraph>
    <Paragraph position="12"> The rest of the actions on the PUSH are are then processed as normal. The purposes of the actions lowlvl-start and lowlvl-finish are to respectively set up and restore a stack of register contexts (hold-regs), each level in the stack holding the register contents of one level in the network, with the base of the stack representing the highest level of saved actions. The action highlvl-setr performs a SETR at the next higher level of register contexts on the stack.</Paragraph>
  </Section>
  <Section position="5" start_page="101" end_page="102" type="metho">
    <SectionTitle>
2. An Example
</SectionTitle>
    <Paragraph position="0"> A typical sequence of actions in a fragment of an ATN network might be as in figure 2.</Paragraph>
    <Paragraph position="1"> \.----(SENDR regl 'nphrase) PUSH P~P with form</Paragraph>
    <Paragraph position="3"> This would be translated into the list of saved actions on the left of figure 3, and when control had passed through a set of states such that the actions' scope specifications were satisfied, execution would produce the sequence of operations shown on the right of the figure.</Paragraph>
  </Section>
  <Section position="6" start_page="102" end_page="102" type="metho">
    <SectionTitle>
3. Scope Problems
</SectionTitle>
    <Paragraph position="0"> As with LIFTR, SENDR actions need special scoping treatment: since there can be any type of interaction on a lower level between registers SENDRed and registers to be LIFTRed, the only safe execution time for actions using these registers and for actions referencing registers whose values depend on them (without engaging in full symbolic execution) is when the higher level sub-network has been fully traversed. There is a special scope specification for this- scope T.</Paragraph>
  </Section>
  <Section position="7" start_page="102" end_page="102" type="metho">
    <SectionTitle>
III AUTOMATIC SCOPE COMPUTATION
</SectionTitle>
    <Paragraph position="0"> The process of writing scope clauses into the grammar for an island parser is laborious, and therefore prone to error. The implementation described here can automatically detect all context-sensitive actions and tests and put them into scope clauses containing suitable (and usually optimal) scope specifications. Thus the parser can interpret straight off an ATN grammar that has been written for an ordinary left-to-right parser.</Paragraph>
    <Paragraph position="1"> The sooping algorithm consists of five passes over the grammar, the first four dealing with the exceptional scoping required by LIFTR and SENDR actions, and the fifth with the rest of the actions and tests in the network. Comments on the algorithm follow the necessarily technical account of it.</Paragraph>
    <Paragraph position="2"> A. The Scoping Algorithm The five passes of the scoping algorithm will now be described, actions and tests in the network being treated identically.</Paragraph>
    <Paragraph position="3"> I. Pass I Pass one takes care of the scoping problem with LIFTR actions mentioned in the previous section that a register being LIFTRed must be scoped back at the higher level to at least before the PUSH arc. But if the register is used on the PUSH arc itself, the scoping algorithm should produce correct scope specifications without needing to treat this as a special case. Thus the solution I have adopted is for the algorithm to check whether the register appears on the PUSH arc, and if not, the dummy action (SETR &lt;register&gt; (GETR &lt;register&gt;)) is added to the actions on the PUSH arc.</Paragraph>
  </Section>
  <Section position="8" start_page="102" end_page="103" type="metho">
    <SectionTitle>
2. Pass 2
</SectionTitle>
    <Paragraph position="0"> The second pass finds, for each sub-network, the names of all the registers whose values depend on other registers (for use in the subsequent scoping passes). It does this by finding the registers used in each register-setting action (SETR, LIFTR, or SENDR), using knowledge of the register usage of each function used, and for each register which is not being assigned to, it appends onto the property-list of the register the name of the register being set in the current action, and a pointer to that register's property-list.</Paragraph>
    <Paragraph position="1"> Thus in the end, each register is associated with  a list of all the registers in the sub-network which depend on the value of that register.</Paragraph>
    <Paragraph position="2"> 3. Pass 3 Pass three deals with scoping SENDR actions, giving them the treatment described at the end of the last section - it assigns the scope specification T to all actions which reference registers whose values depend on any of the registers used in actions on the same PUSH arc as a SENDR action.</Paragraph>
    <Paragraph position="3"> 4. Pass 4 Pass four finds all actions that use registers that have been passed down from a higher level by a SENDR, and also actions which use registers dependent on those SENDRed registers, giving the actions scope SENDR.</Paragraph>
    <Paragraph position="4"> 5. Pass 5  The rest of the scoping is performed in pass five. Each action is considered in turn, collecting the names of all registers it uses, and the names of those whose values depend on them. The scope specification is then computed depending on the common pert of all possible paths from the start of the current sub-network to any action which is dependent on the action under consideration. This list of states ('left-states') is the intersection of the states to the left of each action which uses any of the collected registers.</Paragraph>
    <Paragraph position="5"> The algorithm distinguishes the following four cases for the contents of 'left-states': null (1) If NIL - there are at least two non null intersecting paths from the left to the arc containing the action which reference registers dependent on those in the action, so return scope specification T.</Paragraph>
    <Paragraph position="6"> (2) All states in 'left-states' are in loops in the network - it is very difficult to compute the optimal scope specification, so return T (which will always be correct though perhaps not optimal). The problem with loops is that no register should be changed or referenced in a right-to-left parse until control has finally passed out of the loop.</Paragraph>
    <Paragraph position="7">  (3) The left state of the arc containing the action being scopad is in 'left-states', and the state is not in a loop- all dependent actions are to the right of the arc, so return NIL.</Paragraph>
    <Paragraph position="8"> (4) Otherwise - return as scope specification a list of all states in 'left-states' that are not in loops.</Paragraph>
    <Paragraph position="9">  If an action does not use any registers, it obviously does not need scoping, and the algorithm bypasses it. If a scope specification is returned for an action that is already scoped, whether the new scope 'overwrites' the old one depends on what is already there:scope SENDR overwrites scope T scope T overwrites scope &lt;list of states&gt; scope &lt;list of states&gt; is appended to an existing scope &lt;list of states&gt; B. Discussion of the Scoping Algorithm The algorithm does not produce totally optimal scope specifications in all circumstances: that is, actions may sometimes be scoped so they are saved for longer in the parse before they are executed than may strictly be necessary. The main shortcoming is in dealing with networks where there are two or more alternative separate paths containing actions using registers computed to be interdependent; for example in scoping the network fragment in figure 4,  (NP/) but the paths through them are independent and the register is not used elsewhere, so the actions do not need to be scoped at all. There does not seem to be any way around this problem by modifying the algorithm, but fortunately scope specifications that are not entirely optimal (as in this case) should only minimally affect the performance of the interpreter ~hen parsing a sentence.</Paragraph>
    <Paragraph position="10"> configtvations 'Sconfigs 'I at the boundaries of each island that are compatible, and then splice those that completely cover a sub-network into as many successively higher levels as possible (by calling Woods' 'Complete-right' function as many times as possible). In a real-time speech understanding system (depending on the strategy it employed), the time saved by this method could be critical to the success of the system.</Paragraph>
    <Paragraph position="11"> V OBSERVATIONS ON THE INTERPRETER IN USE The parser has been tested (Carroll, 1982) with various sized (purely syntactic) grammars, simulating speech processing by the arbitrary selection of one or more words in a typed string as parsing starting points, and the arbitrary addition of words to the left and right of these.</Paragraph>
    <Paragraph position="12"> It has been observed that the more complex the structure of the sentence being parsed, the more Sconfigs get generated, and consequently the longer the parse takes. There are, however, other less obvious factors influencing the number of Sconfigs generated.</Paragraph>
    <Section position="1" start_page="103" end_page="103" type="sub_section">
      <SectionTitle>
Saved Tests
</SectionTitle>
      <Paragraph position="0"> Seonfigs tend to proliferate embarrassingly when there are many possible paths of JUMP arcs between states on the same level of the grammar due to scoped tests having to be saved and not being immediately executable.</Paragraph>
      <Paragraph position="1"> If there are no BENDR actions down to the sub-network containing the JUMPs, then none of the saved tests will have to be carried up to a higher level, and so many of the Sconfigs will be filtered out when the POP arc st that level is processed. But if there are SENDR actions, the Sconfigs will not be filtered so effectively, will be carried up to higher levels, and at each higher level the number of Sconfigs will multiply.</Paragraph>
      <Paragraph position="2"> This Sconfig proliferation and resulting combinatorial explosion will always be associated with island parsing usinog large complex grammars that are purely syntactic~; unfortunately LIFTR and SENDR actions aggravate the problem. However, the utility of these actions more than outweighs the consequent decrease in parse-time efficiency.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="103" end_page="104" type="metho">
    <SectionTitle>
IV MERGING PARTIALLY BUILT ISLANDS
</SectionTitle>
    <Paragraph position="0"> In the HWIM system, to join together two adjacent islands to make one island covering them both, the smaller island was broken up and the words from it added onto the end of the larger. This obviously wastes all the effort expanded in building the smaller island.</Paragraph>
    <Paragraph position="1"> A more efficient method of joining two islands which I have implemented, is to merge all the segment I The state of the parse in an island parser is held as a list of segment configurations, each of which represents a partial parse covering one or more words in the utterance.</Paragraph>
    <Paragraph position="2"> 2 It seems that the HWIM parser also encountered these problems; their solution was to employ semantic grammars, with a large number of WRD arcs, to use both syntactic and semantic categories on CAT arcs, and to expand the set of constituents pushed for to include &amp;quot;semantic constituents&amp;quot;.  B. Differing Word-Orders Parsing the same sentence with differing orders of adding the words in it to islands usually results in differing numbers of Sconflgs being created. For example, two parses of the sentence</Paragraph>
  </Section>
  <Section position="10" start_page="104" end_page="104" type="metho">
    <SectionTitle>
JOHN IS EAGER TO PLEASE.
</SectionTitle>
    <Paragraph position="0"> gave the results:run I run 2 Sconfigs generated 388 182 parse time (secs.) 1.77 1.08 The difference was caused by the fact that in the first run, 'IS' was used as an initial island, setting up expectations for more possible distinct final sentence structures than in the second run, which started with the word 'PLEASE'. This difference in ex pectatlon status reflects the different structuring potential of the two words.</Paragraph>
  </Section>
  <Section position="11" start_page="104" end_page="104" type="metho">
    <SectionTitle>
VI SOME FUTURE DIRECTIONS FOR RESEARCH
A~ Parsing Conjunctions
</SectionTitle>
    <Paragraph position="0"> Island parsing appears to offer a promising solution to the problem of parsing written as well as spoken sentences containing conjunctions; although the ATN formalism is quite powerful in expressing natural language grammars, it faces problems deal ing with sentences containing conjunctions: (WRD AND ...) arcs need to be inserted almost everywhere since AND can conjoin any two constituents of the same type. Boguraev (1982) has suggested that this problem might be overcome by building islands at each conjunction and parsing outwards from them.</Paragraph>
    <Paragraph position="1"> ATN. For this reason, restrictions might have to be placed on the ATN grammars used, but this requires further investigation.</Paragraph>
  </Section>
  <Section position="12" start_page="104" end_page="104" type="metho">
    <SectionTitle>
VII ACKNOWLEDGEMENTS
</SectionTitle>
    <Paragraph position="0"> I would like to thank Bran Boguraev for his guidance during the writing of the interpreter, and for supplying the ATN grammars I have used. Thanks also to Karen Sparck Jones and John Tait for their comments on earlier drafts of this paper.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML