File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-1010_metho.xml
Size: 10,368 bytes
Last Modified: 2025-10-06 14:12:26
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-1010"> <Title>The translation of constitutent structure into connectionist networks</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. Earley's Representation </SectionTitle> <Paragraph position="0"> Let us first summarize the essentials of Earley's algo-rithm.</Paragraph> <Paragraph position="1"> It operates in two stages: In the first stage, a parse list is computed and in the second stage the correct parse is filtered out from the parse list. For the string aab the information contained in the parse list can be represented as in figure 1 by a superposition of possible sub-trees found applicable in going through the string from left to fight. The correct parse &quot;filtered out&quot; is represented in figure 2.</Paragraph> <Paragraph position="2"> Earley uses another way of representing parse lists and correct parses. He represents them by means of dotted rule symbols and dominance scope numbers entered in ists, one for each input interval. The parse list containing the same information given in the superposifion of the trees is as in Parse list tree Correct parse tree symbols of the input string represented at the bottom exist in the intervals <0,1>, <1,2>,<2,3>. At each completed interval, the rules which have found application so far are entered in the corresponding list.s, together with a number indicating the number of intervals dominated by the head symbol of the rule.</Paragraph> <Paragraph position="3"> List 0 List 1 List 2 List 3 <S->aA., 2> <A->aa. , 2> <A->a., 1> <A->a. ,1> <S->Ab., 3> symbols according to Earley Let us indicate a feature which is essential in view of our connectionist implementation: Each piece of information in Earley's system is in fact a triple < list number, dotted symbol, length of dominance >.</Paragraph> <Paragraph position="4"> The representation in figure 3 is, however, not yet complete as a representation of the parse list. In fact, the parsing process as def'med by Earley makes use of further dotted symbols derived from the rules of the underlying constituent struture, namely all dotted rule symbols which can be obtained by placing exactly one dot between symbols to the right of the arrow. The system of dotted rule symbols for our grammar is presented in figure 4. All dotted rule symbols are needed for controlling the parse process.</Paragraph> <Paragraph position="5"> S->Ab., S->A.b, S->.Ab A->aa., A->a.a, A->.aa, A->.a, A->.a .a. , .b. ,.S.</Paragraph> <Paragraph position="7"> The cornplete parse is computed list by list from left to right as the input string is read in. In principle many dotted rule symbols in the hst could be placed simulta-nously but only in a parallel system like the one we shall present, not in Earley's completely sequential implementation on a yon Neumann machine.</Paragraph> <Paragraph position="8"> 3.Our representation How are we going to implement Earley's algorithm in a cormectionist net? We follow the localist principle of connectionist implementation: One concept - one unit, but we apply it to the triples in Earley's represen-tation: One triple one unit. This principle applied to our example of three intervals and, correspondingly, to 3 as the longest possible dominance and to 14 dotted rules (as eninnerated in figure 4) yields 3&quot;14&quot;3 = 126 units. In general, a system with n dotted rules and length of input string 1 would have n*l 2 units. The connectivities between the units must be defined in such a way that they generate activity patterns over the three-dimensional system of units (each member of a triple indi- eating a dimension), such that a unit becomes active (1) exactly when the corresponding triple is specified in the Earley algorithm. All other units not specified in the algorithm must remain inactive (0). The parse list given in figure 3 would be represented by the activity pattern over the units in a three dimensional space indicated in figure 5.</Paragraph> <Paragraph position="9"> parse tree given in figure 2.</Paragraph> <Paragraph position="10"> The repr(mentation outlined so far seems to have an essential disadvantage: The space built by the units which represent the parse hst structures seems to be unlimited, since it depends on the length of the input string. This is indeed the case.</Paragraph> <Paragraph position="11"> However, the structurally essential feature is not the space used for representing the complete parse list structure but only the space in wlfich the process of generating the parse list structure is executed. Our system can indeed be subdivided architectonically into the representation spaces - one for the parse list, one for the correct parse, and a limited space containing the units which generate the representations. It is only this latter space - comprising grammar units (0,Y,0),(-1,Y,0) and control units (0,Y,-1).(-I,Y,-1) for all dotted rules Y - which has an inhomogenous connectivity structure whose specificity is determined by the constituent structure rule system from which it is compiled. Obvviously, this space of inhomogenous connectivity is limited in our implementation and is 2&quot;2&quot;n (where n is the number of dotted rules).</Paragraph> <Paragraph position="12"> In this space 2*n units are control bit units whereas 2*n units correspond directly to dotted rule symbols of the original grammar such that their connectivities represent the logical and procedural interdependencies between these symbols in Earley's algorithm. The extension of this space is thus independent of the length of the input string to be parsed.</Paragraph> <Paragraph position="14"> figure 2 in space 4. Input representations in spaces I arid II) In contrast to this, the units in the representation space have a homogenous connectivity among them, which is completely independent of the gramrnar implemented. Instead, this connectivity corresponds to the circuit connectivity of a shift register implemented as an integrated circuit.</Paragraph> <Paragraph position="15"> The overall architecture which derives from our automatic compilation process applied to a given constituent structure is now given as in figure 6. Space I and H contain the representations of the input string, the units in space HI represent the parse list under construction and after completion, processing space derived from our simple grammar 54 2 space IV represents the same for the correct parse. Space IX (resp. X) is the inhomogenous processing space whose connectivity corresponds strictly to the structure of the grammar from which it is compiled.</Paragraph> <Paragraph position="16"> The inhomogenous internal connectivity within space IX is represented in figure 7. The units represented are also connected to the neighbouring units in the representation space 1II and to control bits which determine the shifting processes in the representation space.</Paragraph> <Paragraph position="17"> control bit unit (0, .S. ,-1) forces the parser to shift the input string in the next step 4. An outline of the cormectionist parsing process The computational process is as follows: Initially the input string is in space I ( or is transferred to this space from a word recognizer array analysing acoustic or graphic input). The first input symbol is read into the processing space - more correctly into a connected buffer place of space VII, i.e. the unit (-2, .a., 1) is activated and simultanously the unit (0, .S., O) - i. e. the initializer unit. (Cp. figure 8) first symbol Due to the connectivities in position 0 ( i.e. in space IX) the units (0, S -> .aA , 0) and (0, S -> .Ab, 0) become simuhanously active, and then, depending on them, simultanously the units (0, A-> .aa, 0) and (0, A-> .a, 0). To scan-in the the first terminal the complete pattern of activity has to be shifted one step to the left with the exception of the activation of unit (-2..t. ,1). The activity of this unit ist transferred to the unit (0, .t. , 1). (This is done because the units located at X=-I are used as a temporary buffer by the parser.) Figure 9 shows the state after this shifting process has been carried out. But simultanously the parser has to perform the computation of the parse list for the terminal just read. Since the units (O, A ->.a,0), (0,A ->.aa,0) and (0,S->.aA,0) were active while the terminal &quot;a&quot; was read, the parser must activate the units (0,A->a.,1), (0,A->a.a,1) and (0,S->a.A,1). And the activity of the unit (0,A->a.,1) forces the unit (0, S->A.b,1) to become active. These actions take place according to the cormectivifies in space IX of figure 6 represented in figure 7.</Paragraph> <Paragraph position="18"> It should be clear by now how, in principle, the parsing process develops over the connectionist space until the final stage represented schematically in figure 5 is reached. It should also be clear, in principle, how the process of generating the complete parse is produced in space IV through the operation of the units in space X. They determine the &quot;filtering out&quot; of certain unconfirmed parse tree information in the parse list in a process of stepwise information shift from III to IV. We shall not discuss tiffs process here.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 5. Perspectives for further research </SectionTitle> <Paragraph position="0"> From a linguistic point of view, it is important to be able to generate connectionist networks for more complicated grammars, in particular for tmification based grammars and for principles and parameters based approaches such as those recently developed by Chomsky. So far we have been able to define the appropriate representation space - i.e. the extension of our spaces HI and IV - and to develop first ideas about the connectivities derived from symbolic definitions of grammatical properties, i.e. the structures in our spaces IX and X. We are optimistic about the possibilities of translating any unification based formalism working with feature structures into a corresponding cormectionist network.</Paragraph> </Section> class="xml-element"></Paper>