File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-1021_metho.xml
Size: 22,312 bytes
Last Modified: 2025-10-06 14:12:55
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-1021"> <Title>Hopfield Models as Nondeterministic Finite-State Machines</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 An Input-Driven Sequencing </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Hopfield Model </SectionTitle> <Paragraph position="0"> In this section a noiseless Hopfield model is proposed that is tailored to implement NDAs on. The model is based on the associative memory described by Buhmann et al. \[2\] and the theory of delayed synap~s from \[8\]. We chose the Itopfield model because of its analytical transparency, and its capability of sequence traversing, which agrees well with the sequential nature of language use at a pbenomenologica\[ level. The Hopfield model proposed is a memory for temporal transitions extended with externa|-input synapses. Figure 1 shows the architecture involved.</Paragraph> <Paragraph position="1"> Ill this network only those neurons are active upon which a combined local field operates that transcends the threshold. The activity generated by such a lo~ cal field is the active overlap of the temporal image of past activity provided by so-called temporal synapses, and the image of input external activity provided by so-called input synapses. By the temporal synapses, this activity will later generate another (subthreshold) temporal image, so network activity may be considered a transition mechanism that brings the network from one temporal image to another. Active overlaps are unique with high probability if the activity patterns are chosen at random and represent low nrean network activity. This uniqueness makes tile selectivity of the network very plausible: if an external activity pattern is presented that does not match the current temporal image, then there will not he activity of any significance; tile input is not recognized.</Paragraph> <Paragraph position="2"> When an NDA is mapped onto this network, pairs of NDA-state q and input-symbol x, such that 6(q, x) yPS {~, are mapped onto activity patterns. Temporal relations in the network then serve to implement NDA transitions. Note that single NDA transitions arc mapped onto single network transitions.</Paragraph> <Paragraph position="3"> This results in complex representations of the NDA states and the input symbols. An NDA state is rap resented by all activity patterns that represent a pair containing that state, and input patterns are represented by a component-wise OR over all activity patterns containing that input symbol. A consequence is that mixed temporal images, the subthreshold analogue of mixture states, are a very natural phenomenon in this network, because tile temporal image of an active overlap comprises at least all activity patterns representing a successor state. But this is not all. Also the network will act as if it implements the deterministic equivalent of the NDA, i.e. it will trace all paths through state space the input allows for, concurrently. The representations of the states of this deterministic finite-state automaton (FSA) are dynamically constructed along the way; they are mixed temporal images. The concept of a &quot;dynamically constructed representation&quot; is borrowed from Touretzky \[9\], who, by the way, argued that they could not exist in the current generation of neural networks, such ms ltopfield models.</Paragraph> <Paragraph position="4"> A time cycle of the network can be described as follows: 1~ The network is allowed to evolve into a stable activity pattern that is the active overlap of a temporal image of past activity, and the input image of external input for a pe~'iod tr (= relaxation time), when an external activity pattern is presented to the network; 2. After some time the network reaches a state of stable activity and starts to construct a new temporal image. It is allowed to do this for a period t, (= active time)', 3. Then the input is removed, and the network evolves towards inactivity. This takes again about a period t~; Acids DE COLING-92, NANTES, 23-28 AOt~T 1992 1 1 4 PRO(:. OF COL\]NG-92, NANTI~S, AUG. 23-28. 1992 4. Not before a period ta (= delay time) has passed, a new input arrives. The new temporal image is forwarded by tile slow synapses during a period ta +iv, starting when td ends. The slow synapses have forgotten the old temporal image while the network was in its td.</Paragraph> <Paragraph position="5"> The synapses modeled in the network collect the incoming activity over a period ~ + tr, and emit the time average over again a period ta + tr after having waited a period ta + t~. In the network this is modeled by computing a time average over prior neuronal activity, and then multiply it by the synoptic efficacy. Tile time average ranges over a period (2t~ + la + 31~)/N - (l, + I~)/N. The first argument is the total time span in the network, covering two active periods and an intervening delay time, including their transition times. The second argument is tile current period of network is activity, activity that cannot directly interfere with the network's dyna\[lliCS. null More formally the network can be described as follows: null</Paragraph> <Paragraph position="7"> the external input term, The Si are neuronal variables (5&quot;,'. is a neuron in another network), hi is the total input on Si, U is a threshold value which is equal for all Si, Jij is the synaptic efficacy of the synapse connecting S i to Si, and A is the relative magnitude of the synapses.</Paragraph> <Paragraph position="8"> The average at time ~ is expressed by ~/(~), where r =- (2t. + ta + 3tr)/N and ~ -~ (ta + t~)/N. The function w(t) determines over which period activity is averaged. The input synapses are nonzero only in case i = j. These synapses carry a negative ground signal -A'a, which is equivalent to an extra threshold generated by the input synapses. The activity patterns {~'} ({~&quot;} ~ (~\],~ ..... ~N) ) are statist,tally independent, and satisfy the same probability distribution as the patterns in the model of Buhmann et al. \[2\]:</Paragraph> <Paragraph position="10"> models consist of very many neurons. The arced arrows denote temporal synapses. The straiyht alT'ows denote input synapses.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Estimation of Parameters </SectionTitle> <Paragraph position="0"> A number of system parameters need to be related in order to make the model work correctly.</Paragraph> <Paragraph position="1"> Timing; is fairly important ill this network. Tile time the network is active (to) should not exceed tile delay time t~. If it does then ta+lr > ta+tr, and since no average is computed over a period ta + tr back in time, not the fldl time average of the previous activity need to be computed, consequently we choose ta < ta.</Paragraph> <Paragraph position="2"> The choice for a transition time tr depends on tile probability with which one requires the network to get in the next stable activity state. This subject will be dealt with in section 7.</Paragraph> <Paragraph position="3"> In the estimates of A t and the storage capacity below, an expression of the temporal transition term in terms of the overlap parameter m o is used, which will be introduced here first. The overlap parameter rnP(t) measures the overlap of a network state {S} ~_ ($1,$2,...,SN) r at time t with stored pattern {~P}, and is delined by:</Paragraph> <Paragraph position="5"> Acaqis OE COLING-92, NANTES, 23-28 ao~r 1992 1 1 5 PROC. Of COLING-92, N^NTgs, AUG. 23-28, 1992 Assuming that N --* co while p remains fixed this is, after expansion of the Jq and ignoring infinitesimal terms, approximated by:</Paragraph> <Paragraph position="7"> If the temporal image is {~C/~} then h~ is about (N co):</Paragraph> <Paragraph position="9"> If a number of patterns in a mixture state have the same successor, that pattern may be activated.</Paragraph> <Paragraph position="10"> To prevent this A ~ will be chosen such that the slow synapses do not induce activity in the network autonomously, not even if all the neurons in the network are active. On average, the activity in the network is given by the parameter a. The total activity in a network is a quantity x such that z = 1/a, so what we require is that xh~ < U, i.e. that: a~(~+~ - .)(r - 0) w(t)dt < U. aO The interesting case is (i &quot;+1 = 1. Since the integral is at most O/(r - O) which is the strongest condition on the left side, the left expression can be written as )d(1 - a)/a. It was earlier demanded that only a combined local field can transcend the threshold, which implies that external input .~e(1 - a) < U, so we can take A ~ < A~a safely. This is small because a is small.</Paragraph> <Paragraph position="11"> Next a value for the threshold that optimizes storage capacity is estimated by signal-to-noise ratio analysis, following \[1\] and \[2\], for N,p --* oo. Temporal effects are neglected because they effect signal and noise equally. It is also assumed that external input is present, so that the critical noise effects can be studied. In this model the external input synapses do not add noise, they do not contain any information apart from the magnitude of the incoming signal. Now suppose the system is in state {S} = {~}. The signal is that part of the input that excites the current pattern: s = A'(~, ~-a).</Paragraph> <Paragraph position="12"> The noise is the part of the input that excites other patterns. It can be seen as a random variable with zero mean and it is estimated by its variance: ~t v'~-a where a = p/N. We want that given the right input h i > U, if both the temporal and the external input excite Si, and that hi < U if the temporal input does not excite Si. This gives signal-to-noise ratios:</Paragraph> <Paragraph position="14"> Substituted in either P0 or Pt it results in Pore = l This result is the same as obtained by Buhmann et al. \[2\], and they found a storage capacity ct c .~ -(ainu) -1 where ac = prna,:/N. The storage capacity is large for small a, so a will be chosen a << 0.5. A last remark concerns an initial input for the network. In case there has been no input for the network for some time, it does not contain a temporal image anymore, and consequently has to be restarted.</Paragraph> <Paragraph position="15"> This can be done by preceding a sequence by an extra strong first input, a kind of warning signal of magnitude e.g. A&quot; + At. This input creates a memory of previous activity and serves as a starting point for the temporal sequences stored in the network.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Neural-Network Acceptors </SectionTitle> <Paragraph position="0"> In this section it is shown how NDAs from definition</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 can be mapped onto networks as described in sec- </SectionTitle> <Paragraph position="0"> tions 3 and 4. Such networks can only be used for cyclic recognition runs. Where &quot;cyclic&quot; indicates that both the initial and the accepting state of an NDA are mapped onto the same network state. If this were not done, the accepting state is not assigned an activity vector, since no transition departs from it, see definition 5.2 below. Cyclic recognition in its turn can only be correctly done for grammars that generate end-of-sentence markers. Any grammar can be extended to do this.</Paragraph> <Paragraph position="1"> An NDA is related to a network by a parameter list.</Paragraph> <Paragraph position="2"> Definition 5.1 Let M = (Q, ,6, Qo, F) be an NDA. A parame- null ter list defined .for an NDA M is a list of the form (a, A ~, A t , N, p, prnaz, ta, td, tr, U), where: 1. a 6 \[0,1\] c ~t; 2. t~ < td, with ta,td 6 ~I; 3. A ~ 6 ~+ where ~+ = {x }x E Ax > 0}; 4. 0< A t <,Va; 5. p = ~q.q'eQ I {x 6 ~Clq' e ~f(q,x)} I x I {Y 6 ~, 16(q',y) # 0} I; 6. Pmax >~ P; 7. N > (-alna)p ..... ; Ac-t~ DE COLING-92, NANTES, 23-28 AO6&quot;r 1992 1 1 6 PROC. OF COLING-92, NANTES, AUG. 23-28. 1992 8. G: see section 7; 9. U=M(1-a)+M(1/2-a).</Paragraph> <Paragraph position="3"> Note that there arc an infinite number of parameter lists for each NDA.</Paragraph> <Paragraph position="4"> The mapping of an NDA onto a network starts by mapping basic entities of the NDA onto activity patterns. Such a pattern is called a code. Definition 5.2 Let M = (Q,E,ti, Qo, F) he an NDA, let t&quot; : (a,A e,M, N,p, pma,,ta,ta,tr,U) be a parameter list defined according to definition 5.1. The coding \[unction c is given by:</Paragraph> <Paragraph position="6"> such that for q E Q, and x E :E:</Paragraph> <Paragraph position="8"> The set of codes is then partitioned: into sets of activity patterns corresponding to NDA states, and into sets of patterns corresponding to input symbols.</Paragraph> <Paragraph position="9"> Definition 5.3 Let M = (Q,E,6, Qs, I&quot;) Ire an NDA, let P = (a, M, A t, N, p, PmaJ:, In, td, tr, U) be a parameter list defined according to definition 5.1.</Paragraph> <Paragraph position="10"> The set Pq of activity patteT~s for q E Q is:</Paragraph> <Paragraph position="12"> Then a network transition is defined ms a matrix operator specified by the network's storage prescription, and related to NDA transitions using the previously defined partition of the set of codes.</Paragraph> <Paragraph position="13"> Definition 5.4 Let M = (Q,~,6, Qo, F) bc an NDA, let P = (a, M, A t , N, p, Pma*, ta, t d, G, U) be a parameter list defned according to definition 5.1. and let a be an N-dimensional vector, with each component a. The set Tr of network transitions is:</Paragraph> <Paragraph position="15"> where each jt is an N x N matrix.</Paragraph> <Paragraph position="16"> This suffices to define a neuralonetwork acceptor. Definition 5.5 Let M = (Q,E,6, Qo, I&quot;) be an NDA, let P = (a, A*, A*, N, p, Pmax, to, td, tr, U) be a parameter list defined according to definition 5.1. A neural-network acceptor (NNA) defined for an NDA M that takes its parameters from P is a quadruple H = (T, f, U, S), where: 1. tim topology T, a list of neurons per layer, is: (N); 2. the activation fimction f is given by: Si = 1 if E~J/j.~j+A'(S~-a)>U (} if EjJej~J+A'('d~-a)<-U' 3. the update procedure is a Monte Carlo random walk.</Paragraph> <Paragraph position="17"> 4. the synaptic coefficients are given by: 'It =: E jl (q,.q.~) C Tr, and je .: A~I where I is the identity matrix. In order to construct activity patterns that can serve xs external input x for the network a component-wise OR operation is performed over the set P~ as defined in definition 5.3.</Paragraph> <Paragraph position="19"> At last a formal definition can be given of a temporal image a.s the set of all activity patterns for which there is a network input that makes such an activity pattern the network's next qua.st-stable state.</Paragraph> <Paragraph position="20"> Definition 5.7 Let M = (Q,E,6, Qc, F) I)e an NDA, let P = (a, M, M, N, p, p,.,,, t,, ta, it, U) be a parameter list defined according to definition 5.1, and let H = (T, f, U,S) be an NNA defined according to definition 5.5 that takes its parameters from P. A temporal image is a set: {c(q, x)\[ input OIt(P~) for H implies m C/(q'~) = 1 -- a}, a set Pq, is a temporal image of a quasi-stable stale {S} = c(q, x) of H if and only if J~q,,q,,:) is a transition of H.</Paragraph> <Paragraph position="21"> Now that we have a neural-uetwork acceptor, we triay also Wahl to use it to judge the legality of strings against a given grammar with it.</Paragraph> <Paragraph position="22"> ACTE's DE COLlNG-92, NANTES, 23-28 AO~I' 1992 1 1 7 I'ROC. OV COLING-92, NANTES, AUCl. 23-28, 1992 Definition 5.8 Let M = (Q,E,6, Qo, F) be an NDA, let P -(a, M, A t, N, p, Pm~x, is, Gt, G, U) be a parameter list defined according to definition 5.1, let H = (T, f, U, S) be an NNA defined according to definition 5.5 that takes its parameters from P, and let ai E E, q E Q0, and q' E F. H is said to accept a string w = al &quot; &quot; &quot; an if and only if H evolves through a series of temporal images that ends at Pq, if started at Pq, if OR(P~),..., OR(Pa,) appears as external input for the network.</Paragraph> <Paragraph position="23"> Next the correctness of an NNA is to be proven.</Paragraph> <Paragraph position="24"> Since an NNA is essentially a stochastic machine, this raises some extra problems. What we propose is to let various network parameters approach values for which the NNA has only exact properties, and prove that the uetwork is correct in the limit. Those &quot;various network parameters&quot; are defined below. Definition 5.9 Let M : (Q,E,6, Qo, F) he an NDA, let P = (a, M, M, N, p, P,na,,, ta, t,t~ t,., U) be a parameter list defined according to definition 5.1. A list of large parameters is a parameter list P such that: 1. a =_ cl/N where cl << N is a constant; 2. ~ ~ c~N/cl, where c2 is a small constant; 3. 0 < M < A~a,i.e. 0< M < c2; 4. Pm,~&quot; -- -{alna)-l N; 5. N ~oo; 6. t~lN ~oo.</Paragraph> <Paragraph position="25"> The following lemma states that for neural-network acceptors that take their parameters from a list of large parameters both the probability that the network reaches the next stable state within relaxation time, and the probability that only the patterns that are temporally related to the previous activity pattern will become active, tend to unity. Essentially it means that such networks operate exactly as prescribed by the synapses. Such networks are intrinsically correct.</Paragraph> <Paragraph position="26"> Lemma 5.10 intrinsical correctness Let M = (Q,~2,8, Qo, F) be an NDA, let P = (a, A', ~t, N, p, pm,::, ta, ta, tr, U) be a parameter list defined according to definitions 5.1 and 5.9, and let H = (T, f, U, S) be an NNA defined according to definition 5.5 that takes its parameters from P, then H is such that: 1. for all neurons Si in H, P(SI is selected) ~ 1 during network evolution; and 2. /or all actwity patterns {c} ~ Upu, y ~ QuZ, if l~ 7 PS &quot;, then P(~,~ = ~ = 1) ~ O, where i=l,...,N.</Paragraph> <Paragraph position="27"> Then the correctness of an NNA follows.</Paragraph> <Paragraph position="28"> Theorem 5.11 (correctness of the NNA) Let M = (Q,Z,6, Q0, F) be an NDA, let P = (a, A', A t, N, p, p,,o~, t,,, ta, G, U) be a parameter list defined according to definitions 5.1 and 5.9, let H = (71, f, U, S) be an NNA defined according to definition 5.5 that takes its parameters from P, and let w E E +, then the probability that M accepts a string w if and only if H accepts w, tends to unity.</Paragraph> <Paragraph position="29"> The proof of the theorem is given in \[4\].</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Simulation Results </SectionTitle> <Paragraph position="0"> As an'example we constructed au NNA that accepts the language generated by a grammar with productions: null It takes its parameters from the llst: (0.05,1.5,0.07,800,64,6.68N,5,5,5,1.46). It was tested with tile sentence &quot;. tile baby the woman comforted cried .&quot; The preceding full stop is a first input that awakens tile network. The graph below shows the time evolution of the network.</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 7 Complexity Aspects </SectionTitle> <Paragraph position="0"> If a neural-network acceptor h~.s to process a sequence of n input patterns, it (worst ease) first has to construct its initial temporal image, when awakened by an initial input (that is not considered a part of the sequence), and then has to build n further temporal images. The time required to process a string of AC'rE.S DE COLING-92, NaNTEs, 23-28 Aor~r 1992 1 I 8 PgOC. OF COLING-92, NAWrES. AUG. 23-28, 1992 length u as a function of the length of the input sequence is thus (T -- O)(n + 1). The constant r al~ depends oti t,. which is chosen to let the network satisfy a certain probability that it reaches the next state in relaxation. This probability is given by (1 - .~/)o where B = tr/N. The time complexity of the neural-network acceptor is O(n).</Paragraph> <Paragraph position="1"> The upper limit on the number p of stored temporal relations between single activity patterns is \[ Q 1:~ x I ~3 \[2. The number of neurons in a network is then cx { Q \]e x \] E 17, where e depends on the storage capacity and the chosen (low) probability that selection errors occur. The randomly chosen activity patterns overlap, so if a large number of patterns is active they may constitute, by overlap, other unselected activity patterns that will create tlreir own causal consequences. This is called a selection er= rot. The probability that this can happen can be estinrated by l'~,.,,,~(n) -~. 1 - 1'(,'C/, = 0), where the latter is: (-l - 2np ' 1.-2,,i)~ In this expressi .... P ---- ( ..... ,)/v wh .... -_- (~ ), ,: is the nmnber of activity patterns stored in the network, and m is the number of patterns that were supposed to be present in the mixture state. The probability q = 1 - p, and ,1 :_~ (aN) is the number of patterns that can bc constructed fi'om the aetiw~ neurons in the mix. S,, is the mnnber of wrongly selected activity patterns for a given n. l),rror(n) decreases with increa~ing N if the other parameters remain tixed.</Paragraph> <Paragraph position="2"> The space complexity of the network, exprc&sed as the nnmber of neurons, and as a function of the number of NI)A states is O(\[ Q 17). This is large because Q = Qs'r <1 F I ~ for some PDA M. tlowever things conld have been worse. Not using mixed temporal images to represent FSA states would necessitate the use of a mnnber of temporal images of order 2 IQ'I~, So compared to a more conventional use of lloptield models, this approach yields a redaction of the space complexity of the network.</Paragraph> </Section> class="xml-element"></Paper>