File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/67/c67-1036_metho.xml

Size: 13,286 bytes

Last Modified: 2025-10-06 14:11:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="C67-1036">
  <Title>THE~ENTROPY OF RECURSIVE MARKOV PROCESSES</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
BENNY BRODDA
</SectionTitle>
    <Paragraph position="0"> The work reported in this paper has been sponsored by Humanistiska forskningsr~det, Tekniska forskningsr~det and Riksbankens Jubileumsfond, Stockholm, Sweden. '.</Paragraph>
    <Paragraph position="1"> \</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE~ENTROPY OF RECURSIVE MARKOV PROCESSES
By
BENNY BRODDA
</SectionTitle>
    <Paragraph position="0"> KVAL, Fack, Stockholm 40, Sweden Summary The aim of this communication is to obtain an explicit formula for calculating the entropy of a source which behaves in accordance with the rules of an arbitrary Phrase Structure Grammar, in which relative probabilities are attached to the rules in the grammar. With this aim in mind we introduce an alte~rnative definition of the concept of a PSG as a set of self-embedded (re-Cursive) Finite State Grammars; when the probabilities are taken into account in such a grammar we call it a Recursive Markov Process.</Paragraph>
    <Paragraph position="1"> 1. In the first section we give a more detailed definition of what kind of Markov Processes we are going to generalize later on (in sec. 3), and we also outline the concept of entropy in an ordinary Markov source. More details &amp;quot;of information may be foupd~ e.g., in Khinchins &amp;quot;Mathematical Foundations of Information Theory&amp;quot;, N.Y. ~ 1957~ or &amp;quot;Information Theory&amp;quot; by R. Ash, N. Y. , 1965.</Paragraph>
    <Paragraph position="2"> A Markov Grammar is defined as a Markov Source with the following propertie s : Assume that there are n+ 1 states, say S O , S1, ..., Sn, in the source. S O is defined as the initial state and S is defined as the final state and the other n states are called intermediate states. We shall, of course, also have a transition matrix, M = (Pij), containing the, transition probabilities of the source. a) A transition from state S i to state S k is always accompanied by a production of a (non-zero) letter aik from a given finite alphabet. Transition to different states from one given state alway s produce different letters.</Paragraph>
    <Paragraph position="3"> b) From the&amp;quot; initial state, S0~ direct or indirect transitions should be possible to any other state in the source. From no state is a transition to S O allowed. c) From any state, direct or indirect transitions to the final state S should n be possible. From S n no transition is allowed to any other state (S n is an &amp;quot;absorbing state&amp;quot;).</Paragraph>
    <Paragraph position="4"> The work reported in this paper has been sponsored by Humanistiska forskningsr~det, Tekniska forskningsr~det and Riksbankens Jubileumsfond, Stockholm, Swederi.</Paragraph>
    <Paragraph position="5"> A (grammatical) sente'nce should now be defined as the (left-to-right) concatenation of the letters produced by the source, when passing from the initial state to the final state.</Paragraph>
    <Paragraph position="6"> The length of a sentence is defined as the number of letters in the sentence. To simplify matters without dropping much of generality we also require that d) The greatest common divisor for all the possible lengths of sentences is = l (i.e., the source becomes an aperiodic source, if it is short-circuited by identifying the final and initial states). ~-With the properties a - d above, the source obtained by identifying the final and initial states is an indecomposable, ergodic Markov process (cf. Feller, &amp;quot;Probability Theory and Its Applications&amp;quot;, ch. 15, N. Y. s 1950). In the transition matrix M for a Markov grammar of our type all elements in the first column are zero, and in the last row all elements are zero except the last one which is = 1. For a given Markov grammar we define the uncertainty or entropy, Hi, for each state S i, i = 0, 1 .... , n, as:</Paragraph>
    <Paragraph position="8"> We also define the entropy, H or H(M), for the grammar as</Paragraph>
    <Paragraph position="10"> where x = (x0, x z, ..., Xn_l) is defined as the stationary distribution-~ the source obtained when S O and S n are identified; thus x is defined as the (unique) solution to the set of simultaneous equations</Paragraph>
    <Paragraph position="12"> where M 1 is formed by shifting the last and first columns and then omitting the last row and column. The mean sentence length. ~, of the set of grammatical sentences can now be easily calculated as</Paragraph>
    <Paragraph position="14"> (cf. Feller, op. tit.)</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="6" type="metho">
    <SectionTitle>
2. Embedded Grammars
</SectionTitle>
    <Paragraph position="0"> We now assume that we have two Markov grammars, M and M1, with states S O , S 1 .... , S n, and T o , T I, ..., T m, respectively, where S O ands n, T O and T m are the corresponding initial and final states. Now consider two states S i and S k in the grammar M; assume that the corresponding transition probability is = Pik&amp;quot; We now transform the grammar, M1, into a new one, M\], by embedding the grammar M 2 in M 1 between the states S i and Sk, an operation which is performed by identifying the states T O and T with the m states S i and S k respectively. Or, to be more precise, assume that in the grammar M 1 the transitions to the states Tj, j~l, has the probabilities q0j&amp;quot; Then, in the grammar M', transitions to a state T. from the state S. will 3 1 take place with the probability =.Pikq0 j. A return to the state S k in the &amp;quot;main&amp;quot; grammar from an intermediate state Tj in M 1 takes place with the probability qjm&amp;quot; With the conditions above fulfilled, we propose that the entropy for the. composed grammar be calculated according to the formula:</Paragraph>
    <Paragraph position="2"> the inherent probability of being in the state S. under the same conditions.</Paragraph>
    <Paragraph position="3">  ~1 is the mean sentence length of the sentences produced by the grammar M 1 alone. (It is quite natural that this number appears as a weight in the formula, since if one is producing a sentence according to the grammar M and arrives at the state S i and from there &amp;quot;dives&amp;quot; into the grammar M1, then ~1 is the expected waiting time for emerging again in the main grammar M.) The factor xiPik may be interpreted as the combined probability of ever arriving at.S i and there choosing the path over to M 1 (you may, of course, choose quite another path from Si).</Paragraph>
    <Paragraph position="4"> The proof of formula (4) is very'straightforward, once the premises according to the above have been given, and we omit it here, as it does not give much extra insight to the theory. THe formula may be extended to the case when there are:more than one sub-grammar embedded in the grammar M', by adding similar terms as the one standing, to the right in the numerator and the denominator. The important thing here is that the factors of the type x.p.~ depend only on the probability matrix for the grammar M and are de- 1  1 pendent of the sub-grammars involved.</Paragraph>
    <Paragraph position="5"> 3. Recur sive or Self-embedded Sources ~-. ....</Paragraph>
    <Paragraph position="6">  It is now quite natural to allow a grammar to have itself as a sub-grammar or to allow a grammar M 1&amp;quot; to contain a grammar M~. which, in its turn, contains M 1, and so on. The grammars thus obtained cannot, howeverB be re-written as an ordinary Markov grammar. The relation between an ordinary Markov grammar and a recursive one is~exactly similar to the relation between Finite state Languages and Phrase Structure Languages.</Paragraph>
    <Paragraph position="7"> To be more precise, assume that we have a set of Markov grammars M~ M l ..... M~ where MI 0 is called the main grammar and in the sense that the process always starts at the initial state in M ~ and ceases when it reaches the final state in M 0. Each of the grammars may contain any number of the others (and itself) as sub-grammars. The only restriction is that from any state in any one of the grammars there should exist a path which ends up at the final state of M O.</Paragraph>
    <Section position="1" start_page="1" end_page="6" type="sub_section">
      <SectionTitle>
Remark
</SectionTitle>
      <Paragraph position="0"> If we interpret a source of our kind as a Phrase Structure Language, the re-writing rules are all of the following kind: (5) S i -* Aik + Sk o..r_r S n -, #; where the S' s are all non-terminal symbols. (They stand for the names of the states in the sources - M~, l~i I ..... M~and where S O is assumed to be the initial symbol /the Chomskyan S/ and S n is the terminating state which produces the sentence delimiter #. The symbols Aik ar e either terminal symbols /letters from a finite alphabet/ or non-terminal symbols equal to the name of the initial state in one of the grammars M~, Ni~ ..... M~ /one may  also say that Aik grammar/.) :i: stands as an abbreviation for an arbitrary sentence of that We associate each grammar M! with the grammar M., j = 0, 12 .... , N, by 3 3 just considering it as a non-recursive one, thaf is, we consider all the symbols Aik as terminal symbols (even if they are:'not). The grammars thus obtained are ordinarily Markov grammars according to our definition, and the entropies Hj = H(Mj) are easily computed according to formula (1), as are the stationary distributions /formula (2)/. The follwoing theorem shows how the entropies H! for the fully recursive grammars M! are connected with the J 3 numbers H.. J Theorem The entropy H! for a set of recursive Markov grammar Mj, j = 0, 1, J can be calculated according to the formula</Paragraph>
      <Paragraph position="2"> j=0, 1 .... ,N.</Paragraph>
      <Paragraph position="3"> Here the factors Yjk are dependent only of the probability matrix of the * grammar and the numbers ~k defined as the mean sentence length of the sentences of the grammar M~, k = 0, 1, .... N, and computable according to lemma below.</Paragraph>
      <Paragraph position="4"> H~ is the entropy for the grammar.</Paragraph>
      <Paragraph position="5"> The theorem above is a direct application for the grammar of formula (4), sec. 2.</Paragraph>
      <Paragraph position="6"> The coefficients Yjk in formula (6) can, more precisely, be calculated as a sum of terms of the type xiPim with the indices (i, m) are where the gram!&amp;quot; x i and are the components the sta- mar M~ appears in the grammar M3~ Pim tionary distribution and probability matrix for the grammar M.o , J Assume now that we have a Markov grammar of our type, but for which each transition will take a certain amount of time. A very natural question is then: &amp;quot;What is the expected time to produce a sentence in that language ?&amp;quot; The answer is in the following lemma.</Paragraph>
      <Paragraph position="7"> Lemma Let M be a_MMarkov grammar with states Si, i= O, S are the initial and final states respectively, n 1 .... , n, where S O and Assume that each transition S i -. S k will take Ylk time units. Denote the expected time for arrival at S given that the grammar is in state n S i by ti, i = 0, I, ...~ n~ (thus t o is the expected time for producing asentence). The times t I will then fulfill the following set of simultaneously linear</Paragraph>
      <Paragraph position="9"> With more convenient notations we can write (7) as</Paragraph>
      <Paragraph position="11"> where E is the unit matrix, P is the probability matrix (with P = 0) and nn Pt is the vector with components Pi (t) =~ Pim tim' i = 0, 1 .... , n.</Paragraph>
      <Paragraph position="12"> m The application of ~he lemma for computing the numbers ~k in formula (6) is now the following.</Paragraph>
      <Paragraph position="13"> The transition times of the lemma are, of course, the expected time (or &amp;quot;lengths&amp;quot; as we have called it earlier) for passing via a sub-grammar of the grammar under consideration. Thus the number tik i-~\]itself the unknown entitle s ~k&amp;quot;  For each of the sub-grammars M~, j : 0, I, ..., N, we geta set of linear J equations of type (7) for determining the vectors t of 1emma. The first component of this vector, i.e.j the number t O , is then equal to the expected length, ~, of the sentences of that g~ammar. (Unfortunately, we have to compute extra the expected time for going from any state of the sub-grammars to the corresponding final state.) The total number of unknowns involved when computing the entropy of our grammar (i. e. , the entropy H~) is equal to (the total number of states in all our sub-grammars) plus (the number of sub-grammars).</Paragraph>
      <Paragraph position="14"> This is also the number of equatior~,_for we haven + 1 e~uations from formula (6) and then (n + 1) sets of equations of the type (7). We assert that all these simultaneous equations a~e solvable, if the grammar fulfills the conditions we earlier stated for the grammar, i.e., that from 'each state in any sub-grammar exists at least one path to the final state of that grammar.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML