File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1013_metho.xml
Size: 7,165 bytes
Last Modified: 2025-10-06 14:08:43
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1013"> <Title>Partially Distribution-Free Learning of Regular Languages from Positive Samples</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Preliminaries </SectionTitle> <Paragraph position="0"> We will write for letters and s for strings.</Paragraph> <Paragraph position="1"> We have a nite alphabet , and is the free monoid generated by , i.e. the set of all strings with letters from , with the empty string (identity). For s 2 we de ne jsj to be the length of s. The subset of of strings of length d is denoted by d. A distribution or stochastic language D over is a function D : ! [0; 1] such that Ps2 D(s) = 1. The L1 norm between two distributions is de ned as maxs jD1(s) D2(s)j. For a multiset of strings S we write ^S for the empirical distribution dened by that multiset { the maximum likelihood estimate of the probability of the string.</Paragraph> <Paragraph position="2"> A probabilistic deterministic nite state automaton is a mathematical object that stochastically generates strings of symbols.</Paragraph> <Paragraph position="3"> It has a nite number of states one of which is a distinguished start state. Parsing or generating starts in the start state, and at any given moment makes a transition with a certain probability to another state and emits a symbol. We have a particular symbol and state which correspond to nishing.</Paragraph> <Paragraph position="4"> A PDFA A is a tuple (Q; ; q0; qf; ; ; ) , where Q is a nite set of states, is the alphabet, a nite set of symbols, q0 2 Q is the single initial state, qf 62 Q is the nal state, 62 is the nal symbol, : Q [f g ! Q[fqfg is the transition function and : Q [f g ! [0; 1] is the next symbol probability function. (q; ) = 0 when (q; ) is not de ned.</Paragraph> <Paragraph position="5"> We will sometimes refer to automata by the set of states. All transitions that emit go to the nal state. In the following and will be extended to strings recursively in the normal way.</Paragraph> <Paragraph position="6"> The sum of the output transition from each states must be one: so for all q 2 Q</Paragraph> <Paragraph position="8"> Assuming further that there is a non zero probability of reaching the nal state from each state: i.e.</Paragraph> <Paragraph position="9"> 8q 2 Q9s 2 : (q; s ) = qf ^ (q; s ) > 0 (2) the PDFA then de nes a probability distribution over , where the probability of generating a string s 2 is PA(s) = (q0; s ). We will write L(A) for the support of this distribution,</Paragraph> <Paragraph position="11"> distribution of the state q.</Paragraph> <Paragraph position="12"> We say that two states q; q0 are -distinguishable if L1(Pq; Pq0) > for some > 0. An automaton is -distinguishable i every pair of states is -distinguishable. Since we can merge states q; q0 which have L1(Pq; Pq0) = 0, we can assume without loss of generality that every PDFA has a non-zero distinguishability.</Paragraph> <Paragraph position="13"> Note that (q0; s) where s 2 is the pre x probability of the string s, i.e. the probability that the automaton will generate a string that starts with s.</Paragraph> <Paragraph position="14"> We will use a similar notation, neglecting the probability function for (non-probabilistic) deterministic nite-state automata (DFAs).</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Algorithm </SectionTitle> <Paragraph position="0"> We shall rst state our main result.</Paragraph> <Paragraph position="1"> Theorem 1 For any regular language L, when samples are generated by a PDFA A where L(A) = L, with distinguishability and number of states n, for any ; > 0, the algorithm LearnDFA will with probability at least 1 return a DFA H which de nes a language L(H) that is a subset of L with PA(L(A) L(H)) < .</Paragraph> <Paragraph position="2"> The algorithm will draw a number of samples bounded by a polynomial in j j; n; 1= ; 1= ; 1= , and the computation is bounded by a polynomial in the number of samples and the total length of the strings in the sample.</Paragraph> <Paragraph position="3"> We now de ne the algorithm LearnDFA. We incrementally construct a sequence of DFAs that will generate subsets of the target language. Each state of the hypothesis automata will represent a state of the target and will have attached a multiset of strings that approximates the distribution of strings generated by that state. We calculate the following quantities m0 and N from the input parameters.</Paragraph> <Paragraph position="5"> We start with an automaton that consists of a single state and no transitions, and the attached multiset is a sample of strings from the target.</Paragraph> <Paragraph position="6"> At each step we sample N strings from the target distribution. This re-sampling ensures the independence of all of the samples, and allows us to apply bounds in a straightforward way.</Paragraph> <Paragraph position="7"> For each state u in the hypothesis automaton and letter in the alphabet, such that there is no arc labelled with out of u we construct a candidate node (u; ) which represents the state reached from u by the transition labelled with .</Paragraph> <Paragraph position="8"> For each string in the sample, we trace the corresponding path through the hypothesis. When we reach a candidate node, we remove the preceding part of the string, and add the rest to the multiset of the candidate node. Otherwise, in the case when the string terminates in the hypothesis automaton we discard the string.</Paragraph> <Paragraph position="9"> After we have done this for every string in the sample, we select a candidate node (u; ) that has a multiset of size at least m0. If there is no such candidate node, the algorithm terminates, Otherwise we compare this candidate node with each of the nodes already in the hypothesis. The comparison we use calculates the L1-norm between the empirical distributions of the two multisets and says they are similar if this distance is less than =4. We will make sure that with high probability these empirical distributions are close in the L1-norm to the su x distributions of the states they represent.</Paragraph> <Paragraph position="10"> Since we know that the su x distributions of di erent states will be at least apart, we can be con dent that we will only rarely make mistakes. If there is a node, v, which is similar then we conclude that v and (u; ) represent the same state. We therefore add an arc labelled with leading from u to v. If it is not similar to any node in the hypothesis, then we conclude that it represents a new node, and we create a new node u0 and add an arc labelled with leading from u to u0. In this case we attach the multiset of the candidate node to the new node in the hypothesis. Intuitively this multiset will be a sample from the su x distribution of the state of the target that it represents. We then discard all of the candidate nodes and their associated multisets, but keep the multisets attached to the states of the hypothesis, and repeat.</Paragraph> </Section> class="xml-element"></Paper>