File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1013_concl.xml
Size: 4,335 bytes
Last Modified: 2025-10-06 13:53:51
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1013"> <Title>Partially Distribution-Free Learning of Regular Languages from Positive Samples</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Discussion </SectionTitle> <Paragraph position="0"> The convergence of these sorts of algorithms has been studied before in the identi cation in the limit framework, but previous proofs have not been completely convincing (Carrasco and Oncina, 1999), and this criterion gives no guide to the practical utility of the algorithms since it applies only asymptotically. The partially distribution free learning problem we study here is novel. as is the extension of the results of (Ron et al., 1995) to cyclic automata and thus to in nite languages.</Paragraph> <Paragraph position="1"> Before we examine our results critically, we would like to point out some positive aspects of the algorithm. First, this class of algorithms is in practice e cient and reliable. This particular algorithm is designed to have a provably good worst-case performance, and thus we anticipate its average performance on naturally occurring data to be marginally worse than comparable algorithms. We have established that we can learn an exponentially large family of in nite languages using polynomial amounts of data and computation. Mild properties of the input distributions su ce to guarantee learnability. The algorithm we present here is however not intended to be e cient or cognitively plausible: our intention was to nd one that allowed a simple proof.</Paragraph> <Paragraph position="2"> The major weakness of this approach in our opinion is that the parameter n in the sample complexity polynomial is the number of states in the PDFA generating the distribution, and not the number of states in the minimal FA generating the language. Since determinisation of nite automata can cause exponential blow ups this is potentially a serious problem, depending on the application domain. A second problem is the need for a distinguishability parameter, which again in speci c cases could be exponentially small. An alternative to this is to de ne a class of -distinguishable automata where the distinguishability is bounded by an inverse polynomial in the number of states. Formally this is equivalent, but it has the e ect of removing the parameter from the sample complexity polynomial at the cost of having a further restriction on the class of distributions. Indeed we can deal with the previous objection in the same way if necessary by requiring the number of states in the generating PDFA to be bounded by a polynomial in the minimal number of states needed to generate the target language. However both of these limitations are unavoidable given the negative results previously discussed.</Paragraph> <Paragraph position="3"> Appendix Proof of Lemma 1.</Paragraph> <Paragraph position="4"> We write p(s) for the true probability and ^p(s) = c(s)=m for the empirical probability of the string in the sample { i.e. the maximum likelihood estimate. We want to bound the probability over an in nite number of strings, which rules out a naive application of Hoe ding bounds. It will su ce to show that every string with probability less than 0=2 will have empirical probability less than 0, and that all other strings will have probability within 0 of their true values. The latter is straightforward: since there are at most 2= 0 of these frequent strings. For any given frequent string s, by Hoe ding bounds:</Paragraph> <Paragraph position="6"> So the probability of making an error on a frequent string is less than 4= 0e 2m0 02.</Paragraph> <Paragraph position="7"> Consider all of the strings whose probability</Paragraph> <Paragraph position="9"> We de ne Srare = S1k=1 Sk. The Cherno bound says that for any > 0, for the sum of n bernouilli variables with prob p and</Paragraph> <Paragraph position="11"> Now we bound each group separately, using the binomial Cherno bound where n = m 0 > mp (which is true since p < 0)</Paragraph> <Paragraph position="13"> This bound decreases with p, so we can replace this for all strings in Sk with the upper bound for the probability, and we can replace</Paragraph> <Paragraph position="15"> which establishes the result.</Paragraph> </Section> class="xml-element"></Paper>