File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-1044_evalu.xml
Size: 3,788 bytes
Last Modified: 2025-10-06 13:59:39
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1044"> <Title>for Psycholinguistics</Title> <Section position="6" start_page="346" end_page="347" type="evalu"> <SectionTitle> 5 Consistency </SectionTitle> <Paragraph position="0"> In this section we prove the main results of this paper, namely that all of the estimation methods discussed in Section 3 always provide consistent PCFGs. We start with a technical lemma, central to our results, showing that a PCFG that minimizes the cross-entropy with a distribution over any set of derivations must be consistent.</Paragraph> <Paragraph position="1"> Lemma 2 Let G = (G,pG) be a proper PCFG and let pD be a probability distribution defined over some set D [?] D(G). If G minimizes function H(pD ||pG), then G is consistent.</Paragraph> <Paragraph position="2"> Proof. LetG = (N,S,S,R), and assume thatG is not consistent. We establish a contradiction. SinceG is not consistent, we must havesummationtextd,w pG(S d= w) < 1. Let then R(G) = (G,pR) be the renormalization of G, defined as in (15). For any derivation S d= w, w [?] S[?], with d in D, we can use Lemma 1 and</Paragraph> <Paragraph position="4"> In words, every complete derivation d in D has a probability in R(G) that is strictly greater than in G. But this means H(pD ||pR) < H(pD ||pG), against our hypothesis. Therefore, G is consistent and pG is a probability distribution over set D(G).</Paragraph> <Paragraph position="5"> Thus function H(pD ||pG) can be interpreted as the cross-entropy. (Observe that in the statement of the lemma we have avoided the term 'cross-entropy', sincecross-entropiesareonlydefinedforprobability distributions.) Lemma 2 directly implies that the cross-entropy minimization method in (12) always provides a consistent PCFG, since it minimizes cross-entropy for a distribution defined over a subset of D(G). We have already seen in Section 3 that the supervised MLE method is a special case of the cross-entropy minimization method. Thus we can also conclude that a PCFG trained with the supervised MLE method is alwaysconsistent. Thisprovidesanalternativeproof of a property that was first shown in (Chaudhuri et al., 1983), as discussed in Section 1.</Paragraph> <Paragraph position="6"> We now prove the same result for the unsupervised MLE method, without any restrictive assumption on the rules of our CFGs. This solves a problem that was left open in the literature (Chi and Geman, 1998); see again Section 1 for discussion. Let C and C be defined as in Section 3. We define the empirical distribution of C as</Paragraph> <Paragraph position="8"> Let G = (N,S,S,R) be a CFG such that C [?] L(G). Let D(C) be the set of all complete derivations for G that generate sentences in C, that is,</Paragraph> <Paragraph position="10"> Further, assume some probabilistic extensionG = (G,pG) of G, such that pG(d) > 0 for every d [?] D(C). We define a distribution over D(C) by</Paragraph> <Paragraph position="12"> We now apply to G the estimator in (12), in order to obtain a new PCFG ^G = (G, ^pG) that minimizes the cross-entropy betweenpD(C) and ^pG. According to Lemma 2, we have that ^G is a consistent PCFG.</Paragraph> <Paragraph position="13"> Distribution ^pG is specified by</Paragraph> <Paragraph position="15"/> <Paragraph position="17"> Since distribution pG was arbitrarily chosen, sub-ject to the only restriction that pG(d) > 0 for every d [?] D(C), we have that (23) is the growth estimator (10) already discussed in Section 3. In fact, for each w [?] L(G) and d [?] D(G), we have pG(d|w) = pG(d)pG(w). We conclude with the desired result, namely that a general form PCFG obtained at any iteration of the EM method for the unsupervised MLE is always consistent.</Paragraph> </Section> class="xml-element"></Paper>