XML Viewer - n06-1044

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-1044_evalu.xml
Size: 3,788 bytes
Last Modified: 2025-10-06 13:59:39
<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1044">
  <Title>for Psycholinguistics</Title>
  <Section position="6" start_page="346" end_page="347" type="evalu">
    <SectionTitle>
5 Consistency
</SectionTitle>
    <Paragraph position="0"> In this section we prove the main results of this paper, namely that all of the estimation methods discussed in Section 3 always provide consistent PCFGs. We start with a technical lemma, central to our results, showing that a PCFG that minimizes the cross-entropy with a distribution over any set of  derivations must be consistent.</Paragraph>
    <Paragraph position="1"> Lemma 2 Let G = (G,pG) be a proper PCFG and let pD be a probability distribution defined over some set D [?] D(G). If G minimizes function H(pD ||pG), then G is consistent.</Paragraph>
    <Paragraph position="2"> Proof. LetG = (N,S,S,R), and assume thatG is  not consistent. We establish a contradiction. SinceG is not consistent, we must havesummationtextd,w pG(S d= w) &lt; 1. Let then R(G) = (G,pR) be the renormalization of G, defined as in (15). For any derivation S d= w, w [?] S[?], with d in D, we can use Lemma 1 and</Paragraph>
    <Paragraph position="4"> In words, every complete derivation d in D has a probability in R(G) that is strictly greater than in G. But this means H(pD ||pR) &lt; H(pD ||pG), against our hypothesis. Therefore, G is consistent and pG is a probability distribution over set D(G).</Paragraph>
    <Paragraph position="5"> Thus function H(pD ||pG) can be interpreted as the cross-entropy. (Observe that in the statement of the lemma we have avoided the term 'cross-entropy', sincecross-entropiesareonlydefinedforprobability distributions.) Lemma 2 directly implies that the cross-entropy minimization method in (12) always provides a consistent PCFG, since it minimizes cross-entropy for a distribution defined over a subset of D(G). We have already seen in Section 3 that the supervised MLE method is a special case of the cross-entropy minimization method. Thus we can also conclude that a PCFG trained with the supervised MLE method is  alwaysconsistent. Thisprovidesanalternativeproof of a property that was first shown in (Chaudhuri et al., 1983), as discussed in Section 1.</Paragraph>
    <Paragraph position="6"> We now prove the same result for the unsupervised MLE method, without any restrictive assumption on the rules of our CFGs. This solves a problem that was left open in the literature (Chi and Geman, 1998); see again Section 1 for discussion. Let C and C be defined as in Section 3. We define the empirical distribution of C as</Paragraph>
    <Paragraph position="8"> Let G = (N,S,S,R) be a CFG such that C [?] L(G). Let D(C) be the set of all complete derivations for G that generate sentences in C, that is,</Paragraph>
    <Paragraph position="10"> Further, assume some probabilistic extensionG = (G,pG) of G, such that pG(d) &gt; 0 for every d [?] D(C). We define a distribution over D(C) by</Paragraph>
    <Paragraph position="12"> We now apply to G the estimator in (12), in order to obtain a new PCFG ^G = (G, ^pG) that minimizes the cross-entropy betweenpD(C) and ^pG. According to Lemma 2, we have that ^G is a consistent PCFG.</Paragraph>
    <Paragraph position="13"> Distribution ^pG is specified by</Paragraph>
    <Paragraph position="15"/>
    <Paragraph position="17"> Since distribution pG was arbitrarily chosen, sub-ject to the only restriction that pG(d) &gt; 0 for every d [?] D(C), we have that (23) is the growth estimator (10) already discussed in Section 3. In fact, for each w [?] L(G) and d [?] D(G), we have pG(d|w) = pG(d)pG(w). We conclude with the desired result, namely that a general form PCFG obtained at any iteration of the EM method for the unsupervised MLE is always consistent.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML