File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-1043_intro.xml

Size: 2,491 bytes

Last Modified: 2025-10-06 14:03:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1043">
  <Title>Cross-Entropy and Estimation of Probabilistic Context-Free Grammars</Title>
  <Section position="3" start_page="335" end_page="335" type="intro">
    <SectionTitle>
2 Preliminaries
</SectionTitle>
    <Paragraph position="0"> Throughout this paper we use standard notation and definitions from the literature on formal languages and probabilistic grammars, which we briefly summarize below. We refer the reader to (Hopcroft and Ullman, 1979) and (Booth and Thompson, 1973) for a more precise presentation.</Paragraph>
    <Paragraph position="1"> A context-free grammar (CFG) is a tuple G = (N,S,R,S), where N is a finite set of nonterminal symbols, S is a finite set of terminal symbols disjoint from N, S [?] N is the start symbol and R is a finite set of rules. Each rule has the form A - a, where A [?] N and a [?] (S [?] N)[?]. We denote by L(G) and T(G) the set of all strings, resp., trees, generated by G. For t [?] T(G), the yield of t is denoted by y(t).</Paragraph>
    <Paragraph position="2"> For a nonterminal A and a string a, we write f(A,a) to denote the number of occurrences of A in a. For a rule (A - a) [?] R and a tree t [?] T(G), f(A - a,t) denotes the number of occurrences of A - a in t. We let f(A,t) =summationtexta f(A - a,t). A probabilistic context-free grammar (PCFG) is a pair G = (G,pG), with G a CFG and pG a function from R to the real numbers in the interval [0,1]. A PCFG is proper if for every A [?] N we havesummationtext a pG(A - a) = 1. The probability of t [?] T(G) is the product of the probabilities of all rules in t, counted with their multiplicity, that is,</Paragraph>
    <Paragraph position="4"> The probability of w [?] L(G) is the sum of the probabilities of all the trees that generate w, that is,</Paragraph>
    <Paragraph position="6"> A PCFG is consistent ifsummationtextt[?]T(G) pG(t) = 1.</Paragraph>
    <Paragraph position="7"> In this paper we write log for logarithms in base 2 and ln for logarithms in the natural base e. We also assume 0 * log0 = 0. We write Ep to denote the expectation operator under distribution p. In case G is proper and consistent, we can define the derivational entropy of G as the expectation of the information of parse trees in T(G), computed under distribution pG, that is,</Paragraph>
    <Paragraph position="9"> Similarly, for each A [?] N we also define the non-terminal entropy of A as</Paragraph>
    <Paragraph position="11"/>
  </Section>
class="xml-element"></Paper>
Download Original XML