File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-1044_intro.xml

Size: 3,153 bytes

Last Modified: 2025-10-06 14:03:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1044">
  <Title>for Psycholinguistics</Title>
  <Section position="3" start_page="343" end_page="343" type="intro">
    <SectionTitle>
2 Preliminaries
</SectionTitle>
    <Paragraph position="0"> In this paper we use mostly standard notation, as for instance in (Hopcroft and Ullman, 1979) and (Booth and Thompson, 1973), which we summarize below.</Paragraph>
    <Paragraph position="1"> A context-free grammar (CFG) is a 4-tupleG = (N,S,S,R) where N and S are finite disjoint sets of nonterminal and terminal symbols, respectively, S [?] N is the start symbol and R is a finite set of rules. Each rule has the form A - a, where A [?] N and a [?] (S [?]N)[?]. We write V for set S [?]N.</Paragraph>
    <Paragraph position="2"> Each CFG G is associated with a left-most derive relation =, defined on triples consisting of two strings g,d [?] V[?] and a rule pi [?] R. We write g pi= d if and only if g = uAgprime and d = uagprime, for some u [?] S[?], gprime [?] V[?], and pi = (A - a). A left-most derivation for G is a string d = pi1***pim, m [?] 0, such that g0 pi1= g1 pi2= *** pim= gm, for some g0,...,gm [?] V[?]; d = e (where e denotes the empty string) is also a left-most derivation. In the remainder of this paper, we will let the term derivation always refer to left-most derivation. If g0 pi1= *** pim= gm for some g0,...,gm [?] V[?], then we say that d = pi1***pim derives gm from g0 and we writeg0 d= gm; d = ederives anyg0 [?] V[?] from itself.</Paragraph>
    <Paragraph position="3"> A (left-most) derivation d such that S d= w, w [?] S[?], is called a complete derivation. If d is a complete derivation, we write y(d) to denote the (unique) string w [?] S[?] such that S d= w. We define D(G) to be the set of all complete derivations for G. The language generated by G is the set of all strings derived by complete derivations, i.e.,</Paragraph>
    <Paragraph position="5"> there is a one-to-one correspondence between complete derivations and parse trees for strings in L(G).</Paragraph>
    <Paragraph position="6"> For X [?] V and a [?] V[?], we write f(X,a) to denote the number of occurrences of X in a. For (A - a) [?] R and a derivation d, f(A - a,d) denotes the number of occurrences of A - a in d.</Paragraph>
    <Paragraph position="7"> We let f(A,d) =summationtextaf(A - a,d).</Paragraph>
    <Paragraph position="8"> A probabilistic CFG (PCFG) is a pair G = (G,pG), where G is a CFG and pG is a function from R to real numbers in the interval [0,1]. We say that G is proper if, for every A [?] N, we have</Paragraph>
    <Paragraph position="10"> Function pG can be used to assign probabilities to derivations of the underlying CFG G, in the following way. Ford = pi1***pim [?] R[?],m [?] 0, we define</Paragraph>
    <Paragraph position="12"> Consistency implies that the PCFG defines a probability distribution over both sets D(G) and L(G).</Paragraph>
    <Paragraph position="13"> If a PCFG is proper, then consistency means that no probability mass is lost in derivations of infinite length. All PCFGs in this paper are implicitly assumed to be proper, unless otherwise stated.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML