File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1069_intro.xml

Size: 4,901 bytes

Last Modified: 2025-10-06 14:02:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1069">
  <Title>Probabilistic Parsing Strategies</Title>
  <Section position="3" start_page="3" end_page="3" type="intro">
    <SectionTitle>
2 Preliminaries
</SectionTitle>
    <Paragraph position="0"> In this paper we assume some familiarity with definitions of (P)CFGs and (P)PDAs. We refer the reader to standard textbooks and publications as for instance (Harrison, 1978; Booth and Thompson, 1973; Santos, 1972).</Paragraph>
    <Paragraph position="1"> A CFGGis a tuple ( ;N;S;R), with and N the sets of terminals and nonterminals, respectively, S the start symbol and R the set of rules. In this paper we only consider left-most derivations, represented as strings d2R and simply called derivations. For ; 2( [N) , we write )d with the usual meaning. If = S and = w2 , we call d a complete derivation of w. We say a CFG is reduced if each rule in R occurs in some complete derivation.</Paragraph>
    <Paragraph position="2"> A PCFG is a pair (G;p) consisting of a CFG G and a probability function p from R to real numbers in the interval [0;1]. A PCFG is proper if P =(A! )2R p( ) = 1 for each A 2 N.</Paragraph>
    <Paragraph position="3"> The probability of a (left-most) derivation d =</Paragraph>
    <Paragraph position="5"> In this paper we will mainly consider push-down transducers rather than push-down automata. Push-down transducers not only compute derivations of the grammar while processing an input string, but they also explicitly produce output strings from which these derivations can be obtained. We use transducers for two reasons. First, constraints on the output strings allow us to restrict our attention to &amp;quot;reasonable&amp;quot; parsing strategies. Those strategies that cannot be formalized within these constraints are unlikely to be of practical interest. Secondly, mappings from input strings to derivations, such as those realized by push-down transducers, turn out to be a very powerful abstraction and allow direct proofs of several general results.</Paragraph>
    <Paragraph position="6"> Contrary to many textbooks, our push-down devices do not possess states next to stack symbols.</Paragraph>
    <Paragraph position="7"> This is without loss of generality, since states can be encoded into the stack symbols, given the types of transitions that we allow. Thus, a PDT A is a 6-tuple ( 1; 2; Q; Xin; Xfin; ), with 1 and 2 the input and output alphabets, respectively, Q the set of stack symbols, including the initial and final stack symbols Xin and Xfin, respectively, and the set of transitions. Each transition has one of the following three forms: X 7! XY, called a push transition, YX 7!Z, called a pop transition, or X x;y7! Y, called a swap transition; here X, Y, Z2Q, x2 1[f&amp;quot;gis the input read by the transition and y 2 2 is the written output. Note that in our notation, stacks grow from left to right, i.e., the top-most stack symbol will be found at the right end. A configuration of a PDT is a triple ( ;w;v), where 2 Q is a stack, w 2 1 is the remaining input, and v 2 2 is the output generated so far. Computations are represented as strings c 2 . For configurations ( ;w;v) and ( ;w0;v0), we write ( ;w;v) 'c ( ;w0;v0) with the usual meaning, and write ( ;w;v) ' ( ;w0;v0) when c is of no importance. If (Xin;w;&amp;quot;) 'c (Xfin;&amp;quot;;v), then c is a complete computation of w, and the output string v is denoted out(c). A PDT is reduced if each transition in occurs in some complete computation. null Without loss of generality, we assume that combinations of different types of transitions are not allowed for a given stack symbol. More precisely, for each stack symbol X 6= Xfin, the PDA can only take transitions of a single type (push, pop or swap). A PDT can easily be brought in this form by introducing for each X three new stack symbols Xpush, Xpop and Xswap and new swap transitions X &amp;quot;;&amp;quot;7! Xpush, X &amp;quot;;&amp;quot;7! Xpop and X &amp;quot;;&amp;quot;7! Xswap. In each existing transition that operates on top-of-stack X, we then replace X by one from Xpush, Xpop or Xswap, depending on the type of that transition. We also assume that Xfin does not occur in the left-hand side of a transition, again without loss of generality. null A PPDT is a pair (A;p) consisting of a PDT A and a probability functionpfrom to real numbers in the interval [0;1]. A PPDT is proper if</Paragraph>
    <Paragraph position="9"> such that there is at least one transition X 7!</Paragraph>
    <Paragraph position="11"> such that there is at least one transition YX7! Z, Z2Q.</Paragraph>
    <Paragraph position="12"> The probability of a computation c = 1 m, i 2 for 1 i m, is p(c) =Q m i=1 p( i). The probability of a stringw isp(w) =P (Xin;w;&amp;quot;)'c(Xfin;&amp;quot;;v) p(c). A PPDT is consistent if w2 p(w) = 1. A PPDT (A;p) is reduced if Ais reduced.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML