File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-2150_intro.xml

Size: 6,386 bytes

Last Modified: 2025-10-06 14:05:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2150">
  <Title>Y=c fldy 17{.c, trlct, c.d St,()chasti(: (;;rammars</Title>
  <Section position="3" start_page="0" end_page="929" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> l\[' W( COilSJdcr ~ natural lttnguages as a sLr~,lcLtire lno(lelled by a formal grammar we do not consider il any more as a language thal. is used. Formal (contextfree) grammars are often advocated as a model lbr l, he &amp;quot;linguistic competence&amp;quot; of an ideal }tal;ural l~mguage user. It is also 1,oticed that this mat}mmatical concept is far froln at su{Iicieut model for describing all aspects of the language. What cannot be expressed by this model is tile fact thai. some sentences or phrases are more likeley to occur than others. '\]'his notion of occurrence refers to the use of language and thereR)re considering this kind of statistical knowledge about language has to do with the pragmatics of language laid down in a corpus of the language. With a particular context of use in mind a syntactically ambiguous sentence will often have a most, likely meaning and hence a most likely mlalysis. Some of the shortcomings of the pure (context-free) grammar model can maybe be solved by stochastic gralmnars, a model that makes it possible to incorporate certain statistical facts about the language nse into a model of the possible structures of sentences as we conceive them from a mathematical, formal, point of view. Natural languages are now seen as stochastic; a user of a langnage as a stochastic source producing sentences. A stochastic language over some alphabet E is simply a formal language L over i3 together with a probability function 05 assigning to each string a: in the language a real mn-nber 05(a,) in \[0, 1\]. Since 05(a:) is interpreted as the chance that tile event x, or the event that a language-source produces x, will occur, it will be clear that the sum of 4)(x) where x ranges over all possible sentences is equal to one. Tile stochastic language is called context free if tile language L is context-free.</Paragraph>
    <Paragraph position="1"> The usual grammatical model for a stochastic context-free language is a context-free gramm~u' I,ogcther with a probability function f that assigns a real mnnber in \[0, 1\] to each of the productions of the grammar. The Ineaning of this Nnction is the followint. A step in a derivation of a sententia\] form, in which a nonterminal A is rewritten using production p has chance f(p) to occur, independent of which A is rewri{,ten in the sentential form and indepelident of the history of the proces l, hat produced the sentential form. The probability of a derivation(- tree) is the product of the probabilities of bile derivation steps that produces the tree. The probability of a sentence generated by tin, gramnm.r is the sum of the probabilities of all the. trees of' a sentence. So given a stochastic grammar we can compute the probabilities of all its sentences. The distribution language generated by a stochastic grammar G, I)L(G), is defiued as the set o\[' all deriw~tion trees with their probabilities. 'Che stochast, ic language generated by a stochastic grammar (7, HL(G), is defined as the set of all sentences generated by the grammar with their probabilities.</Paragraph>
    <Paragraph position="2"> A stochastic gr~mmlar G is an adequate model of a language L if on its basis we can correctly compute tile probabilities of the sentences in the hmgnage L.</Paragraph>
    <Paragraph position="3"> Of course this assumes a statistical analysis of a language corpus. A stochastic grammar that generates a stochastic language is called consistent.</Paragraph>
    <Paragraph position="4"> Definition 1.1 A stochastic grammar G is called consistent if for the probability measure p reduced b?l (; onto the laT~g~agc generated by its underlying gram-Otherwise the grammar is called inconsistent.</Paragraph>
    <Paragraph position="5"> Not all stochastic grammars generate a stochastic language. Even proper, and reduced grammars 1 do not 1A granlmed is culled proper if for all nontcrminals A, talc stlnl of the probabilities assigned to the rules for A is 1. A grammar is c~dled reduced if all nont.erminals are reachable and can produce a terminal st, ring.</Paragraph>
    <Paragraph position="6">  necessarily generate a stochastic language. This is illustrated in the following example. &amp;quot; Example 1.1 Consider the stochastic grammar G with nonterminal set VN = {S}, terminal set V'r = {a}. The productions with their probabilities are given by:  Following the technique presented in \[2\] we find that the production generating function is given by gl(St ) = qs~ + 1- q, and that, the first moment matrix /2 is given by \[2q\]. We can conclude that the grammar is consistent if and only if q _&lt; 1/2. For details we refer to \[5\]. No~ice thai, all the different trees of string a r~ have Lhe same probability. Hence, they cannot be distinguished according to their probabilities. \[\] It has been noticed that the usual model of a stochastic grammar as presented above, and which we from now on call the unrestricted stochastic gram.mar model, has some disadvantages for modelling &amp;quot;real&amp;quot; languages. In this paper we present a more adequate model, the weakly restricted stochastic grammar model. We give necessary and sufficient conditions to test in an efficient way whether such a grammar defines a stochastic language. Moreover, we will show that these grammars can be transformed into an equivalent model of the usual type. The nice thing about the new model is that it models &amp;quot;contextdependent&amp;quot; probabilities of production-rules directly in terms of the grammar specification of the language and not in terms of some particular implementation of the grammar as a parser. The latter is done by Briscoe and Carroll \[3\] by assigning probabilities to the transitions of the LR-parser constructed for the grammar. In section 2 weakly restricted grammars are introduced, in section 3 conditions for their consistency are investigated; in section 4 it is proven that weakly restricted grammars and unrestricted grammars generate the same class of stochastic languages and section 5 presents the inside-outside algorithm for weakly restricted grammars.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML