File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/82/j82-3004_abstr.xml
Size: 6,614 bytes
Last Modified: 2025-10-06 13:46:02
<?xml version="1.0" standalone="yes"?> <Paper uid="J82-3004"> <Title>Coping with Syntactic Ambiguity or How to Put the Block in the Box on the Table 1</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> 2. Formal Power Series </SectionTitle> <Paragraph position="0"> This section will make the linear systems analogy more precise by relating context-free grammars to formal power series (polynominals). Formal power series are a well-known device in the formal language literature (e.g., Salomaa 1973) for developing the algebraic properties of context-free grammars. We introduce them here to establish a formal basis for our upcoming discussion of processing issues.</Paragraph> <Paragraph position="1"> The power series for grammar (5a) is (5b).</Paragraph> <Paragraph position="2"> (5a) NP -,.John I NPandNP (5b) NP -- John + John and John + 2John and John and John + 5John and John and John and John + 14John and John and John and John and John + ...</Paragraph> <Paragraph position="3"> Each term consists of a sentence generated by the grammar and an ambiguity coefficient 3 which counts how many ways the sentence can be generated. For example, the sentence &quot;John&quot; has one parse tree (6a) \[John\] 1 tree because the zero-th coefficient of the power series is one. Similarly, the sentence &quot;John and John&quot; also has one tree because its coefficient is one, (6b) \[John and John\] 1 tree and &quot;John and John and John&quot; has two because its coefficient is two, (6c) \[\[John and John\] and John\], 2 trees \[John and \[John and John\]\] and &quot;John and John and John and John&quot; has five, (6d) \[John and \[\[John and John\] and John\]\], 5 trees \[John and \[John and \[John and John\]\]\], \[\[\[John and John\], and John\] and John\], \[\[John and \[John and John\]\] and John\], \[\[John and John\] and \[John and John\]\] and so on. The reader can verify for himself that &quot;John and John and John and John and John&quot; has fourteen trees.</Paragraph> <Paragraph position="4"> Note that the power series encapsulates the ambiguity response of the system (grammar) to all possible input sentences. In this way, the power series is analogous to the impulse response in electrical engineering, which encapsulates the response of the system (circuit) to all possible input frequencies. (Ambiguity coefficients bear a strong resemblance to frequency coefficients in Fourier analysis.) All of these transformed representation systems (e.g., power series, impulse response, and Fourier series) provide a complete description of the system with no loss of information 4 (and no heuristic approximations, for example, search strategies (Kaplan 1972)). Trans- null 3 The formal language literature (Harrison 1978, Salomaa 1973) uses the term support instead of ambiguity coefficient. 140 American Journal of Computational Linguistics, Volume 8, Number 3-4, July-December 1982 Kenneth Church and Ramesh Patil Coping with Syntactic Ambiguity forms are often very useful because they provide a different point of view. Certain observations are more easily seen in the transform space than in the original space, and vice versa.</Paragraph> <Paragraph position="5"> This paper will discuss several ways to generate the power series. Initially let us consider successive approximation. Of all the techniques to be presented here, successive approximations most closely resembles the approach taken by most current chart parsers including EQSP (Martin, Church, and Patil 1981). The alternative approaches take advantage of certain regularities in the power series in order to produce the same results more efficiently.</Paragraph> <Paragraph position="6"> Successive approximation works as follows. First we translate grammar (5a) into the equation: (7) NP = John + NP. and. NP where &quot;+&quot; connects two ways of generating an NP and &quot;.&quot; concatenates two parts of an NP. In some sense, we want to &quot;solve&quot; this equation for NP. This can be accomplished by refining successive approximations. An initial approximation NP 0 is formed by taking NP to be the empty language,</Paragraph> <Paragraph position="8"> Then we form the next approximation by substituting the previous approximation into equation (7), and simplifying according to the usual rules of algebra (e.g., assuming distributivity, associativity, 5 identity element, and zero element).</Paragraph> <Paragraph position="10"> We continue refining the approximation in this way.</Paragraph> <Paragraph position="11"> (8c) NP 2 = John + NP 1 * and. NP 1 = John + John and John (8d) NP 3 = John + NP 2 and NP 2 = John + (John + John and John). and. (John + John and John) = John + John and John + John and John and John + John and John and John + John and John and John and John 4 This needs a qualification. It is true that the power series provides a complete description of the ambiguity response to any input sentence. However, the power series representation may be losing some information that would be useful for parsing. In particular, there might be some cases where it is impossible to recover the parse trees exactly, as we will see, though this may not be too serious a problem for many practical applications. That is, it is often possible to recover most (if not all) of the structure, which may be adequate for many applications.</Paragraph> <Paragraph position="12"> 5 The careful reader may correctly object to this assumption. We include it here for expository convenience, as it greatly simplifies the derivations though it should be noted that many of the results could be derived without the assumption. Furthermore, this assumption is valid for counting ambiguity. That is, I A &quot; B I *</Paragraph> <Paragraph position="14"> + 2 John and John and John + John and John and John and John Eventually, we have NP expressed as an infinitely long polynominal (5b) above. This expression can be simplified by introducing a notation for exponentiation. Let x i be an abbreviation for multiplying x * x * ... * x, i times.</Paragraph> <Paragraph position="15"> (9) NP = John + John and John + 2 John (and John) 2 + 5 John (and John) 3 + 14 John (and John) 4 -1- ,..</Paragraph> <Paragraph position="16"> Note that parentheses are interpreted differently in algebraic equations than in context-free rules. In context-free rules, parentheses denote optionality, where in equations they denote precedence relations among algebraic operations.</Paragraph> </Section> class="xml-element"></Paper>