File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/c90-3020_abstr.xml
Size: 20,002 bytes
Last Modified: 2025-10-06 13:46:52
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-3020"> <Title>Linear Encodings of Linguistic Analyses</Title> <Section position="1" start_page="0" end_page="2" type="abstr"> <SectionTitle> USA epstein@ flash.bellcore.com 1. Introduction </SectionTitle> <Paragraph position="0"> Natural languages contain families of expressions such that the number of readings of an expression is an exponential function of the length of the expression.</Paragraph> <Paragraph position="1"> Two well-known cases involve prepositional phrase attachment and coordination. Other cases discussed below involve anaphora and relative operator scope.</Paragraph> <Paragraph position="2"> For example, in (l) John said that Bill said that ... that Harry said that he thinks that he thinks that ... that he thinks that it is raining.</Paragraph> <Paragraph position="3"> each he can refer to any of John, Bill ..... Harly? Thus if (1) contains n names and m occurrences of he, this sentence has n m readings (assuming that all anaphoric relationships are intrasentential). We discuss below families of expressions whose ambiguities grow as various exponential functions (factorial, Fibonacci, Catalan) of expression length.</Paragraph> <Paragraph position="4"> It appears, then, that exhaustive linear-time processing of natural language is impossible. An exponentially long answer cannot be produced in linear time. 2 On the other hand, human processing of natural language seems to be at least linearly fast. The ready explanation for this is that people do not recover all readings of ambiguous expressions. This is clearly correct, as far as it goes.</Paragraph> <Paragraph position="5"> This paper shows how to encode in linear space the exponentially large numbers of readings associated with various families of expressions. The availability of these encodings removes an apparent obstacle to exhaustive analyses of these expressions in linear time. The encodings may thus be useful for practical computational purposes. They may also provide a better 1. (1) is of course highly unnatural in a sense. However, it effectively isolates for study a phenomenon that is intrinsic to natural language. Similar observations apply to the examples below. 2. It is of course also the case that an exponentially long answer caunot be produced in polynomial time. If the problem cannot be reformulated so that answers are not exponentially long, the question of tractability does not arise. See \[Garey and Johnson 79\] and \[Barton, Berwick, and Ristad 87\] for related discussions. basis than exponential-space encodings for explanations of how humans process language.</Paragraph> <Paragraph position="6"> For each of the linguistic constructions discussed in this paper, there is a simple program that generates analyses of the construction. If there are no constraints on what counts as a linguistic analysis, then a specification of a program, which requires constant space, together with a specification of an input expression, which requires linear space, could count as a linear encoding of an analysis of the input. Intuitively, there is a vast qualitative divide between a (program,input) pair on one hand, and, for instance, a forest of constituent structure trees on the other hand. More generally, a question arises of how to distinguish analyses from procedures that yield analyses. This paper will not attempt to answer this question definitively. The analyses presented in Sections 2 - 4 all satisfy a notion of &quot;legal&quot; analysis that excludes (program,input) pairs. Sections 2 and 3 discuss polynomial space analyses. Section 4 adds a representational device to the repertory of Sections 2 and 3, so that linear space analyses are possible. Section 5 infol~mally discusses a variety of issues, including the distinction between analysis and procedure.</Paragraph> <Paragraph position="7"> 2. Analyses in Conjunctive Normal Form Assume that example (1) involves no ambiguities except for antecedents of pronouns. Assume further that the length of the analysis of (1), aside from the specification of antecedents of pronouns, grows linearly) Let the proposition q comprise all aspects of the analysis of (1), aside from specifications of antecedents of pronouns.</Paragraph> <Paragraph position="8"> Let the proposition p.. comprise the specification that the j-th name in (1I 'J is the antecedent of the i-th occurrence of he. (For example, Pl,2 comprises the 3. These assumptions, and similar assumptions for other examples below, permit a briefer discussion than would otherwise be possible. Reservations about these assumptions do not affect the substance of the discussion. Our concern with (1) focuses on exponentially growing possibilities for assigning antecedents to pronouns.</Paragraph> <Paragraph position="9"> 108 1 specification that Bill is the antecedent of the most shallowly embedded he.) Let n be the number of names in (1) and let m be the number of occun'ences of he.</Paragraph> <Paragraph position="10"> Then an exhaustive analysis of (1) can take the following form:</Paragraph> <Paragraph position="12"> (l-a), which contains n m disjuncts, is in Disjunctive Normal Form (DNF). Each disjunct fully specifies a possible interpretation of (1). It is an implicit assumption in much of the literature that the proper form for linguistic analyses is DNF. An analysis in DNF amounts to a listing of possible global interpretations.</Paragraph> <Paragraph position="13"> (l-a) is logically equivalent to the following :aatement in Conjunctive Normal Form (CNF): (l-b) q & (Pl,1 v Pl,2 v ... v Pl.n ) & (P2,1 v P2,2 v ... v P2,n ) (3) the block in the box on the table ... in the kitchen As \[Church and Patil 82\] discuss, examples like (3) are similar to other structures with systematic attachment ambiguities, such as coordination structures. While the number of readings of (3)4is thus exponential in the length of (3), (3) has an O(n ) length analysis in CNF as follows:</Paragraph> <Paragraph position="15"> (l-b) contains m+l conjuncts. The length of an exhaustive analysis of (1) is exponential in the number of pronouns in (1) when the analysis is given in DNF, but linear in the number of pronouns when the analysis is given in CNF. However, (l--b) is not linear in the length of (1), because each of m conjuncts contains n clisjuncts, so that a total of mxn literals is required to specify anaphoric possibilities.</Paragraph> <Paragraph position="16"> The following example has an analysis in DNF that grows as the factorial of the length of the input: (2) John told Bill that Tom told him that Fred told him tha! ... that Jim told him that Harry told him that it is raining.</Paragraph> <Paragraph position="17"> The first occurrence of him can have John or Bill as antecedent. The second occurrence of him can have John or Bill or Tom as antecedent, and so on. (2) has an obvious analysis in CNF whose length is a quadratic function of the length of the input, namely</Paragraph> <Paragraph position="19"> where the notation follows the same conventions as in (l-a,b).</Paragraph> <Paragraph position="20"> The number of readings for the following noun phrase grows as the Catalan of the number of prepositional phrases: In (3-a), Pi' comprises the specification that constituent i ,J .... attaches to constmmnt j, where the block ~s constituent 0, in the box is constituent 1, on the table is constituent 2, and so on. Constituent k must attach to some constituent that lies to its left. If constituent k attaches to 2 109 constituent m, then the constituents between constituent m and constituent k cannot attach to constituents to the left of constituent m. 4 For each pair (k,m), the number of atoms in (3-ak,m) is fl(k,m) = ,~'i. fl(k,m) is quadratic in k. For each k, then, the number of atoms in (3-a-k) is f2(k) = k+ l(k,i), a cubic function in k. The number of atoms in (3-a) (excluding atoms hidden in q) is thus =f2(i), a quartic function in n. (3-a) is certainly not the most compressed CNF analysis of (3). It is, however, easy to describe.</Paragraph> <Paragraph position="21"> Given an exhaustive analysis in DNF, choosing a global interpretation requires exactly one operation of selecting a disjunct. Foi&quot; (l-b) and (2-a), choosing a global interpretation requires a number of selections that is linear in the length of the input. I am aware of no other reason for preferring DNF to CNF for analyses of examples like (1) and (2). In favor of preferring CNF there is the practical advantage of polynomial-space output, with its implications for speed of processing. There is also the possibility of more accurate psycholinguistic modeling. It seems likely that people make decisions on antecedents for pronouns in examples like (1) and (2) locally, on a pronoun-by-pronoun basis, and that they do not choose among global analyses. 5 In contrast, the conjuncts of (3-a) clearly do not correspond one-to-one with processing decisions. Section 4 discusses an analysis of (3) whose components may correspond to local decisions on attachment sites.</Paragraph> <Paragraph position="22"> 3. Encodings with non-atomic propositional constants It is possible to get a cubic length analysis of (3) by introducing constants tbr non-atomic propositions. For m<k, let r. be v Pk_l.k_2 ). K,m (Pk-I m+l V Pk-I m+2 v ... Of course, the space required to define the r km must figure in the space required to encode an analys\[s of (3) along the lines of (3-b-k,m). rk,m_ l -= (Pk-l,m v rk,m) , so 4. (Pl ~ (P2 v ... v pj)) is equivalent to (-'Pl v P2 v ... v pj), SO that (3-a) is in CNF. 5. This is not to suggest that people produce an exhaustive analysis in CNF prior to choosing a reading. The hypothesis is rather that fragments of a CNF representation are produced (in some sense) during processing.</Paragraph> <Paragraph position="23"> it requires quadratic space to define all the rk. m. A revised version of (3) with (3-b-k,m) in place of (3-ak,m) throughout requires cubic space. 6 Tree representations of single readings for examples like (3) may be viewed as follows: edges correspond to atomic propositions that comprise specifications like &quot;constituent i attaches to constituent j&quot; or &quot;constituent i projects to constituent j.,,7 A non-terminal node A corresponds to a constituent, but also corresponds to the conjunction of the atomic propositions that correspond to edges that A dominates. Thus the root node of the tree corresponds to a proposition that comprises a full specification of constituent structure.</Paragraph> <Paragraph position="24"> The situation is essentially the same \['or shared forests. (\[Tomita 87\] discusses shared forests and packed shared forests.) Edges ill shared forests correspond to atomic propositions, and non-terminal nodes correspond to non-atomic propositions. To extend this perspective, shared forests compress the information in non-shared forests by exploiting the introduction of constants for non-atomic propositions. In a shared forest, the subtree that a node dominates is written only once. In effect, then, a constant is introduced that represents the conjunction that corresponds to the node. This constant is a constituent of the fornmlas that correspond to superior nodes.</Paragraph> <Paragraph position="25"> While shared forests are more compressed than unshared forests, the number of nodes in the shared forest representation of (3) is still exponential in the length of (3).</Paragraph> <Paragraph position="26"> In a packed shared forest, a packed node that does not dominate any packed nodes corresponds to a disjunction of conjunctions of atomic propositions.</Paragraph> <Paragraph position="27"> Packed nodes that dominate other packed nodes correspond to disjunctions of conjunctions of atomic and non-atomic propositions. In effect, for each node (packed or non-packed), a constant is introduced that abbreviates the formula that corresponds to the node.</Paragraph> <Paragraph position="28"> Exploitation of constants for non-atomic propositions pemfits more significant compression for packed shared forests than for shared forests. The packed root node of a packed shared forest for (3) cotxesponds to a disjunction of conjunctions whose size in atoms is exponential in the length of (3). However, the number of nodes of a packed shared forest for (3) goes up as the square of the length of (3). The number of edges of the packed shared forest (a more authentic measure of the size of the forest) goes up as the cube of the length.</Paragraph> <Paragraph position="29"> 6. Further compression is possible if we allow quantification over subscript indices. However, quantification over artifacts of representation may uncontroversially involve crossing the divide between analysis and procedure.</Paragraph> <Paragraph position="30"> 7. Details of constituent structure are not relevant to the discussion here. For example, we will not distinguish &quot;X attaches to V&quot; from &quot;X attaches to VP.&quot; 110 3 4. Encodings that introduce structural constants A linear length encoding of an analysis of (1) is possible if we use the constant A = {John, Bill ..... Harry} in the encoding as follows:</Paragraph> <Paragraph position="32"> Note that &quot;x ~ Y&quot; is short-hand for the disjunction of the statements &quot;x = y,&quot; where y ranges over Y, so that (l-c) is not very different from (l-b). Examples below involve tYeer use of constants that correspond to sets of linguistic entities, I will call such constants &quot;structural.&quot; A linear analysis of (2) is possible if we introduce constants A 1 ..... A , where A. = {John, Bill}, A; = A 1</Paragraph> <Paragraph position="34"> quantifier Qi takes scope over Qi-I to its immediate left, then the quantifier Qi+l to the immediate right of Qi cannot take scope over Qi&quot; (See \[Epstein 88\] for a discussion of relative operator scope.) It follows that the number of relative operator scope readings for (4) grows as the Fibonacci of the length of (4). 8 However, a linear encoding of an exhaustive analysis of (4) is as follows:</Paragraph> <Paragraph position="36"> Because A. can be defined in terms of Ai_l, only linear 1 space is required to define these constants. It is convenient to mix definitions of constants with other aspects of the encoding of (2), as follows: Here q represents aspects of the analysis of (4) aside from the specification of relative operator scope, and Qi represents the i-th quantifier in (4), reading from the left. The L. are introduced constants corresponding to 1 quantifiers that can have lower scope than some more deeply embedded quantifier. &quot;Q > Q.&quot; means that Q. i has higher scope than Qj. For all Q, ''~ < T&quot; is true and &quot;Q > T&quot; is false. 9 Note that if we delete from (4-a) propositions that assign values to introduced constants, such as &quot;(L l = Q2),&quot; the resulting statement is in CNF. Section 3 discussed cubic length analyses of (3) with propositional constants. (3) has a linear analysis with structural constants as follows:</Paragraph> <Paragraph position="38"> For (2), the introduction of structural constants permits a linear encoding. For the following example, the introduction of structural constants likewise permits a linear encoding: (4) Many teachers expect several students to expect many teachers to expect several students to ... to expect many teachers to expect several students to read some book.</Paragraph> <Paragraph position="39"> Each quantifier in (4) can take scope over the quantifier to its immediate left (if any), and can take scope over the quantifier to its immediate right (if any). However, if a 8. The most deeply embedded clause in (4) has 2 possible relative scope readings. The second most deeply embedded clansc in (4) has 3 possible relative scope readings (many>several>some, many>some>several several>many>some). Let S. be the k-th .... k most deeply embedded clause m (4). (S k ts immediately embedded in S..;)' Given that' S_ has a total ofn (relative operator) scope read~l~s, and that S ~tas a total of m scope readings then the * k-! &quot; &quot; ' subject of S. can take scope over all the quantifiers in S * k+\[ , . k' accounting f6r n global readings over S..,. Alternatively, the subject of S can take scope over the subj\[~ of S Then both . k k+.l&quot; these subjects take scope over all the qu'mtifiers m S . The k-l . second alternative thus accounts for m additional global readings over Sk+ 1.</Paragraph> <Paragraph position="40"> 9. (4-a) does not explicitly state, for example that Q > Q. but this t . 3' fact can be derived from (4-a) through apphcatmn of the transitivity of relative operator scope* Generally speaking, linguistic representations don't explicitly include all their consequences.</Paragraph> <Paragraph position="42"> Here q represents aspects of the analysis of (3) aside from the specification of attachment points for the prepositional phrases. The desired solutions consist of specifications of attachment possibilities, stated in the form &quot;ap(PPk) e X&quot; (&quot;attachment point of the k-th PP is one of the elements of X' ) in (3~c). The AP k and RE K are introduced constants. AP k is the attachment point ot 10 o PPk&quot; RE k represents the right edge of a constituent structure tree for the string consisting of the block and the first k PP's. (3-c) is in a sort of relaxed CNF, as discussed above in connection with (4-a), and in Section 5 below. &quot;T&quot; in (3-c) is defined so that RE TAP = {AP} u {X ~ RE I X precedes AP}. (When PPk to the right of PP. attaches above PP., PP. is not in the right edge of 1 1 l . the resulting structure, and is unavadable for attachment by material to the right of PPk.) As for (3), the number of readings of the following exmnple (from \[Church and Patil 82\]) grows as the Catalan of the number of prepositional phrases: (5) Put the block in the box on the table ... in the kitchen.</Paragraph> <Paragraph position="43"> However, there is an important difference between (3) and (5). In (3), any number of PP's can attach to the block, any number of PP's can attach to the box, and so on, No NP in (3) requires complements. (Dr the box must attach to the block, but only because the block is the only NP that lies to the left of in the box.) In (5), on the other hand, put requires one NP argument and one PP argument, and cannot accept any other complements. 11 An analysis of (5) along the lines of (3c) would incorrectly include readings where more than one PP attaches to put, and readings where no PP attaches to put. A linear analysis of (5) is as follows: 10. &quot;PP attaches to PP &quot; really means that PP attaches to the object of i .. k . i . . the preposmon head of PP. Thts usage permits a brtefer * k . discussion than would otherwise be possible.</Paragraph> <Paragraph position="44"> 1 l. This characterization of put is not strictly speaking correct, but the necessary qualifications are irrelevant to the discussion here.</Paragraph> <Paragraph position="46"> (5-a) is similar to (3-c), but includes the additional constants OG OG is the open (theta-)gnd for the k&quot; k substructure corresponding to put the block followed by the first k PP's. OG k is either {VP}, if none of the first k PP's is attached to V, or is empty Non-empty OG &quot; k indicates that for each constituent X in OG., some PP., . . K 1 k<i~n, must attach to X. \[\] in (5-a) is defined so that A \[\] B is equal to A if A is non-empty, and is otherwise equal to B. The final conjunct in (5-a) captures the requirement that if none of the first n-1 PP's attaches to put, then the final PP must attach to this verb.</Paragraph> </Section> class="xml-element"></Paper>