File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/p02-1008_metho.xml
Size: 27,610 bytes
Last Modified: 2025-10-06 14:07:55
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1008"> <Title>Comprehension and Compilation in Optimality Theory</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 A General Presentation of OT </SectionTitle> <Paragraph position="0"> This section (graphically summarized in Fig. 1) lays out a generalized version of OT's theory of production, introducing some notational and representational conventions that may be useful to others and will be important below. In particular, all objects are represented as strings, or as functions that map strings to strings. This will enable us to use finite-state techniques later.</Paragraph> <Paragraph position="1"> The underlying form x and surface form z are represented as strings. We often refer to these strings as input and output. Following Eisner (1997), each candidate (x;z) is also represented as a string y.</Paragraph> <Paragraph position="2"> The notation (x;z) that we have been using so far for candidates is actually misleading, since in fact the candidatesy that are compared encode more than justxandz. They also encode a particular alignment or correspondence between x and z. For example, if x = abdip and z = a[di][bu], then a typical candidate would be encoded</Paragraph> <Paragraph position="4"> which specifies that a corresponds to a, b was deleted (has no surface correspondent), voiceless p surfaces as voiced b, etc. The harmony of y might depend on this alignment as well as on x and z (just as an outfit might fit worse when worn backwards).</Paragraph> <Paragraph position="5"> Because we are distinguishing underlying and surface material by using disjoint alphabets = fa;b;:::g and = f[;];a;b;:::g,2 it is easy to extract the underlying and surface forms (x and z) from y.</Paragraph> <Paragraph position="6"> Although the above example assumes that x and z are simple strings of phonemes and brackets, nothing herein depends on that assumption. Autosegmental representations too can be encoded as strings (Eisner, 1997).</Paragraph> <Paragraph position="7"> In general, an OT grammar consists of 4 components: a constraint ranking, a harmony ordering, and generating and pronouncing functions. The constraint ranking is the language-specific part of the grammar; the other components are often supposed to be universal across languages.</Paragraph> <Paragraph position="8"> The generating function GEN maps any x2 to the (nonempty) set of candidates y whose underlying form is x. In other words, GEN just inserts 2An alternative would be to distinguish them by odd and even positions in the string.</Paragraph> <Paragraph position="9"> x|{z} underlying form x2</Paragraph> <Paragraph position="11"> starrings are pruned away, and finally the ?'s are removed from the survivors.</Paragraph> <Paragraph position="12"> arbitrary substrings from amongst the characters of x, subject to any restrictions on what constitutes a legitimate candidate y.3 (Legitimacy might for instance demand that y's surface material z have matched, non-nested left and right brackets, or even that z be similar to x in terms of edit distance.) A constraint ranking is simply a sequence C1;C2;:::Cn of constraints. Let us take each Ci to be a function that scores candidates y by annotating them with violation marks ?. For example, a NODELETE constraint would map y =</Paragraph> <Paragraph position="14"> aab?0c?0[ddii][pb0u], inserting a?after each underlying phoneme that does not correspond to any surface phoneme. This unconventional formulation is needed for new approaches that care about the exact location of the ?'s. In traditional OT only the number of ?'s is important, although the locations are sometimes shown for readability.</Paragraph> <Paragraph position="15"> Finally, OT requires a harmony ordering on scored candidates y 2 ( [ [f?g) . In traditional OT, y is most harmonic when it contains the fewest ?'s. For example, among candidates scored by NODELETE, the most harmonic ones are the ones with the fewest deletions; many candidates may tie for this honor. x6 considers other harmony orderings, a possibility recognized by Prince and Smolensky (1993) ( corresponds to their H-EVAL). In general may be a partial order: two competing candidates may be equally harmonic or incomparable (in which case both can survive), and candidates with different underlying forms never compete at all.</Paragraph> <Paragraph position="16"> Production under such a grammar is a matter of successive filtering by the constraints C1;:::Cn.</Paragraph> <Paragraph position="17"> Given an underlying form x, let</Paragraph> <Paragraph position="19"> 3It is never really necessary for GEN to enforce such restrictions, since they can equally well be enforced by the top-ranked constraint C1 (see below).</Paragraph> <Paragraph position="21"> The set of optimal candidates is now Yn(x). Extracting z from each y 2Yn(x) gives the set Z(x) or PRODUCE(x) of acceptable surface forms:</Paragraph> <Paragraph position="23"> PRON denotes the simple pronunciation function that extracts z from y. It is the counterpart to GEN: just as GEN fleshes out x2 into y by inserting symbols of , PRON slims y down to z 2 by removing symbols of .</Paragraph> <Paragraph position="24"> Notice that Yn Yn 1 ::: Y0. The only candidates y2Yi 1 that survive filtering by Ci are the ones that Ci considers most harmonic.</Paragraph> <Paragraph position="25"> The above notation is general enough to handle some of the important variations of OT, such as Paradigm Uniformity and Sympathy Theory. In particular, one can define GEN so that each candidate y encodes not just an alignment between x and z, but an alignment amongx;z;and some other strings that are neither underlying nor surface. These other strings may represent the surface forms for other members of the same morphological paradigm, or intermediate throwaway candidates to which z is sympathetic. Production still optimizes y, which means that it simultaneously optimizes z and the other strings.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Comprehension in Finite-State OT </SectionTitle> <Paragraph position="0"> This section assumes OT's traditional harmony ordering, in which the candidates that survive filtering by Ci are the ones into which Ci inserts fewest ?'s.</Paragraph> <Paragraph position="1"> Much computational work on OT has been conducted within a finite-state framework (Ellison, 1994), in keeping with a tradition of finite-state phonology (Johnson, 1972; Kaplan and Kay, 1994).4 ism discussed above. It specifically assumes that GEN;C1;:::Cn;and PRON are all regular relations, meaning that they can be described by finite-state transducers. GEN is a nondeterministic transducer that maps eachxto multiple candidatesy. The other transducers map each y to a single y or z.</Paragraph> <Paragraph position="2"> These finite-state assumptions were proposed (in a different and slightly weaker form) by Ellison (1994). Their empirical adequacy has been defended by Eisner (1997).</Paragraph> <Paragraph position="3"> In addition to having the right kind of power linguistically, regular relations are closed under various relevant operations and allow (efficient) parallel processing of regular sets of strings. Ellison (1994) exploited such properties to give a production algorithm for finite-state OT. Given x and a finite-state OT grammar, he used finite-state operations to construct the set Yn(x) of optimal candidates, represented as a finite-state automaton.</Paragraph> <Paragraph position="4"> Ellison's construction demonstrates that Yn is always a regular set. Since PRON is regular, it follows that PRODUCE(x) = Z(x) is also a regular set.</Paragraph> <Paragraph position="5"> We now show that COMPREHEND(z), in constrast, need not be a regular set. Let = fa;bg, = f[;];a;b;:::gand suppose that GEN allows candidates like the ones inx3, in which parts of the string may be bracketed between [ and ]. The crucial grammar consists of two finite-state constraints. C2 penalizes a's that fall between brackets (by inserting ? next to each one) and also penalizes b's that fall outside of brackets. It is dominated by C1, which penalizes brackets that do not fall at either edge of the string. Note that this grammar is completely permissive as to the number and location of surface characters other than brackets.</Paragraph> <Paragraph position="6"> Ifxcontains morea's thanb's, then PRODUCE(x) is the set ^ of all unbracketed surface forms, where ^ is minus the bracket symbols. If x contains fewer a's than b's, then PRODUCE(x) = [^ ].</Paragraph> <Paragraph position="7"> And if a's and b's appear equally often in x, then PRODUCE(x) is the union of the two sets.</Paragraph> <Paragraph position="8"> Thus, while the x-to-z mapping is not a regular relation under this grammar, at least PRODUCE(x) is a regular set for each x--just as finite-state OT constraints, notably Koskenniemi's (1983) two-level model, which like OT used finite-state constraints on candidates y that encoded an alignment between underlying x and surface z. guarantees. But for any unbracketed z 2 ^ , such as z = abc, COMPREHEND(z) is not regular: it is the set of underlying strings with # of a's # of b's.</Paragraph> <Paragraph position="9"> This result seems to eliminate any hope of handling OT comprehension in a finite-state framework. It is interesting to note that both OT and current speech recognition systems construct finite-state models of production and define comprehension as the inverse of production. Speech recognizers do correctly implement comprehension via finite-state optimization (Pereira and Riley, 1997).</Paragraph> <Paragraph position="10"> But this is impossible in OT because OT has a more complicated production model. (In speech recognizers, the most probable phonetic or phonological surface form is not presumed to have suppressed its competitors.) One might try to salvage the situation by barring constraints like C1 or C2 from the theory as linguistically implausible. Unfortunately this is unlikely to succeed. Primitive OT (Eisner, 1997) already restricts OT to something like a bare minimum of constraints, allowing just two simple constraint families that are widely used by practitioners of OT. Yet even these primitive constraints retain enough power to simulate any finite-state constraint. In any case, C1 and C2 themselves are fairly similar to &quot;domain&quot; constraints used to describe tone systems (Cole and Kisseberth, 1994). While C2 is somewhat odd in that it penalizes two distinct configurations at once, one would obtain the same effect by combining three separately plausible constraints: C2 requires a's between brackets (i.e., in a tone domain) to receive surface high tones, C3 requires b's outside brackets to receive surface high tones, and C4 penalizes all surface high tones.5 Another obvious if unsatisfying hack would impose heuristic limits on the length of x, for example by allowing the comprehension system to return the approximation COMPREHEND(z)\fx : jxj b's in the underlying form, COMPREHEND(z) is actually a finite set in this version, hence regular. But the non-regularity argument does go through if the tonal information in z is not available to the comprehension system (as when reading text without diacritics); we cover this case inx5. (One can assume that some lower-ranked constraints require a special suffix before ], so that the bracket information need not be directly available to the comprehension system either.) haps it can be produced by some finite-state method, although the automaton to describe the set might be large in some cases.</Paragraph> <Paragraph position="11"> Recent efforts to force OT into a fully finite-state mold are more promising. As we will see, they identify the problem as the harmony ordering , rather than the space of constraints or the potential infinitude of the answer set.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Regular-Relation Comprehension </SectionTitle> <Paragraph position="0"> Since COMPREHEND(z) need not be a regular set in traditional OT, a corollary is that COMPREHEND and its inverse PRODUCE are not regular relations.</Paragraph> <Paragraph position="1"> That much was previously shown by Markus Hiller and Paul Smolensky (Frank and Satta, 1998), using similar examples.</Paragraph> <Paragraph position="2"> However, at least some OT grammars ought to describe regular relations. It has long been hypothesized that all human phonologies are regular relations, at least if one omits reduplication, and this is necessarily true of phonologies that were successfully described with pre-OT formalisms (Johnson, 1972; Koskenniemi, 1983).</Paragraph> <Paragraph position="3"> Regular relations are important for us because they are computationally tractable. Any regular relation can be implemented as a finite-state transducer T, which can be inverted and used for comprehension as well as production. PRODUCE(x) = T(x) = range(x T), and COMPREHEND(z) = T 1(z) = domain(T z).</Paragraph> <Paragraph position="4"> We are therefore interested in compiling OT grammars into finite-state transducers--by hook or by crook. x6 discusses how; but first let us see how such compilation is useful in realistic situations.</Paragraph> <Paragraph position="5"> Any practical comprehension strategy must recognize that the hearer does not really perceive the entire surface form. After all, the surface form contains phonetically invisible material (e.g., syllable and foot boundaries) and makes phonetically imperceptible distinctions (e.g., two copies of a tone versus one doubly linked copy). How to comprehend in this case? The solution is to modify PRON to &quot;go all the way&quot;--to delete not only underlying material but also phonetically invisible material. Indeed, PRON can also be made to perform any purely phonetic processing. Each output z of PRODUCE is now not a phonological surface form but a string of phonemes or spectrogram segments. So long as PRON is a regular relation (perhaps a nondeterministic or probabilistic one that takes phonetic variation into account), we will still be able to construct T and use it for production and comprehension as above.6 How about the lexicon? When the phonology can be represented as a transducer, COMPREHEND(z) is a regular set. It contains all inputs x that could have produced output z. In practice, many of these inputs are not in the lexicon, nor are they possible novel words. One should restrict to inputs that appear in the lexicon (also a regular set) by intersecting COMPREHEND(z) with the lexicon. For novel words this intersection will be empty; but one can find the possible underlying forms of the novel word, for learning's sake, by intersecting COMPREHEND(z) with a larger (infinite) regular set representing all forms satisfying the language's lexical constraints.</Paragraph> <Paragraph position="6"> There is an alternative treatment of the lexicon.</Paragraph> <Paragraph position="7"> GEN can be extended &quot;backwards&quot; to incorporate morphology just as PRON was extended &quot;forwards&quot; to incorporate phonetics. On this view, the input x is a sequence of abstract morphemes, and GEN performs morphological preprocessing to turnxinto possible candidates y. GEN looks up each abstract morpheme's phonological string2 from the lexicon,7 then combines these phonological strings by concatenation or template merger, then nondeterministically inserts surface material from . Such a GEN can plausibly be built up (by composition) as a regular relation from abstract morpheme sequences to phonological candidates. This regularity, as for PRON, is all that is required.</Paragraph> <Paragraph position="8"> Representing a phonology as a transducer T has additional virtues. T can be applied efficiently to any input string x, whereas Ellison (1994) or Eisner (1997) requires a fresh automaton construction for each x. A nice trick is to build T without 6Pereira and Riley (1997) build a speech recognizer by composing a probabilistic finite-state language model, a finite-state pronouncing dictionary, and a probabilistic finite-state acoustic model. These three components correspond precisely to the input to GEN, the traditional OT grammar, and PRON, so we are simply suggesting the same thing in different terminology. 7Nondeterministically in the case of phonologically conditioned allomorphs: INDEFINITE APPLE 7!f aepl, aenaeplg . This yields competing candidates that differ even in their underlying phonological material.</Paragraph> <Paragraph position="9"> PRON and apply it to all conceivable x's in parallel, yielding the complete set of all optimal candidates Yn( ) = Sx2 Yn(x). If Y and Y0 denote the sets of optimal candidates under two grammars, then (Y \:Y0)[(Y0\:Y) yields the candidates that are optimal under only one grammar. Applying GEN 1 or PRON to this set finds the regular set of underlying or surface forms that the two grammars would treat differently; one can then look for empirical cases in this set, in order to distinguish between the two grammars.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Theorem on Compiling OT </SectionTitle> <Paragraph position="0"> Why are OT phonologies not always regular relations? The trouble is that inputs may be arbitrarily long, and so may accrue arbitrarily large numbers of violations. Traditional OT (x4) is supposed to distinguish all such numbers. Consider syllabification in English, which prefers to syllabify the long input bibambam:::bam |{z } k copies as [bi][bam][bam]:::[bam] (with k codas) rather than [bib][am][bam]:::[bam] (with k + 1 codas). NOCODA must therefore distinguish annotated candidates y with k ?'s (which are optimal) from those with k + 1 ?'s (which are not). It requires a ( k + 2)-state automaton to make this distinction by looking only at the ?'s in y. And if k can be arbitrarily large, then no finite-state automaton will handle all cases.</Paragraph> <Paragraph position="1"> Thus, constraints like NOCODA do not allow an upper bound onk for allx2 . Of course, the minimal number of violations k of a constraint is fixed given the underlying form x, which is useful in production.8 But comprehension is less fortunate: we cannot bound k given only the surface form z. In the grammar of x4, COMPREHEND(abc) included underlying forms whose optimal candidates had arbitrarily large numbers of violations k.</Paragraph> <Paragraph position="2"> Now, in most cases, the effect of an OT grammar can be achieved without actually counting anything. (This is to be expected since rewrite-rule 8Ellison (1994) was able to construct PRODUCE(x) from x. One can even build a transducer for PRODUCE that is correct on all inputs that can achieve K violations and returns;on other inputs (signalling that the transducer needs to be recompiled with increased K). Simply use the construction of (Frank and Satta, 1998; Karttunen, 1998), composed with a hard constraint that the answer must have K violations.</Paragraph> <Paragraph position="3"> grammars were previously written for the same phonologies, and they did not use counting!) This is possible despite the above arguments because for some grammars, the distinction between optimal and suboptimal y can be made by looking at the non-? symbols in y rather than trying to count the ?'s. In our NOCODA example, a surface sub-string such as . . .ib?][a. . . might signal that y is suboptimal because it contains an &quot;unnecessary&quot; coda. Of course, the validity of this conclusion depends on the grammar and specifically the constraints C1;:::Ci 1 ranked above NOCODA, since whether that coda is really unnecessary depends on whether Yi 1 also contains the competing candidate :::i][ba::: with fewer codas.</Paragraph> <Paragraph position="4"> But as we have seen, some OT grammars do have effects that overstep the finite-state boundary (x4).</Paragraph> <Paragraph position="5"> Recent efforts to treat OT with transducers have therefore tried to remove counting from the formalism. We now unify such efforts by showing that they all modify the harmony ordering .</Paragraph> <Paragraph position="6"> x4 described finite-state OT grammars as ones where GEN, PRON, and the constraints are regular relations. We claim that if the harmony ordering is also a regular relation on strings of ( [ [f?g) , then the entire grammar (PRODUCE) is also regular.</Paragraph> <Paragraph position="7"> We require harmony orderings to be compatible with GEN: an ordering must treat y0; y as incomparable (neither is the other) if they were produced from different underlying forms.9 To make the notation readable let us denote the relation by the letter H. Thus, a transducer for H accepts the pair ( y0; y) if y0 y.</Paragraph> <Paragraph position="8"> The construction is inductive. Y0 = GEN is regular by assumption. If Yi 1 is regular, then so is Yi since (as we will show)</Paragraph> <Paragraph position="10"> where Yi def= Yi 1 Ci and maps x to the set of starred candidates that Ci will prune;:denotes the complement of a regular language; and D is a transducer that removes all ?'s. Therefore PRODUCE =</Paragraph> <Paragraph position="12"> fewer ?'s than, yg. If we were allowed to drop the sameunderlying-form condition then the ordering would become regular, and then our claim would falsely imply that all traditional finite-state OT grammars were regular relations.</Paragraph> <Paragraph position="13"> It remains to derive (4). Equation (2) implies</Paragraph> <Paragraph position="15"> One can read H( Yi(x)) as &quot;starred candidates that are worse than other starred candidates,&quot; i.e., suboptimal. The set difference (7) leaves only the optimal candidates. We now see</Paragraph> <Paragraph position="17"> and composing both sides with D yields (4). To justify (9),(10) we must show when y2 Yi(x) that y 2 H( Yi(x)) , (9z) y 2 H( Yi(z)). For the ) direction, just take z = x. For (, y 2 H( Yi(z)) means that (9 y0 2 Yi(z)) y0 y; but then x = z (giving y2H( Yi(x))), since if not, our compatibility requirement on H would have made y02 Yi(z) incomparable with y2 Yi(x).</Paragraph> <Paragraph position="18"> Extending the pretty notation of (Karttunen, 1998), we may use (4) to define a left-associative generalized optimality operator ooH : Y ooH C def= (Y C :range(Y C H)) D (14) Then for any regular OT grammar, PRODUCE = GEN ooH C1 ooH C2 ooH Cn PRON and can be inverted to get COMPREHEND. More generally, different constraints can usefully be applied with different H's (Eisner, 2000).</Paragraph> <Paragraph position="19"> The algebraic construction above is inspired by a version that Gerdemann and van Noord (2000) give for a particular variant of OT. Their regular expressions can be used to implement it, simply replacing their add_violation by our H.</Paragraph> <Paragraph position="20"> Typically, H ignores surface characters when comparing starred candidates. So H can be written as elim( ) G elim( ) 1 where elim( ) is a transducer that removes all characters of . To satisfy the compatibility requirement on H, G should be a subset of the relation ( j?j( : ?)j(? : )) .10 10This transducer regexp says to map any symbol in [f?g to itself, or insert or delete ?--and then repeat. We now summarize the main proposals from the literature (seex1), propose operator names, and cast them in the general framework.</Paragraph> <Paragraph position="21"> Y o C: Inviolable constraint (Koskenniemi, 1983; Bird, 1995), implemented by composition.</Paragraph> <Paragraph position="22"> Y o+ C: Counting constraint (Prince and Smolensky, 1993): more violations is more disharmonic. No finite-state implementation possible. Y oo C: Binary approximation (Karttunen, 1998; Frank and Satta, 1998). All candidates with any violations are equally disharmonic. Implemented by G = ( ( : ?) )+, which relates underlying forms without violations to the same forms with violations.</Paragraph> <Paragraph position="23"> Y oo3 C: 3-bounded approximation (Karttunen, 1998; Frank and Satta, 1998). Like o+ , but all candidates with 3 violations are equally disharmonic. G is most easily described with a transducer that keeps count of the input and output?'s so far, on a scale of 0, 1, 2, 3. Final states are those whose output count exceeds their input count on this scale. Y o C: Matching or subset approximation (Gerdemann and van Noord, 2000). A candidate is more disharmonic than another if it has stars in all the same locations and some more besides.11 Here</Paragraph> <Paragraph position="25"> ner, 2000). A candidate is more disharmonic than another if in the leftmost position where they differ (ignoring surface characters), it has a ?. This revises OT's &quot;do only when necessary&quot; mantra to &quot;do only when necessary and then as late as possible&quot; (even if delaying ?'s means suffering more of them later).</Paragraph> <Paragraph position="26"> Here G = ( j?) (( : ?)j(( : ?)( j?) )). Unlike the other proposals, here two forms can both be optimal only if they have exactly the same pattern of violations with respect to their underlying material.</Paragraph> <Paragraph position="27"> Y <o C: Right-to-left directional evaluation.</Paragraph> <Paragraph position="28"> &quot;Do only when necessary and then as early as possible.&quot; Here G is the reverse of the G used in o> . The novelty of the matching and directional proposals is their attention to where the violations fall.</Paragraph> <Paragraph position="29"> Eisner's directional proposal (o>, <o) is the only 11Many candidates are incomparable under this ordering, so Gerdemann and van Noord also showed how to weaken the notation of &quot;same location&quot; in order to approximate o+ better. candidate; NOCODA dislikes syllable codas. (a) Surface material of the candidates. (b) Scored candidates for G to compare. Surface characters but not ?'s have been removed by elim( ). (c) In traditional evaluation o+ , G counts the ?'s. (d) Directional evaluation o> gets a different result, as if NOCODA were split into 4 constraints evaluating the syllables separately. More accurately, it is as if NOCODA were split into one constraint per underlying letter, counting the number of ?'s right after that letter. one defended on linguistic as well as computational grounds. He argues that violation counting (o+) is a bug in OT rather than a feature worth approximating, since it predicts unattested phenomena such as &quot;majority assimilation&quot; (Bakovi'c, 1999; Lombardi, 1999). Conversely, he argues that comparing violations directionally is not a hack but a desirable feature, since it naturally predicts &quot;iterative phenomena&quot; whose description in traditional OT (via Generalized Alignment) is awkward from both a linguistic and a computational point of view. Fig. 2 contrasts the traditional and directional harmony orderings.</Paragraph> <Paragraph position="30"> Eisner (2000) proved that o> was a regular operator for directional H, by making use of a rather different insight, but that machine-level construction was highly technical. The new algebraic construction is simple and can be implemented with a few regular expressions, as for any other H.</Paragraph> </Section> class="xml-element"></Paper>