File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/p97-1055_intro.xml
Size: 15,159 bytes
Last Modified: 2025-10-06 14:06:14
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1055"> <Title>Paradigmatic Cascades: a Linguistically Sound Model of Pronunciation by Analogy</Title> <Section position="4" start_page="428" end_page="431" type="intro"> <SectionTitle> 2 The Paradigmatic Cascades Model </SectionTitle> <Paragraph position="0"> In this section, we introduce the paradigmatic cascades model. We first formalize the concept of a paradigmatic relationship. We then go through the details of the learning procedure, which essentially consists in an extensive search for such relationships.</Paragraph> <Paragraph position="1"> We finally explain how these patterns are used in the pronunciation procedure.</Paragraph> <Section position="1" start_page="428" end_page="428" type="sub_section"> <SectionTitle> 2.1 Paradigmatic Relationships and Alternations </SectionTitle> <Paragraph position="0"> The paradigmatic cascades model crucially relies upon the existence of numerous paradigmatic relationships in lexical databases. A paradigmatic relationship involves four lexical entries a, b, c, d, and expresses that these forms are involved in an analogical (in the Saussurian (de Saussure, 1916) sense) proportion: a is to b as e is to d (further along abbreviated as a : b = c : d, see also (Lepage and Shin-Ichi, 1996) for another utilization of this kind of proportions). Morphologically related pairs provide us with numerous examples of orthographical proportions, as in:</Paragraph> <Paragraph position="2"> Considering these proportions in terms of orthographical alternations, that is in terms of partial fnnctions in the graphemic domain, we can see that each proportion involves two alternations. The first one transforms reactor into reaction (and factor into faction), and consists in exchanging the suffixes or and ion. The second one transforms reactor into factor (and reaction into faction), and consists in exchanging the prefixes re and f. These alternations are represented on figure 1.</Paragraph> <Paragraph position="3"> Formally, we define the notion of a paradigmatic relationship as follows. Given E, a finite alphabet, and/:, a finite subset of E*, we say that (a, b) E/: x/: is paradigmatically related to (c, d) E/: x/: iff there exits two partial functions f and g from E* to E*, where f exchanges prefixes and g exchanges suffixes, and:</Paragraph> <Paragraph position="5"> f and g are termed the paradigmatic alternations associated with the relationship a : b =:,9 c : d.</Paragraph> <Paragraph position="6"> The domain of an alternation f will be denoted by dora(f).</Paragraph> </Section> <Section position="2" start_page="428" end_page="429" type="sub_section"> <SectionTitle> 2.2 The Learning Procedure </SectionTitle> <Paragraph position="0"> The main purpose of the learning procedure is to extract from a pronunciation lexicon, presumably structured by multiple paradigmatic relationships, the most productive paradigmatic alternations.</Paragraph> <Paragraph position="1"> Let us start with some notations: Given G a graphemic alphabet and P a phonetic alphabet, a pronunciation lexicon PS is a subset of G* x P*. The restriction of/: on G* (respectively P*) will be noted /:a (resp./:p). Given two strings x and y, pref(x, y) (resp. suff(x, y)) denotes their longest common prefix (resp. suffix). For two strings x and y having a non-empty common prefix (resp. suffEx) u, f~y (resp, g~y) denotes the function which transforms x into y: as x = uv, and as y = ut, f~y substitutes a final v with a final t. ~ denotes the empty string.</Paragraph> <Paragraph position="2"> Given /:, the learning procedure searches /:G for any for every 4-uples (a, b, c, d) of graphemic strings such that a : b =:,g c : d. Each match increments the productivity of the related alternations f and g. This search is performed using using a slightly modified version of the algorithm presented in (Federici, Pirrelli, and Yvon, 1995), which applies to every word x in/:c the procedure detailled in table 1.</Paragraph> <Paragraph position="3"> In fact, the properties of paradigmatic relationships, notably their symetry, allow to reduce dramatically the cost of this procedure, since not all 4-uple of strings in PSc, need to be examined during that stage.</Paragraph> <Paragraph position="4"> For each graphemic alternation, we also record their correlated alternation(s) in the phonological domain, and accordingly increment their productivity. For instance, assuming that factor and reactor respectively receive the pronunciations/faekt0r/and /rii~ektor/, the discovery of the relationship expressed in (1) will lead our algorithm to record that the graphemic alternation f -+ re correlates in the phonemic domain with the alternation /f/-+ /ri:/.</Paragraph> <Paragraph position="5"> Note that the discovery of phonemic correlates does not require any sort of alignment between the orthographic and the phonemic representations: the procedure simply records the changes in the phonemic domain when the Mternation applies in the graphemic domain.</Paragraph> <Paragraph position="6"> At the end of the learning stage, we have in hand a set A = {Ai} of functions exchanging suffixes or prefixes in the graphemic domain, and for each Ai in A: (i) a statistical measure Pi of its productivity, defined as the likelihood that the transform of a lexical item be another lexieal item:</Paragraph> <Paragraph position="8"> (ii) a set {Bi,j},j G {1...hi} of correlated functions in the phonemic domain, and a statistical measure Pi,j of their conditional productivity, i.e. of the likelihood that the phonetic alternation Bi,j correlates with Ai.</Paragraph> <Paragraph position="9"> Table 2 gives the list of the phonological correlates of the alternation which consists in adding the suffix ly, corresponding to a productive rule for deriving adverbs from adjectives in English. If the first lines of table 2 are indeed <'true&quot; phonemic correlates of the derivation, corresponding to various classes of adjectives, a careful examination of the last lines reveals that the extraction procedure is easily fooled by accidental pairs like imp-imply, on-only or earearly. A simple pruning rule was used to get rid of these alternations on the basis of their productivity, and only alternations which were observed at least twice were retained.</Paragraph> <Paragraph position="10"> It is important to realize that A allows to specifiy lexical neighbourhoods in 12a: given a lexical entry x, its nearest neighbour is simply f(x), where f is the most productive alternation applying to x. Lexical neighbourhoods in the paradigmatic cascades model are thus defined with respect to the locally most productive alternations. As a consequence, the definition of neighbourhoods implicitely incorporates a great deal of linguistic knowledge extracted fl'om the lexicon, especially regarding morphological processes and phonotactic constraints, which makes it much for relevant for grounding the notion of analogy between lexical items than, say, any neighbourhood based on the string edition metric.</Paragraph> </Section> <Section position="3" start_page="429" end_page="431" type="sub_section"> <SectionTitle> 2.3 The Pronunciation of Unknown Words </SectionTitle> <Paragraph position="0"> Supose now that we wish to infer the pronunciation of a word x, which does not appear in the lexicon.</Paragraph> <Paragraph position="1"> This goal is achieved by exploring the neighbourhood of x defined by A, in order to find one or several analogous lexica.1 entry(ies) y. The second stage of the pronunciation procedure is to adapt the known pronunciation of y, and derive a suitable pronunciation for x: the idea here is to mirror in the phonemic domain the series of alternations which transform x into y in the graphemic domain, using the statistical pairing between alternations that is extracted during the learning stage. The complete pronunciation procedure is represented on figure 2.</Paragraph> <Paragraph position="2"> Let us examine carefully how these two aspects of the pronunciation procedure are implemented. The first stage is I;o find a lexical entry in the neighbour- null hood of x defined by L:.</Paragraph> <Paragraph position="3"> The basic idea is to generate A(x), defined as {Ai(x), forAi E ,4, x E domain(Ai)}, which contains all the words that can be derived from x using a function in ,4. This set, better viewed as a stack, is ordered according to the productivity of the Ai: the topmost element in the stack is the nearest neighbour of x, etc. The first lexical item found in fl, (x) is the analog of x. If A (x) does not contain any known word, we iterate the procedure, using x I, the top-ranked element of .4 (x), instead of x.</Paragraph> <Paragraph position="4"> This expands the set of possible analogs, which is accordingly reordered, etc. This basic search strategy, which amounts to the exploration of a derivation tree, is extremely ressource consuming (every expension stage typically adds about a hundred of new virtual analogs), and is, in theory, not guaranted to terminate. In fact, the search problem is equivalent to the problem of parsing with an unrestricted Phrase Structure Grammar, which is known to be undecidable.</Paragraph> <Paragraph position="5"> We have evaluated two different search strategies, which implement various ways to alternate between expansion stages (the stack is expanded by generating the derivatives of the topmost element) and matching stages (elements in the stack are looked for in the lexicon). The first strategy implements a depth-first search of the analog set: each time the topmost element of the stack is searched, but not found, in the lexicon, its derivatives are immediately generated, and added to the stack. In this approach, the position of an analog in the stack is assessed a.s a function of the &quot;distance&quot; between the original word x and the analog y = A~ (A~_, (... A~ (x))), according to:</Paragraph> <Paragraph position="7"> The search procedure is stopped as soon an analog is found in L:a, or else, when the distance between x and the topmost element of the stack, which monotonously decreases (Vi, pi < 1), falls below a pre-defined theshold.</Paragraph> <Paragraph position="8"> The second strategy implements a kind of compromise between depth-first and breadth-first exploration of the derivation tree, and is best understood if we first look at a concrete example. Most alternations substituting one initial consonant are very productive, in English like in many other languages.</Paragraph> <Paragraph position="9"> Therefore, a word starting with say, a p, is very likely to have a very close derivative where the initial p has been replaced by say, a r. Now suppose that this word starts with pl: the alternation will derive an analog starting with rl, and will assess it with a very high score. This analog will, in turn, derive many more virtual analogs starting with rl, once its suffixes will have been substituted during another expansion phase. This should be avoided, since there are in fact very few words starting with the prefix rl: we would therefore like these words to be very poorly ranked. The second search strategy has been devised precisely to cope with this problem.</Paragraph> <Paragraph position="10"> The idea is to rank the stack of analogs according to the expectation of the number of lexical derivatives a given analog may have. This expectation is computed by summing up the productivities of all the alternations that can be applied to an analog y according to: p, (61 i/yEdom(Ai) This ranking will necessarily assess any analog starting in rl with a low score, as very few alternations will substitute its prefix. However, the computation of (6) is much more complex than (5), since it requires to examine a given derivative before it can be positioned in the stack. This led us to bring forward the lexical matching stage: during the expansion of the topmost stack element, all its derivatives are looked for in the lexicon. If several derivatives are simultaneously found, the search procedure halts and returns more than one analog.</Paragraph> <Paragraph position="11"> The expectation (6) does not decrease as more derivatives are added to the stack; consequently, it cannot be used to define a stopping criterion.</Paragraph> <Paragraph position="12"> The search procedure is therefore stopped when al} derivatives up to a given depth (2 in our experiments) have been generated, and unsuccessfully looked for in the lexicon. This termination criterion is very restrictive, in comparison to the one implemented in the depth-first strategy, since it makes it impossible to pronounce very long derivatives, for which a significant number of alternations need to be applied before an analog is found. An example is the word synergistically, for which the &quot;breadth-first&quot; search terminates uncessfully, whereas the depth-first search manages to retrieve the &quot;analog&quot; energy. Nonetheless, the results reported hereafter have been obtained using this &quot;breadth-first&quot; strategy, mainly because this search was associated with a more efficient procedure for reconstructing pronunciations (see below).</Paragraph> <Paragraph position="13"> Various pruning procedures have also been implemented in order to control the exponential growth of the stack. For example, one pruning procedure detects the most obvious derivation cycles, which generate in loops the same derivatives; another pruning procedure tries to detect commutating alternations: substituting the prefix p, and then the suffix s often produces the same analog than when alternations apply in the reverse order, etc. More details regarding implementational aspects are given in (Yvon, 1996b).</Paragraph> <Paragraph position="14"> If the search procedure returns an analog y = Aik(Aik_~(...Ail(x))) in PS, we can build a pronunciation for x, using the known pronunciation C/(y) of y. 'For this purpose, we will use our knowledge of the Bi,j, for i E {il...ik}, and generate every possible transforms of q;(y) in the phonological domain: -1 -1 {Bik,jk(Bik_~,jk_~ (. .. (q~(y))))), with jk in { 1 ... nik }, and order this set using some function of the Pi,j. The top-ranked element in this set is the pronunciation of x. Of course, when the search fails, this procedure fails to propose any pronunciation.</Paragraph> <Paragraph position="15"> In fact, the results reported hereafter use a slightly extended version of this procedure, where the pronunciations of more than one a.nMog are used for generating and selecting the pronunciation of the unknown word. The reason for using multiple analogs is twofold: first, it obviates the risk of being wrongly influenced by one very exceptional analog; second, it enables us to model conspiracy effects more accurately. Psychological models of reading aloud indeed assume that the pronunciation of an unknown word is not influenced by just one analog, but rather by its entire lexical neighbourhood.</Paragraph> </Section> </Section> class="xml-element"></Paper>