File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-1046_metho.xml
Size: 26,592 bytes
Last Modified: 2025-10-06 14:12:05
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-1046"> <Title>Word Boundary Identification fro m Phoneme Sequence Constraints in Automatic Continuous Speech Recognition</Title> <Section position="2" start_page="0" end_page="225" type="metho"> <SectionTitle> TEACH TEA </SectionTitle> <Paragraph position="0"> TEE~~~oi 3 ng--3 w \] i jERROR: cannot be parsed In this case, a left-to-right chart-parsing strategy would break off at/ch/because/ch oi ng/is unparsable: there are no words that end in/ch el/or begin with/el rig/and/el/is not usually corrsidered to be a word (aside from an exclamation) in the English language. Since the strategy works from left-to-right, the phonemes which lie to the right of this error would also remain unparsed: thus will would not be derived frmn /w i 1/, unless the chart-parsing strategy were modified in some way to be able to cope with this kind of error. If, on the other hand, phoneme sequence constraints had been applied, a word boundary would have been inserted between/ng/and/w/. This would enable immediate recovery from the kind of error described above: in this case, if the chart-parsing strategy is unable to continue parsing phonemes at a particular point (from /ch/ to /el/ to /ng/) it can continue parsing from the following word boundary (between /ng/ and /w/) that trod been automatically inserted by phoneme sequence constraints. The prior application of phoneme sequence constraints, therefore, breaks up a single string of phonemes into smaller units, whicb, from the point of view of the left-to-right chart-parsing strategy, are independent of each other. A by-product of the prior insertion of word boundaries in this way is that the chart-parsing strategy could parse each of these units in parallel (Figure 3).</Paragraph> <Paragraph position="2"> #moni # thangka #foosendlng # miidh@ ko pi @v #yoolet@ .I.</Paragraph> <Paragraph position="3"> m n thanks for ndm\]'me h a y se ' g t ecopyofyourlettev would enable the chart-parsing strategy to apply in parallel from all the pre-identified word boundaries.</Paragraph> <Paragraph position="4"> Such a parallel strategy may be computationally faster than one which parses the string strictly from left-to-right.</Paragraph> <Paragraph position="5"> As in Harringt0n & Johnstone (1988), sentences transcribed by a trained phonetician are used as the input data. The experiment does not take account, therefore, of any errors which may arise as a result of inaccuracies in the automatic extraction of the phonemes from the acoustic signal by the phonetic rule component of a continuous speech recogniser.</Paragraph> </Section> <Section position="3" start_page="225" end_page="227" type="metho"> <SectionTitle> 2 Method I </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="225" end_page="226" type="sub_section"> <SectionTitle> 2.1 Word boundary sequences </SectionTitle> <Paragraph position="0"> In order to identify phoneme sequences which are excluded word-internally (and which therefore signal the presence of a word boundary), it is necessary to determine a priori the complete set of three phoneme sequences which can occur across word boundaries.</Paragraph> <Paragraph position="1"> For this purpose, a 'Word-lexicon' of the 23,000 most frequent words (including many derivational and inflectional morphological variants and compounds) in part of the Lancaster-Oslo-Bergen corpus (Johannson, Leech & Goodluck, 1978} was used with each word keyed to one citation form and zero or more reduced form pronunciations. The citation form entry, which is often identical to the one given in Gimson (1984), corresponds to a phonemicisation of an isolated production of the word at a moderately slow tempo. The reduced forms include variant phonemieisations of the same words which might occur in faster speech productions. In general, three different kinds of reduction rules are included: alternation rules in which segments are in free variation (e.g./co k sh @ n/,/o k sh @ rg, auction); deletion rules in which single segments may be deleted (/o k sh n/ from/o k sh @ n/, auction); and word-internal assimilation rules (/g u b b at/from/g u d b a~/, good-bye). The rules do not take into account phonological assimilation across word boundaries (see Harrington, Laver & Cutting (1986) for further details of the reduction rules). The reduced forms were derived from the citation forms by rule using a software package running on Xerox-ll00 workstations in Interlisp-D (Cutting & Harrington, 1986). After the application of the reduction rules on the 23,000 word lexicon, around 70,000 reduced forms were derived (on average, ttmrefore, each word is associated with 4 different pronunciations).</Paragraph> <Paragraph position="2"> In order to derive the complete set of possible three phoneme sequences that occur across word boundaries, all final two phonemes (PP#) were paired with all initial phonemes (#P) of all citation and reduced forms, thus deriving the complete set of PP#P sequences (where P is any phoneme); and all final phouenms (P#) were paired with the first two phonemes (#PP) of all citation and reduced forms thus deriving the complete set of P#PP sequences. This pairing operation produced a total of 62,670 different three-phoneme sequences.</Paragraph> <Paragraph position="3"> Subsequently, it was necessary to take into account some of the modifications to word boundary sequences which occur as a result of assimilatory processes since, as stated above, these were not included in the reduction rules. In order to take into account the realisation of/r/in phrases such as/dh e@ raa m e n i/ (there are many) and 'intrusive/rf (/dh ii aid i @ r i z/, the idea is), the sequences in (1) were paired with all word-initial vowel phonemes that occurred in the Word lexicon: (1) /U@ r, e@ r, i@ r, @ r, @@ r, oo r, aa r/ thus deriving, for example,/@ r# i/(measure is),/aa r # au/(far out) etc. In addition,/r/was paired with all #VP sequences in the Word-lexicon where V is any word-initial vowel and P is any phoneme. This pairing operation results in sequences such as/r # i z,J (measure is), /r # au t\] (far&quot; out) etc.</Paragraph> <Paragraph position="4"> In order to account for the assimilation of alveolars to bilabials preceding bilabials, all PPt # sequences (where P is any phoneme and Pt is one of /t,d,n/) were extracted from the Word-lexicon. Final/t\],/d/,/n/were then changed to/p/,/b/and/m/ respectively (thus the PPt # sequences/it t #/,/it d #/,/it n #/were changed to/it p #/,/it b #/,/it m #f). The changed sequences were then paired with the labial consonants/p,b,m,f,v,w/. This pairing operation produces sequences such as/it p # b/(eat by),/on m # f/ (shown few),~@@ m # w~ (burn wood).</Paragraph> <Paragraph position="5"> A similar procedure was used to take account of the instability of some of the alveolars before palatals and velars as shown in Table 1 below.</Paragraph> <Paragraph position="6"> /s/to/sh/: oo sh # sh sh # shuu (horse shoe) /zJ to/zh/: i zh # sh zh # sh u@ (is sat'el /t/to/ch/: a ch # y eh # y oo (at your) /d/to/jh/: ijh # y jh # y uu (didyou) It/to/k/: ai k # k k # k uh (might come) /dJ to/g/: iig # k g # k 1 (need cleaning) /n/to/ng/: e ng # k ng # k a (when can) Table I: Sonm of the word boundary assimilation cases considered in the derivation of word boundary sequences.</Paragraph> <Paragraph position="7"> Consideration was given to some deletion rules across word boundaries such as the deletion of the alveolar stop in/faa .q # s pii clr t, (fast speech). In this case, a coraplete list of three-phoneme sequences occurring word-finally was made from the Word-lexicon where the penultimate consonant was a fricative and the final consonant an alveolar stop. The final alveolar stop was deleted and tile resulting two.phoneme sequence was tmired with all members of iI'P (thus/aa s t #/ (j'hst) ->/aa s #/ (first) ->/aa s # &quot;,;/, /b, st speech). All wm'd boundary sequences which resulted frmn the inclusion of these assinfilation rules were added to the previously derived P#PP arm I'P#1 ) sequences, thus producing a total of 69,819 wla'd boundary sequences.</Paragraph> </Section> <Section position="2" start_page="226" end_page="227" type="sub_section"> <SectionTitle> 2.2 Word boundary sequences excluded word-lnl:m'nally </SectionTitle> <Paragraph position="0"> We new wished to determine which word boundary sequences do not oecm&quot; word-internally (since these enable tile automatic detection of a word boundary), tlowcver, it is clear fi'om the phouolog 3 literature (Fudge, 1969; Cleruents & Keysm', 1983) that sequential constraints on phonemes are ltot upheld aeross many morpheme boundaries. For example, it is well documented (Rockey, t97a) that mdy alveolar!; and palato-alvel.lars may fi)\]low /au/ (town, h)wl, couch). Bat retch a constraiot is not upheld word internv.lly acrosq the nloFphetne boundary in a colnpollnd such a~; eew&)y, /k aub oil Similarly, /uu art l/ does uot occur morpheme--internally, hut does occur in componnds such as throughout. Since the Word-lexicon include~; compounds, sequences .'inch as /an au t/ would be considered to occm word--internally alnd would therelbrc be excluded fl'om the list of phmleme seq oence coustraiuts that enable the autenmtic detection of a word b(madary fi'om a string of phonemes Bnt this has the unfortunate effect that a word boundary would not be inserted in the sequem:e through outer,/th r au all t 00/. Since iIl fact we prefer word boanda)'ies to be it,serted wherever possible, all coalpotw-d~ were removed from the Wet'd-lexicon, m; a resnlt of which /uu au V wotdd be included as a possible phoneme sequence eonstraiut. Cousequeatly, we would expect a word bonndary to be inserted in both through outer and throughout. This implies either that tLroughout must be stm'ed its /th r uu # au t/ in the lexieou which tile chart-parsing strategy matches against the phonemic string, or else that morphoh)gical rules must apply after the phoueme :m(tuencc constraint processor to \['ell~.ove the medial # in throughout.</Paragraph> <Paragraph position="1"> A similar argument applies to inflectional ntm'pheme boundaries. \[&quot;or example,/n th s/is excluded morphmne internally hut does occm&quot; across stem/inflectional suffix boundaries (months). For the reasons outlined above, morphoh)gical variants with regnlar inflections (plm'ah;, present and past tense suftixes) were removed from the Word-lexicon. Exeludiag these inflectional morphological vm'iants has the (undesizable) effect that a t)oundary will be inserted between/th/an(t/~ in three months time, /th r ii lauh n th # s tai m/. lIowcver, some inflectional morphological rules, which apply after the phoneme sequence constraiu~ pr~)ces~qor, are designed to convert these boundaries into morpheme (M) boundaries (see section 4 below).</Paragraph> <Paragraph position="2"> Finally, it is also the ease that many sequences that are excluded monomorphemicully (e.g. /m ei sI~) can occur word-internally in derived morphological variants (/k o n f @ mei sh @ n/, confirmation). A similar ease could be nlade for renmving derivational variants \[i'(ua tim Word-lexicon and applying morphoh)gieal rules to rmaove the//t)oun(hu'y from sequences such as /k o n f @ m /C/ ei sh (u). n/ which would result after the applicatiou of the l)hlmeme sequence constraint processor. However, deri,/atioual variants were not removed, in part duc I;o the complexity of the interaction between the inflectional and derivational nmrphological rules that would have to apply after word boundaries had been inserted automatically.</Paragraph> <Paragraph position="3"> Only compounds and regular morphologically inflected variants were removed from the Word-lexicon; hencetbrth, the resulting lexicon with such entries removed will be referred to as the Morpheme.lcxicon. 'Uhe Morpheme-lexicon coutained around 12,0110 h;xical entries alter these mori)ttologlcal variants had been ~elaoved from the 23,000 Word-lexicon.</Paragraph> <Paragraph position="4"> All word boundary sequences, including those which account for the assinfilatory processes described in 2.1, were placed in one file and the medial word boundary symbol was removed. After all duplicate entries had been renmved, the resulting filc was matched against the Morpheme-lexicon in order to determine which boundary sequences do nut occur 'morphenm'-internally. The matching algorithm for this purpose was a UNiX shell script runnin,g.on a 12 mB Masseomp: it outputs the frequency with which the word boundary sequences occur word-irtternally in a given lexicou.</Paragraph> <Paragraph position="5"> 2.3 The word boundary identification algorithm All word boundary sequences which did not occur 'umrpheme'-ini:ernally were compiled into a discrinfination tree in which, working from left to right, common phonemes share identical branches. At the end of each branch, an instruction is included for where the boundary should be inserted if the :~equencc is found in an input l)honemic string (Figure 4).</Paragraph> <Paragraph position="6"> sequence constraints is matched against a phonemic input.</Paragraph> <Paragraph position="7"> In the case of/d b a/, tilt&quot; example, the boundary must be inserted after/d\], since there are no entries in the MorphemeAexicon with final /d b/. ttowever, since there are entries that both end in /dh @/and begin with/(a) d/,/dh @ d/cannot be unambiguously parsed: in this case a '?' is inserted after the first phoneme of the word boundary sequence./dh ? @ d/nmans, therefore, that a wm'd houndary occurs either after/dh/, or after/@/.</Paragraph> <Paragraph position="8"> For any given input phonemic string, the algorithm matches three phonemes at a time against the tree (Figure 4) fi'om left..to-right through the string. If they match, a boundary is inserted at the appropriate place. Subsequently, the fixed window of three phonemes shifts one phoneme to the right and the new sequence is matched in the same way. Thus, the matching algm'ithm steps through the input string one phoneme at a time with a window width of three phonemes until the end of the string is reached.</Paragraph> <Paragraph position="9"> Phonemic transcriptions (excluding stress or boundary symbols) were made hy a trained phonetician of 145 sentences produced by one lip speaker. '\['he average numbers of words per utterance and phonenms per word were 10.73 and 4.04 respectively. The sentences wcve taken from a 'phonemically balanced' passage constructed f~r the speech recognition project at Edinburgh University; sentences from Section It of the LancasterOslo-Bergen corpus (Johannson, Leech and Goodluck, 1978); and sentences fl'om a corpus of business dictation collected at CSTR. The transcribed sentences, which clearly do not contain any errors that could have arisen as a result of phonetic processing of the acoustic waveform by a speech recognlser, were input to the algorithm schematically outlined in Figm'e 4.</Paragraph> </Section> </Section> <Section position="4" start_page="227" end_page="227" type="metho"> <SectionTitle> 3. Results I </SectionTitle> <Paragraph position="0"> The statistics on the automatically inserted # boundaries are shown in Table If.</Paragraph> <Paragraph position="1"> phonemically transcribed utterances. The results show that 523/1411 (37%) of the target word boundaries were correctly detected. However, there were 69 automatically inserted # boundaries which did not correspond to word boundaries in the original utterances. Of these, 14 were incorrectly inserted because of the presence of reduced phonological forms in the utterances (e.g./w @ dh\] for with) which we had failed to generate by rule; and 7 were inserted because some words occurred in the utterances that had not been included in the Word-lexicon (most of these were proper names). 44 # boundaries were inserted at morpheme boundaries, both in compounds (/h au # e v @/for however) and preceding inflectional suffixes (/s i' m # z/ for seems). In the next section, some morphology rules ale described which attempt to convert the # at stem/suffix boundaries in cases such as/s i m # Z\] into morpheme boundaries. Finally, 244 '?' were inserted at appropriate points (i.e. for each/P?QPJ, where/PQR/are phonemes, either/P#QP,/or /PQ#PJ occurred in the original utterances}. The next section also describes rules for converting some of these '?' boundaries into definite # boundaries.</Paragraph> </Section> <Section position="5" start_page="227" end_page="228" type="metho"> <SectionTitle> 4. Method II </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="227" end_page="228" type="sub_section"> <SectionTitle> 4.1 Morphology rules </SectionTitle> <Paragraph position="0"> The phonemic strings with the word boundaries inserted by the matching algorithm in Figure 4 are input to a second stage of processing which uses four additional sources of knowledge: PHON1 and PHON2 (a list of all one and two phoneme words in the Morphology-lexicon) and #PP and PP# (a list of all legal word-initial and word-final two phoneme sequences). Since these data are extracted from the Morphology-lexicon, they take account of phonologically reduced variants, but not the morphological variants that were excluded from the Word-lexicon.</Paragraph> <Paragraph position="1"> The morphology rules test whether the two phonemes that occur to the right of an automatically inserted # are legal with respect to PHON1, PHON2, #PP andPP#. If they are not, the assumption is made that the # occurs across a stem/inflectional morpheme boundary. Morphological rules are then applied to shift the # to the correct place, if possible. Consider for example, the phrase boys and girls in... which, after the application of the first i stage of processing, was analysed as: (2) boi#zan?g@@l#zin The insertion of the word boundaries at this first stage of processing is attributable to the fact that neither /b oi Z\] nor /g..@@..1..z/ occurred in the Morphology-lexicon. Furthermore, since there are no words that begin with/el Z\] nor/1 z/, the relevant sequences would be stored as/b oi # Z\] and/@@ 1 # z/in the tree in to the right of the first # in (2): (3) If/z a/is not in #PP rewrite/el # z a/as/el M z # a/ else rewrite/el # z a/as/el M? z a/.</Paragraph> <Paragraph position="2"> Informally, (3) states that if/z a/cannot begin words (according to the Morphology-lexicon),/z/must be an inflectional suffix of the previous word: therefore place an 'M' (morpheme boundary) before /Z\] and shift the # symbol to the right of/z\]. Alternatively, if/z a/ does begin words in the Morphology-lexicon, it is impossible to determine whether/z/is a plural suffix or the first phoneme of a following word. In this case, M? is used to denote these two possibilities: it is an abbreviation for either/el M z # M or /el # z a/. In fact, since there are no words that begin with /z a/, (2) is analysed as/M z # a\]. A solution with M? would occur if boys are were analysed at the first stage of processing as: (4) b oi # z aa since in this case/Z\] can also be the first phoneme of a word (Csar}. A test is often performed with respect to PHON1 and/or PHON2 rather than #PP. This occurs in the following example, in which two # symbols have been automatically inserted in close proximity at the first stage of processing: (5) b i g i n # z @ # t ai p # (begins a type) In this case, a test is made to determine whether/z @/occurs in PHON2 (i.e. whether it is is a two phoneme word}. Since it is not, (5) is reanalysed as/b.i g i n M z # @ # t ai p #/.</Paragraph> <Paragraph position="3"> The test in (3) above is only made if the structural description of phonemes to the left and right of the # is met by certain conditions. Specifically, the test is performed in contexts such a s those given in Table IIL</Paragraph> </Section> </Section> <Section position="6" start_page="228" end_page="228" type="metho"> <SectionTitle> PAST TENSE </SectionTitle> <Paragraph position="0"> {p, k, f, th, s, sh}# t (tapped, missed, wished) voiced phonemes excluding/d\] #d (paved, seemed)</Paragraph> </Section> <Section position="7" start_page="228" end_page="228" type="metho"> <SectionTitle> PLURALS/PRESENT TENSE </SectionTitle> <Paragraph position="0"> {p, t, k, f, th} # s (mats, picks, meets) voiced phonemes excluding/z, zh, jh/# z (tabs,sings) Table III: Some of the contexts in which the morplmlogy rules apply.</Paragraph> <Section position="1" start_page="228" end_page="228" type="sub_section"> <SectionTitle> 4.2 Resolving Ambiguities </SectionTitle> <Paragraph position="0"> The four sets of data PHON1, PHON2, #PP and PP# are also used to convert some '?' symbols into definite (#) word boundaries. In order to resolve the hypothetical ambiguity /ABC?DEF/, for example, it is first expanded into the two possible cases it represents in (7) and (8) below: (7) ABC#DEF (8) ABCD#EF An attempt is then made to prove that either (7) or (8) is illegal (on the basis that, if (7) is illegal, ABC?DEF must correspond to the representation in (8) and vice-versa). (7) can be proved illegal if (9) is true: (9) Either C is not in PHON1 and BC is not in PP# Or D is not in PHON1 and DE is not in #PP An informal interpretation of (9) is the following. If C is not a one-phoneme word, test whether BC is a legal two-phoneme sequence that can end words; if C is not a one-phoneme word and BC cannot end words, then (7) must be illegal. Otherwise, if (7) i cannot be shown to be illegal on the basis of the phonemes that precede #, the phonemes that follow # are considered. In this case if D is not a one-phoneme word and if&quot; DE cannot begin a word, (7) must be illegal. Otherwise, (7) cannot be shown to be illegal and so the following (similar} test is applied to (8): (10) (8) is illegal if: Either D is not in PITON1 and CD is not in PP# Or E is not in PHON1 and EF is not in #PP.</Paragraph> <Paragraph position="1"> If neithm' (7) nor (8) cat, be proved illegal, the '?' cannot be resolved into #.</Paragraph> <Paragraph position="2"> When two '?' symbols occur in close proximity, an expansion is made into fore&quot; alternatives. If t:hree of the alternatives can be proved illegal, both '?' symbols can be resolved as definite # symbols. For exmnple, after the first stage of processing, ;aeasuring the gun was analysed as: (11) /me~hring#dh?@?guhn/ This expamt:~ into the following alternatives: (12) /rn e dl ring# dh # @ # g uh n/.</Paragraph> <Paragraph position="3"> (13) /me e,h ring#dh # @ g#uhn/, (14) hne dt ring#dh @##guhn/.</Paragraph> <Paragraph position="4"> (15) hnezhring#dh @ #g#uhn/.</Paragraph> <Paragraph position="5"> (12) and (13) nmst be illegal since hlh/is not a one-.phonenm word ((13) is additionally illegal since /@ g/ is not a possible two-phoneme word). (15) is illegal since/g/ is not a one-t)honeme word. Theret ore (14) is the only possible analysis of (11). This type of expansion into four possibilities is only made when 3 phmtemes, or fewer, occur between the two '?' symbols: if more than three phonemes intervene, the result of resolving both ? symbols together is the same as if each ? symbol were considered separately.</Paragraph> <Paragraph position="6"> Finally, the example with two '?' symbols in (11) is extended to the general ease in which n '?' symbols occur in close proximity to one another (i.e. a series of n '?' symbols with 3, or fewer, l)homanes between successive '?' symbols). These expand i~to 2 n alternatives. As in the example above, if 2 n - l alternatives can be proved illegal, all r~ '?' symbols can be converted to # symbols.</Paragraph> </Section> <Section position="2" start_page="228" end_page="228" type="sub_section"> <SectionTitle> 4.3 Order of rules </SectionTitle> <Paragraph position="0"> After: the application of the first stage of the word boundary in..~ertion rules, expansion rules apply in which each '?' symbol is e~panded into two alternatives. The morphology rules apply to each of these expanded alternatives and at all other points in the utterance at which their structural description is met. Only after ~;he morphology rules have applied can any of the alternates be eliminated. The morphology rules must apply before eliminating alternatives, othm'wise some altm'natives might be incorrectly eliminated. This can be illustrated with the example boys and girls which, after the first stage of processing, was analysed as/b oi # z a n ? g @@ 1 # z/. This expands into: (16) boi#z an#g@@l#z (17) boi#z an g#@@l#z If the elimination rules applied prior to morphological rules, both (16) and (17) would be eliminated, since /z a/is not in #PP (and (17) is illegal since/n g/is not in PP#). I~, on the other hand, the morphology rules apply first, (18) and (19) would be derived from (16) and (17) respectively: (18) boiMz #an#g@@lMz# (19) boi~'fz #an g#@@lMz# Only (19) would be eliminated, on the grounds that/n g/is not a legal two-phoneme sequence occurring word-flnally.</Paragraph> <Paragraph position="1"> A further illustration of the interaction between the expansion rules, morphological rules and elimination of alternatives is shown in (20 - 33) below. After the first stage of processing, months tie (from a sentence in a gardening manual, 'after a few months, tie in more growth') was analysed as/m uh n th ? s t ? ai/. This expands to four alternatives: In eliminating the alternatives, a slight modification has to be made to the rules: rather than referring to two segments to the left and right of #, they refer to the two segments to the left of an M symbol (if present) and to two segments to the right of #. But the segments that intervene between an M and # are ignored. The following test would therefore be made to test the legality of (29): (34) (29) is illegal if: l,\]ither /th/is not in PITON1 and/n th/is not in PP# Or /t ai/is not in P\[ION2 It is possible to eliminate (28) since/t/is not in PIION1. (31), (32) and (33) can be eliminated since/th s/does not occur in PP# (final /th s/occurring only across a stem/inflectional suffix boundary). (29) and (30) remain, and are collapsed into one representation in (35) using the M? notation: (35) muhnthM?s t ai#in Tim analysis shows therefore that /in uh n th ? s t ? ai/ corresponds to either months tie in or month sty in.</Paragraph> </Section> </Section> class="xml-element"></Paper>