File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/83/a83-1032_metho.xml
Size: 16,553 bytes
Last Modified: 2025-10-06 14:11:29
<?xml version="1.0" standalone="yes"?> <Paper uid="A83-1032"> <Title>APPLICATION OF THE LIBERMAN-PRINCE STRESS RULES TO COMPUTER SYNTHESIZED SPEECH</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> APPLICATION OF THE LIBERMAN-PRINCE STRESS RULES TO COMPUTER SYNTHESIZED SPEECH </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> Computer synthesized speech is and will continue to be an important feature of many artificially intelligent systems. Although current computer synthesized speech is intelligible, it cannot yet pass a Turing test. One avenue for improving the intelligibility of computer synthesized speech and for making it more human-like is to incorporate stress patterns on words. But to achieve this improvement, a set of stress prediction rules amenable to computer implementation is needed.</Paragraph> <Paragraph position="1"> This paper evaluates one such theory for predlcting stress, that of Liberman and Prince. It first gives an overview of the theory and then discusses modifications which were necessary for computer implementation. It then describes an experiment which was performed to determine the model's strengths and shortcomings. The paper concludes with the results of that study.</Paragraph> </Section> <Section position="3" start_page="0" end_page="192" type="metho"> <SectionTitle> I INTRODUCTION </SectionTitle> <Paragraph position="0"> Since speech is such an important component of human activities, it is essential that it be included in computer systems simulating human behavior or performing human tasks. Advantages of interacting with a computer system capable of speech include tha= a) special equipment (e.g. a terminal) is unnecessary for receiving output from the device.</Paragraph> <Paragraph position="1"> b) the output may be communicated to several people simultaneously.</Paragraph> <Paragraph position="2"> c) it m~y be used to gain someone's attention. null d) it is useful in communicating information in an emergency.</Paragraph> <Paragraph position="3"> *Current address: Bell Laboratories, Indianapolis, Indiana 46219.</Paragraph> <Paragraph position="4"> The primary methods for generating computer synthesized speech are i) to use a lexicon of word pronunciations and then assemble a message from these stored words or 2) to use a letter-to-sound translator. A shortcoming common to both methods, and of interest to linguists and more recently computer scientists, is the inclusion of English prosody in computer synthesized speech e.g. Klatt \[6\], Lehlste \[8\], Wltten et al \[ll\] and Hill \[5\]. Of the three primary components of English prosody, this paper considers only stress (the other two are intonation and pause). It applies the theory for stress prediction proposed by linguists Mark Liberman and Alan Prince \[9\] to computer synthesized speech. Their theory was chosen primarily as a result of it having received wide-spread attention since its introduction (see Paradls \[lO\], Yip \[12\], FuJimura \[3 and 4\] and Basboll \[2\]).</Paragraph> </Section> <Section position="4" start_page="192" end_page="194" type="metho"> <SectionTitle> II THE LIBERMAN-PRINCE MODEL </SectionTitle> <Paragraph position="0"> In addition to the attention it received, the Liberman-Prince model \[9\] (hereafter referred to as rhe LP model) is attractive for computer application for two other reasons. First, the majority of its rules can be applied without knowledge of the lexical category (part-of-speech) of the word being processed since the rules are based only on the sequences and attributes of letters in a word.</Paragraph> <Paragraph position="1"> This feature is especially important in an unrestricted text-to-speech translation system.</Paragraph> <Paragraph position="2"> Secondly, since the metrical trees that define the prominence relations are a common data structure, a computer model may be designed which remains very close to the foundations and intentions of the theoretical model.</Paragraph> <Paragraph position="3"> This section will summarize the LP theory as presented in \[9\]. The LP method of predicting stress focuses on two attributes of vowels: / or - !on~ and + or - low. The ~ of b~e is +lon~ while the PS of ~ is -lonE. Each of the vowels has both a + and - lon~ pronunciation. For example: state, sat, pint, pin, snow, pot, cute, and cup.</Paragraph> <Paragraph position="4"> The attribute + or - low is named for the height of the tongue in the mouth during articulation of the sound (see Figure i). During production of a +low vowel, the tongue is low in the mouth while it is high for a -lo.~w vowel. Speaking aloud the words in the figure demonstrates this difference.</Paragraph> <Paragraph position="5"> front back TABLE I. Examples of the ESR.</Paragraph> <Paragraph position="7"> The relative position of the highest points of the tongue in vowels in 1 heed, 2 hid, 3 bead, 4 had, 5 father, 6 good, 7 food. \[7\].</Paragraph> <Paragraph position="8"> America ardma defdctive negdce can6nical Carddna referdndum repdte Everest hormonal amalgam er6de asparagus horlzon erector balloon polygamous desirous anarthrous ballyhoo elephant adjacent Charybdis exploit Stress is not inherent to vowels in isolation but is present only wlthin words. Stress of a vowel phoneme wichln a word is a relative quality that is noticeable only by contrast with surrounding phonemes. Consonant phonemes may also be defined in terms of several different actrlbuces, but within thls theory their main purpose is to combine with vowels Co complete the syllable s~ructure of the words.</Paragraph> <Paragraph position="9"> In English, each syllable of a word moat concain aC least one vowel. A syllable can be a single vowel, rode-E; it may be an open syllable with the vowel at a syllable boundary, po-llce, ar-tlculate or it may be a closed syllable with the vowel surrounded by consonants, Mo__n-tana. The term 'vowel ~ in this context means vowel phoneme and noc orthographic vowel; the same is true for consonants. The c h in thine is considered a single consonant phoneme.</Paragraph> <Paragraph position="10"> The LP model defines context sensitive rules thac can be used co predict which vowels within a word should be stressed. The three rule types are: l) English Stress Rule and the Stress Retraction Rule - ESR and SRR, 2) English Descressing Rule - EDR, and 3) Exceptionless Vowel Lengthening Rule - EVL. As the names imply, the first and second rules deal with assignment of + or - stress, while the third predicts which vowels should belong. All three rules operate within a word from right to left. In the first stage, the shape of the penultimate (next-to-last) syllable determines the assignment of the + stress attribute using the ESR rule. &quot;If the penultimate vowel is short and followed by (at most) one consonant, then stress falls on the preceding syllable,&quot; \[9\] as in Table l(a). &quot;Zf the penultimate vowel is long \[Table l(b)\] or followed by two or more consonants \[Table l(c)\] then it must bear stress itself.&quot; \[9\] Each of ~he previous statements assumes the final vowel is short. The fourth case of the ESR says thac if the final vowel is long then ic must bear stress, Table l(d). (See \[9\] for exceptions Co this first stage.) ~n the second stage, the +stress attribute is assigned based on the position of the leftmost +stress vowel in the word. Since the rule retracts stress across the word It is called the Stress Retraction Rule (SRR).</Paragraph> <Paragraph position="11"> The ESR and SRR mark certain vowels to be stressed; this however does not imply that when the word is spoken, each of the vowels will be stressed. There are instances, depending on the characteristics of the word, where vowels will lose their stress through the application of the English Destressin8 Rule (EDR).</Paragraph> <Paragraph position="12"> The EDR depends on the notion of metrical crees whose purpose it is to give an alternating rhythm to the syllables of a word and define the relative prominence of each syllable within the word. Rhythm is reflected by the assignment of the actrlbuce ~, strong, to stressed syllables and w, weak, co unstressed syllables. For the words labor, ca?rlce, and Pamela the trees are simple (see Figure 2). The first rule in building the tree is if the vowel is -stress then its attribute is ~, if the vowel is +stress then it may be ~ or w. The root node of any independent subtree or the root node of the final tree is not labeled.</Paragraph> <Paragraph position="13"> The ~ E labeling defines a contrast between two adjacent components of a word; therefore, a SOfitary s or E would have no meaning.</Paragraph> <Paragraph position="14"> Each time a +stress is assigned by either the ESR or the SRR an attempt is made to add co the tree. As in the word labor a node is added to the tree and the vowels are marked s or w according to their stress markings, + or -. Next, any unattached vowels co the rlghc of the new node are added, as wlch Pamela. This builds a series of binary subcrees chat are necessarily left branchin~ (see Figure 3). There are some situations where nothing can be added to the tree after the assignment of +stress. Such words cause a rephrasing o{ the second step above to become: next attach any vowels to the right of the present vowel that have not been attached durin 8 the operation of a previous rule.</Paragraph> <Paragraph position="15"> These t%/o steps allow trees such as those in Figure 4 to be formed. Two questions remain. How is the tree completed? How are the ~, ~ relations defined above the vowel level? To answer the first question; after all unattached vowels to the right have been attached into a left branching subtree, this subtree is joined to the highest node of the subtree immediately to the right, if it exists (see Figure 5). The ~, ~ assignment is made by the Lexical Category Prominence Rule (LCPR). In its simplest form it states: In the configuration \[N1,N2\] within a lexical category, N2 is s if and only if it branches. The LCPR has already been used in the stress assignments of teleological , Pamel@, and execute, to connect unattached vowels to the right of the + - sequences. The LCPR also follows the convention that no -stress vowel is assigned To insure that all vowels are included in the tree, one final step is necessary as illustrated by the word Monongahela. Following the rules as previously outlined will generate a stress assignment and tree such as that in Figure 6(a). The first vowel must be included in the tree to produce Figure 6(b), This is done as the last stage of tree building. The LCPR is used in this case to Join the vowel and the tree structure and to</Paragraph> <Paragraph position="17"> The English Destressin8 Rule (EDR) is used to determ/ne which vowels should be reduced. Generally t%/o things happen when a vowel is reduced.</Paragraph> <Paragraph position="18"> First, it will lose its +stress attribute and secondly, the vowel sound will be reduced to a schwa (an indeterminate sound in many unstressed syllables, e.g. the leading ~ in America). The rule is based on the tree prominance relations of the uuetrical trees, and is restricted to operating on only those vowels that have been marked +stress by either the ESR or SKE (see \[9\]).</Paragraph> <Paragraph position="19"> Finally the Exceptionless Vowel Lengthening Rule (see \[9\]) is applied to handle apparent exceptions in the operation of the ESR, e.g. words such as alien, simultaneous, radium and labia which contain a vowel sequence preceding the vowel to be stressed.</Paragraph> <Paragraph position="20"> III I~LE~iENTAT I ON Converting a theoretical model such as tha: proposed by LP into a computerized implementation poses problems. One concern is whether she rules and definitions of the theory are well suited to a computer implementation, or if not, must they be transformed to such an extent that they no longer resemble the originals? Fortunately the LP theory is expressed in rules and definitions that easily lend themselves to an implementation.</Paragraph> <Paragraph position="21"> Overcoming other problems while remaining close to the LP theory involves a careful combination of three factors. First, certain modifications must be made with the application of the rules for locating the +stress attribute and building metrical trees. Second, several assumptions must be made about the exact definitions of the terms such as VOWEL and CONSONANT. Third, some of the rules which are too general must be restricted.</Paragraph> <Paragraph position="22"> None of these modifications causes a drastic reshaping of the model.</Paragraph> <Paragraph position="23"> Three outcomes exist for a word being processed by such a system. One, the stress pattern of the word will be correctly predicted. Two, the stress pattern of the word will be incorrectly predicted. Three, the word will drop through without the system being able to predict any stress. Any modifications, assumptions or reetrictioas imposed should be done with the primary intent of reducing the number of words for which an incorrect stress pattern is predicted, even if this means increasing the number of words which drop through.</Paragraph> <Paragraph position="24"> One modlflcation was to use a phonetic translation of the word instead of its s~andard spelling. This ~eant working from an underlying representation rather than the surface representation. By working from the underlying representation, the attributes +-stress, and +-low could be dlfferenflared from the phonetic alphabet character directly because a +lon~ vowel and a -lon 8 vowel would be represented by two different characters in the phonetic alphabet. Four immediate results occur from maklng this modification. First, single consonant sounds such as the t_hhln thln~ are represented by a single character. However, the same is not true for dlpthongs. Both IPA symbols and VOTRAX codes (a VOTRAX ML-I speech synthesizer was used to output the results of the stress prediction) for dlpthongs are multiple character codes. Second, in a phonetic translatlon all reduced vowels are already reduced. Therefore for the most part the EDR is of llttle value. It only retains its usefulness for initial syllables that are not stressed but whose vowel is not schwa.</Paragraph> <Paragraph position="25"> This syllable will draw stress by the SRR creating a situation for the EDR to apply. Third, the ESR and SRR also operate less freely because they will not apply stress to a schwa. Fourth, a new rule is required to operate in conjunction with the EVL.</Paragraph> <Paragraph position="26"> This rule must give a final +!on~ vowel, such as the ~ in stor~, the -lon~ attribute so that the ESR can correctly assign stress.</Paragraph> <Paragraph position="27"> A second change was that the SRR could be applied in accordance with the principle of disjunctlve ordering. This situation results from the fact that a translator system has no lexicon.</Paragraph> <Paragraph position="28"> Although the words therefore cannot be marked for a particular type of s~rees retraction (SRR), it does not cause a major problem.</Paragraph> <Paragraph position="29"> One implication of these modifications is the sequential ordering of the rules which group words into classes based solely on the characteristics of their phonetic translation. Therefore any set of stress rules should be organized in terms of a 'best fi~' mode of application. Secondly, the stress rules cannot be defined in a way that can differentiate syllable boundaries, so no rule can be based on the concept of a 'light' or 'heavy' syllable. Although the stress rule input form does allow an affix option, it should be kept in mind that the e nn of enforce is considered a prefix as well as the ann of English. Finally, there can be no distinction between words based on the word stem or the word origin, except, in the case of word origin, if it can be defined in terms of a dlstinc~ affix. For example the Greek prefix hetero in: heterodox, heter0ny ~, or heterosexual is a candidate for long retraction by the SRR.</Paragraph> <Paragraph position="30"> Although the application model is a modified version of the LP model, it still operates in the manner of their original intent.</Paragraph> </Section> class="xml-element"></Paper>