File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/79/j79-1057_concl.xml
Size: 9,843 bytes
Last Modified: 2025-10-06 13:55:51
<?xml version="1.0" standalone="yes"?> <Paper uid="J79-1057"> <Title>Strong First Syllable Rule - A ;</Title> <Section position="8" start_page="34" end_page="34" type="concl"> <SectionTitle> IV Reliability </SectionTitle> <Paragraph position="0"> Two studies have been made to determine the accuracy of phonological and stress placement rules' and to select a minimal set df rules which will produce accurate results in as many cases as possible.</Paragraph> <Paragraph position="1"> The set of letxer-to-phonemewrules used in the first testing proced~re contained 534 rules: included were 127, consonant roles, 46 prefix rules (giving pronunciations for 40 prefixes), 155 suffix rules (covering 96 suffixes) and 206 vowel rules. The Trisyllabic Shortening Rule was included in the set.of stress rules. A sample of 4,725 worb from the Third International Dictionary, either preferred or alternate. Of the 2,375-1- rq 5-lekter words which receiyed acceptable pronunciatLons, 2,135 were given preferred pronunciat&ons, 228 were given alternace pronunciations and 12 received the verbaL pgonunciation of nounlverb pairs. A table of frequency of use and statistical accuracy 6f each rule was derivet from this study. These results led to the removal of the Tris'yllabic Shortening Rule and to the formulation of eight gets of phonological rules ranging from a maximal set of 557 rules to a minimal set of 277 rules.</Paragraph> <Paragraph position="2"> In the second study, these eight sets of rules were each applied to a new group of te$t words which was composed of a xandom sampling of sixletter words from the Brow Corpus (250 words), the Heritage English Dictionary (150 words) and Stedman's Medical Dictionary (100 words). Results of this study are as follows: number of Rules Percentage siven acceptable pronunciation Note: The addition of special medical prefixes would increase the accuracy & r+s appl.iredfo fie sample from stedmanrs Medical Dictionary by approximately ten per cent.</Paragraph> <Paragraph position="3"> The set of rules turrentJg being used in the text-to-speech system is the set mqtaining 413 rules A list of the maximal set of 557 rules together with instructions hr extracting the other sets of rules is given in the appendix.</Paragraph> <Paragraph position="4"> There are a number of problem areas, many of which derive from the lack of a lekicon. Boblems of this type include incorrect suffix or prefix recognition and the treatment of compounds as single norphs.</Paragraph> <Paragraph position="5"> Some examples from each problem area are given below: The pr~nunciations of the underlined vowels in the contexts above ark encountered infrequently, and, in most cases, are not predictable. In the word international, the context which determines the pronunciation of @]is the right-hqnd context Lc+~v], A 10- xowel almost always is found in this context as in nation, station, explanation, observational, gensational. A short [el is usually found preceding PC], e.g . , maleuc, angelic ,. systemic, photogenic, and is long only in a few words, e.g., strategic, scenic, and the suffix *legit. .There are very few words ending in the vowel[u]-most are either low frequency words or proper names. The palatalization in menu is not found in other words with finallu], e.g., flu, emu, gnu, -i'mprqmptu. The word - two is very irregular in pronunciation. Most words ending in[o]such as E, - no, so, calico, echo have the sound lo/. It may be noted, however, that two other words which, like -1 two me very high frequency words, have the same pronunciation of final[o]as -9 two i.e., - do, - to. The mispronunciation of the[e]in modeled is due to the assumption, lacking a lexicon, that the morphemic analysis is -- model C - ed.</Paragraph> <Paragraph position="6"> Mispronunciation of vowel digraph: The reasons for mispronunciation of the vowel digraphs underlined above fall into a number of categories. There are very high frequency words, said, should and would, which do not follow letter-to-sound rules. Said may be contrasted with the words laid, maid, paid, and raid; the words should and would contrast with mould, shoulder and boulder. The sequence bir] as in theiys, heir, weir is not found frequently in English, nor is the sequence (feit] as in forPS eit surfsit and counterfeit . Rules for ki] in these two contexts w@rakcskis:risidered unproductive. ~inal to4 in English is usually pronounced as in oboe, toe aA foe; the pronunciation found in shoe, and also in canoe, is rare. Rules governing the pronunciation of &ow] (endowed) and bi] (guitars) are statistically based. Althougfi there are rna~ywordo in- whiekhllar nort=ccmt ext dsqxmdmt-prwornlat3m~~s m, e.g., -9 cow' allow, eyebrow and build, guilt, guinea, other pronunciations are statistically more likely, e.g., those found in shadow, glow, follow and bruise, juice, nuisance. The ptonunciation of break is not-predictable -- the word steak has the same digraph pronunciation, but other similar words such as creak, freak and Streak are pronounced like the majority of words congaining the digraph Lea], Mispronunciation of .single consonant: of - /f/ cor~ /PI eager / j/ exhaust - /h/ two - /w / physiological - /s/ deserts - /s/ schizophrenic - /z/ The consonants underlined above are either silent or have unpredictable or unusual pronunciations. Silent consonants are found in two, corpand exhaust .- The yord two is a h'igh-frequency word in which both the l&] ahd b] have unusual pronunciatio~s . Silent M is rare, although it is also found in the word sword. Pinal silent [d, as found in corp is also rare. (This word is c~~sidered in this section because the pronunciation of both the,[r]and therp] are determined by rules for single consonants.) There are a few tmrds, like exhaust, in which [h] is .silent following kx], e. g. exhibit, exhilarate, exhort and exhume. However, this rule is not sufficiently productive to merit inclusion.</Paragraph> <Paragraph position="7"> The letter b] preceding (e],[il and ty] in English usually has a soft sound as in integer and wager. In particular, many words ending inber] ark a combinatton of a root with final [el and the Buff ix her], e. g . , mrager, manager, merger, all of which have a soft [g] sound. The pronunciation of the \g] in eager is unusual and not predictable. Another pr~nunciation which is frequently unpredictable, i.e., not context-dependent,, ts that of the lqtter[s] between vowels. The rule for this context predic~ the more frequent sound /s/ whereas the sound /z/ is found in deserts and physiologieal. The letter [ z] in schizophrenic has the rgre pronunciation /ts/, and the word - if, as previously discussed, is the only English word in which a final [f] is Consonant clusters are inPS rqquently mispronounced The cluster tch 1 is the most frequent problem fn Ghi~ category, its pronu~eiation being determined, in many cases, by the Greek or Latin origin of the word in which it appeBrs. The pronunciation /If as in -. chef and - cliches is less frequent than. either /:/, e.g., church, or /k/, e. g., .chemical. Morph-final Cgh] may be pdonounced either if/ as in laugh, enough and cough or, with slightly hiqher probability, not pronounced, a3 in high, weigh and dourn.</Paragraph> <Paragraph position="8"> Unusual and rare pronunciations of the clusters -3 t~ -9 ss - id add - li are found in other yoxds above. The pronunciatipn of [ss] as /J/ is found preceding certain suffixes, e.g.,.deprgsqion, fissute, but rarely within a motph, (tissue, above). Russian orthography is still reflected in the English spelling of tsar evemthough the pronunciation has been Anglicized. A silent [l] appears in could, .would and calf. The words could and would are high freauency words and also .differ from regular;,pz'~nunciation in the vowel digraph cou3. Although half like -5 calf also has a silent [l] , in most words a final [lf] is gronopnced /If / , e. g. , - elf, -- shelf, self, gulf.</Paragraph> <Paragraph position="9"> The high-frequency-ward pronunciation of: morph-in itial [th] as / 8 1, e. g. , these, -9 then - the, has been discussed previous3 &quot; Almost al,l problems in this category arise from the lack of a morph lexicon. Words are pronounced incorrectly because letter strings in a root which appear to be suffixes are converted to phonemes using ,rules for suffix pronunciation. It may be seen in the examples above that a mistake in morph ana5iysis can cause obvious errors in pronunciation. There are many technical prefixes which have not been included in the prefix list.</Paragraph> <Paragraph position="10"> These may 6e added by a user with particular technical needs. A few prefixes such as[a]in apart andp] in eject have not been included because a high error rate would result, i.e., all words begipning in &or g would be incorrectly analyzed. In the remaining cases, prefixes were incorrectly analyzed as part of a root after suffixes were incorrectly removed. Errors in pronunciation, and particularly in stress are the result. Incorrect stress: Most of the words in this category have unusual stress patterns which are unpredictable. A comparison with similar words shows the regular stress pattern: erratic, fanatic, aromatic brigade, serenade, marinade The word selects is stressed incorrectly due to lack of information concerning its part of speech (c.f., discussion of modificatiqns in the Main Stress Rule).</Paragraph> <Paragraph position="11"> The results of this study indicate that the letter-to-phoneme system is quite powerful, even in isolaqion. When considered in the domain of the over-all text-to-speech system in which a lexicon is available for high-frequency words and compounds, the letter-to-phoneme system should be highly reliable .</Paragraph> </Section> class="xml-element"></Paper>