File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/67/c67-1018_metho.xml
Size: 15,228 bytes
Last Modified: 2025-10-06 14:11:06
<?xml version="1.0" standalone="yes"?> <Paper uid="C67-1018"> <Title>WITH CONSIDERABLE SUPFORT AND ENTHUSIASM FROM THE PROVINCIAL 60VERNMENTS, AND FROM THE NATICNAL AND LOCAL VO~UNCARY WELFARE AGENCIES. ONE MILLION DOLLARS</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> SEGMENTING NATURAL LANGUAGE BY ARTICULATORY FEATURES </SectionTitle> <Paragraph position="0"> ENGLAND.</Paragraph> <Paragraph position="1"> I. For many purposes it is necessary to segment text into units convenient for handling. The sentence has been generally accepted as the natural unit, since there was no obvious alternative other than the word -which by itself tells us too little - or the paragraph - which is a vague and shifting unit, ~unless redefined. But the sentence is not satisfactory either: it ks very variable in length; studies of speech show that in its conventional form it is not always recognizably present I ; it may depend semantically upon its context up to at least paragraph length; and in any case what constitutes a sentence is not consistently defined (Fries 2 indicates more than 200 definitions).</Paragraph> <Paragraph position="2"> 2. There is another way of segmenting text, which does not suffer from these limitations, being based upon the rhythmical features of articulated speech.</Paragraph> <Paragraph position="3"> This use of the term &quot;articulated&quot; results from a vlew of language as basically speech, that is as skilled bodily movement. We have found it possible to bridge the gap between spoken language and written language by using features which both the writer and the reader of language tend to adopt from speech.</Paragraph> <Paragraph position="4"> 3. Studies of SSpoken language, particularly in relatiQn to foreign language teaching, show agreement on at least the terminal boundary of the &quot;tone group&quot; which Crystal & Quirk3 call &quot;the most striking prosodic unit in English speech&quot;, and on which they have found experimentally a high rate of agreement by informants. Many different teaching books* exemplify this agreed feature, despite the lack of satisfactory instrumental evidence on continuous speech (into which research is now being planned).</Paragraph> <Paragraph position="5"> 4. Less agreement is found on the configuration of the whole unit which terminates in the &quot;nucleus&quot;. Some authors refer to &quot;tone groups&quot; or &quot;tone units&quot;, some to &quot;sense groups&quot;, some use both terms: this overlapping category of tone and sense suggested a field for further study, which has been proceeding at C.L.R.U. for some time. Syntax is not usually brought into the treatment of this subject, since the approach is phonological; but among the authors * Work supported by Canadian National Research Council.</Paragraph> <Paragraph position="6"> .I.</Paragraph> <Paragraph position="7"> referred to 4, MacOarthy do~ndicate that syntactic criteria determine the s$~ure of his &quot;intonation groups&quot;. Our studies support the work of those who suggest that what is commonly called &quot;stress&quot; has a semantic functionS, and what can be an~ysed in terms of intonation is the syntactic feature , - a kind of audible syntactic braketting.</Paragraph> <Paragraph position="8"> 5. It is common practice in the teaching of English as a foreign language (see Baird7) to use tone groups of two stresses (head and nucleus) as examples, but this configuration is not usually formalized. In my own use of such drill material for the foreign learner, I have for many years adopted this unit, marked it with a musical p~rase-mark, and called it, since my 1954 publication deg, a &quot;phrasing&quot;. MY drill use of this unit gives a minimal context of not less than one sentence - a sentence being se~nentable into one or more phrasim~s, the phrasing being thus audit between the word and the sentence but not necessarily coterminous with the clause or grammatical phrase. (The musical analogy shows phrasing as a category distinct from the note, the bar, and the section.) 6. Ten years after publication of these drills, my work was called upon by Margaret Masterman9 in relation to her own semantic approach, for which the two stress-points of the phrasing were seen to correspond to two information points. In the ~eantime I had been led by teaching experience to consid6r the ~ifficulty of foreign lezrners with adequate vocabulary and adequate syntax but no adequate speech-experience of English. They were unable to read a piece of current English (e.g. a &quot;Times&quot; leading article) with understanding, wherean the native English reader, even if momentarily puzzled by perhaps a hastily-worded sentence, would immediately feed back into his reading of it (i.e. &quot;in his mind's ear&quot;) the natural speech form (i.e. the phrasing) with which the writer had written it.</Paragraph> <Paragraph position="9"> 7. From this the conception of &quot;stress-point&quot; became differentiated from precise syllabic location of stress (which is itself a complex of amplitude, frequency, and duration) and was defined as the word or words centred, in stress-and-tone prominence, on the nuclear tone, ~nd the word or words centred (in th e same sense of ,prominence&quot;) on that head t_one which predominates above any other head or heads which might follow the precedin~ nucleus.</Paragraph> <Paragraph position="10"> .2.</Paragraph> <Paragraph position="11"> r This method of dealing with tone groups which apparently have more than one head proves to be operationally satisfactory. It gives us a consistent phrasing of two beats, the second of which consists, in certain cases, of a &quot;silent stress~ ( phenomenon vouched for by many phoneticianslO), a It also helps to meet~the difficulty of differently timed lan@nla~es, referred to in para. 13 below.</Paragraph> <Paragraph position="12"> 8. It follows from the treatment of stress-points indicated in para. 7 above, that spread stress will occur in regular compounds, such as &quot;semi+readiness&quot;, and it also occurs very frequently in cases of a noun with its qualifier, whether true adjective or noun acting as adjective, e.g. &quot;political+requirements&quot;, or &quot;staff+ planning&quot;, and in g~neral where we find intimately associated words on which the stress falls with virtually equal emphasis.</Paragraph> <Paragraph position="13"> 9. The silent beat may or may not be a perceptible pause, but tends to occur in certain typical locations, e.g. where some expression of significant semantic content is about to follow. It would also be possible in many cases to imagine the phrasing re-written using relevant syllables instead of the silent beat, e.g.</Paragraph> <Paragraph position="14"> &quot;in a review of progress&quot; instead of &quot;in a review () &quot;.</Paragraph> <Paragraph position="15"> In marking phrasings on text two symbols are used in addition to the + sign for spread stress and the () sign for silent beat. They are the well-known tonetio m~rker ~ (originally representing a high falling tone) used for the nuclear stress, and the stress-mark' used for the head stress. These may also be referred to as primary and secondary stress-points, the nucleus being primary because in general it indicates the ~ of the utteraace and the head being secondary because in generalit indicates the cqmment. Thus reading down all the nuclear stress-points of a text printed as a series of phrasings one below the other, we have an index of the topic of the whole text.</Paragraph> <Paragraph position="16"> 10. A piece of text reading &quot;Politically Canada is divided into ten provinces and two territories&quot; can be phrased-up either as .3.</Paragraph> <Paragraph position="17"> &quot;~oliticall~ ( )&quot; ~ Canada is &quot;divide~&quot;into ' ten &quot;Province6 &quot;and 'two ~errltorie's TM or as ~olitically ( ) 'Canada is &quot;divided into 'ten ~province8 and 'two ~territories.</Paragraph> <Paragraph position="18"> The &quot;quatrain&quot; form into which this falls proves to be very frequent, particularly at the bUinning of a passage. This passage continues in two more quatrains: 'Each+province is ~sovereign in its ' own &quot;sphere and 'administers its ~own 'natural ~resource8, and upon 'such &quot;resources as 'related to ~topography, ' position and &quot;clilate i8 'based the &quot;economy/of/the/province.</Paragraph> <Paragraph position="19"> A straightforward text of this kind offers if not a word. for-word, at least something like a phrasing-forphrasing possibility in translation. But the translation correspondence, for French for example, is often not direct but expanded (e.g. 2 or more French for 1 English), or transposed in order. Apart from these ocnsiderations, there are many cases in which the phrasing structure resolves syntactic or semantic uncertainty. Here is a case where the lack of such a means of segmentation led to a serious mlstranslation: It 'may be *assumed that an 'international ~force on a 'standby ~basis will ' take+shape as a * development out of 'practice which has already &quot;begun.</Paragraph> <Paragraph position="20"> The published translation has turned the last two lines into &quot;prendra une for~e assez singuli~re, ce qu'elle a d6Jh coneno6 h faire&quot;.</Paragraph> <Paragraph position="21"> 1 1. Passages of text An various styles and of various lengths have been analyse~ by hand, and show a consistent tendency for this~hythm to be found. There may be physiological reasons for this. Neurological studieseshow persistence of tone and rhythm in cases where normal articulation is impaired1 1. ~ood reasons for this rhythm to be binary include the fact that the *For neurological literature I am indebted to Dr.</Paragraph> <Paragraph position="22"> Violet MacDermot.</Paragraph> <Paragraph position="23"> .4.</Paragraph> <Paragraph position="24"> rhythm of the motk~Ms heart-beat is present even to the unborn child, and the in/out rhythm of respiration and the left/right rhythm of walking are basic to h~an life in general. Studies in articulatory phonetics support the belief that some form of kinaesthetic activity is involved in silent reading, as well as in listening to live speech, which is why we can legitimately refer to &quot;the rhythm of the prose&quot; in spite of the lack, up to the present, of acoustic instrumental documentation of this.</Paragraph> <Paragraph position="25"> 12. Though intonation supplies the contour on which the phrasing is founded, the rhyth~of stress is the more essential factor. As Tibbitts '~ sayss &quot;The correct basic stressing is mandator~ while the intonation is variable within as yet undefined limits&quot;. This is the reason wh~ She phrasing hypothesis is unaffected by differences o~ dialect or accent. The question of isochromicity in English prose has a literature str~tchin~ back to Joshua Steele in 1775, through Coventry Patmore in 1856, and on to its thorough experimental .. (though not instrumental) examination by AndrT~Classe in 1939 and discussion by Abercrombie in 1951 o. There is evidence for at least a strong tendency towards a normal regular periodicity of stress-points. Our observations suggest that a speaker tends to select and order his words so as to distribute them about these pulsations of stress in such a way that points of emphasis fall naturally upon them.</Paragraph> <Paragraph position="26"> 13. The question of whether the phrasing can be equally well observed in languages other than English is not included in the present paper, except by the observation that when parallel texts in English and ~rench are analysed in this way, the French equivalent of the English phrasing, as clearly delimited by the French nuclear tone (and notwithstanding the difference bT~ tween a syllable-timed and a stress-t~ed language ) supplies a form of &quot;translation unit &quot;'l withl~ measurable rate of correspondence with the English .</Paragraph> <Paragraph position="27"> 13. Examination of given phrasings in a text of 377 phrasings a followed by another of over 900 phrasings, led Dolby'9 to say: &quot;Phrasing length, as measured by the number of syllables, appears to be a reasonably behaved statistic when viewed in isolation with routine statistical tools&quot;. (See Appendix I) 14. A method of observing the phonological configuration of phrasings is to turn written text into spoken prose on magnetic tape, pass this through a suitable pitch detector and intensity detector (such as that of .5.</Paragraph> <Paragraph position="28"> the University of Grenoble or the University of Copenhagen), and record the result on mlngograph scrolls. Research now being started at C.L.R.U. is comparing the output of these two sets of apparatus with that of apparatus developed in England, with a view to finding the best selection of acoustic data by which to observe the terminal point of the phrasing (frequently a steep fall or rise in pitch), and the two stress-points as peaks of frequency-plus-amplitude-plusduration. null 15. An extension of the usefulness of this unit of segmentation can be seen in algorithmic production by computer of a form of phrasing, based on observation of the criteria used in making articulatory p~asings.</Paragraph> <Paragraph position="29"> This has beeh done at 0.L.R.U. by J.E. Dobson=Vin a form which while not in every single case identical with hand-marked phrasings nevertheless provides a new and operational segmentation of continuous text.</Paragraph> <Paragraph position="30"> As part of the work done under contract to the National Research Council of Canada, this programme is now being applied to the phrasing of a text of 20,000 words from the 0~uada Year Book of 1962.</Paragraph> <Paragraph position="31"> 16. The normal rhythmical stress can also be provided algorithmically. This makes possible a computerized ordering of the phrasings of a text alphabetically according to four different valuations, i.e.</Paragraph> <Paragraph position="32"> (i) the primary Snuclear) stress! (ii) the secondary (=head) stress, (ill)pendants (= unstressed strings attached) to primary stress; (iv) pendants (= unstressed strings attached) to secondary stress.</Paragraph> <Paragraph position="33"> This gives a semantic concordance (called SE~O) from which statistical and other information can be derived. The computer can process text in this way as it could not do using the sentence as a unit, and both more economically and with more information than it could by merely cutting the text into lines of the length of the computer print-out.</Paragraph> <Paragraph position="34"> 17. The patterning of stressed and unstressed words, i.e. of stress-points and unstressed words can be expressed as a calculus of ordered pairs, on which research is proceeding.</Paragraph> <Paragraph position="36"/> </Section> class="xml-element"></Paper>