ANATOMICAL AND SPECTROGRAPHIC ANALYSIS OF THE VOICE IN DISEASE: A REPORT OF FIVE CASES W.A. K E R R , M.R.C.S., D.L.O., Department of Otorhinolaryngology, Johannesburg Hospital. (Head: D.R. Haynes) and L.W. L A N H A M , PH.D. Head, Department of Phonetics & General Linguistics, University of the Witwatersrand, Johannesburg SUMMARY Five cases are presented. One is a case of ventricular p h o n a t i o n of iatrogenic origin and the remaining four had undergone l a r y n g e c t o m y for carcinoma of the larynx. Points of interest are discussed, particularly the constant ventri- cular fold p h o n a t i o n of the first case and t h e clear harmonic structure present in the voice of o n e of the l a r y n g e c t o m y cases w h o has b o t h esophageal speech and pharyngeal p h o n a t i o n . OPSOMMING V y f gevalle w o r d voorgele. Die een is 'n geval van ventrikulere fonasie van iatrogeniese oorsprong, terwyl die o o r b l y w e n d e vier gevalle laringektomie ondergaan het vir karsinoma van die larinks. A s p e k t e van belang w o r d bespreek, veral die t e e n w o o r d i g h e i d van harmoniese struktuur in die stem van die eerste geval m e t ventrikulere fonasie, a s o o k in die stem van een van die laringekto- mie-gevalle wat b e i d e esofageale en faringeale fonasie gebruik. The authors do know that much has been written on this subject and that a comprehensive reference to the many authors would be impossible in this article. But they have recorded some publications in the list of references and acknowledge the work of such authorities as: Negus, C. and C.L. Jackson, Kallen, Bateman, Kirchner, Arnold, Huizinga, van den Berg, Moore, Norris, Conley, Tapia, Tato, Moolenaar Bijl and the Bell Telephone Company. This paper is comprised of two sections - first (Part I), the discussion of the spectrographic evidence, second (Part II), the anatomical analysis and case histories. Editor's note: We gratefully acknowledge a grant-in-aid for the publication of this article from The National Cancer Association of South Africa. Tydskrif van die Suid-Afrikaanse Vereniging vir Spraak- en ehoorheelkunde, Vol. 20, Desember 1973 R ep ro du ce d by S ab in et G at ew ay u nd er li ce nc e gr an te d by th e P ub lis he r (d at ed 2 01 2) 82 W.A. Kerr and L.W. Lanham- LLL-λλ- 1 J Journal of the South African Speech and Hearing Association, Vol. 20, December 1973 R ep ro du ce d by S ab in et G at ew ay u nd er li ce nc e gr an te d by th e P ub lis he r (d at ed 2 01 2) Anatomical and Spectrograph Analysis of the Voice in Disease 83 It is hoped that the association of the disciplines of Acoustic Phonetics and Otorhinolaryngology, as well as the method of presentation, may be of inter- est in South Africa; and, if so, this report will be a fitting tribute to Professor P. de V. Pienaar who has worked for many years in this country in Speech Therapy on the foundations of Phonetics. PART I THE PHYSICAL ANALYSIS OF THE DIFFERENT "VOICES". "Voice" refers to the manner in which the upper vocal tract is made to func- tion as a resonating system bringing out the distinctive qualities of vowels and vowel-like speech sounds (i.e. oral and nasal resonants such as 1 and r, m and n), whose qualities depend on the resonances of the vocal tract. In this section of the paper the acoustic properties of the different voices are analysed in an attempt to correlate this analysis with states of the esophageal sphincter, the pharynx and the organs of normal speech. The acoustic properties are ex- tracted by means of a spectrograph^ analysis. The measurement of pharyn- geal pressure is resorted to in one case in order to confirm pharyngeal con- striction. In physical properties each of the voices of the five cases are different in their own way and are examined in the order of, first, the case of ventricular phona- tion followed by the four cases of esophageal speech including Case D, an interesting case who has two distinct "voices". The aim of this analysis is to identify features which may contribute to a classificatory framework of voice without vocal folds, and to an understanding of some of the compensatory mechanisms involved in producing intelligible speech. SPECTROGRAPHIC ANAL YSIS OF SPEECH SOUNDS The Kay Sonagraph Model 6061-B was used to provide spectrograms showing the acoustic properties of the speech sounds made by our five cases. The spectrograms show frequency on the vertical axis from 0-8 KHz., amplitude in varying degrees of darkness in lines and smudges made by the stylus of the Sonagraph, and frequency/amplitude changes in time on the horizontal axis (12,33 cm = 1 sec.). The Sonagraph provides fine-grained analysis on the ver- tical axis in a narrow-band display in which the instrument registers conflated intensity in bandwidths of 45 cps; variations in frequency over very brief inter- vals are therefore registered separately. The wide-band display conflates inten- sity over bandwidths of 300 cps. Wide-band analysis provides fine-grained analysis in the time dimension. The analysis of variations in time in narrow- band spectrograms is coarse, as are variations in frequency in broad-band dis- plays. In the analysis of speech sounds which depend on the glottal note, a narrow- band spectrogram highlights harmonic structure. The spectrogram marked Normal II shows a harmonic structure with a fundamental of approximately 120 cps in the region; of 19 to 20 on the horizontal cm scale. The resonances of the supra-glottal vocal tract amplify, selectively, harmonics in the glottal- note and these amplified harmonics are seen to be darker and thicker at a/b Tydskrif van die Suid-Afrikaanse Vereniging vir Spraak- en Gehoorheelkunde, Vol. 20, Desember 1973 R ep ro du ce d by S ab in et G at ew ay u nd er li ce nc e gr an te d by th e P ub lis he r (d at ed 2 01 2) 84 W.A. Kerr and L.W. Lanham WHISPER I "(wide) I i 12 13 I "» , I (1 3! her, (high pitch) 5 , , a , , , , 0 I 11 ι \ ζ ι i 3 ι 11 . 1 : h her (1ow pi tch) WHISPER II (narrow) 11 12 13 4 I s i ii I / « a I 3 I 1 ο I 1 1 > i 2 ' 1 3 I 1 => ι ! ̂ ί 1 S. I 1 7 I 1G 1 i Ο 1 Journal of the South African Speech and Hearing Association, Vol. 20, December 1973 R ep ro du ce d by S ab in et G at ew ay u nd er li ce nc e gr an te d by th e P ub lis he r (d at ed 2 01 2) Anatomical and Spectrograph Analysis of the Voice in Disease 85 and c in the vertical scale. Resonance bars appear much more clearly at equi- valent points on the wide-band Normal I spectrogram. Wide-band spectro- grams highlight resonance properties on which the distinctive quality of vowels and resonant consonants depend and such resonance properties are the pro- ducts of the cavities of the supra-glottal vocal tract. Wide-band displays obli- terate harmonic structure. Resonance bars are normally termed "formants" and the formant structure for the vowel [3:] of Ken emerges as F1 = 550, F2 = 1340, F3 = 2250 (approximate centre frequencies of lowest three for- mants). Fine-structure analysis revealing the pulses of the glottal note shows in wide-band spectrograms in the thin parallel vertical lines rising vertically through most of the visible frequency scale where vowels are articulated. Each of the highly regular (in time and amplitude) pressure pulses from the vibrat- ing vocal folds is shown by a vertical line which also thickens and darkens as it enters the bandwidth of a formant. Glottal pulses up to 500 per second are discernible in wide-band spectrograms. Aperiodic noise, at whatever point in the vocal tract it is created, shows as striations in wide-band displays: i.e. as irregular vertical lines of varying lengths as seen at Normal I 10. Here the consonant [s] is seen as randomly distributed energy concentrated mainly above 3.5 Khz which, in time, has a relatively gradual onset and continues for approximately 16 csecs. Noise in the form of a burst, i.e. instantaneous onset and rapid decay is seen at Case AI 7 corresponding to the release of the [t] of to. Aperiodic high-intensity noise is also clearly shown by Whisper I where random, irregular frequency/ amplitude components are seen to be amplified by resonances; compare the ill-defined, but discernible, Formant 2 of [3:] in her in Whisper I and [3:] of Ken in Normal I. An aperiodic noise component in narrow-band spectrograms appears as a horizontally elongated smudge rather than a thin vertical line. The representation of noise in the two different spectrographs samplings are clearly shown by Whisper I (wide band) and Whisper II (narrow band). The means of identifying areas on the spectrograms presented here is in terms of coordinates on the horizontal and vertical scales. In illustration, the square box in the centre of spectrogram Whisper I is 7 to 8 - g to j. The title over each spectrogram should be interpreted thus: CASE DV Esophageal (wide) is a wide-band spectrogram identified by the number V and representing the voice of Case D who has esophageal voice. Note that V and VI are wide and narrow-band spectrograms of the same utterance. The means of interpreting the pharyngeal pressure graphs are discussed below. VOICE TYPES IN THIS STUDY The five cases range, in their speech, from high to relatively low levels of in- telligibility (with some fluctuation in individual cases) and the correlation between these differences with the different means of exciting the resonators is attempted below. Resonators function in response to an input in which a driving force is in- volved. In the human voice the driving force is always an air flow which passes a point of constriction and the resonators of the upper vocal tract can be ex- cited in three different ways. The air flow may be interrupted at the point of Tydskrif van die Suid-Afrikaanse Verenigingvir Spraak- en Gehoorheelkunde, Vol. 20, Desember 1973 R ep ro du ce d by S ab in et G at ew ay u nd er li ce nc e gr an te d by th e P ub lis he r (d at ed 2 01 2) 86 W.A. Ker and L.W. Lanham CASE All Ventricular (narrow) _ Ρ parents to see doctor Kerr Journal of the South African Speech and Hearing Association, Vol. 20, December 1973 R ep ro du ce d by S ab in et G at ew ay u nd er li ce nc e gr an te d by th e P ub lis he r (d at ed 2 01 2) Anatomical and Spectrograph Analysis of the Voice in Disease 87 constriction by rapidly alternating closed and open phases. In normal voice lung air is interrupted in highly regular and rhythmical vibratory cycles by the true vocal folds which are drawn medially and, in varying degrees, laterally, into a state of vibration. These vibrations are a consequence of the Bernoulli Effect in which an accelerating air flow through a narrowing channel sucks the elastic edges of the constrictor together. Closure brings a change in pressure at the point of constriction and is followed by rapid opening. The vibrations of normal phonation set up pressure pulses which are highly regular in time and amplitude and only vibrators having the properties of the vocal folds, violin strings, etc. can produce a stream of pressure pulses of this kind. These pulses create a periodic sound wave with an inherent harmonic structure, i.e. the energy in the vibrations is largely concentrated at points in the frequency scale which are multiples of the fundamental (the lowest frequency compo- nent). Very little turbulence occurs in this type of interrupted air flow even as the closed phase of the vibratory cyclc is approached and there is, therefore, little concomitant aperiodic noise. A vowel as a relatively "pure" note or tone is produced by resonators linked to a vibratory source of this kind. The reso- nators amplify harmonics set up by the glottal note which fall within their bandwidth and four clear resonance bars (formants) can be seen in wide-band spectrograms in the normal voice (wide Normal I 3 to 4 in the articulation of the first vowel of parents). To the extent that irregularities develop in time (frequency) and amplitude in the vibratory pattern of the vocal folds, the ear identifies harshness or roughness. A variation of frequency of as little as 1 cps can give rise to perceived roughness.24 Changes in the harmonic structure give rise to perceived differences in pitch (intonation). In the narrow-band spectrogram Normal II11 - a to i, a higher pitch corresponding to roughly 160 cps falls to a lower pitch of 123 cps at 21 - a to p. A resonating system can, however, also function by receiving impulses in the form of sharp raps or taps. The forefinger flicking the throat just above the superior edge of the right lamina of the thyroid produces a spectrogram such as that labelled Finger-flicks. Each rap sets up a noise burst of very brief duration with aperiodic noise properties in which energy is distributed over the 8 KHz. visible on the spectrogram. There is no inherent harmonic structure in the noise burst and even a rapid succession of such raps at rates of over 100 per second does not set up a harmonic structure. There is, therefore, no pos- sibility of frequency modulation which would be perceived as pitch variation or intonation, even if the rate of rapping is varied significantly.* The vocal tract functions as a resonating system in response to the "rapping" input by amplifying frequencies in the noise bursts falling within the band- width of the resonators. The energy at such amplified frequencies decays re- latively slowly and the emergent formants can be discerned at Finger-flicks 1 and 2 - a and b as horizontally extended smudges. A third driving force for exciting resonators is continuous turbulence causing * Our discussion implicitly rejects S c r i p t u r e ' s 1 9 ( 1 9 0 6 ) t h e o r y that no over- t o n e s emanate from glottal vibrations, o n l y air puffs w h i c h cause the reso- nators t o sound w i t h their o w n frequencies. Tydskrif van die Suid-Afrikaanse Vereniging vir Spraak- en Gehoorheelkunde, Vol. 20, Desember 1973 R ep ro du ce d by S ab in et G at ew ay u nd er li ce nc e gr an te d by th e P ub lis he r (d at ed 2 01 2) 88 W.A. Kerr and L.W. Lanham Ϊ Kha I B f f l J p . .̂l· . . . . h . , ; , , 16 1 7 I .1 I J 1 1 Ί Γΐ ll " I 1 •> ι < · ' 1 ' I 1 S > > < 1 15 «>· I k t J ι '* • o'h make her a home CASE CI I Esophageal (narrow) CASE CI 11 Esophageal (wide) Journal of the South African Speech and Hearing Association, Vol. 20, December 1973 R ep ro du ce d by S ab in et G at ew ay u nd er li ce nc e gr an te d by th e P ub lis he r (d at ed 2 01 2) Anatomical and Spectrographic Analysis of the Voice in Disease 89 friction of relatively high amplitude emanating from a point of narrow con- striction at which air escapes without any interruption of the air flow. Whis- pering is an excellent example of resonators functioning with this type of input. The point of constriction is a small V-shaped opening at the posterior end of the glottis and the vocal folds do not vibrate. Whisper I (wide-band, and II (narrow-band) show strong, transient, aperiodic noise components over the whole of the visible frequency scale and strong amplification of those fre- quencies falling in the bandwidths of resonators, in particular; see Formant 2 in Whisper II 11 to 14 - c and the corresponding area in the narrow-band dis- play of Whisper II. This is spectrographic evidence of the discernible quality differences of whispered vowels. Continuous turbulence is, for the human body as a sound producer, highly uneconomical in rapidly draining the air reservoir. For esophageal speech with its small air reservoir it is impracticable. One reason for discussing whisper here is that esophageal "burping" (see below) with a high level of concomitant friction, is seemingly far more effect- ive than esophageal "rapping" with little friction. A point of interest connected with whisper is that, although totally lacking in harmonic structure (see Whisper II), an auditory impression of pitch variation can be created by so altering the constriction at source that the energy is differently distributed in the frequency scale. The difference between "low- pitch whisper" and "high-pitch whisper" in Whisper I and II (representing a conscious effort by the normal voice recorded on these spectrograms) is that Formant I (at level a/b) and Formant 2 (at level c) have, respectively, less and more energy concentrated at these relatively low frequencies. Formant 3 (at e/f, i.e. 2.5 KHz) shows the approximate point where the low-high energy distribution becomes inverted. In esophageal "voice" air flow from a reservoir in the upper esophagus is in- terrupted by crude vibrations of the esophageal sphincter which alternatively opens and closes the esophageal lumen. In esophageal sphincter vibration the closure of the valve-like exit to the air reservoir is probably brought about by muscle tension after the opening caused by air pressure. If this is the case then this vibration is not a consequence of the Bernoulli Effect. Each opening re- leases a burst of noise into the resonating system at irregular, variable rates. Muscle tension and air pressure would regulate the rate of vibration. Case Β illustrates how close this type of vibration can come to "rapping" or "tapping"; compare Case BI 1 to 3 with Finger-flicks. Here rapping is seen to be irregular in time with an average rate of roughly 42 per second. Case CI 8 to 14 shows a somewhat more rapid, regular rate of rapping at approximately 44 per sec- ond. The corresponding sections on the narrow-band spectrograms BII, CII show a total absence of harmonic structure and the formants appear as smudges darkened in the vertical bars of the raps. Spectrographic evidence attests to the absence of frequency modulation in esophageal voice. A significant variation in esophageal sphincter vibration begins to show in comparing BI and II with CI and II (particularly 1 to 4) and is clearest when EI enters the comparison. The actual noise bursts recede in prominence and continuing aperiodic noise components over wide bandwidths are the main input to the resonators. The raps apparently smooth out into a succession of air puffs somewhat more rapid and regular, and lower in amplitude (hereafter Tydskrif van die Suid-Afrikaanse Vereniging vir Spraak- en ehoorheelkunde, Vol. 20, Desember 1973 R ep ro du ce d by S ab in et G at ew ay u nd er li ce nc e gr an te d by th e P ub lis he r (d at ed 2 01 2) 90 W.A. Kerr and L.W. Lanham CASE CIV Esophageal' (narrow) Journal of the South African Speech and Hearing Association, Vol. 20, December 1973 R ep ro du ce d by S ab in et G at ew ay u nd er li ce nc e gr an te d by th e P ub lis he r (d at ed 2 01 2) Anatomical and Spectrograph Analysis of the Voice in Disease 91 referred to as "burping" in contradistinction to "rapping"). The distinction between rapping and burping is significant because the voice of Case Β is very "croaky" and of low intelligibility; that of Case C is unpleasantly croaky, but of high intelligibility; that of Ε not at all croaky, breathy, but pleasant, and highly intelligible. Although our postulation as to the site of constriction which produces random aperiodic noise components is unconfirmed experimentally, we suggest that friction noise emanates from the esophageal sphincter where the manner in which the air flow is interrupted ranges over vibratory cycles with tight closure, high esophageal pressure and "clean" break on opening (rapping), to burping with weaker, probably incomplete, closure and considerable concomitant friction. A third possible state is a still weaker interruption with a more or less open lumen allowing a continuous air flow. This would seem to be the nature of the esophageal air mechanism in oral fricatives such as [s]. If turbulence is not a consequence of esophageal vibration then pharyngeal constriction is the next most likely source. Pharyngeal pressure measurements show that Case D uses this mechanism and provides evidence of a vibratory mechanism involving the pharynx. We are doubtful, however, whether Case C has pharyngeal constriction. Case C burps without embarrassment and makes no attempt to cover the esophageal voice by pharyngeal constriction. Com- pare spoken and sung make in CI, II at 5 to 6 and CIII, IV at 5 to 8. The pro- minence of the raps recedes as the vibrations speed up* and the quantity of random, aperiodic noise is greater. In Case Ε the significant absence of strong noise bursts in EI is worth investigating. Two significant dimensions of esophageal voice emerge from this discussion: (a) the manner in which the esophageal sphincter controls the release of air; (b) the presence and nature of pharyngeal constriction. In classifying cases of the types dealt with here it is useful to distinguish tHe three types of source input to the resonating system: Type X Pressure pulses with a harmonic structure and true frequency'modu- lation (friction noise may be present, but is a minor contributor). Type Y Vibrations of low frequency in the form of noise bursts lacking a harmonic structure (rapping and burping). Type Ζ Continuous, relatively uninterrupted friction noise. None of our cases is classified as Z, but experiments with a normal vocal tract show that pharyngeal constriction could set up a "voice" of this kind. Ob- viously there is overlap within this categorisation. Ζ can clearly overlap with X and Y;X and Y are, as a rule, mutually exclusive. The phonation of a normal voice is only Type X; a succession of glottal stops or catches cannot be produced at a rate fast enough to provide any real sem- blance of Type Y. Ventricular phonation in the one case discussed below, is * T h e superior-inferior branches of tne laryngea. nerve seem t o c o n v e y the same instruction t o the cricopharyngeus muscle as it did t o muscles of the larynx. Tydskrif van die Suid-Afrikaanse Vereniging vir Spraak- en Gehoorheelkunde, Vol. 20, Desember 1973 R ep ro du ce d by S ab in et G at ew ay u nd er li ce nc e gr an te d by th e P ub lis he r (d at ed 2 01 2) 92 W.A. Kerr and L.W. Lanham 'IN- CASE 01V Esophageal (narrow) Λ ί » — Λ - , age CASE bV "Esophageal («ΜδίΓ