Papers in Physics, vol. 5, art. 050004 (2013) Received: 2 March 2013, Accepted: 5 June 2013 Edited by: G. Mindlin Licence: Creative Commons Attribution 3.0 DOI: http://dx.doi.org/10.4279/PIP.050004 www.papersinphysics.org ISSN 1852-4249 Revisiting the two-mass model of the vocal folds M. F. Assaneo,1∗ M. A. Trevisan1† Realistic mathematical modeling of voice production has been recently boosted by ap- plications to different fields like bioprosthetics, quality speech synthesis and pathological diagnosis. In this work, we revisit a two-mass model of the vocal folds that includes accu- rate fluid mechanics for the air passage through the folds and nonlinear properties of the tissue. We present the bifurcation diagram for such a system, focusing on the dynamical properties of two regimes of interest: the onset of oscillations and the normal phonation regime. We also show theoretical support to the nonlinear nature of the elastic properties of the folds tissue by comparing theoretical isofrequency curves with reported experimental data. I. Introduction In the last decades, a lot of effort was devoted to de- velop a mathematical model for voice production. The first steps were made by Ishizaka and Flana- gan [1], approximating each vocal fold by two cou- pled oscillators, which provide the basis of the well known two-mass model. This simple model repro- duces many essential features of the voice produc- tion, like the onset of self sustained oscillation of the folds and the shape of the glottal pulses. Early analytical treatments were restricted to small amplitude oscillations, allowing a dimen- sional reduction of the problem. In particular, a two dimensional approximation known as the flap- ping model was widely adopted by the scientific community, based on the assumption of a transver- sal wave propagating along the vocal folds [2, 3]. Moreover, this model was also used to successfully ∗E-mail: florencia@df.uba.ar †E-mail: marcos@df.uba.ar 1 Laboratorio de Sistemas Dinámicos, Depto. de F́ısica, FCEN, Universidad de Buenos Aires. Pabellón I, Ciudad Universitaria, 1428EGA Buenos Aires, Argentina. explain most of the features present in birdsong [4, 5]. Faithful modeling of the vocal folds has recently found new challenges: realistic articulatory speech synthesis [6–8], diagnosis of pathological behavior of the folds [9, 10] and bioprosthetic applications [11]. Within this framework, the 4-dimensional two-mass model was revisited and modified. Two main improvements are worth noting: a realistic description of the vocal fold collision [13,14] and an accurate fluid mechanical description of the glottal flow, allowing a proper treatment of the hydrody- namical force acting on the folds [8, 15]. In this work, we revisit the two-mass model de- veloped by Lucero and Koenig [7]. This choice rep- resents a good compromise between mathematical simplicity and diversity of physical phenomena act- ing on the vocal folds, including the main mechani- cal and fluid effects that are partially found in other models [13, 15]. It was also successfully used to reproduce experimental temporal patterns of glot- tal airflow. Here, we extend the analytical study of this system: we present a bifurcation diagram, explore the dynamical aspects of the oscillations at the onset and normal phonation and study the 050004-1 Papers in Physics, vol. 5, art. 050004 (2013) / M. F. Assaneo et al. isofrequency curves of the model. This work is organized as follows: in the second section, we describe the model. In the third sec- tion, we present the bifurcation diagram, compare our solutions with those of the flapping model ap- proximation and analyze the isofrecuency curves. In the fourth and last section, we discuss our re- sults. II. The model Each vocal fold is modeled as two coupled damped oscillators, as sketched in Fig. 1. Figure 1: Sketch of the two-mass model of the vo- cal folds. Each fold is represented by masses m1 and m2 coupled to each other by a restitution force kc and to the laryngeal walls by K1 and K2 (and dampings B1 and B2), respectively. The displace- ment of each mass from the resting position x0 is represented by x1 and x2. The different aerody- namic pressures P acting on the folds are described in the text. Assuming symmetry with respect to the saggital plane, the left and right mass systems are identical (Fig. 1) and the equation of motion for each mass reads ẋi = yi (1) ẏi = 1 mi [fi −Ki(xi) −Bi(xi,yi) −kc(xi −xj)] , for i,j = 1 or 2 for lower and upper masses, re- spectively. K and B represent the restitution and damping of the folds tissue, f the hydrodynamic force, m is the mass and kc the coupling stiffness. The horizontal displacement from the rest position x0 is represented by x. We use a cubic polynomial for the restitution term [Eq. (2)], adapted from [1, 7]. The term with a derivable step-like function Θ [Eq. (3)] accounts for the increase in the stiffness introduced by the collision of the folds. The restitution force reads Ki(xi) = kixi(1 + 100xi 2) (2) + Θ ( xi + x0 x0 ) 3ki(xi + x0)[1 + 500(xi + x0) 2 ], with Θ(x) = { 0 if x ≤ 0 x2 8 10−4+x2 if x > 0 , (3) where x0 is the rest position of the folds. For the damping force, we have adapted the ex- pression proposed in [7], making it derivable, arriv- ing at the following equation: Bi(xi) = (4)[ 1 + Θ ( xi + x0 x0 ) 1 �i ] ri(1 + 850xi 2)yi, where ri = 2�i √ kimi, and �i is the damping ratio. In order to describe the hydrodynamic force that the airflow exerts on the vocal folds, we have adopted the standard assumption of small inertia of the glottal air column and the model of the bound- ary layer developed in [7, 11, 15]. This model as- sumes a one-dimensional, quasi-steady incompress- ible airflow from the trachea to a separation point. At this point, the flow separates from the tissue surface to form a free jet where the turbulence dis- sipates the airflow energy. It has been experimen- tally shown that the position of this point depends on the glottal profile. As described in [15], the separation point located at the glottal exit shifts down to the boundary between masses m1 and m2 when the folds profile becomes more divergent than a threshold [Eq. (7)]. Viscous losses are modeled according to a bi- dimensional Poiseuille flow [Eqs. (6) and (7)]. The equations for the pressure inside the glottis are 050004-2 Papers in Physics, vol. 5, art. 050004 (2013) / M. F. Assaneo et al. Pin = Ps + ρu2g 2a21 , (5) P12 = Pin − 12µugd1l 2 g a31 , (6) P21 = { 12µugd2l 2 g a32 + Pout if a2 > ksa1 0 if a2 ≤ ksa1 , (7) Pout = 0. (8) As sketched in Fig. 1, the pressures exerted by the airflow are: Pin at the entrance of the glottis, P12 at the upper edge of m1, P21 at the lower edge of m2, Pout at the entrance of the vocal tract and Ps the subglottal pressure. The width of the folds (in the plane normal to Fig. 1) is lg; d1 and d2 are the lengths of the lower and upper masses, respectively. ai are the cross- sections of the glottis, ai = 2lg(xi + x0); µ and ρ are the viscosity and density coefficient of the air; ug is the airflow inside the glottis, and ks = 1.2 is an experimental coefficient. We also assume no losses at the glottal entrance [Eq. (5)], and zero pressure at the entrance of the vocal tract [Eq. (8)]. The hydrodynamic force acting on each mass reads: f1 = { d1lgPs if x1 ≤−x0 or x2 ≤−x0 Pin+P12 2 in other case (9) f2 =   d2lgPs if x1 > −x0 and x2 ≤−x0 0 if x1 ≤−x0 P21+Pout 2 in other case (10) Following [1, 7, 10], these functions represent opening, partial closure and total closure of the glottis. Throughout this work, piecewise functions P21, f1 and f2 are modeled using the derivable step- like function Θ defined in Eq. (3). III. Analysis of the model i. Bifurcation diagram The main anatomical parameters that can be ac- tively controlled during the vocalizations are the subglottal pressure Ps and the folds tension con- trolled by the laryngeal muscles. In particular, the action of the thyroarytenoid and the cricothyroid muscles control the thickness and the stiffness of folds. Following [1], this effect is modeled by a pa- rameter Q that scales the mechanic properties of the folds by a cord-tension parameter: kc = Qkc0, ki = Qki0 and mi = mi0 Q . We therefore performed a bifurcation diagram using these two standard con- trol parameters Ps and Q. Five main regions of different dynamic solutions are shown in Fig. 2. At low pressure values (re- gion I), the system presents a stable fixed point. Reaching region II, the fixed point becomes un- stable and there appears an attracting limit cycle. At the interface between regions I and II, three bifurcations occur in a narrow range of subglot- tal pressure (Fig. 3, left panel), all along the Q axis. The right panel of Fig. 3 shows the oscilla- tion amplitude of x2. At point A, oscillations are born in a supercritical Hopf bifurcation. The am- plitude grows continuously for increasing Ps until point B, where it jumps to the upper branch. If the pressure is then decreased, the oscillations persist even for lower pressure values than the onset in A. When point C is reached, the oscillations suddenly stop and the system returns to the rest position. This onset-offset oscillation hysteresis was already reported experimentally in [12]. The branch AB depends on the viscosity. De- creasing µ, points A and B approach to each other until they collide at µ = 0, recovering the result re- ported in [3, 10, 14], where the oscillations occur as the combination of a subcritical Hopf bifurcation and a cyclic fold bifurcation. On the other hand, the branch BC depends on the separation point of the jet formation. In par- ticular, for increasing ks, the folds become stiffer and the separation point moves upwards toward the output of the glottis. From a dynamical point of view, points C and B approach to each other until they collapse. In this case, the oscillations are born at a supercritical Hopf bifurcation and the system presents no hysteresis, as in the standard flapping model [17]. Regions II and III of Fig. 2 are separated by a saddle-repulsor bifurcation. Although this bifur- cation does not represent a qualitative dynamical change for the oscillating folds, its effects are rele- vant when the complete mechanism of voiced sound 050004-3 Papers in Physics, vol. 5, art. 050004 (2013) / M. F. Assaneo et al. Figure 2: Bifurcation diagram in the plane of sub- glottal pressure and fold tension (Q,Ps). The in- sets are two-dimensional projections of the flow on the (v1,x1) plane, the red crosses represent unsta- ble fixed points and the dotted lines unstable limit cycles. Normal voice occurs at (Q,Ps) ∼ (1, 800). The color code represents the linear correlation be- tween (x1 − x2) and (y1 + y2): from dark red for R = 1 to dark blue for R = 0.6. This diagram was developed with the help of AUTO continuation software [20]. The rest of the parameters were fixed at m1 = 0.125 g, m2 = 0.025 g, k10 = 80 N/m, k20 = 8 N/m, kc = 25 N/m, �1 = 0.1, �2 = 0.6, lg = 1.4 cm, d1 = 0.25 cm, d2 = 0.05 cm and x0 = 0.02 cm. production is considered. Voiced sounds are gener- ated as the airflow disturbance produced by the oscillation of the vocal folds is injected into the se- ries of cavities extending from the laryngeal exit to the mouth, a non-uniform tube known as the vocal tract. The disturbance travels back and forth along the vocal tract, that acts as a filter for the origi- nal signal, enhancing the frequencies of the source that fall near the vocal tract resonances. Voiced sounds are in fact perceived and classified accord- ing to these resonances, as in the case of vowels [18]. Consequently, one central aspect in the generation Figure 3: Hysteresis at the oscillation onset-offset. Left panel: zoom of the interface between regions I and II. The blue and green lines represent folds of cycles (saddle-node bifurcations in the map). The red line is a supercritical Hopf bifurcation. Right panel: the oscillation amplitude of x2 as a function of the subglottal pressure Ps, at Q = 1.71. The continuation of periodic solutions was realized with the AUTO software package [20]. of voiced sounds is the production of a spectrally rich signal at the sound source level. Interestingly, normal phonation occurs in the re- gion near the appearance of the saddle-repulsor bi- furcation. Although this bifurcation does not al- ter the dynamical regime of the system or its time scales, we have observed that part of the limit cy- cle approaches the stable manifold of the new fixed point (as displayed in Fig. 4), therefore changing its shape. This deformation is not restricted to the appearance of the new fixed point but rather oc- curs in a coarse region around the boundary be- tween II and III, as the flux changes smoothly in a vicinity of the bifurcation. In order to illustrate this effect, we use the spectral content index SCI [21], an indicator of the spectral richness of a sig- nal: SCI = ∑ k Akfk/( ∑ k Akf0), where Ak is the Fourier amplitude of the frequency fk and f0 is the fundamental frequency. As the pressure is in- creased, the SCI of x1(t) increases (upper right panel of Fig. 4), observing a boost in the vicinity of the saddle-repulsor bifurcation that stabilizes after the saddle point is generated. Thus, the appearance of this bifurcation near the region of normal phonation could indicate a possi- ble mechanism to further enhance the spectral rich- ness of the sound source, on which the production of voiced sounds ultimately relies. 050004-4 Papers in Physics, vol. 5, art. 050004 (2013) / M. F. Assaneo et al. Figure 4: A projection of the limit cycle for x1 and the stable manifold of the saddle point, for param- eters consistent with normal phonatory conditions, (Q,Ps) = (1, 850) (region III). Left inset: projec- tion in the 3-dimensional space (y1, x1, x2). Right inset: Spectral content index of x1(t) as a function of Ps for a fixed value of Q = 0.95. In green, the value at which the saddle-repulsor bifurcation takes place. In the boundary between regions III and IV, one of the unstable points created in the saddle-repulsor bifurcation undergoes a subcritical Hopf bifurca- tion, changing stability as an unstable limit cycle is created [19]. Finally, entering region V, the sta- ble and the unstable cycles collide and disappear in a fold of cycles where no oscillatory regimes exist. In Fig. 2, we also display a color map that quan- tifies the difference between the solutions of the model and the flapping approximation. The flap- ping model is a two dimensional model that, instead of two masses per fold, assumes a wave propagating along a linear profile of the folds, i.e., the displace- ment of the upper edge of the folds is delayed 2τ with respect to the lower. The cross sectional areas at glottal entry and exit (a1 and a2) are approxi- mated, in terms of the position of the midpoint of the folds, by { a1 = 2lg(x0 + x + τẋ) a2 = 2lg(x0 + x− τẋ) , (11) where x is the midpoint displacement from equilib- rium x0, and τ is the time that the surface wave takes to travel half the way from bottom to top. Equation (11) can be rewritten as (x1 − x2) = τ(y1 + y2). We use this expression to quantify the difference between the oscillations obtained with the two-mass model solutions and the ones gener- ated with the flapping approximation, computing the linear correlation coefficient between (x1 −x2) and (y1 + y2). As expected, the correlation coeffi- cient R decreases for increasing Ps or decreasing Q. In the region near normal phonation, the approx- imation is still relatively good, with R ∼ 0.8. As expected, the approximation is better for increasing x0, since the effect of colliding folds is not included in the flapping model. ii. Isofrequency curves One basic perceptual property of the voice is the pitch, identified with the fundamental frequency f0 of the vocal folds oscillation. The production of different pitch contours is central to language, as they affect the semantic content of speech, carry- ing accent and intonation information. Although experimental data on pitch control is scarce, it was reported that it is actively controlled by the laryn- geal muscles and the subglottal pressure. In par- ticular, when the vocalis or interarytenoid muscle activity is inactive, a raise of the subglottal pres- sure produces an upraising of the pitch [16]. Compatible with these experimental results, we performed a theoretical analysis using Ps as a sin- gle control parameter for pitch. In the upper pan- els of Fig. 5, we show isofrequency curves in the range of normal speech for our model of Eqs. (1) to (10). Following the ideas developed in [22] for the avian case, we compare the behavior of the fundamental frequency with respect to pressure Ps in the two most usual cases presented in the lit- erature: the cubic [1, 7] and the linear [10, 14] restitutions. In the lower panels of Fig. 5, we show the isofrequency curves that result from re- placing the cubic restitution by a linear restitution Ki(xi) = kixi + Θ( xi+x0 x0 )3ki(xi + x0). Although the curves f0(Ps) are not affected by the type of restitution at the very beginning of os- cillations, the changes become evident for higher values of Ps, with positive slopes for the cubic case and negative for the linear case. This result sug- gests that a nonlinear cubic restitution force is a 050004-5 Papers in Physics, vol. 5, art. 050004 (2013) / M. F. Assaneo et al. Figure 5: Relationship between pitch and restitu- tion forces. Left panels: isofrequency curves in the plane (Q,Ps). Right panels: Curves f0(Ps) for Q=0.9, Q=0.925 and Q=0.95. In the upper panels, we used the model with the cubic nonlin- ear restitution of Eq. (2). In the lower panels, we show the curves obtained with a linear restitution, Ki(xi) = kixi + Θ( xi+x0 x0 )3ki(xi + x0). good model for the elastic properties of the oscil- lating tissue. IV. Conclusions In this paper, we have analyzed a complete two- mass model of the vocal folds integrating collisions, nonlinear restitution and dissipative forces for the tissue and jets and viscous losses of the air-stream. In a framework of growing interest for detailed modeling of voice production, the aspects studied here contribute to understanding the role of the different physical terms in different dynamical be- haviors. We calculated the bifurcation diagram, focusing in two regimes: the oscillation onset and normal phonation. Near the parameters of normal phona- tion, a saddle repulsor bifurcation takes place that modifies the shape of the limit cycle, contributing to the spectral richness of the glottal flow, which is central to the production of voiced sounds. With respect to the oscillation onset, we showed how jets and viscous losses intervene in the hysteresis phe- nomenon. Many different models for the restitution prop- erties of the tissue have been used across the liter- ature, including linear and cubic functional forms. Yet, its specific role was not reported. Here we showed that the experimental relationship between subglottal pressure and pitch is fulfilled by a cubic term. Acknowledgements - This work was partially funded by UBA and CONICET. [1] K Ishizaka, J L Flanagan, Synthesis of voiced sounds from a two-mass model of the vocal cords, Bell Syst. Tech. J. 51, 1233 (1972). [2] I R Titze, The physics of smallamplitude os- cillation of the vocal folds, J. Acoust. Soc. Am. 83, 1536 (1988). [3] M A Trevisan, M C Eguia, G Mindlin, Nonlin- ear aspects of analysis and synthesis of speech time series data, Phys. Rev. E 63, 026216 (2001). [4] Y S Perl, E M Arneodo, A Amador, F Goller, G B Mindlin, Reconstruction of physiological instructions from Zebra finch song, Phys. Rev. E 84, 051909 (2011). [5] E M Arneodo, Y S Perl, F Goller, G B Mindlin, Prosthetic avian vocal organ controlled by a freely behaving bird based on a low dimensional model of the biomechanical periphery, PLoS Comput. Biol. 8, e1002546 (2012). [6] B H Story, I R Titze Voice simulation with a bodycover model of the vocal folds, J. Acoust. Soc. Am. 97, 1249 (1995). [7] J C Lucero, L Koening Simulations of tempo- ral patterns of oral airflow in men and women using a two-mass model of the vocal folds un- der dynamic control, J. Acoust. Soc. Am. 117, 1362 (2005). [8] X Pelorson, X Vescovi, C Castelli, E Hirschberg, A Wijnands, A P J Bailliet, H M 050004-6 Papers in Physics, vol. 5, art. 050004 (2013) / M. F. Assaneo et al. A Hirschberg, Description of the flow through in-vitro models of the glottis during phonation. Application to voiced sounds synthesis, Acta Acust. 82, 358 (1996). [9] M E Smith, G S Berke, B R Gerratt, Laryn- geal paralyses: Theoretical considerations and effects on laryngeal vibration, J. Speech Hear. Res. 35, 545 (1992). [10] I Steinecke, H Herzel Bifurcations in an asym- metric vocalfold model, J. Acoust. Soc. Am. 97, 1874 (1995). [11] N J C Lous, G C J Hofmans, R N J Veldhuis, A Hirschberg, A symmetrical two-mass vocal-fold model coupled to vocal tract and trachea, with application to prosthesis design, Acta Acust. United Ac. 84, 1135 (1998). [12] T Baer, Vocal fold physiology , University of Tokyo Press, Tokyo, (1981). [13] T Ikeda, Y Matsuzak, T Aomatsu, A nu- merical analysis of phonation using a two- dimensional flexible channel model of the vocal folds, J. Biomech. Eng. 123, 571 (2001). [14] J C Lucero, Dynamics of the two-mass model of the vocal folds: Equilibria, bifurcations, and oscillation region, J. Acoust. Soc. Am. 94, 3104 (1993). [15] X Pelorson, A Hirschberg, R R van Hassel, A P J Wijnands, Y Auregan, Theoretical and ex- perimental study of quasisteadyflow separation within the glottis during phonation. Applica- tion to a modified twomass model, J. Acoust. Soc. Am. 96, 3416 (1994). [16] T Baer, Reflex activation of laryngeal muscles by sudden induced subglottal pressure changes, J. Acoust. Soc. Am. 65, 1271 (1979). [17] J C Lucero, A theoretical study of the hystere- sis phenomenon at vocal fold oscillation onset- offset, J. Acoust. Soc. Am. 105, 423 (1999). [18] I Titze, Principles of voice production, Pren- tice Hall, (1994). [19] J Guckenheimer, P Holmes, Nonlinear oscil- lations, dynamical systems and bifurcations of vector fields, Springer, (1983). [20] E Doedel, AUTO: Software for continuation and bifurcation problems in ordinary differen- tial equations, AUTO User Manual, (1986). [21] J Sitt, A Amador, F Goller, G B Mindin, Dy- namical origin of spectrally rich vocalizations in birdsong, Phys. Rev. E 78, 011905 (2008). [22] A Amador, F Goller, G B Mindlin, Frequency modulation during song in a suboscine does not require vocal muscles, J. Neurophysiol. 99, 2383 (2008). 050004-7