Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jun;121(6):3770-89.
doi: 10.1121/1.2730621.

Time dependence of vocal tract modes during production of vowels and vowel sequences

Affiliations

Time dependence of vocal tract modes during production of vowels and vowel sequences

Brad H Story. J Acoust Soc Am. 2007 Jun.

Abstract

Vocal tract shaping patterns based on articulatory fleshpoint data from four speakers in the University of Wisconsin x-ray microbeam (XRMB) database [J. Westbury, UW-Madison, (1994)] were determined with a principal component analysis (PCA). Midsagittal cross-distance functions representative of approximately the front 6 cm of the oral cavity for each of 11 vowels and vowel-vowel (VV) sequences were obtained from the pellet positions and the hard palate profile for the four speakers. A PCA was independently performed on each speaker's set of cross-distance functions representing static vowels only, and again with time-dependent cross-distance functions representing vowels and VV sequences. In all cases, results indicated that the first two orthogonal components (referred to as modes) accounted for more than 97% of the variance in each speaker's set of cross-distance functions. In addition, the shape of each mode was shown to be similar across the speakers suggesting that the modes represent common patterns of vocal tract deformation. Plots of the resulting time-dependent coefficient records showed that the four speakers activated each mode similarly during production of the vowel sequences. Finally, a procedure was described for using the time-dependent mode coefficients obtained from the XRMB data as input for an area function model of the vocal tract.

PubMed Disclaimer

Figures

FIG. 1
FIG. 1
Modes and mean area function determined from a 10-vowel set of area functions for an adult male (from Story et al. (1996)) shown as a function of the distance from the lips; a negative sign is used because of the leftward orientation of the glottis to the lips. (a) Modes φ1 (solid) and φ2; the vertical lines denote points at which the modes intersect in the front half of the vocal tract; and (b) Mean area function.
FIG. 2
FIG. 2
Mapping between the mode coefficients (q1 and q2) in (a) and formant frequencies (the F1 and F2) in (b). The dark solid line and the dashed line in (a) indicate the range of coefficient values for each of the two modes, respectively; the [F1,F2] pairs produced by the coefficients (via an area function) along each line are shown in (b) by the same line styles. The curved light lines in the left plot are the coefficient variations that correspond to the triangular, hypothetical [F1, F2] trajectory for [iɑui] shown on the right.
FIG. 3
FIG. 3
(a) Time-dependent mode coefficients for [iɑui] (b)Time-dependent area function generated by the coefficients in (a) and a time-dependent version of Eqn. (1).
FIG. 4
FIG. 4
Demonstration of determining a cross-distance function from XRMB data. (a) Sagittal view of a time frame representative of JW26’s [ɪ] vowel. A superior and inferior vocal tract boundary are generated based on the tongue points (T1–T4), palatal outline, pharyngeal wall, and four “phantom” points (open circles) related to the mandible and lips. (b) Bisection method of determining initial centerline points and cross-distances. (c) Result of of multiple iterations of the bisection technique. The lines extending across the vocal tract are perpendicular to the centerline and comprise the cross-distance measurements. (d) Cross-distance function.
FIG. 5
FIG. 5
Temporal variation of the cross-distance function measured over the course of [əmɑ] spoken by JW26. Note the bilabial closure for [m] occurs at 0.14 seconds.
FIG. 6
FIG. 6
Modes, φ1 and φ2, and mean cross-distance functions, Ω, for female speakers JW26 and JW56. The vertical lines indicate points at which φ1 and φ2 intersect; these are comparable to vertical lines shown in Fig. 1a. (a) φ1 and φ2 for JW26, (b) φ1 and φ2 for JW56, (c) Ω for JW26, and (d) Ω for JW56.
FIG. 7
FIG. 7
Modes, φ1 and φ2, and mean cross-distance functions, Ω, for male speakers JW12 and JW61. The vertical lines indicate points at which φ1 and φ2 intersect; these are comparable to vertical lines shown in Fig. 1a. (a) φ1 and φ2 for JW12, (b) φ1 and φ2 for JW61, (c) Ω for JW12, and (d) Ω for JW61.
FIG. 8
FIG. 8
Reconstructions of three vowels from JW26. The q1 and q2 coefficients used to reconstruct each cross-distance function are shown below the plots. In each plot the thin solid line denotes the original cross-distance function, the dashed thick line is the reconstruction with only one mode, the thick solid line is the reconstruction with two modes. (a) vowel [i], (b) vowel [ɔ] and (c) vowel[A].
FIG. 9
FIG. 9
Coefficient and formant spaces for JW26 based on principal component analysis and formant frequency measurements. (a) [q1, q2] space based on single data frames of the isolated vowels, (b) [F1, F2] space for isolated vowels, (c) [q1, q2] space based on time-dependent vowels and VV sequences, and (d) [F1, F2] space corresponding to the vowels and VVs in (c). In (c) and (d) the time-dependent vowels are represented as a series of open circles whereas the VV sequences are shown with solid dots connected by lines. The IPA labels for each VV are located, when practical, near the beginning of the transition.
FIG. 10
FIG. 10
Coefficient and formant spaces for JW56 based on principal component analysis and formant frequency measurements. The (a), (b), (c), and (d) subplots are denoted the same as in Fig. 9.
FIG. 11
FIG. 11
Coefficient and formant spaces for JW12 based on principal component analysis and formant frequency measurements. The (a), (b), (c), and (d) subplots are denoted the same as in Fig. 9.
FIG. 12
FIG. 12
Coefficient and formant spaces for JW61 based on principal component analysis and formant frequency measurements. The (a), (b), (c), and (d) subplots are denoted the same as in Fig. 9.
FIG. 13
FIG. 13
F1 and F2 formant frequencies (top), and mode coefficients (bottom) shown over the time course of six vowel sequences for JW26. The areas with white background indicate time periods where formant frequencies could be estimated from the audio signal, whereas, gray areas denote periods of silence between the production of the vowel sequences.
FIG. 14
FIG. 14
F1 and F2 formant frequencies (top), and mode coefficients (bottom) shown over the time course of six vowel sequences for JW56.
FIG. 15
FIG. 15
F1 and F2 formant frequencies (top), and mode coefficients (bottom) shown over the time course of six vowel sequences for JW12.
FIG. 16
FIG. 16
F1 and F2 formant frequencies (top), and mode coefficients (bottom) shown over the time course of six vowel sequences for JW61.
FIG. 17
FIG. 17
Coefficient and formant trajectories of vowel sequences for female speakers JW26 and JW56 relative to an area function model. The coefficient trajectories are transformed versions of those in Figs. 9c and 10c (via Eqns. 4–7). The formant trajectories were calculated from area functions generated by Eqn. 9. The background grids in (a) and (c) represent the possible coefficient space based on the area function model, whereas the grids in (b) and (d) result from the coefficient-to-formant mapping. The solid and dashed lines represent the effects of each mode in isolation.
FIG. 18
FIG. 18
Transformed coefficient and formant trajectories of vowel sequences for male speakers JW12 and JW61 relative to an area function model. The coefficient trajectories are transformed versions of those in Figs. 11c and 12c (via Eqns. 4–7). The formant trajectories were calculated from area functions generated by Eqn. 9. Further description of the figure is the same as Fig. 17.
FIG. 19
FIG. 19
Time-varying area function for JW12’s [ɑi] generated with Eqn. 11.

Similar articles

Cited by

References

    1. Anderson N. Modern Spectrum Analysis. IEEE Press; 1978. On the calculation of filter coefficients for maximum entropy spectral analysis, in Childers; pp. 252–255.
    1. Baer T, Gore JC, Gracco LC, Nye PW. Analysis of vocal tract shape and dimensions using magnetic resonance imaging: vowels. J Acoust Soc Am. 1991;90:799–828. - PubMed
    1. Boersma P, Weenink D. PRAAT, Version 4.4.07. 2006. www.praat.org.
    1. Bouabana S, Maeda S. Multi-pulse LPC modeling of articulatory movements. Speech Comm. 1998;24:227–248.
    1. Fowler CA, Saltzman EL. Coordination and coarticulation in speech production. Language and Speech. 1993;36(23):171–195. - PubMed

Publication types