scispace - formally typeset
Search or ask a question
Topic

Formant

About: Formant is a research topic. Over the lifetime, 6860 publications have been published within this topic receiving 141000 citations.


Papers
More filters
Journal ArticleDOI
Hynek Hermansky1
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
Abstract: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.

2,969 citations

Journal ArticleDOI
TL;DR: Analysis of the formant data shows numerous differences between the present data and those of PB, both in terms of average frequencies of F1 and F2, and the degree of overlap among adjacent vowels.
Abstract: This study was designed as a replication and extension of the classic study of vowel acoustics by Peterson and Barney (PB) [J. Acoust. Soc. Am. 24, 175–184 (1952)]. Recordings were made of 50 men, 50 women, and 50 children producing the vowels /i, i, eh, ae, hooked backward eh, inverted vee), a, open oh, u, u/ in h–V–d syllables. Formant contours for F1–F4 were measured from LPC spectra using a custom interactive editing tool. For comparison with the PB data, formant patterns were sampled at a time that was judged by visual inspection to be maximally steady. Preliminary analysis shows numerous differences between the present data and those of PB, both in terms of average formant frequencies for vowels, and the degree of overlap among adjacent vowels. As with the original study, listening tests showed that the signals were nearly always identified as the vowel intended by the talker.

1,891 citations

Journal ArticleDOI
TL;DR: A software formant synthesizer is described that can generate synthetic speech using a laboratory digital computer and a control program lets the user specify variable control parameter data, such as formant frequencies as a function of time, as a sequence of 〈time, value〉 points.
Abstract: A software formant synthesizer is described that can generate synthetic speech using a laboratory digital computer. A flexible synthesizer configuration permits the synthesis of sonorants by either a cascade or parallel connection of digital resonators, but frication spectra must be synthesized by a set of resonators connected in parallel. A control program lets the user specify variable control parameter data, such as formant frequencies as a function of time, as a sequence of 〈time, value〉 points. The synthesizer design is described and motivated in Secs. I–III, and fortran listings for the synthesizer and control program are provided in an appendix. Computer requirements and necessary support software are described in Sec. IV. Strategies for the imitation of any speech utterance are described in Sec. V, and suggested values of control parameters for the synthesis of many English sounds are presented in tabular form.

1,662 citations

Journal ArticleDOI
TL;DR: Perceptual validation of the relative importance of acoustic cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices.
Abstract: Voice quality variations include a set of voicing sound source modifications ranging from laryngealized to normal to breathy phonation. Analysis of reiterant imitations of two sentences by ten female and six male talkers has shown that the potential acoustic cues to this type of voice quality variation include: (1) increases to the relative amplitude of the fundamental frequency component as open quotient increases; (2) increases to the amount of aspiration noise that replaces higher frequency harmonics as the arytenoids become more separated; (3) increases to lower formant bandwidths; and (4) introduction of extra pole zeros in the vocal-tract transfer function associated with tracheal coupling. Perceptual validation of the relative importance of these cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices. The new formant synthesizer, KLSYN88, is fully documented here. Results of the perception study indicate that, contrary to previous research which emphasizes the importance of increased amplitude of the fundamental component, aspiration noise is perceptually most important. Without its presence, increases to the fundamental component may induce the sensation of nasality in a high-pitched voice. Further results of the acoustic analysis include the observations that: (1) over the course of a sentence, the acoustic manifestations of breathiness vary considerably--tending to increase for unstressed syllables, in utterance-final syllables, and at the margins of voiceless consonants; (2) on average, females are more breathy than males, but there are very large differences between subjects within each gender; (3) many utterances appear to end in a "breathy-laryngealized" type of vibration; and (4) diplophonic irregularities in the timing of glottal periods occur frequently, especially at the end of an utterance. Diplophonia and other deviations from perfect periodicity may be important aspects of naturalness in synthesis.

1,656 citations

Journal ArticleDOI
TL;DR: This paper overviews emotional speech recognition having in mind three goals to provide an up-to-date record of the available emotional speech data collections, and examines separately classification techniques that exploit timing information from which that ignore it.

907 citations


Network Information
Related Topics (5)
Speech perception
12.3K papers, 545K citations
87% related
Speech processing
24.2K papers, 637K citations
85% related
Microphone
39.9K papers, 337.3K citations
83% related
Word recognition
10.8K papers, 470.2K citations
81% related
Noise
110.4K papers, 1.3M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023149
2022323
2021139
2020154
2019180
2018173