scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Perceptual invariance and onset spectra for stop consonants in different vowel environments

01 Nov 1976-Journal of the Acoustical Society of America (Acoustical Society of America)-Vol. 67, Iss: 2, pp 648-662
TL;DR: In this paper, a series of perception experiments were conducted to determine if a brief stimulus in which only the spectral information at onset is preserved provides sufficient cues for identification of place of articulation across vowel contexts, and if it does, to define further the nature and size of the spectral window.
Abstract: In this series of perception experiments, we have attempted (a) to determine if a brief stimulus in which only the spectral information at onset is preserved provides sufficient cues for identification of place of articulation across vowel contexts, and (b) if it does, to define further the nature and size of the spectral window. Subjects were randomly presented with synthetically produced stimuli consisting of a 5‐ or 10‐msec noise burst followed by a brief voiced interval containing three formant transitions with onset and offset characteristics appropriate to the consonants [b, d, g] in the environment of the vowels [a, i, u], as well as stimuli with steady second‐ and third‐formant transitions. The length of the voiced interval was systematically varied from 40 to 5 msec. The results indicate that an onset spectrum consisting of the burst plus the initial 5 or 10 msec of voicing provide sufficient cues for the identification of the stop consonant, and that vocalic information can be reliably derived from these brief stimuli containing only one or two glottal pulses. [Research approved by an NIH grant.]
Citations
More filters
Book
01 Jan 1995
TL;DR: The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation.
Abstract: The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time-delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input. As a recognition task, the speaker-dependent recognition of the phonemes B, D, and G in varying phonetic contexts was chosen. For comparison, several discrete hidden Markov models (HMM) were trained to perform the same task. Performance evaluation over 1946 testing tokens from three speakers showed that the TDNN achieves a recognition rate of 98.5% correct while the rate obtained by the best of the HMMs was only 93.7%. >

2,512 citations

Journal ArticleDOI
TL;DR: In this article, the authors presented a time-delay neural network (TDNN) approach to phoneme recognition, which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input
Abstract: The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time-delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input. As a recognition task, the speaker-dependent recognition of the phonemes B, D, and G in varying phonetic contexts was chosen. For comparison, several discrete hidden Markov models (HMM) were trained to perform the same task. Performance evaluation over 1946 testing tokens from three speakers showed that the TDNN achieves a recognition rate of 98.5% correct while the rate obtained by the best of the HMMs was only 93.7%. >

2,319 citations

Journal ArticleDOI

614 citations

Journal ArticleDOI
TL;DR: This work found that listening to synthesized speech stimuli varying in small and acoustically equal steps evoked distinct and invariant cortical population response patterns that were organized by their sensitivities to critical acoustic features.
Abstract: Speech perception requires the rapid and effortless extraction of meaningful phonetic information from a highly variable acoustic signal. A powerful example of this phenomenon is categorical speech perception, in which a continuum of acoustically varying sounds is transformed into perceptually distinct phoneme categories. We found that the neural representation of speech sounds is categorically organized in the human posterior superior temporal gyrus. Using intracranial high-density cortical surface arrays, we found that listening to synthesized speech stimuli varying in small and acoustically equal steps evoked distinct and invariant cortical population response patterns that were organized by their sensitivities to critical acoustic features. Phonetic category boundaries were similar between neurometric and psychometric functions. Although speech-sound responses were distributed, spatially discrete cortical loci were found to underlie specific phonetic discrimination. Our results provide direct evidence for acoustic-to-higher order phonetic level encoding of speech sounds in human language receptive cortex.

520 citations

Journal ArticleDOI
TL;DR: Based in part on previous studies of speech of the hearing impaired, a profile has been designed to direct research on the acoustic or physiologic correlates of dysarthric intelligibility impairment and a word intelligibility test is proposed for use with Dysarthric speakers.
Abstract: The measurement of intelligibility in dysarthric individuals is a major concern in clinical assessment and management and in research on dysarthria. The measurement objective is complicated by the ...

483 citations