Selective cortical representation of attended speaker in multi-talker speech perception

doi:10.1038/NATURE11020

Open AccessJournal ArticleDOI

Selective cortical representation of attended speaker in multi-talker speech perception

Nima Mesgarani, +1 more

- 10 May 2012 -

Nature

- Vol. 485, Iss: 7397, pp 233-236

TLDR

It is demonstrated that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone.

Abstract:

The neural correlates of how attended speech is internally represented are described, shedding light on the ‘cocktail party problem’. The 'cocktail-party problem' — the question of what goes on in our brains when we listen selectively for one person's voice while ignoring many others — has puzzled researchers from various disciplines for years. Using electrophysiological recordings from neurosurgery patients listening to two speakers simultaneously, Nima Mesgarani and Edward Chang determine the neural correlates associated with the internal representation of attended speech. They find that the neural responses in the auditory cortex represent the attended voice robustly, almost as if the second voice were not there. With these patterns established, a simple algorithm trained on various speakers predicts which stimulus a subject is attending to, on the basis of the patterns emerging in the secondary auditory cortex. These results suggest that speech representation in the brain reflects not only the acoustic environment, but also the listener's understanding of these signals. As well as shedding light on a long-standing neurobiological problem, this work may give clues as to how automatic speech recognition might be improved to cope with more than one talker. Humans possess a remarkable ability to attend to a single speaker’s voice in a multi-talker background1,2,3. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented4,5. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener’s intended goal.

Selective cortical representation of attended speaker in multi-talker speech perception

Citations

Cortical oscillations and sensory predictions

Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party”

Emergence of neural encoding of auditory objects while listening to competing speakers

The Ease of Language Understanding (ELU) model: Theoretical, empirical, and clinical advances

Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG

References

Some Experiments on the Recognition of Speech, with One and with Two Ears

Auditory Scene Analysis: The Perceptual Organization of Sound

Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST

TIMIT Acoustic-Phonetic Continuous Speech Corpus

Reading a neural code

Related Papers (5)

Emergence of neural encoding of auditory objects while listening to competing speakers

Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party”

Some Experiments on the Recognition of Speech, with One and with Two Ears

Cortical oscillations and speech processing: emerging computational principles and operations.

The cortical organization of speech processing