Selective cortical representation of attended speaker in multi-talker speech perception
Nima Mesgarani,Edward F. Chang +1 more
TLDR
It is demonstrated that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone.Abstract:
The neural correlates of how attended speech is internally represented are described, shedding light on the ‘cocktail party problem’. The 'cocktail-party problem' — the question of what goes on in our brains when we listen selectively for one person's voice while ignoring many others — has puzzled researchers from various disciplines for years. Using electrophysiological recordings from neurosurgery patients listening to two speakers simultaneously, Nima Mesgarani and Edward Chang determine the neural correlates associated with the internal representation of attended speech. They find that the neural responses in the auditory cortex represent the attended voice robustly, almost as if the second voice were not there. With these patterns established, a simple algorithm trained on various speakers predicts which stimulus a subject is attending to, on the basis of the patterns emerging in the secondary auditory cortex. These results suggest that speech representation in the brain reflects not only the acoustic environment, but also the listener's understanding of these signals. As well as shedding light on a long-standing neurobiological problem, this work may give clues as to how automatic speech recognition might be improved to cope with more than one talker. Humans possess a remarkable ability to attend to a single speaker’s voice in a multi-talker background1,2,3. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented4,5. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener’s intended goal.read more
Citations
More filters
Journal ArticleDOI
Cortical oscillations and sensory predictions
TL;DR: It is argued that neural rhythms offer distinct and adapted computational solutions to predicting 'what' is going to happen in the sensory environment and 'when'.
Journal ArticleDOI
Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party”
Elana Zion Golumbic,Elana Zion Golumbic,Nai Ding,Stephan Bickel,Stephan Bickel,Peter Lakatos,Catherine A. Schevon,Guy M. McKhann,Robert R. Goodman,Ronald G. Emerson,Ashesh D. Mehta,Ashesh D. Mehta,Jonathan Z. Simon,David Poeppel,Charles E. Schroeder,Charles E. Schroeder +15 more
TL;DR: It is found that brain activity dynamically tracks speech streams using both low-frequency phase and high-frequency amplitude fluctuations and that optimal encoding likely combines the two.
Journal ArticleDOI
Emergence of neural encoding of auditory objects while listening to competing speakers
Nai Ding,Jonathan Z. Simon +1 more
TL;DR: Recording from subjects selectively listening to one of two competing speakers using magnetoencephalography indicates that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for top-down attentional modulation and bottom-up neural adaptation.
Journal ArticleDOI
The Ease of Language Understanding (ELU) model: Theoretical, empirical, and clinical advances
Jerker Rönnberg,Thomas Lunner,Adriana A. Zekveld,Adriana A. Zekveld,Patrik Sörqvist,Henrik Danielsson,Björn Lyxell,Örjan Dahlström,Carine Signoret,Stefan Stenfelt,M. Kathleen Pichora-Fuller,Mary Rudner +11 more
TL;DR: This paper examines the Ease of Language Understanding model in light of new behavioral and neural findings concerning the role of working memory capacity (WMC) in uni-modal and bimodal language processing.
Journal ArticleDOI
Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG
James O’Sullivan,Alan J. Power,Nima Mesgarani,Siddharth Rajaram,John J. Foxe,Barbara G. Shinn-Cunningham,Malcolm Slaney,Shihab A. Shamma,Edmund C. Lalor +8 more
TL;DR: It is shown that single-trial unaveraged EEG data can be decoded to determine attentional selection in a naturalistic multispeaker environment and a significant correlation between the EEG-based measure of attention and performance on a high-level attention task is shown.
References
More filters
Journal ArticleDOI
Some Experiments on the Recognition of Speech, with One and with Two Ears
TL;DR: In this paper, the relation between the messages received by the two ears was investigated, and two types of test were reported: (a) the behavior of a listener when presented with two speech signals simultaneously (statistical filtering problem) and (b) behavior when different speech signals are presented to his two ears.
Book
Auditory Scene Analysis: The Perceptual Organization of Sound
TL;DR: Auditory Scene Analysis as discussed by the authors addresses the problem of hearing complex auditory environments, using a series of creative analogies to describe the process required of the human auditory system as it analyzes mixtures of sounds to recover descriptions of individual sounds.
Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST
John S. Garofolo,Lori Lamel,W M. Fisher,Jonathan G. Fiscus,David S. Pallett,Nancy L. Dahlgren +5 more
Dataset
TIMIT Acoustic-Phonetic Continuous Speech Corpus
John S. Garofolo,Lori Lamel,William M. Fisher,Jonathan C. Fiscus,David S. Pallett,Nancy L. Dahlgren,Victor W. Zue +6 more
TL;DR: The TIMIT corpus as mentioned in this paper contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences, including time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance.
Journal ArticleDOI
Reading a neural code
TL;DR: Here the neural code was characterized from the point of view of the organism, culminating in algorithms for real-time stimulus estimation based on a single example of the spike train, applied to an identified movement-sensitive neuron in the fly visual system.
Related Papers (5)
Emergence of neural encoding of auditory objects while listening to competing speakers
Nai Ding,Jonathan Z. Simon +1 more