scispace - formally typeset
Open AccessJournal ArticleDOI

Selective cortical representation of attended speaker in multi-talker speech perception

Nima Mesgarani, +1 more
- 10 May 2012 - 
- Vol. 485, Iss: 7397, pp 233-236
TLDR
It is demonstrated that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone.
Abstract
The neural correlates of how attended speech is internally represented are described, shedding light on the ‘cocktail party problem’. The 'cocktail-party problem' — the question of what goes on in our brains when we listen selectively for one person's voice while ignoring many others — has puzzled researchers from various disciplines for years. Using electrophysiological recordings from neurosurgery patients listening to two speakers simultaneously, Nima Mesgarani and Edward Chang determine the neural correlates associated with the internal representation of attended speech. They find that the neural responses in the auditory cortex represent the attended voice robustly, almost as if the second voice were not there. With these patterns established, a simple algorithm trained on various speakers predicts which stimulus a subject is attending to, on the basis of the patterns emerging in the secondary auditory cortex. These results suggest that speech representation in the brain reflects not only the acoustic environment, but also the listener's understanding of these signals. As well as shedding light on a long-standing neurobiological problem, this work may give clues as to how automatic speech recognition might be improved to cope with more than one talker. Humans possess a remarkable ability to attend to a single speaker’s voice in a multi-talker background1,2,3. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented4,5. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener’s intended goal.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Cortical oscillations and sensory predictions

TL;DR: It is argued that neural rhythms offer distinct and adapted computational solutions to predicting 'what' is going to happen in the sensory environment and 'when'.
Journal ArticleDOI

Emergence of neural encoding of auditory objects while listening to competing speakers

TL;DR: Recording from subjects selectively listening to one of two competing speakers using magnetoencephalography indicates that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for top-down attentional modulation and bottom-up neural adaptation.
Journal ArticleDOI

The Ease of Language Understanding (ELU) model: Theoretical, empirical, and clinical advances

TL;DR: This paper examines the Ease of Language Understanding model in light of new behavioral and neural findings concerning the role of working memory capacity (WMC) in uni-modal and bimodal language processing.
Journal ArticleDOI

Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG

TL;DR: It is shown that single-trial unaveraged EEG data can be decoded to determine attentional selection in a naturalistic multispeaker environment and a significant correlation between the EEG-based measure of attention and performance on a high-level attention task is shown.
References
More filters
Journal ArticleDOI

Some Experiments on the Recognition of Speech, with One and with Two Ears

TL;DR: In this paper, the relation between the messages received by the two ears was investigated, and two types of test were reported: (a) the behavior of a listener when presented with two speech signals simultaneously (statistical filtering problem) and (b) behavior when different speech signals are presented to his two ears.
Book

Auditory Scene Analysis: The Perceptual Organization of Sound

TL;DR: Auditory Scene Analysis as discussed by the authors addresses the problem of hearing complex auditory environments, using a series of creative analogies to describe the process required of the human auditory system as it analyzes mixtures of sounds to recover descriptions of individual sounds.
Dataset

TIMIT Acoustic-Phonetic Continuous Speech Corpus

TL;DR: The TIMIT corpus as mentioned in this paper contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences, including time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance.
Journal ArticleDOI

Reading a neural code

TL;DR: Here the neural code was characterized from the point of view of the organism, culminating in algorithms for real-time stimulus estimation based on a single example of the spike train, applied to an identified movement-sensitive neuron in the fly visual system.
Related Papers (5)