scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Neural coding of continuous speech in auditory cortex during monaural and dichotic listening

01 Jan 2012-Journal of Neurophysiology (American Physiological Society)-Vol. 107, Iss: 1, pp 78-89
TL;DR: These findings characterize how the spectrotemporal features of speech are encoded in human auditory cortex and establish a single-trial-based paradigm to study the neural basis underlying the cocktail party phenomenon.
Abstract: The cortical representation of the acoustic features of continuous speech is the foundation of speech perception. In this study, noninvasive magnetoencephalography (MEG) recordings are obtained from human subjects actively listening to spoken narratives, in both simple and cocktail party-like auditory scenes. By modeling how acoustic features of speech are encoded in ongoing MEG activity as a spectrotemporal response function, we demonstrate that the slow temporal modulations of speech in a broad spectral region are represented bilaterally in auditory cortex by a phase-locked temporal code. For speech presented monaurally to either ear, this phase-locked response is always more faithful in the right hemisphere, but with a shorter latency in the hemisphere contralateral to the stimulated ear. When different spoken narratives are presented to each ear simultaneously (dichotic listening), the resulting cortical neural activity precisely encodes the acoustic features of both of the spoken narratives, but slightly weakened and delayed compared with the monaural response. Critically, the early sensory response to the attended speech is considerably stronger than that to the unattended speech, demonstrating top-down attentional gain control. This attentional gain is substantial even during the subjects' very first exposure to the speech mixture and therefore largely independent of knowledge of the speech content. Together, these findings characterize how the spectrotemporal features of speech are encoded in human auditory cortex and establish a single-trial-based paradigm to study the neural basis underlying the cocktail party phenomenon.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: It is found that, during listening to connected speech, cortical activity of different timescales concurrently tracked the time course of abstract linguistic structures at different hierarchical levels, such as words, phrases and sentences.
Abstract: The most critical attribute of human language is its unbounded combinatorial nature: smaller elements can be combined into larger structures on the basis of a grammatical system, resulting in a hierarchy of linguistic units, such as words, phrases and sentences. Mentally parsing and representing such structures, however, poses challenges for speech comprehension. In speech, hierarchical linguistic structures do not have boundaries that are clearly defined by acoustic cues and must therefore be internally and incrementally constructed during comprehension. We found that, during listening to connected speech, cortical activity of different timescales concurrently tracked the time course of abstract linguistic structures at different hierarchical levels, such as words, phrases and sentences. Notably, the neural tracking of hierarchical linguistic structures was dissociated from the encoding of acoustic cues and from the predictability of incoming words. Our results indicate that a hierarchy of neural processing timescales underlies grammar-based internal construction of hierarchical linguistic structure.

749 citations

Journal ArticleDOI
06 Mar 2013-Neuron
TL;DR: It is found that brain activity dynamically tracks speech streams using both low-frequency phase and high-frequency amplitude fluctuations and that optimal encoding likely combines the two.

730 citations


Cites background or result from "Neural coding of continuous speech ..."

  • ...This selectivity itself seems to sharpen as a sentence unfolds....

    [...]

  • ...…in regions closer to auditory cortex are consistent with findings that sensory areasmaintain representations for both attended and ignored speech (Ding and Simon, 2012b), as well as with classic findings for modulation of simple sensory responses by attention (Hillyard et al., 1973; Woldorff et…...

    [...]

  • ...We speculate that this reflects contamination from onset responses which occur during the beginning of the sentence (Ding and Simon, 2012b) as a larger response during the first epoch is found across both electrode groups and frequency bands....

    [...]

  • ...Using converging analytic approaches we confirm that both low-frequency phase (Ding and Simon, 2012b; Kerlin et al., 2010) and high gamma power (Mesgarani and Chang, 2012) concurrently track the envelope of attended speech....

    [...]

  • ...This is in line with previous findings that both LF and HGp speech tracking responses are modulated by attention and are not simply a reflection of global acoustical input (Ding and Simon, 2012a, 2012b; Kerlin et al., 2010;Mesgarani and Chang, 2012)....

    [...]

Journal ArticleDOI
TL;DR: Recording from subjects selectively listening to one of two competing speakers using magnetoencephalography indicates that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for top-down attentional modulation and bottom-up neural adaptation.
Abstract: A visual scene is perceived in terms of visual objects. Similar ideas have been proposed for the analogous case of auditory scene analysis, although their hypothesized neural underpinnings have not yet been established. Here, we address this question by recording from subjects selectively listening to one of two competing speakers, either of different or the same sex, using magnetoencephalography. Individual neural representations are seen for the speech of the two speakers, with each being selectively phase locked to the rhythm of the corresponding speech stream and from which can be exclusively reconstructed the temporal envelope of that speech stream. The neural representation of the attended speech dominates responses (with latency near 100 ms) in posterior auditory cortex. Furthermore, when the intensity of the attended and background speakers is separately varied over an 8-dB range, the neural representation of the attended speech adapts only to the intensity of that speaker but not to the intensity of the background speaker, suggesting an object-level intensity gain control. In summary, these results indicate that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for top-down attentional modulation and bottom-up neural adaptation.

696 citations

Journal ArticleDOI
TL;DR: It is shown that single-trial unaveraged EEG data can be decoded to determine attentional selection in a naturalistic multispeaker environment and a significant correlation between the EEG-based measure of attention and performance on a high-level attention task is shown.
Abstract: How humans solve the cocktail party problem remains unknown. However, progress has been made recently thanks to the realization that cortical activity tracks the amplitude envelope of speech. This has led to the development of regression methods for studying the neurophysiology of continuous speech. One such method, known as stimulus-reconstruction, has been successfully utilized with cortical surface recordings and magnetoencephalography (MEG). However, the former is invasive and gives a relatively restricted view of processing along the auditory hierarchy, whereas the latter is expensive and rare. Thus it would be extremely useful for research in many populations if stimulus-reconstruction was effective using electroencephalography (EEG), a widely available and inexpensive technology. Here we show that single-trial (≈60 s) unaveraged EEG data can be decoded to determine attentional selection in a naturalistic multispeaker environment. Furthermore, we show a significant correlation between our EEG-based measure of attention and performance on a high-level attention task. In addition, by attempting to decode attention at individual latencies, we identify neural processing at ∼200 ms as being critical for solving the cocktail party problem. These findings open up new avenues for studying the ongoing dynamics of cognition using EEG and for developing effective and natural brain– computer interfaces.

620 citations


Cites background or methods or result from "Neural coding of continuous speech ..."

  • ...These data suggest, as has been done before (Ding and Simon 2012a), that attended speech is not simply more strongly represented by the same neural generators, but rather that both speech streams are represented separately in the neural data....

    [...]

  • ...Previous research (Ding and Simon 2012a) has shown that attended and unattended speech can be extracted separately from neural data, implying that it is not just the case that attended speech is more strongly represented by the same neural generators....

    [...]

  • ...Using magnetoencephalography (MEG), which is a more global measure of cortical activity, Ding and Simon (2012a) showed that responses to a single-trial speech mixture could be decoded to give an estimate of the envelope of the input speech stream, and that this estimate typically had a greater…...

    [...]

  • ...Recent research in this area has focused on changes in cortical activity that track the dynamic changes in the speech stimulus (Kerlin et al. 2010; Ding and Simon 2012a; Koskinen et al. 2012; Mesgarani and Chang 2012; Power et al. 2012; Zion Golumbic et al. 2013)....

    [...]

  • ...This stimulus-reconstruction approach has been shown to be exquisitely sensitive to selective attention in a multispeaker environment (Ding and Simon 2012a, 2012b; Zion Golumbic et al. 2013)....

    [...]

Journal ArticleDOI
TL;DR: A neuroimaging study reveals how coupled brain oscillations at different frequencies align with quasi-rhythmic features of continuous speech such as prosody, syllables, and phonemes.
Abstract: Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG) to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta) and the amplitude of high-frequency (gamma) oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex) attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations.

514 citations


Cites background from "Neural coding of continuous speech ..."

  • ...These mechanisms could lead to changes in oscillatory phase dynamics [26,50,51]....

    [...]

References
More filters
Book
01 Jan 1991
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Abstract: Preface to the Second Edition. Preface to the First Edition. Acknowledgments for the Second Edition. Acknowledgments for the First Edition. 1. Introduction and Preview. 1.1 Preview of the Book. 2. Entropy, Relative Entropy, and Mutual Information. 2.1 Entropy. 2.2 Joint Entropy and Conditional Entropy. 2.3 Relative Entropy and Mutual Information. 2.4 Relationship Between Entropy and Mutual Information. 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information. 2.6 Jensen's Inequality and Its Consequences. 2.7 Log Sum Inequality and Its Applications. 2.8 Data-Processing Inequality. 2.9 Sufficient Statistics. 2.10 Fano's Inequality. Summary. Problems. Historical Notes. 3. Asymptotic Equipartition Property. 3.1 Asymptotic Equipartition Property Theorem. 3.2 Consequences of the AEP: Data Compression. 3.3 High-Probability Sets and the Typical Set. Summary. Problems. Historical Notes. 4. Entropy Rates of a Stochastic Process. 4.1 Markov Chains. 4.2 Entropy Rate. 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph. 4.4 Second Law of Thermodynamics. 4.5 Functions of Markov Chains. Summary. Problems. Historical Notes. 5. Data Compression. 5.1 Examples of Codes. 5.2 Kraft Inequality. 5.3 Optimal Codes. 5.4 Bounds on the Optimal Code Length. 5.5 Kraft Inequality for Uniquely Decodable Codes. 5.6 Huffman Codes. 5.7 Some Comments on Huffman Codes. 5.8 Optimality of Huffman Codes. 5.9 Shannon-Fano-Elias Coding. 5.10 Competitive Optimality of the Shannon Code. 5.11 Generation of Discrete Distributions from Fair Coins. Summary. Problems. Historical Notes. 6. Gambling and Data Compression. 6.1 The Horse Race. 6.2 Gambling and Side Information. 6.3 Dependent Horse Races and Entropy Rate. 6.4 The Entropy of English. 6.5 Data Compression and Gambling. 6.6 Gambling Estimate of the Entropy of English. Summary. Problems. Historical Notes. 7. Channel Capacity. 7.1 Examples of Channel Capacity. 7.2 Symmetric Channels. 7.3 Properties of Channel Capacity. 7.4 Preview of the Channel Coding Theorem. 7.5 Definitions. 7.6 Jointly Typical Sequences. 7.7 Channel Coding Theorem. 7.8 Zero-Error Codes. 7.9 Fano's Inequality and the Converse to the Coding Theorem. 7.10 Equality in the Converse to the Channel Coding Theorem. 7.11 Hamming Codes. 7.12 Feedback Capacity. 7.13 Source-Channel Separation Theorem. Summary. Problems. Historical Notes. 8. Differential Entropy. 8.1 Definitions. 8.2 AEP for Continuous Random Variables. 8.3 Relation of Differential Entropy to Discrete Entropy. 8.4 Joint and Conditional Differential Entropy. 8.5 Relative Entropy and Mutual Information. 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information. Summary. Problems. Historical Notes. 9. Gaussian Channel. 9.1 Gaussian Channel: Definitions. 9.2 Converse to the Coding Theorem for Gaussian Channels. 9.3 Bandlimited Channels. 9.4 Parallel Gaussian Channels. 9.5 Channels with Colored Gaussian Noise. 9.6 Gaussian Channels with Feedback. Summary. Problems. Historical Notes. 10. Rate Distortion Theory. 10.1 Quantization. 10.2 Definitions. 10.3 Calculation of the Rate Distortion Function. 10.4 Converse to the Rate Distortion Theorem. 10.5 Achievability of the Rate Distortion Function. 10.6 Strongly Typical Sequences and Rate Distortion. 10.7 Characterization of the Rate Distortion Function. 10.8 Computation of Channel Capacity and the Rate Distortion Function. Summary. Problems. Historical Notes. 11. Information Theory and Statistics. 11.1 Method of Types. 11.2 Law of Large Numbers. 11.3 Universal Source Coding. 11.4 Large Deviation Theory. 11.5 Examples of Sanov's Theorem. 11.6 Conditional Limit Theorem. 11.7 Hypothesis Testing. 11.8 Chernoff-Stein Lemma. 11.9 Chernoff Information. 11.10 Fisher Information and the Cram-er-Rao Inequality. Summary. Problems. Historical Notes. 12. Maximum Entropy. 12.1 Maximum Entropy Distributions. 12.2 Examples. 12.3 Anomalous Maximum Entropy Problem. 12.4 Spectrum Estimation. 12.5 Entropy Rates of a Gaussian Process. 12.6 Burg's Maximum Entropy Theorem. Summary. Problems. Historical Notes. 13. Universal Source Coding. 13.1 Universal Codes and Channel Capacity. 13.2 Universal Coding for Binary Sequences. 13.3 Arithmetic Coding. 13.4 Lempel-Ziv Coding. 13.5 Optimality of Lempel-Ziv Algorithms. Compression. Summary. Problems. Historical Notes. 14. Kolmogorov Complexity. 14.1 Models of Computation. 14.2 Kolmogorov Complexity: Definitions and Examples. 14.3 Kolmogorov Complexity and Entropy. 14.4 Kolmogorov Complexity of Integers. 14.5 Algorithmically Random and Incompressible Sequences. 14.6 Universal Probability. 14.7 Kolmogorov complexity. 14.9 Universal Gambling. 14.10 Occam's Razor. 14.11 Kolmogorov Complexity and Universal Probability. 14.12 Kolmogorov Sufficient Statistic. 14.13 Minimum Description Length Principle. Summary. Problems. Historical Notes. 15. Network Information Theory. 15.1 Gaussian Multiple-User Channels. 15.2 Jointly Typical Sequences. 15.3 Multiple-Access Channel. 15.4 Encoding of Correlated Sources. 15.5 Duality Between Slepian-Wolf Encoding and Multiple-Access Channels. 15.6 Broadcast Channel. 15.7 Relay Channel. 15.8 Source Coding with Side Information. 15.9 Rate Distortion with Side Information. 15.10 General Multiterminal Networks. Summary. Problems. Historical Notes. 16. Information Theory and Portfolio Theory. 16.1 The Stock Market: Some Definitions. 16.2 Kuhn-Tucker Characterization of the Log-Optimal Portfolio. 16.3 Asymptotic Optimality of the Log-Optimal Portfolio. 16.4 Side Information and the Growth Rate. 16.5 Investment in Stationary Markets. 16.6 Competitive Optimality of the Log-Optimal Portfolio. 16.7 Universal Portfolios. 16.8 Shannon-McMillan-Breiman Theorem (General AEP). Summary. Problems. Historical Notes. 17. Inequalities in Information Theory. 17.1 Basic Inequalities of Information Theory. 17.2 Differential Entropy. 17.3 Bounds on Entropy and Relative Entropy. 17.4 Inequalities for Types. 17.5 Combinatorial Bounds on Entropy. 17.6 Entropy Rates of Subsets. 17.7 Entropy and Fisher Information. 17.8 Entropy Power Inequality and Brunn-Minkowski Inequality. 17.9 Inequalities for Determinants. 17.10 Inequalities for Ratios of Determinants. Summary. Problems. Historical Notes. Bibliography. List of Symbols. Index.

45,034 citations


"Neural coding of continuous speech ..." refers background in this paper

  • ...The decoding accuracy is limited by Fano’s inequality (Cover and Thomas 1991): H Pe Pelog N 1 log N I s, r , where Pe is percentage of correct decoding and H(Pe) Pelog(Pe) (1 Pe)log(1 Pe)....

    [...]

Journal ArticleDOI
TL;DR: A dual-stream model of speech processing is outlined that assumes that the ventral stream is largely bilaterally organized — although there are important computational differences between the left- and right-hemisphere systems — and that the dorsal stream is strongly left- Hemisphere dominant.
Abstract: Despite decades of research, the functional neuroanatomy of speech processing has been difficult to characterize. A major impediment to progress may have been the failure to consider task effects when mapping speech-related processing systems. We outline a dual-stream model of speech processing that remedies this situation. In this model, a ventral stream processes speech signals for comprehension, and a dorsal stream maps acoustic speech signals to frontal lobe articulatory networks. The model assumes that the ventral stream is largely bilaterally organized--although there are important computational differences between the left- and right-hemisphere systems--and that the dorsal stream is strongly left-hemisphere dominant.

4,234 citations


"Neural coding of continuous speech ..." refers background in this paper

  • ...…comprehension is a complex hierarchical process involving multiple brain regions, it is unclear whether the attentional effect seen in the auditory cortex directly modulates feedforward auditory processing or reflects only feedback from language areas, or even motor areas (Hickok and Poeppel 2007)....

    [...]

  • ...Because speech comprehension is a complex hierarchical process involving multiple brain regions, it is unclear whether the attentional effect seen in the auditory cortex directly modulates feedforward auditory processing or reflects only feedback from language areas, or even motor areas (Hickok and Poeppel 2007)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the relation between the messages received by the two ears was investigated, and two types of test were reported: (a) the behavior of a listener when presented with two speech signals simultaneously (statistical filtering problem) and (b) behavior when different speech signals are presented to his two ears.
Abstract: This paper describes a number of objective experiments on recognition, concerning particularly the relation between the messages received by the two ears. Rather than use steady tones or clicks (frequency or time‐point signals) continuous speech is used, and the results interpreted in the main statistically. Two types of test are reported: (a) the behavior of a listener when presented with two speech signals simultaneously (statistical filtering problem) and (b) behavior when different speech signals are presented to his two ears.

3,562 citations

Journal ArticleDOI
13 Oct 1995-Science
TL;DR: Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information; the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.
Abstract: Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants, vowels, and words in simple sentences improved markedly as the number of bands increased; high speech recognition performance was obtained with only three bands of modulated noise. Thus, the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.

2,865 citations


"Neural coding of continuous speech ..." refers background in this paper

  • ...The slow temporal modulations and coarse spectral modulations reflect the rhythm of speech and contain syllabic and phrasal level segmentation information (Greenberg 1999) and are particularly important for speech intelligibility (Shannon et al. 1995)....

    [...]

  • ...In quiet, these slow modulations, in concert with even a very coarse spectral modulation, accomplish high speech intelligibility (Shannon et al. 1995)....

    [...]

Journal ArticleDOI
12 Oct 1973-Science
TL;DR: Auditory evoked potentials were recorded from the vertex of subjects who listened selectively to a series of tone pipping in one ear and ignored concurrent tone pips in the other ear to study the response set established to recognize infrequent, higher pitched tone pipped in the attended series.
Abstract: Auditory evoked potentials were recorded from the vertex of subjects who listened selectively to a series of tone pips in one ear and ignored concurrent tone pips in the other ear. The negative component of the evoked potential peaking at 80 to 110 milliseconds was substantially larger for the attended tones. This negative component indexed a stimulus set mode of selective attention toward the tone pips in one ear. A late positive component peaking at 250 to 400 milliseconds reflected the response set established to recognize infrequent, higher pitched tone pips in the attended series.

1,839 citations


"Neural coding of continuous speech ..." refers background in this paper

  • ...It is known that even without any rhythmic cues, the auditory evoked response to an attended stimulus can be enhanced (Hillyard et al. 1973)....

    [...]

  • ...Experiments using dichotically presented tone sequences have demonstrated that the effect of attention on the M100 (N1) is observed for stimuli with some kinds of rhythm (typically fast) (Ahveninen et al. 2011; Hillyard et al. 1973; Power et al. 2011; Rif et al. 1991; Woldorff et al. 1993), but not others (Hari et al....

    [...]

  • ...…sequences have demonstrated that the effect of attention on the M100 (N1) is observed for stimuli with some kinds of rhythm (typically fast) (Ahveninen et al. 2011; Hillyard et al. 1973; Power et al. 2011; Rif et al. 1991; Woldorff et al. 1993), but not others (Hari et al. 1989; Ross et al. 2010)....

    [...]