scispace - formally typeset
Search or ask a question
Author

Michael J. Crosse

Other affiliations: Trinity College, Dublin
Bio: Michael J. Crosse is an academic researcher from Albert Einstein College of Medicine. The author has contributed to research in topics: Multisensory integration & Neurocomputational speech processing. The author has an hindex of 9, co-authored 21 publications receiving 813 citations. Previous affiliations of Michael J. Crosse include Trinity College, Dublin.

Papers
More filters
Journal ArticleDOI
TL;DR: A new open-source toolbox for performing temporal response functions describing a mapping between stimulus and response in both directions is introduced and the importance of regularizing the analysis is explained and how this regularization can be optimized for a particular dataset.
Abstract: Understanding how brains process sensory signals in natural environments is one of the key goals of 21st century neuroscience. While brain imaging and invasive electrophysiology will play key roles in this endeavor, there is also an important role to be played by noninvasive, macroscopic techniques with high temporal resolution such as electro- and magnetoencephalography. But challenges exist in determining how best to analyze such complex, time-varying neural responses to complex, time-varying and multivariate natural sensory stimuli. There has been a long history of applying system identification techniques to relate the firing activity of neurons to complex sensory stimuli and such techniques are now seeing increased application to EEG and MEG data. One particular example involves fitting a filter – often referred to as a temporal response function – that describes a mapping between some feature(s) of a sensory stimulus and the neural response. Here, we first briefly review the history of these system identification approaches and describe a specific technique for deriving temporal response functions known as regularized linear regression. We then introduce a new open-source toolbox for performing this analysis. We describe how it can be used to derive (multivariate) temporal response functions describing a mapping between stimulus and response in both directions. We also explain the importance of regularizing the analysis and how this regularization can be optimized for a particular dataset. We then outline specifically how the toolbox implements these analyses and provide several examples of the types of results that the toolbox can produce. Finally, we consider some of the limitations of the toolbox and opportunities for future development and application.

437 citations

Journal ArticleDOI
TL;DR: These findings demonstrate that, when successfully comprehending natural speech, the human brain responds to the contextual semantic content of each word in a relatively time-locked fashion.

286 citations

Journal ArticleDOI
TL;DR: It is demonstrated that the brain's representation of auditory speech is enhanced when the accompanying visual speech signal shares the same timing, and this enhancement is most pronounced at a time scale that corresponds to mean syllable length.
Abstract: Congruent audiovisual speech enhances our ability to comprehend a speaker, even in noise-free conditions. When incongruent auditory and visual information is presented concurrently, it can hinder a listener9s perception and even cause him or her to perceive information that was not presented in either modality. Efforts to investigate the neural basis of these effects have often focused on the special case of discrete audiovisual syllables that are spatially and temporally congruent, with less work done on the case of natural, continuous speech. Recent electrophysiological studies have demonstrated that cortical response measures to continuous auditory speech can be easily obtained using multivariate analysis methods. Here, we apply such methods to the case of audiovisual speech and, importantly, present a novel framework for indexing multisensory integration in the context of continuous speech. Specifically, we examine how the temporal and contextual congruency of ongoing audiovisual speech affects the cortical encoding of the speech envelope in humans using electroencephalography. We demonstrate that the cortical representation of the speech envelope is enhanced by the presentation of congruent audiovisual speech in noise-free conditions. Furthermore, we show that this is likely attributable to the contribution of neural generators that are not particularly active during unimodal stimulation and that it is most prominent at the temporal scale corresponding to syllabic rate (2–6 Hz). Finally, our data suggest that neural entrainment to the speech envelope is inhibited when the auditory and visual streams are incongruent both temporally and contextually. SIGNIFICANCE STATEMENT Seeing a speaker9s face as he or she talks can greatly help in understanding what the speaker is saying. This is because the speaker9s facial movements relay information about what the speaker is saying, but also, importantly, when the speaker is saying it. Studying how the brain uses this timing relationship to combine information from continuous auditory and visual speech has traditionally been methodologically difficult. Here we introduce a new approach for doing this using relatively inexpensive and noninvasive scalp recordings. Specifically, we show that the brain9s representation of auditory speech is enhanced when the accompanying visual speech signal shares the same timing. Furthermore, we show that this enhancement is most pronounced at a time scale that corresponds to mean syllable length.

149 citations

Journal ArticleDOI
TL;DR: This work examines this in the human brain using natural speech-in-noise stimuli that were designed specifically to maximize the behavioral benefit of audiovisual (AV) speech to find that this benefit arises from the ability to integrate multimodal information over longer periods of time.
Abstract: Speech comprehension is improved by viewing a speaker9s face, especially in adverse hearing conditions, a principle known as inverse effectiveness. However, the neural mechanisms that help to optimize how we integrate auditory and visual speech in such suboptimal conversational environments are not yet fully understood. Using human EEG recordings, we examined how visual speech enhances the cortical representation of auditory speech at a signal-to-noise ratio that maximized the perceptual benefit conferred by multisensory processing relative to unisensory processing. We found that the influence of visual input on the neural tracking of the audio speech signal was significantly greater in noisy than in quiet listening conditions, consistent with the principle of inverse effectiveness. Although envelope tracking during audio-only speech was greatly reduced by background noise at an early processing stage, it was markedly restored by the addition of visual speech input. In background noise, multisensory integration occurred at much lower frequencies and was shown to predict the multisensory gain in behavioral performance at a time lag of ∼250 ms. Critically, we demonstrated that inverse effectiveness, in the context of natural audiovisual (AV) speech processing, relies on crossmodal integration over long temporal windows. Our findings suggest that disparate integration mechanisms contribute to the efficient processing of AV speech in background noise. SIGNIFICANCE STATEMENT The behavioral benefit of seeing a speaker9s face during conversation is especially pronounced in challenging listening environments. However, the neural mechanisms underlying this phenomenon, known as inverse effectiveness, have not yet been established. Here, we examine this in the human brain using natural speech-in-noise stimuli that were designed specifically to maximize the behavioral benefit of audiovisual (AV) speech. We find that this benefit arises from our ability to integrate multimodal information over longer periods of time. Our data also suggest that the addition of visual speech restores early tracking of the acoustic speech signal during excessive background noise. These findings support and extend current mechanistic perspectives on AV speech perception.

121 citations

Journal ArticleDOI
01 Mar 2018
TL;DR: The availability of prior information enhanced the perceived clarity of degraded speech, which was positively correlated with changes in phoneme-level encoding across subjects, and the possibility that prior information affects the early encoding of natural speech in a dual manner is suggested.
Abstract: In real-world environments, humans comprehend speech by actively integrating prior knowledge (P) and expectations with sensory input. Recent studies have revealed effects of prior information in temporal and frontal cortical areas and have suggested that these effects are underpinned by enhanced encoding of speech-specific features, rather than a broad enhancement or suppression of cortical activity. However, in terms of the specific hierarchical stages of processing involved in speech comprehension, the effects of integrating bottom-up sensory responses and top-down predictions are still unclear. In addition, it is unclear whether the predictability that comes with prior information may differentially affect speech encoding relative to the perceptual enhancement that comes with that prediction. One way to investigate these issues is through examining the impact of P on indices of cortical tracking of continuous speech features. Here, we did this by presenting participants with degraded speech sentences that either were or were not preceded by a clear recording of the same sentences while recording non-invasive electroencephalography (EEG). We assessed the impact of prior information on an isolated index of cortical tracking that reflected phoneme-level processing. Our findings suggest the possibility that prior information affects the early encoding of natural speech in a dual manner. Firstly, the availability of prior information, as hypothesized, enhanced the perceived clarity of degraded speech, which was positively correlated with changes in phoneme-level encoding across subjects. In addition, P induced an overall reduction of this cortical measure, which we interpret as resulting from the increase in predictability.

52 citations


Cited by
More filters
01 Jan 2016
TL;DR: This is an introduction to the event related potential technique, which can help people facing with some malicious bugs inside their laptop to read a good book with a cup of tea in the afternoon.
Abstract: Thank you for downloading an introduction to the event related potential technique. Maybe you have knowledge that, people have look hundreds times for their favorite readings like this an introduction to the event related potential technique, but end up in malicious downloads. Rather than reading a good book with a cup of tea in the afternoon, instead they are facing with some malicious bugs inside their laptop.

2,445 citations

Journal ArticleDOI
TL;DR: A new open-source toolbox for performing temporal response functions describing a mapping between stimulus and response in both directions is introduced and the importance of regularizing the analysis is explained and how this regularization can be optimized for a particular dataset.
Abstract: Understanding how brains process sensory signals in natural environments is one of the key goals of 21st century neuroscience. While brain imaging and invasive electrophysiology will play key roles in this endeavor, there is also an important role to be played by noninvasive, macroscopic techniques with high temporal resolution such as electro- and magnetoencephalography. But challenges exist in determining how best to analyze such complex, time-varying neural responses to complex, time-varying and multivariate natural sensory stimuli. There has been a long history of applying system identification techniques to relate the firing activity of neurons to complex sensory stimuli and such techniques are now seeing increased application to EEG and MEG data. One particular example involves fitting a filter – often referred to as a temporal response function – that describes a mapping between some feature(s) of a sensory stimulus and the neural response. Here, we first briefly review the history of these system identification approaches and describe a specific technique for deriving temporal response functions known as regularized linear regression. We then introduce a new open-source toolbox for performing this analysis. We describe how it can be used to derive (multivariate) temporal response functions describing a mapping between stimulus and response in both directions. We also explain the importance of regularizing the analysis and how this regularization can be optimized for a particular dataset. We then outline specifically how the toolbox implements these analyses and provide several examples of the types of results that the toolbox can produce. Finally, we consider some of the limitations of the toolbox and opportunities for future development and application.

437 citations

Journal ArticleDOI
TL;DR: These findings demonstrate that, when successfully comprehending natural speech, the human brain responds to the contextual semantic content of each word in a relatively time-locked fashion.

286 citations

Journal ArticleDOI
TL;DR: The concept of entrainment is discussed, which offers both a putative neurophysiological mechanism for selective listening and a versatile window onto the neural basis of hearing and speech comprehension.

257 citations

Journal ArticleDOI
TL;DR: While responses to acoustic features reflect attention through selective amplification of attended speech, responses consistent with a lexical processing model reveal categorically selective processing, report a response reflecting the incremental integration of phonetic information for word identification.

198 citations