scispace - formally typeset
Search or ask a question

Showing papers by "Paul Sajda published in 2021"


Journal ArticleDOI
TL;DR: This work develops CNN architectures that demonstrate robust detection of glaucoma in optical coherence tomography images and test with concept activation vectors (TCAVs) to infer what image concepts CNNs use to generate predictions, and compares TCAV results to eye fixations of clinicians to identify common decision-making features used by both AI and human experts.
Abstract: Recent studies suggest that deep learning systems can now achieve performance on par with medical experts in diagnosis of disease. A prime example is in the field of ophthalmology, where convolutional neural networks (CNNs) have been used to detect retinal and ocular diseases. However, this type of artificial intelligence (AI) has yet to be adopted clinically due to questions regarding robustness of the algorithms to datasets collected at new clinical sites and a lack of explainability of AI-based predictions, especially relative to those of human expert counterparts. In this work, we develop CNN architectures that demonstrate robust detection of glaucoma in optical coherence tomography (OCT) images and test with concept activation vectors (TCAVs) to infer what image concepts CNNs use to generate predictions. Furthermore, we compare TCAV results to eye fixations of clinicians, to identify common decision-making features used by both AI and human experts. We find that employing fine-tuned transfer learning and CNN ensemble learning create end-to-end deep learning models with superior robustness compared to previously reported hybrid deep-learning/machine-learning models, and TCAV/eye-fixation comparison suggests the importance of three OCT report sub-images that are consistent with areas of interest fixated upon by OCT experts to detect glaucoma. The pipeline described here for evaluating CNN robustness and validating interpretable image concepts used by CNNs with eye movements of experts has the potential to help standardize the acceptance of new AI tools for use in the clinic.

28 citations


Journal ArticleDOI
James R. McIntosh1, Jiaang Yao1, Linbi Hong, Josef Faller, Paul Sajda 
TL;DR: A novel method for BCG artifact suppression using recurrent neural networks (RNNs) that can be used to reduce BCG related artifacts in EEG-fMRI recordings without the use of additional hardware is presented.
Abstract: Objective: The concurrent recording of electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) is a technique that has received much attention due to its potential for combined high temporal and spatial resolution. However, the ballistocardiogram (BCG), a large-amplitude artifact caused by cardiac induced movement contaminates the EEG during EEG-fMRI recordings. Removal of BCG in software has generally made use of linear decompositions of the corrupted EEG. This is not ideal as the BCG signal propagates in a manner which is non-linearly dependent on the electrocardiogram (ECG). In this paper, we present a novel method for BCG artifact suppression using recurrent neural networks (RNNs). Methods: EEG signals were recovered by training RNNs on the nonlinear mappings between ECG and the BCG corrupted EEG. We evaluated our model's performance against the commonly used Optimal Basis Set (OBS) method at the level of individual subjects, and investigated generalization across subjects. Results: We show that our algorithm can generate larger average power reduction of the BCG at critical frequencies, while simultaneously improving task relevant EEG based classification. Conclusion: The presented deep learning architecture can be used to reduce BCG related artifacts in EEG-fMRI recordings. Significance: We present a deep learning approach that can be used to suppress the BCG artifact in EEG-fMRI without the use of additional hardware. This method may have scope to be combined with current hardware methods, operate in real-time and be used for direct modeling of the BCG.

16 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used deep learning to uncover nonlinear representations that link the electrophysiological and hemodynamic measurements in EEG-fMRI systems, which yielded new insights into brain function that are not possible when each modality is acquired separately.
Abstract: Advances in the instrumentation and signal processing for simultaneously acquired electroencephalography and functional magnetic resonance imaging (EEG-fMRI) have enabled new ways to observe the spatiotemporal neural dynamics of the human brain. Central to the utility of EEG-fMRI neuroimaging systems are the methods for fusing the two data streams, with machine learning playing a key role. These methods can be dichotomized into those that are symmetric and asymmetric in terms of how the two modalities inform the fusion. Studies using these methods have shown that fusion yields new insights into brain function that are not possible when each modality is acquired separately. As technology improves and methods for fusion become more sophisticated, the future of EEG-fMRI for noninvasive measurement of brain dynamics includes mesoscale mapping at ultrahigh magnetic resonance fields, targeted perturbation-based neuroimaging, and using deep learning to uncover nonlinear representations that link the electrophysiological and hemodynamic measurements.

12 citations


Journal ArticleDOI
TL;DR: Investigation of the relationship between raphe 5-HT 1A binding and brain-wide network dynamics of negative emotion suggests increased hippocampal network inhibition in MDD is linked to hippocampalSerotonergic dysfunction which may in turn arise from disrupted linkage in raphe to hippocampus serotonergic circuitry.
Abstract: Serotonergic dysfunction is implicated in major depressive disorder (MDD), but the mechanisms of this relationship remain elusive. Serotonin 1A (5-HT1A) autoreceptors regulate brain-wide serotonin neuron firing and are positioned to assert large-scale effects on negative emotion. Here we investigated the relationship between raphe 5-HT1A binding and brain-wide network dynamics of negative emotion. 22 healthy-volunteers (HV) and 27 medication-free participants with MDD underwent positron emission tomography (PET) using [11C]CUMI-101 (CUMI) to quantify 5-HT1A binding in midbrain raphe nuclei and functional magnetic resonance imaging (fMRI) scanning during emotionally negative picture viewing. Causal connectivity across regions responsive to negative emotion was estimated in the fMRI data using a multivariate dynamical systems model. During negative picture viewing, MDD subjects demonstrated significant hippocampal inhibition of amygdala, basal-ganglia, thalamus, orbital frontal cortex, inferior frontal gyrus and dorsomedial prefrontal cortex (IFG, dmPFC). MDD-related connectivity was not associated with raphe 5-HT1A binding. However, greater hippocampal inhibition of amygdala, thalamus, IFG and dmPFC correlated with hippocampal 5-HT1A binding. Correlation between hippocampal 5-HT1A binding and the hippocampal inhibition network was specific to MDD but not HV. MDD and HV groups also differed with respect to the correlation between raphe and hippocampal 5-HT1A binding which was more pronounced in HV. These findings suggest that increased hippocampal network inhibition in MDD is linked to hippocampal serotonergic dysfunction which may in turn arise from disrupted linkage in raphe to hippocampus serotonergic circuitry.

11 citations


Proceedings ArticleDOI
13 Apr 2021
TL;DR: In this article, a hybrid 3D-2D convolutional neural network (CNN) was used for classification of non-AMD, non-neovascular AMD and neovascular macular degeneration.
Abstract: With the availability and increasing reliance on the noninvasive Optical Coherence Tomography Angiography(OCTA) imaging technique for detection of vascular diseases of the retina,suchasage-related macular degeneration(AMD),clinicians now have access to more data than they can effectively parse and digest. Artificial intelligence in the form of convolutional neural networks (CNNs), have shown successful detection of AMDvs. no AMD from fundus images as well as from OCT structural images. In this work, we address an ovel classification problem: automated detection of late stage of the disease, neovascular AMD, visualized through presence of choroidal neovascularization (CNV) and its sequelae. We describe hybrid 3D-2D CNNs that achieve accuracy up to 77.8% at multi-class categorical classification of non-AMD eyes, eyes having non-neovascular AMD, and eyes having neovascular AMD, offering a first-of-its-kind deep learning approach for differentiating progression in AMD.

8 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed and evaluated methods to improve the generalizability of convolutional neural networks (CNNs) trained to detect glaucoma from optical coherence tomography retinal nerve fiber layer probability maps.
Abstract: Purpose To develop and evaluate methods to improve the generalizability of convolutional neural networks (CNNs) trained to detect glaucoma from optical coherence tomography retinal nerve fiber layer probability maps, as well as optical coherence tomography circumpapillary disc (circle) b-scans, and to explore impact of reference standard (RS) on CNN accuracy. Methods CNNs previously optimized for glaucoma detection from retinal nerve fiber layer probability maps, and newly developed CNNs adapted for glaucoma detection from optical coherence tomography b-scans, were evaluated on an unseen dataset (i.e., data collected at a different site). Multiple techniques were used to enhance CNN generalizability, including augmenting the training dataset, using multimodal input, and training with confidently rated images. Model performance was evaluated with different RS. Results Training with data augmentation and training on confident images enhanced the accuracy of the CNNs for glaucoma detection on a new dataset by 5% to 9%. CNN performance was optimal when a similar RS was used to establish labels both for the training and the testing sets. However, interestingly, the CNNs described here were robust to variation in the RS. Conclusions CNN generalizability can be improved with data augmentation, multiple input image modalities, and training on images with confident ratings. CNNs trained and tested with the same RS achieved best accuracy, suggesting that choosing a thorough and consistent RS for training and testing improves generalization to new datasets. Translational relevance Strategies for enhancing CNN generalizability and for choosing optimal RS should be standard practice for CNNs before their deployment for glaucoma detection.

6 citations


Journal ArticleDOI
TL;DR: The study of avalanches in human brain activity provides a tool to assess cognitive variability and is supportive of the emerging theoretical idea that the dynamics of an active human brain operate close to a critical-like region and not a singular critical-state.

5 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: This article proposed an automated video editing model called contextual and multimodal video editing (CMVE), which leverages visual and textual metadata describing videos, integrating essential information from both modalities, and uses a learned editing style from a single example video to coherently combine clips.
Abstract: We propose an automated video editing model, which we term contextual and multimodal video editing (CMVE). The model leverages visual and textual metadata describing videos, integrating essential information from both modalities, and uses a learned editing style from a single example video to coherently combine clips. The editing model is useful for tasks such as generating news clip montages and highlight reels given a text query that describes the video storyline. The model exploits the perceptual similarity between video frames, objects in videos and text descriptions to emulate coherent video editing. Amazon Mechanical Turk participants made judgements comparing CMVE to expert human editing. Experimental results showed no significant difference in the CMVE vs human edited video in terms of matching the text query and the level of interest each generates, suggesting CMVE is able to effectively integrate semantic information across visual and textual modalities and create perceptually coherent quality videos typical of human video editors. We publicly release an online demonstration of our method.

5 citations



Journal ArticleDOI
TL;DR: Using a spatiospectral based inter and intra network connectivity analysis, it is found that improvisers showed a variety of differences in connectivity within and between large-scale cortical networks compared to classically trained musicians, as a function of deviant type.

1 citations


Journal ArticleDOI
TL;DR: In this article, a deep recurrent neural network (RNN) model was used to predict pupil dilation in real-world environments using sequence measures such as fixation position, duration, saccades, and blink-related information.
Abstract: There is increasing interest in how the pupil dynamics of the eye reflect underlying cognitive processes and brain states Problematic, however, is that pupil changes can be due to non-cognitive factors, for example luminance changes in the environment, accommodation and movement In this paper we consider how by modeling the response of the pupil in real-world environments we can capture the non-cognitive related changes and remove these to extract a residual signal which is a better index of cognition and performance Specifically, we utilize sequence measures such as fixation position, duration, saccades, and blink-related information as inputs to a deep recurrent neural network (RNN) model for predicting subsequent pupil diameter We build and evaluate the model for a task where subjects are watching educational videos and subsequently asked questions based on the content Compared to commonly-used models for this task, the RNN had the lowest errors rates in predicting subsequent pupil dilation given sequence data Most importantly was how the model output related to subjects' cognitive performance as assessed by a post-viewing test Consistent with our hypothesis that the model captures non-cognitive pupil dynamics, we found (1) the model's root-mean square error was less for lower performing subjects than for those having better performance on the post-viewing test, (2) the residuals of the RNN (LSTM) model had the highest correlation with subject post-viewing test scores and (3) the residuals had the highest discriminability (assessed via area under the ROC curve, AUC) for classifying high and low test performers, compared to the true pupil size or the RNN model predictions This suggests that deep learning sequence models may be good for separating components of pupil responses that are linked to luminance and accommodation from those that are linked to cognition and arousal

Posted ContentDOI
25 Feb 2021-bioRxiv
TL;DR: In this paper, a spatiospectral-based inter-and intra-network connectivity analysis was performed on a group of musicians who spanned a range of improvisational and classically trained experience, and the authors found that improvisers showed a variety of differences in connectivity within and between large-scale cortical networks compared to classical trained musicians.
Abstract: Musical improvisers are trained to categorize certain musical structures into functional classes, which is thought to facilitate improvisation. Using a novel auditory oddball paradigm (Goldman et al., 2020) which enables us to disassociate a deviant (i.e. musical cord inversion) from a consistent functional class, we recorded scalp EEG from a group of musicians who spanned a range of improvisational and classically trained experience. Using a spatiospectral based inter and intra network connectivity analysis, we found that improvisers showed a variety of differences in connectivity within and between large-scale cortical networks compared to classically trained musicians, as a function of deviant type. Inter-network connectivity in the alpha band, for a time window leading up to the behavioural response, was strongly linked to improvisation experience, with the default mode network acting as a hub. Spatiospectral networks post response were substantially different between improvisers and classically trained musicians, with greater inter-network connectivity (specific to the alpha and beta bands) seen in improvisers whereas those with more classical training had largely reduced inter-network activity (mostly in the gamma band). More generally, we interpret our findings in the context of network-level correlates of expectation violation as a function of subject expertise, and we discuss how these may generalize to other and more ecologically valid scenarios.