Linear Modeling of Neurophysiological Responses to
Naturalistic Stimuli: Methodological Considerations for
Applied Research
Michael J. Crosse
1,2,
*
, Nathaniel J. Zuk
3,*
, Giovanni M. Di Liberto
4,5
, Aaron R. Nidiffer
6
, Sophie Molholm
2
,
and Edmund C. Lalor
6,†
1
X, The Moonshot Factory, Mountain View, CA
2
Department of Pediatrics and Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY
3
Edmond & Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem, Israel
4
Trinity Centre for Biomedical Engineering, Trinity College Institute of Neuroscience, Dept of Mechanical, Manufacturing and
Biomedical Engineering, Trinity College, The University of Dublin, Ireland
5
School of Electrical and Electronic Engineering and UCD Centre for Biomedical Engineering, University College Dublin, Ireland
6
Department of Biomedical Engineering and Department of Neuroscience, University of Rochester, Rochester, NY
Abstract
Cognitive neuroscience has seen an increase in the use of linear modelling techniques for studying the
processing of natural, environmental stimuli. The availability of such computational tools has prompted
similar investigations in many clinical domains, facilitating the study of cognitive and sensory deficits
within an ecologically relevant context. However, studying clinical (and often highly-heterogeneous)
cohorts introduces an added layer of complexity to such modelling procedures, leading to an increased
risk of improper usage of such techniques and, as a result, inconsistent conclusions. Here, we outline some
key methodological considerations for applied research and include worked examples of both simulated
and empirical electrophysiological (EEG) data. In particular, we focus on experimental design, data
preprocessing and stimulus feature extraction, model design, training and evaluation, and interpretation
of model weights. Throughout the paper, we demonstrate how to implement each stage in MATLAB using
the mTRF-Toolbox and discuss how to address issues that could arise in applied cognitive neuroscience
research. In doing so, we highlight the importance of understanding these more technical points for
experimental design and data analysis, and provide a resource for applied and clinical researchers
investigating sensory and cognitive processing using ecologically-rich stimuli.
Keywords: temporal response function, TRF, neural encoding, neural decoding, clinical and translational
neurophysiology, electrophysiology, EEG.
*
These authors contributed equally to this work.
†
e-mail: edmund_lalor@urmc.rochester.edu (E.C.L.)
Introduction
A core focus of cognitive neuroscience is to identify neural correlates of human behavior, with the
intention of understanding cognitive and sensory processing. Such correlates can be used to explicitly
model the functional relationship between some “real world” parameters describing a stimulus or
person’s behavior and the related brain activity. In particular, linear modelling techniques have become
ubiquitous within cognitive neuroscience because they provide a means of studying the processing of
dynamic sensory inputs such as natural scenes and sounds (Wu et al., 2006; Holdgraf et al., 2017). Unlike
event-related potentials (ERPs) – which are a direct measurement of the average neural response to a
discrete event – linear models seek to capture how changes in a stimulus dimension or cognitive state are
linearly reflected in the recorded brain activity. In other words, we model the outputs as a linear
combination (i.e., weighted sum) of the inputs. This enables researchers to conduct experiments using
ecologically relevant stimuli that are more engaging and more representative of real-world scenarios. This
contrasts with current standard practices in which discrete stimuli are presented repeatedly in a highly
artificial manner. Moreover, the simplicity of linear models enables researchers to interpret the model
weights neurophysiologically, providing insight into the neural encoding process of naturalistic stimuli
(Haufe et al., 2014; Kriegeskorte and Douglas, 2019).
The uptake in linear modelling techniques in cognitive neuroscience has led to a similar adoption in the
applied and translational neurosciences. This has greatly facilitated the study of naturalistic sensory
processing in various clinical cohorts such as individuals with autism spectrum disorder (Frey et al., 2013)
and dyslexia (Power et al., 2013; Di Liberto et al., 2018b). However, studying clinical cohorts raises
important issues when constructing and interpreting linear models. For example, particular care is
required when performing group comparisons of the model weights and evaluating model performance.
Furthermore, linear modeling poses challenges and considerations that are not typical for other types of
electrophysiology analysis. As a model, it is meant first and foremost to quantify the functional
relationship between the stimulus features of interest and the recorded neural response. Modeling
electrophysiological data is non-trivial because neighboring time samples and channels are not
independent of each other, so standard methods for quantifying the significance of the fit cannot be used.
Furthermore, the interpretation of the results must take into careful consideration the particular
preprocessing steps applied, which can have major effects on the response patterns obtained with linear
modeling, especially with respect to filtering, normalization and stimulus representation (Holdgraf et al.,
2017; de Cheveigné and Nelken, 2019). Here, we wish to provide guidance and intuition on such
procedures and, in particular, to promote best practices in applying these methods in clinical studies.
In this review, we will step through the stages involved in designing and implementing neuroscientific
experiments with linear modeling in mind. First, we discuss experimental design considerations for
optimizing model performance. Second, we discuss data preprocessing and stimulus feature extraction
techniques relevant to linear modeling. Third, we discuss model design choices and their use cases.
Fourth, we review how to appropriately train and test models as well as evaluate the significance of model
performance. Fifth, we discuss considerations for comparing models generated using multiple stimulus
representations. Sixth, we discuss the neurophysiological interpretation of linear model weights. Finally,
we discuss what can go wrong when using linear models for applied neurophysiology research.
In each section, via an example experiment, we will also introduce issues that are relevant to clinical
research. Because linear modeling is commonly used to study the neural processing of natural speech (for
reviews, see Ding and Simon, 2014; Holdgraf et al., 2017; Obleser and Kayser, 2019), these examples are
based on a speech study previously conducted by some of the authors, but the methods we describe
generalize to many other clinical groups, paradigms, and stimulus types. The researcher should modify
the experimental design, preprocessing and model design steps according to their own research
questions. Likewise, our focus will be on the linear modeling of EEG data, but these methods can be
applied to other neurophysiological data types, such as MEG, ECoG and fMRI. When discussing model
implementation, we will make specific reference to the mTRF-Toolbox, which can be found on github
(https://github.com/mickcrosse/mTRF-Toolbox). All functions referenced in this article were from version
3.0. While we do not elaborate on the technical details of the mTRF-Toolbox (for that we point the reader
to Crosse et al. (2016a)), we do provide example code and briefly walk the reader through its
implementation.
Example Experiment
The example experiment we will describe is based on a previous study performed by some of the co-
authors in this review (Di Liberto et al., 2018b). Individuals with dyslexia (our clinical group) display a
specific behavioral deficit in the processing of speech sounds (i.e., a phonological deficit), while having
intact general acoustic processing (Vellutino et al., 2004; Di Liberto et al., 2018b). We hypothesize that
observed phonological deficits can be explained by weaker phonetic encoding.
To test our hypothesis, we plan to measure how well phonetic features are represented in the ongoing
brain activity of participants with dyslexia compared to a control group. More specifically, we will quantify
how much a model that represents phonetic features improves the ability to predict EEG data over a
model based on acoustic features alone (i.e., the spectrogram). We hypothesize that the predictive
contribution from the phonetic model is reduced in participants with dyslexia, reflective of impaired
neural tracking of phonetic features, while the contributions of acoustics are comparable between groups.
To be clear, while it is inspired by a real study, the example experiment we discuss in this paper is merely
a toy experiment for didactic purposes.
Experimental design
One of the benefits of employing linear models for EEG analysis is the ability to use dynamic and
naturalistic stimuli. Additionally, it allows the experimenter to study sensory processing in an ecologically-
relevant context, and it also provides researchers the opportunity to design experiments that are more
engaging for the participants. This can potentially improve the quality of the data collected as well as the
reliability of the researcher’s findings. Certain factors should be considered when designing naturalistic
experiments.
Use subject-relevant stimulus material. This is primarily relevant to speech studies and is important for
ensuring subject compliance with the task, particularly when studying younger cohorts and individuals
with neurological disorders or developmental disabilities. For example, it is important when choosing an
audiobook or movie, that it is 1) age-relevant (e.g., a children’s story versus an adult’s podcast), 2) content-
relevant (a quantum physics lecture may not be everyone’s cup of tea), and 3) language-relevant (speaker
dialect and even accent may impact early-stage processing across participants/groups differentially). It
may in some situations be necessary to create such content from scratch by recording a native speaker
reading the chosen material aloud. However, there are also publicly available stimulus databases such as
MUSAN: an annotated corpus of continuous speech, music and noise (Snyder et al., 2015), and TCD-TIMIT:
a phonetically rich corpus of continuous audiovisual speech (Harte and Gillen, 2015).
Use a well-balanced stimulus set. It is important to consider the frequency of occurrence of particular
stimulus features that are relevant to the study (e.g., spectral or phonetic features). For example, choosing
stimulus material that contains only a few instances of particular phonemes will make it difficult to reliably
model the neural response to such phonemes without overfitting to the noise on those examples. This
can be avoided by employing phonetically balanced stimuli, such as the aforementioned TCD-TIMIT corpus
(Harte and Gillen, 2015), or in a post hoc manner by focusing the analysis on a subset of the data, i.e., only
the features that are equally represented or only the time segments where the stimuli are well balanced.
It is also best to work with longer stimuli that are preferably broadband or quasi-periodic (e.g., speech or
music recordings). Linear modeling can produce ambiguous results if the stimulus is perfectly periodic
since periodicity can result in artificially periodic-looking evoked responses which can also increase
difficulties with quantifying the accuracy of the model.
In addition, to enhance the model’s ability to disambiguate these response types and better generalize to
novel stimuli, one might consider how to incorporate additional acoustic variability in one’s stimuli,
independent of the linguistic content. This could be accomplished by including multiple speakers with
substantially different spectral profiles (e.g., both male and female speakers), as well as speakers who
provide a more dynamic range in prosody and intonation across the speech content (e.g., trained actors
or media presenters). Models that are trained on a broader range of stimuli are less likely to overfit to
stimulus features that are not of interest to the researcher (such as speaker identity, sex, or location), but
may perform slightly worse on average. Such decisions should be based on the researcher’s overall goals.
When considering your stimuli, we also suggest adopting an open mind with respect to possible future
analyses. Choosing materials that are rich in other features that can be modeled (e.g., semantic content,
prosody, temporal statistics) can provide fruitful opportunities for re-using your data to tackle new
questions beyond those planned in your current study (fans of Dr. Seuss and James Joyce beware!).
Collect enough training data. In order to train a model that generalizes well to new data, it is crucial to
consider how much training data is required, or in other words, how much new stimulus material it is
necessary to have. For most purposes, we recommend collecting a minimum of 10 to 20 minutes of data
per condition, although more data may be required for larger, multivariate models (e.g., spectrogram
models) or when features are sparsely represented (e.g., the onsets of content words). While it is feasible
to construct high-quality models from many short (<5 s) stimulus sequences, such as individual words or
sentences, it is preferable to use longer (>30 s) stimulus passages because it reduces the number of large
stimulus onset responses in the neural data, which tend to obscure feature-specific responses of interest
(see EEG preprocessing for tips on avoiding this).
While more data is always desirable for model training, longer recording sessions can cause subject
fatigue, compromising their ability to concentrate, particularly in children, older adults, or clinical cohorts.
Reduced attentional states can negatively impact the neural tracking of stimuli and as a result model