scispace - formally typeset
Open AccessJournal ArticleDOI

Using connectome-based predictive modeling to predict individual behavior from brain connectivity.

Reads0
Chats0
TLDR
This protocol includes the following steps: feature selection, feature summarization, model building, and assessment of prediction significance, and it has been demonstrated that the CPM protocol performs as well as or better than many of the existing approaches in brain-behavior prediction.
Abstract
Neuroimaging is a fast-developing research area in which anatomical and functional images of human brains are collected using techniques such as functional magnetic resonance imaging (fMRI), diffusion tensor imaging (DTI), and electroencephalography (EEG). Technical advances and large-scale data sets have allowed for the development of models capable of predicting individual differences in traits and behavior using brain connectivity measures derived from neuroimaging data. Here, we present connectome-based predictive modeling (CPM), a data-driven protocol for developing predictive models of brain-behavior relationships from connectivity data using cross-validation. This protocol includes the following steps: (i) feature selection, (ii) feature summarization, (iii) model building, and (iv) assessment of prediction significance. We also include suggestions for visualizing the most predictive features (i.e., brain connections). The final result should be a generalizable model that takes brain connectivity data as input and generates predictions of behavioral measures in novel subjects, accounting for a considerable amount of the variance in these measures. It has been demonstrated that the CPM protocol performs as well as or better than many of the existing approaches in brain-behavior prediction. As CPM focuses on linear modeling and a purely data-driven approach, neuroscientists with limited or no experience in machine learning or optimization will find it easy to implement these protocols. Depending on the volume of data to be processed, the protocol can take 10-100 min for model building, 1-48 h for permutation testing, and 10-20 min for visualization of results.

read more

Content maybe subject to copyright    Report

Using connectome-based predictive modeling to predict
individual behavior from brain connectivity
Xilin Shen
1
, Emily S. Finn
2
, Dustin Scheinost
1
, Monica D. Rosenberg
3
, Marvin M. Chun
2,3,4
,
Xenophon Papademetris
1,5
, and R Todd Constable
1,2,6,*
1
Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven CT,
USA
2
Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven CT, USA
3
Department of Psychology, Yale University, New Haven CT, USA
4
Department of Psychology, Yale University, New Haven CT, USA
5
Department of Biomedical Engineering, Yale University, New Haven CT, USA
6
Department of Neurosurgery, Yale School of Medicine, New Haven CT, USA
Abstract
Neuroimaging is a fast developing research area where anatomical and functional images of
human brains are collected using techniques such as functional magnetic resonance imaging
(fMRI), diffusion tensor imaging (DTI), and electroencephalography (EEG). Technical advances
and large-scale datasets have allowed for the development of models capable of predicting
individual differences in traits and behavior using brain connectivity measures derived from
neuroimaging data. Here, we present connectome-based predictive modeling (CPM), a data-driven
protocol for developing predictive models of brain-behavior relationships from connectivity data
using cross-validation. This protocol includes the following steps: 1) feature selection, 2) feature
summarization, 3) model building, and 4) assessment of prediction significance. We also include
suggestions for visualizing the most predictive features (i.e., brain connections). The final result
should be a generalizable model that takes brain connectivity data as input and generates
predictions of behavioral measures in novel subjects, accounting for a significant amount of the
variance in these measures. It has been demonstrated that the CPM protocol performs equivalently
or better than most of the existing approaches in brain-behavior prediction. However, because
CPM focuses on linear modeling and a purely data-driven driven approach, neuroscientists with
limited or no experience in machine learning or optimization would find it easy to implement the
protocols. Depending on the volume of data to be processed, the protocol can take 10–100 minutes
*
Corresponding Author: R. Todd Constable, todd.constable@yale.edu.
Author contributions statements. XS, ESF, DS, XP, and RTC conceptualized the study. XS developed this protocol with help from ESF
and DS. ESF developed the prediction framework with help from XS and MDR. ESF, XP, and XS contributed previously unpublished
tools. XP developed the online visualization tools with help from XS and DS. XP, MMC, and RTC provided support and guidance
with data interpretation. All authors made significant comments on the manuscript.
Supplementary Information
Supplementary Table 1
Supplementary Table 2
HHS Public Access
Author manuscript
Nat Protoc
. Author manuscript; available in PMC 2018 March 01.
Published in final edited form as:
Nat Protoc
. 2017 March ; 12(3): 506–518. doi:10.1038/nprot.2016.178.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

for model building, 1–48 hours for permutation testing, and 10–20 minutes for visualization of
results.
INTRODUCTION
Establishing the relationship between individual differences in brain structure and function
and individual differences in behavior is a major goal of modern neuroscience. Historically,
many neuroimaging studies of individual differences have focused on establishing
correlational relationships between brain measurements and cognitive traits such as
intelligence, memory, and attention, or disease symptoms.
Note, however, that the term “predicts” is often used loosely as a synonym for “correlates
with”—for example, it is common to say that brain propertyדpredicts” behavioral variable
y, where×may be an fMRI-derived measure of univariate activity or functional connectivity,
and y may be a measure of task performance, symptom severity or another continuous
variable. Yet, in the strict sense of the word, this is not prediction but rather correlation.
Correlation or similar regression models tend to overfit the data and, as a result, often fail to
generalize to novel data. The vast majority of brain-behavior studies do not preform cross-
validation, which makes it difficult to evaluate the generalizability of the results. In the worst
case, Kriegeskorte et al
1
demonstrated that circularity in selection and selective analyses
leads to completely erroneous results. Proper cross-validation is key to ensure independence
between feature selection and prediction/classification, thus eliminating spurious effects and
incorrect population-level inferences
2
. There are at least two important reasons to test the
predictive power of brain-behavior correlations discovered in the course of basic
neuroimaging research:
1.
From the standpoint of scientific rigor, cross-validation is a more conservative
way to infer the presence of a brain-behavior relationship than correlation. Cross-
validation is designed to protect against overfitting by testing the strength of the
relationship in a novel sample, increasing the likelihood of replication in future
studies.
2.
From a practical standpoint, establishing predictive power is necessary to
translate neuroimaging findings into tools with practical utility
3
. In part, fMRI
has struggled as a diagnostic tool due to low generalizability of results to novel
subjects. Testing and reporting performance in independent samples will
facilitate evaluation of a result’s generalizability and eventual development of
useful neuroimaging-based biomarkers with real-world applicability.
Nevertheless, the design and construction of predictive models remains a challenge.
Recently, we have developed connectome-based predictive modeling (CPM) with built-in
cross validation, a method for extracting and summarizing the most relevant features from
brain connectivity data in order to construct predictive models
4
. Using both resting-state
functional magnetic resonance imaging (fMRI) and task-based fMRI, we have shown that
cognitive traits, such as fluid intelligence and sustained attention, can be successfully
predicted in novel subjects using this method
4,5
. Although CPM was developed with fMRI-
Shen et al. Page 2
Nat Protoc
. Author manuscript; available in PMC 2018 March 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

derived functional connectivity as the input, we believe it could be adapted to work with
structural connectivity data measured with diffusion tensor imaging (DTI) or related
methods, or functional connectivity data derived from other modalities such as
electroencephalography (EEG).
Here, we present a protocol for developing predictive models of brain-behavior relationships
from connectivity data using CPM, which includes the following steps: 1) feature selection,
2) feature summarization, 3) model building and application, and 4) assessment of prediction
significance. We also include suggestions for visualization of results. This protocol is
designed to serve as a framework illustrating how to construct and test predictive models,
and to encourage investigators to perform these types of analyses.
Development of the protocol
In this protocol, we describe an algorithm to build predictive models based on a set of
single-subject connectivity matrices, and test these models using cross-validation on novel
data (shown as a schematic in Figure 1). We also discuss a number of options in model
building, including selecting features from pre-defined networks rather than from the whole
brain. We address the issue of how to assess the significance of the predictive power using
permutation tests. Finally, we provide examples of how to visualize the features—in this
case, brain connections—that contribute the most predictive power. This protocol has been
designed for users familiar with connectivity analysis and neuroimaging data processing.
Data preprocessing and related issues are out of the scope of this protocol as the methods
presented in Finn
et al
4
and Rosenberg
et al
5
generalize to any set of connectivity matrices.
Therefore, we assume individual data has been fully preprocessed and the input to this
protocol is a set of M by M connectivity matrices, where M represents the number of distinct
brain regions, or nodes, under consideration, and each element of the matrix is a continuous
value representing the strength of the connection between two nodes.
Applications of the method
Human neuroimaging studies routinely collect behavioral variables along with structural and
functional imaging. Additionally, open-source datasets including the Human Connectome
Project (HCP)
6
, the NKI-Rockland sample
7
, the ADHD-200
8
, and the Philadelphia
Neurodevelopmental Cohort (PNC)
9
include a large sample of subjects (N>500) with both
imaging data and many behavioral variables. Therefore, vast amounts of data exist to explore
which brain connections predict individual differences in behavior. Further, as demonstrated
in Rosenberg
et al
5
, these open-source datasets can be pooled or combined with local
datasets to test whether a predictive model generalizes across different scanners, different
subject populations, and even different measures of the underlying phenotype of interest.
We have applied the CPM protocol in our research and demonstrated robust relationships
between brain connectivity and fluid intelligence in Finn
et al
4
and between brain
connectivity and sustained attention in Rosenberg
et al
5
. Here, we aim to provide a user-
friendly guide for performing prediction of a behavioral variable in novel subjects using
connectivity data. The models described in this protocol offer a rigorous way to establish a
brain-behavior relationship using cross-validation.
Shen et al. Page 3
Nat Protoc
. Author manuscript; available in PMC 2018 March 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Comparison with other methods
The strengths of CPM include its use of linear operations and its purely data-driven
approach. Linear operations allow for fast computation (for example, roughly 60 seconds to
run leave-one-subject-out cross-validation on 100 subjects), easy software implementation
(<100 lines of Matlab code), and straightforward interpretation of feature weights. Although
state-of-the-art brain parcellation methods typically divide the brain into ~300 regions
resulting in ~45,000 unique connections, or edges
10–13
, many hypothesis-driven approaches
focus on a single edge, region, or network of interest. These approaches ignore a large
number of connections and may limit predictive power. In contrast, CPM searches for most
relevant features (edges) across the whole brain and summarizes these selected features for
prediction.
The simplest and most popular method for establishing brain-behavior relationships using
neuroimaging data is correlation or regression models
14
. As mentioned in the introduction,
these methods often overfit the data and limit generalizability to novel data. Often these
correlational relationships are tested on
a priori
regions of interest, but may also be tested in
a whole-brain, data-driven manner. Importantly, using a cross-validated approach helps
guard against the potential for false positives inherent in a whole-brain, data-driven analysis,
and eschews the need for traditional correction for multiple comparisons.
The most directly comparable method to CPM may be the multivariate-prediction and
univariate-regression method used by the HCP Netmats MegaTrawl release
15
(https://
db.humanconnectome.org/megatrawl/index.html). This set of algorithms uses independent
component analysis and partial correlation to generate connectivity matrices from resting-
state fMRI data. These matrices are then related to behavior using elastic-net feature
selection and prediction and 10-fold cross-validation (inner loop for parameter optimization
and outer loop for prediction evaluation). The main differences between this approach and
the proposed CPM approach are: (1) use of group-wise ICA to derive subject-specific
functional brain subunits (and associated time courses) versus use of an existing functional
brain atlas registered to each subject; (2) use of partial correlation versus Pearson correlation
to measure connectivity; (3) use of elastic net algorithm versus Pearson correlation with the
behavioral measure to select meaningful edges; (4) use of elastic net algorithm for predicting
versus use of a linear model on mean connectivity strength. This approach is
computationally more complex and requires substantial expertise on optimization. We focus
on the use of a purely linear model that can be easily implemented with basic programming
skills. While no direct comparison has been made, both methods perform similarly for
predicting fluid intelligence (see Finn
et al
4
and Smith et al
16
).
Another alternative method for developing predictive models from brain connectivity data is
support vector regression (SVR)
17
, an extension of the support vector machine classification
framework to continuous data. In this approach, rather than performing mass univariate
calculations to select relevant features (edges) and combining these into a single statistic for
each subject, a supervised learning algorithm considers all features simultaneously, and
generates a model that assigns different weights to different features in order to best
approximate each observation (distinct behavioral measurement) in the training set. Features
from the test subject(s) are then combined using the same weights and the trained model
Shen et al. Page 4
Nat Protoc
. Author manuscript; available in PMC 2018 March 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

outputs a predicted behavioral score. See Dosenbach et al
18
for an example of SVR applied
to functional connectivity data to predict subject age. A comparison between CPM and SVR
in terms of performance and running time is provided in the Supplemental Information and
Supplemental Table S1.
Finally, many studies have used similar machine-learning techniques in a classification
framework to distinguish healthy control participants from patients using connectivity data.
Reliable classification of patients has been shown in several disorders including ADHD
19
,
autism
20,21
, schizophrenia
22
, Alzheimer’s
23
, and depression
24
. A fundamental difference
between most classification methods and CPM (or the multivariate-prediction and
univariate-regression method or the SVR based methods described above) is that in
classification the outcome variable is discrete (often binary) instead of continuous.
Prediction of individual differences in a continuous measure across a healthy sample is
considerably more challenging then binary classification of disease state. Variations in
behavior among healthy participants generally have substantially lower effect size than
differences due to pathology. In addition, accurate prediction of continuous variables
requires accurate modeling over the whole range of the variable, whereas accurate binary
classification largely requires accurate grouping of participants near the margin. In the case
where subsets of participants are distributed far from the margin, the correct classification of
these subsets is often guaranteed.
While SVR and related multivariate methods can provide good predictive power, in our
experience, predictions generated using CPM are often as good or better than those
generated using SVR, and CPM has at least two advantages over multivariate methods. First,
from a practical standpoint, CPM is simpler to implement and requires less expertise in
machine learning. This makes it more accessible for the general neuroimaging community. It
is our hope that in providing this protocol, we can encourage researchers to perform cross-
validated analyses of the brain-behavior relationships they discover, which will set more
rigorous statistical standards for the field and improve replicability across studies.
The second major advantage of the CPM approach compared to multivariate methods is that
the predictive networks obtained by CPM can be clearly interpreted. It is a frequently
overlooked problem in the literature that interpreting weights generated by multivariate
regression models—even linear ones—is not straightforward
25
. For example, researchers
often erroneously equate large weights with greater importance, and it is even harder to
interpret nonlinear models. CPM allows researchers to rigorously test the predictive value of
a brain-behavior relationship while still providing a one-to-one mapping back to the original
feature space so that researchers can visualize and investigate the underlying brain
connections contributing to the model. This is critical for comparing results with existing
literature, generating new hypothesis about network structure and function, and advancing
our understanding of functional brain organization in general.
Limitations
CPM is based on linear relationships typically with a slope and an intercept (i.e., y=mx+b).
These models may not be optimal for capturing complex, non-linear relationships between
connectivity and behavior. Higher order polynomial terms could be added to the model (i.e.,
Shen et al. Page 5
Nat Protoc
. Author manuscript; available in PMC 2018 March 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Citations
More filters
Journal ArticleDOI

Robust prediction of individual creative ability from brain functional connectivity

TL;DR: A whole-brain network associated with high-creative ability comprised of cortical hubs within default, salience, and executive systems—intrinsic functional networks that tend to work in opposition is identified, suggesting that highly creative people are characterized by the ability to simultaneously engage these large-scale brain networks.
Journal ArticleDOI

Task-induced brain state manipulation improves prediction of individual traits.

TL;DR: It is shown that task-based functional connectivity better predicts intelligence-related measures than rest-based connectivity, suggesting that cognitive tasks amplify individual differences in trait-relevant circuitry.
Journal ArticleDOI

A decade of test-retest reliability of functional connectivity: A systematic review and meta-analysis

TL;DR: This study represents the first meta-analysis and systematic review investigating test-retest reliability of edge-level functional connectivity and suggests there is room for improvement, but care should be taken to avoid promoting reliability at the expense of validity.
Journal ArticleDOI

Influences on the Test-Retest Reliability of Functional Connectivity MRI and its Relationship with Behavioral Utility.

TL;DR: Reliability was lowest for subcortical connections and highest for within‐network cortical connections, and Multivariate reliability was greater than univariate; these findings are among the first to underscore this distinction for functional connectivity.
Journal ArticleDOI

Ten simple rules for predictive modeling of individual differences in neuroimaging.

TL;DR: Ten simple rules to help researchers apply predictive modeling to connectivity data are offered and it is hoped these ten rules will increase the use of predictive models with neuroimaging data.
References
More filters
Proceedings Article

A study of cross-validation and bootstrap for accuracy estimation and model selection

TL;DR: The results indicate that for real-word datasets similar to the authors', the best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds.
Journal ArticleDOI

The organization of the human cerebral cortex estimated by intrinsic functional connectivity

TL;DR: In this paper, the organization of networks in the human cerebrum was explored using resting-state functional connectivity MRI data from 1,000 subjects and a clustering approach was employed to identify and replicate networks of functionally coupled regions across the cerebral cortex.
Journal ArticleDOI

The WU-Minn Human Connectome Project: An Overview

TL;DR: Progress made during the first half of the Human Connectome Project project in refining the methods for data acquisition and analysis provides grounds for optimism that the HCP datasets and associated methods and software will become increasingly valuable resources for characterizing human brain connectivity and function, their relationship to behavior, and their heritability and genetic underpinnings.
Proceedings Article

Support Vector Regression Machines

TL;DR: This work compares support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space and expects that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.
Journal ArticleDOI

Functional network organization of the human brain

TL;DR: In this article, the authors studied functional brain organization in healthy adults using resting state functional connectivity MRI and proposed two novel brain wide graphs, one of 264 putative functional areas, the other a modification of voxelwise networks that eliminates potentially artificial short-distance relationships.
Related Papers (5)