scispace - formally typeset
Open AccessBook ChapterDOI

Multi-subject dictionary learning to segment an atlas of brain spontaneous activity

TLDR
A new hierarchical probabilistic model for brain activity patterns that does not require an experimental design to be specified is given and this model is estimated in the dictionary learning framework, learning simultaneously latent spatial maps and the corresponding brain activity time-series.
Abstract
Fluctuations in brain on-going activity can be used to reveal its intrinsic functional organization. To mine this information, we give a new hierarchical probabilistic model for brain activity patterns that does not require an experimental design to be specified. We estimate this model in the dictionary learning framework, learning simultaneously latent spatial maps and the corresponding brain activity time-series. Unlike previous dictionary learning frameworks, we introduce an explicit difference between subject-level spatial maps and their corresponding population-level maps, forming an atlas. We give a novel algorithm using convex optimization techniques to solve efficiently this problem with non-smooth penalties well-suited to image denoising. We show on simulated data that it can recover population-level maps as well as subject specificities. On resting-state fMRI data, we extract the first atlas of spontaneous brain activity and show how it defines a subject-specific functional parcellation of the brain in localized regions.

read more

Content maybe subject to copyright    Report

HAL Id: inria-00588898
https://hal.inria.fr/inria-00588898v2
Submitted on 19 Feb 2012
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Multi-subject dictionary learning to segment an atlas of
brain spontaneous activity
Gaël Varoquaux, Alexandre Gramfort, Fabian Pedregosa, Vincent Michel,
Bertrand Thirion
To cite this version:
Gaël Varoquaux, Alexandre Gramfort, Fabian Pedregosa, Vincent Michel, Bertrand Thirion. Multi-
subject dictionary learning to segment an atlas of brain spontaneous activity. Information Process-
ing in Medical Imaging, Gábor Székely, Horst Hahn, Jul 2011, Kaufbeuren, Germany. pp.562-573,
�10.1007/978-3-642-22092-0_46�. �inria-00588898v2�

Multi-subject dictionary learning to segment an
atlas of brain spontaneous activity
G. Varoquaux
123
, A. Gramfort
23
, F. Pedregos a
23
, V. Michel
23
, B. Thirion
23
1
INSERM U992 Cognitive Neuroimaging unit,
2
INRIA , Parietal team, Saclay, France
3
LNAO/NeuroSpin, CEA Saclay, Bat. 145, 91191 Gif-sur-Yvette, cedex France
Abstract. Fluctuations in brain on-going activity can be used to reveal
its intrinsic functional organization. To mine this information, we give
a new hierarchical probabilistic model for brain activity patterns that
does not require an experimental design to be specified. We estimate this
mod el in the dictionary learning framework, learning simultaneously la-
tent spatial maps and the corresponding brain activity time-series. Un-
like previous dictionary learning frameworks, we introduce an explicit
difference between subject-level spatial maps and th eir correspond in g
population-level maps, forming an atlas. We give a novel algorithm us-
ing convex optimization techniques to solve efficiently this problem with
non-smooth penalties well-suited to image denoising. We show on simu-
lated data that it can recover population-level maps as well as subject
sp ecificities. On resting-state fMRI data, we extract the first atlas of
sp ontaneous brain activity and show how it defines a subject-specific
functional parcellation of the brain in localized regions.
1 Introduction
The study of intrinsic brain functional organization via distant correlations in the
fluctuations of brain s ignals measured by functional Magnetic Resonance Imag-
ing (fMRI) is receiving increasing interest. In particular, the 1000 Functional
Connectomes project aims at parceling the brain in functional regions and then
at studying the correlation structure o f brain function across these nodes [5]. In-
depe ndent Component Analysis (ICA) is the most popular data-driven approach
to analyze spontaneous activity, as it has been shown to extract interpretable
spatial patterns [3] that are reproducible across subjects [9]. They form networks
of functional regions that are also found in task-driven studies [23].
From a medical point of view, the development of sta tis tically-controlled
analysis of brain spontaneous is interesting as it can lead to new diagnostic or
prognos tic tools applicable on impaired patients. In particular, correlations in
the functional signal between predefined regions have been shown to contain
markers o f post-stroke functiona l reorganization [25]. Howe ver, inferences drawn
from these regio ns depends on the targeted regions, and on their precise delin-
eation. In addition, subject-to-subject local variability, for instance in functional
topography, may c onfound the changes in long-dista nc e interactions.

2 G. Varoquaux, A. Gramfort, F. Pedregosa, V. Michel, B. Thirion
We address the segmentation of functional regions directly from the fMRI
signal. The challenge stems from the lack of salient feature s in the original s ignal,
as well as the lack of a controlled experimental design to perform model fitting
as in task- driven fMRI experiments. In particular, it is difficult to optimize the
parameters (dimension and regularization) of the models, hence to obtain an
arguably faithful and meaningful representation of this data. ICA tackles these
difficulties by estimating a mixing matrix to minimize the mutual information
between the resulting spatial components. Departing from ICA, [12] performs
segmentation by clustering the time se ries through a mixture model. However,
these approache s lack an explicit noise model and do not take into account the
subject-to-s ubject variability nor the spatial str uc ture of the signal. In this paper,
we formulate the problem in the dictionary learning framework and reject ob-
servation nois e based on the assumption that the relevant patterns are spatially
sparse [10, 26], and we focus on the choice of the involved parameters. The paper
is organized as follows: we give in section 2 a two-level probabilistic model that
involves subject-specific spatial maps as well as population-level latent maps, and
in section 3 a n associated efficient learning algorithm. In section 4 we describe
how to set the model parameters fr om the data. I n section 5 , we study different
learning schemes on synthetic data with simulated inter-individual variability.
Finally, in s e ction 6 we apply the method to learning a detailed population-level
atlas of regions describing spontaneous activity as recorded in fMRI.
2 Multi-subject decomposition model for brain activity
Problem statement We consider a dataset of br ain signal time se ries of length
n for S subjects, measured on p voxels: {Y
s
R
n×p
, s = 1 . . . S}. We stipula te
that the corresponding 3D image s are the observation of k spatial latent factors
V
s
R
p×k
, that characterize functional processes or structured measurement
artifacts, and associated time series U
s
R
n×k
: Y
s
U
s
V
s
T
. We are interested
in the study of resting state, or on-going activity, for which no experimental
design can be used to model time-courses, thus we propose to learn U
s
and
V
s
simultaneously, a problem known as dictionary learning, or linear signal
decomposition [7, 17].
Generative model In the case of a multi-subject dataset, we give a hierarchi-
cal probabilistic model for dictionary learning. Following the standard dictionary
learning model, the data observed for each s ubject is written as the linear com-
bination of s ubject-specific dictionary elements, that ar e spatial maps V
s
. For
resting-sta te brain activity, we do not model the loadings U
s
themselves, but
their cova riance.
s {1 . . . S}, Y
s
= U
s
V
s
T
+E
s
, E
s
N (0, σI), U
s
N (0, Σ
U
) (1)
In addition, the subject-specific maps V
s
are generated from population-level
latent factors, the spatial patterns written as brain maps V:
s {1 . . . S}, V
s
= V + F
s
, F
s
N (0, ζI) (2)

Title Suppressed Due to Excessive Length 3
Finally, we specify the prior distribution on V: P(V) exp (ξ (V)), where
is typically a norm or a quasi-norm.
Relation to existing models With ζ = 0, the model identifies V
s
with V: all
latent fa c tors are the same across subjects. In this context, if the prior on V is
un-informative, the model b oils down to a principal component analysis (PCA)
on the concatenated Y
s
. Fo r a Laplace prior, we recover probabilistic fo rmulation
of a sta ndard sparse
1
-penalized PCA [22]. More generally, in this framework,
sparsity-inducing priors give rise to a family of probabilistic projection models
[1]. Our multi-subject model however differs from generalized canonical correla-
tion analysis [16], and its spar se variants [1], as these approaches do not model
subject-specific latent factors and thus do not allow for two levels of variance.
Note that, for multi-subject studies, non-hierarchical models based on PCA and
ICA impose orthog onality constraints on the loadings at the group level, and
thus introduce a unnatural constraint on the U
s
across the different subjects.
ICA can be formulated in a maximum likelihood approach [4] and thus falls
in the same general class of non-hierar chical dictionary learning models [17].
However, as ICA disregards explained variance, it leads to improper priors on
V and requires the use of a PCA pre-processing step to estimate the noise
4
[3].
In neuroimaging, multi-subject dictio nary learning using a fixed group model
(ζ = 0) in combination with ICA is p opular, and called concatenated ICA [6]. In
the experimental section of this paper, we will focus on the us e of proper priors
on V based on spars ity-inducing norms , such as the
1
norm. They are known
to be efficient in terms of separating signal from noise, in the supervised settings
[27], and lead to tractable optimizations that are convex, though non-smooth.
3 Optimization strategy for efficient learning
We now present a new algorithm to efficiently estimate from the data at hand the
model specified by Eq. (1) and (2). I n the following, we call this problem Multi-
Subject Dictionary Learning (MSDL). In the maximum a posteriori (MAP) es-
timation framework, we le arn the pa rameters from the data by maximizing the
sum of the log-likelihood of the data given the model, and penalization terms
that express our hierarchical priors. In addition, as the variance of the group-
level residuals in Eq. (2) could be arbitrarily shrunk by shrinking the norm of
V, we impose an upper bound on the norm of the columns of U
s
:
(U
s
, V
s
)
s∈{1...S},
, V = argmin
U
s
,V
s
,V
E(U
s
, V
s
, V), s.t. ku
s
l
k
2
2
1 (3)
with E(U
s
, V
s
, V) =
S
X
s=1
1
2
kY
s
U
s
V
s
T
k
2
Fro
+ µkV
s
Vk
2
Fro
+ λ (V),
where µ =
σ
ζ
and λ =
σ
ξ
. The optimization problem given by Eq. (3) is not jointly
convex in U
s
, V
s
, and V, however it is separately convex in V
s
and (U
s
, V). Our
4
There exist noisy ICA approaches, but they all assume that the contribution of the
noise to the observed data is small.

4 G. Varoquaux, A. Gramfort, F. Pedregosa, V. Michel, B. Thirion
optimization strategy relies on alternating optimizations of V
s
, U
s
, V, keeping
other pa rameters constant. In the following we give the mathematical analysis
of the optimization procedure; the exact operations are detailed in algorithm 1.
Following [18], we use a block coordinate descent, to minimize E a s a function
of U
s
. Solving Eq. (3) as a function of V
s
corres ponds to a ridge r egression
problem on the variable (V
s
V)
T
, the solution of which can be computed
efficiently (line 9, algorithm 1). Minimizing E as a function of V corresponds
to minimizing
P
S
s=1
1
2
kv
s
vk
2
2
+
λ
µ
(v) for all column vectors v of V. The
solution is a proximal operator [8], as detailed in lemma 1.
Lemma 1. argmin
v
P
S
s=1
1
2
kv
s
vk
2
2
+γ (v)
= prox
γ
/
S
¯v, where ¯v =
1
S
P
S
s=1
v
s
.
The proof of lemma 1 follows from the fac t that
P
S
s=1
kv
s
vk
2
2
= S
P
S
s=1
k¯v
vk
2
2
+
P
S
s=1
k¯v v
s
k
2
2
, as the seco nd ter m at the right ha nd side is independent
from v, the minimization problem simplifies to minimizing the first term, which
corres ponds to the problem solved by the proximal operator on ¯v.
Algorithm 1 Solving optimization problem given in Eq. (3)
Input: {Y
s
R
n×p
, s = 1, . . . , S}, the time series for each subject; k, the number of
maps; an initial guess for V.
Output: V R
p×k
the group-level spatial maps, {V
s
R
p×k
} the subject-specific
spatial maps, {U
s
R
n×k
} the associated time series.
1: E
0
, E
1
, i 1 (initialize variables).
2: V
s
V, U
s
Y
s
V(V
T
V)
1
, for s = 1 . . . S
3: while E
i
E
i1
> εE
i1
do
4: for s=1 to S do
5: for l=1 to k do
6: Update U
s
: u
s
l
u
s
l
+ kv
s
l
k
2
2
(Y
s
U
s
V
sT
)v
s
l
(following [15])
7: u
s
l
u
s
l
/ max(ku
s
l
k
2
, 1)
8: end for
9: Update V
s
(ridge regression): V
s
V + (Y
s
U
s
V
T
)
T
U
s
(U
sT
U
s
+ µI)
1
10: end for
11: Update V using lemma 1: V prox
λ
/
Sµ
1
S
P
S
s=1
V
s
.
12: Compute value of energy: E
i
E(U
s
, V
s
, V)
13: i i + 1
14: end while
Choice of i n itialization The o ptimization problem given by Eq. (3) is not
convex, and thus the output of algorithm 1 depends on the initialization. As
ICA applied to fMRI data extracts super Gaussian signals a nd thus can be
used for sparsity recovery [26], we initialize V with maps extracted with the
fastICA algorithm [14], initialized with a rando m mixing matrix. However, as
not all spatial maps es tima ted by ICA are super-Gaussian, we r un ICA with an

Citations
More filters
Journal ArticleDOI

Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI

TL;DR: The results suggest that gwMRF parcellations reveal neurobiologically meaningful features of brain organization and are potentially useful for future applications requiring dimensionality reduction of voxel-wise fMRI data.
Posted ContentDOI

Local-Global Parcellation of the Human Cerebral Cortex From Intrinsic Functional Connectivity MRI

TL;DR: The results suggest that gwMRF parcellations reveal neurobiologically meaningful features of brain organization and are potentially useful for future applications requiring dimensionality reduction of voxel-wise fMRI data.
Posted Content

Sparse Modeling for Image and Vision Processing

TL;DR: In this article, a self-contained view of sparse modeling for visual recognition and image processing is presented, where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.
References
More filters
Journal ArticleDOI

A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
Journal ArticleDOI

Atomic Decomposition by Basis Pursuit

TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.
Journal ArticleDOI

De-noising by soft-thresholding

TL;DR: The authors prove two results about this type of estimator that are unprecedented in several ways: with high probability f/spl circ/*/sub n/ is at least as smooth as f, in any of a wide variety of smoothness measures.
Journal ArticleDOI

An information-maximization approach to blind separation and blind deconvolution

TL;DR: It is suggested that information maximization provides a unifying framework for problems in "blind" signal processing and dependencies of information transfer on time delays are derived.
Journal ArticleDOI

Independent component analysis: algorithms and applications

TL;DR: The basic theory and applications of ICA are presented, and the goal is to find a linear representation of non-Gaussian data so that the components are statistically independent, or as independent as possible.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What have the authors contributed in "Multi-subject dictionary learning to segment an atlas of brain spontaneous activity" ?

The authors estimate this model in the dictionary learning framework, learning simultaneously latent spatial maps and the corresponding brain activity time-series. Unlike previous dictionary learning frameworks, the authors introduce an explicit difference between subject-level spatial maps and their corresponding population-level maps, forming an atlas. The authors show on simulated data that it can recover population-level maps as well as subject specificities. On resting-state fMRI data, the authors extract the first atlas of spontaneous brain activity and show how it defines a subject-specific functional parcellation of the brain in localized regions. 

After correction for slice timing differences, motion correction and inter-subject spatial normalization using SPM5, the authors extract the time series on a mask of the brain, resulting in roughly 500 samples per subject, and 25000 features – a dataset of 2Go. 

The Lipschitz constant of the smooth term is given by 1 + γ‖L‖, where ‖L‖ stands for the spectral norm of the Laplacian operator. 

In neuroimaging, multi-subject dictionary learning using a fixed group model (ζ = 0) in combination with ICA is popular, and called concatenated ICA [6]. 

In this paper, the authors formulate the problem in the dictionary learning framework and reject observation noise based on the assumption that the relevant patterns are spatially sparse [10, 26], and the authors focus on the choice of the involved parameters. 

This is important in population studies as, unlike previous work, their approach uses the full multi-subject data set to perform simultaneously denoising and latent factor estimation. 

Following [18], the authors use a block coordinate descent, to minimize E as a function of Us. Solving Eq. (3) as a function of Vs corresponds to a ridge regression problem on the variable (Vs − V)T , the solution of which can be computed efficiently (line 9, algorithm 1). 

The authors have measured by 3-fold cross validation the likelihood of left-out data as a function of model order for ICA and SPCA, but because of lack of time and computing resource not for MSDL (see Fig. 3). 

In addition, at high model-order the model can learn patterns that account for subject-to-subject variability, as the authors will see on simulated data. 

as already noted in [21], setting a high model order may only lead to a saturation of the likelihood of left-out data, and not a decrease. 

Minimizing E as a function of V corresponds to minimizing ∑Ss=1 1 2 ‖vs − v‖22 + λ µ Ω(v) for all column vectors v of V. Thesolution is a proximal operator [8], as detailed in lemma 1.Lemma 1. argmin v( ∑Ss=1 1 2 ‖vs−v‖22+γ Ω(v))= prox γ/S 

For each of these datasets, the authors compute the best assignment matching the estimated maps with the ground truth using the Kuhn-Munkres algorithm[20] to maximize crosscorrelation. 

Algorithm 1 Solving optimization problem given in Eq. (3)Input: {Ys ∈ Rn×p, s = 1, . . . , S}, the time series for each subject; k, the number of maps; an initial guess for V. Output: V ∈ Rp×k the group-level spatial maps, {Vs ∈ Rp×k} the subject-specific spatial maps, {Us ∈ Rn×k} the associated time series. 

If there is no spatial jitter across subjects, MSDL and SPCA perform similarly for the recovery of population-level maps, but SPCA outperforms MSDL for the recovery of subject-level maps. 

Their multi-subject model however differs from generalized canonical correlation analysis [16], and its sparse variants [1], as these approaches do not model subject-specific latent factors and thus do not allow for two levels of variance.