How many features are extracted from the mask?

After correction for slice timing differences, motion correction and inter-subject spatial normalization using SPM5, the authors extract the time series on a mask of the brain, resulting in roughly 500 samples per subject, and 25000 features – a dataset of 2Go.

What is the spectral norm of the Laplacian operator?

The Lipschitz constant of the smooth term is given by 1 + γ‖L‖, where ‖L‖ stands for the spectral norm of the Laplacian operator.

What is the popular method of learning a dictionary?

In neuroimaging, multi-subject dictionary learning using a fixed group model (ζ = 0) in combination with ICA is popular, and called concatenated ICA [6].

What is the importance of ICA in population studies?

This is important in population studies as, unlike previous work, their approach uses the full multi-subject data set to perform simultaneously denoising and latent factor estimation.

What is the simplest way to minimize E as a function of Vs?

Following [18], the authors use a block coordinate descent, to minimize E as a function of Us. Solving Eq. (3) as a function of Vs corresponds to a ridge regression problem on the variable (Vs − V)T , the solution of which can be computed efficiently (line 9, algorithm 1).

Why did the authors measure the likelihood of left-out data?

The authors have measured by 3-fold cross validation the likelihood of left-out data as a function of model order for ICA and SPCA, but because of lack of time and computing resource not for MSDL (see Fig. 3).

How can the model learn patterns that account for subject-to-subject variability?

In addition, at high model-order the model can learn patterns that account for subject-to-subject variability, as the authors will see on simulated data.

What is the probability of left-out data?

as already noted in [21], setting a high model order may only lead to a saturation of the likelihood of left-out data, and not a decrease.

How do the authors compute the assignment matching the estimated maps with the ground truth?

For each of these datasets, the authors compute the best assignment matching the estimated maps with the ground truth using the Kuhn-Munkres algorithm[20] to maximize crosscorrelation.

What is the way to recover subject-level maps?

If there is no spatial jitter across subjects, MSDL and SPCA perform similarly for the recovery of population-level maps, but SPCA outperforms MSDL for the recovery of subject-level maps.

(Open Access) Multi-subject dictionary learning to segment an atlas of brain spontaneous activity (2011) | Gaël Varoquaux

Q: What have the authors contributed in "Multi-subject dictionary learning to segment an atlas of brain spontaneous activity" ?

The authors estimate this model in the dictionary learning framework, learning simultaneously latent spatial maps and the corresponding brain activity time-series. Unlike previous dictionary learning frameworks, the authors introduce an explicit difference between subject-level spatial maps and their corresponding population-level maps, forming an atlas. The authors show on simulated data that it can recover population-level maps as well as subject specificities. On resting-state fMRI data, the authors extract the first atlas of spontaneous brain activity and show how it defines a subject-specific functional parcellation of the brain in localized regions.

Q: What is the problem in the dictionary learning framework?

In this paper, the authors formulate the problem in the dictionary learning framework and reject observation noise based on the assumption that the relevant patterns are spatially sparse [10, 26], and the authors focus on the choice of the involved parameters.

HAL Id: inria-00588898

https://hal.inria.fr/inria-00588898v2

Submitted on 19 Feb 2012

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Multi-subject dictionary learning to segment an atlas of

brain spontaneous activity

Gaël Varoquaux, Alexandre Gramfort, Fabian Pedregosa, Vincent Michel,

Bertrand Thirion

To cite this version:

Gaël Varoquaux, Alexandre Gramfort, Fabian Pedregosa, Vincent Michel, Bertrand Thirion. Multi-

subject dictionary learning to segment an atlas of brain spontaneous activity. Information Process-

ing in Medical Imaging, Gábor Székely, Horst Hahn, Jul 2011, Kaufbeuren, Germany. pp.562-573,

�10.1007/978-3-642-22092-0_46�. �inria-00588898v2�

Multi-subject dictionary learning to segment an

atlas of brain spontaneous activity

G. Varoquaux

123

, A. Gramfort

, F. Pedregos a

, V. Michel

, B. Thirion

INSERM U992 Cognitive Neuroimaging unit,

INRIA , Parietal team, Saclay, France

LNAO/NeuroSpin, CEA Saclay, Bat. 145, 91191 Gif-sur-Yvette, cedex France

Abstract. Fluctuations in brain on-going activity can be used to reveal

its intrinsic functional organization. To mine this information, we give

a new hierarchical probabilistic model for brain activity patterns that

does not require an experimental design to be speciﬁed. We estimate this

mod el in the dictionary learning framework, learning simultaneously la-

tent spatial maps and the corresponding brain activity time-series. Un-

like previous dictionary learning frameworks, we introduce an explicit

diﬀerence between subject-level spatial maps and th eir correspond in g

population-level maps, forming an atlas. We give a novel algorithm us-

ing convex optimization techniques to solve eﬃciently this problem with

non-smooth penalties well-suited to image denoising. We show on simu-

lated data that it can recover population-level maps as well as subject

sp eciﬁcities. On resting-state fMRI data, we extract the ﬁrst atlas of

sp ontaneous brain activity and show how it deﬁnes a subject-speciﬁc

functional parcellation of the brain in localized regions.

1 Introduction

The study of intrinsic brain functional organization via distant correlations in the

ﬂuctuations of brain s ignals measured by functional Magnetic Resonance Imag-

ing (fMRI) is receiving increasing interest. In particular, the 1000 Functional

Connectomes project aims at parceling the brain in functional regions and then

at studying the correlation structure o f brain function across these nodes [5]. In-

depe ndent Component Analysis (ICA) is the most popular data-driven approach

to analyze spontaneous activity, as it has been shown to extract interpretable

spatial patterns [3] that are reproducible across subjects [9]. They form networks

of functional regions that are also found in task-driven studies [23].

From a medical point of view, the development of sta tis tically-controlled

analysis of brain spontaneous is interesting as it can lead to new diagnostic or

prognos tic tools applicable on impaired patients. In particular, correlations in

the functional signal between predeﬁned regions have been shown to contain

markers o f post-stroke functiona l reorganization [25]. Howe ver, inferences drawn

from these regio ns depends on the targeted regions, and on their precise delin-

eation. In addition, subject-to-subject local variability, for instance in functional

topography, may c onfound the changes in long-dista nc e interactions.

2 G. Varoquaux, A. Gramfort, F. Pedregosa, V. Michel, B. Thirion

We address the segmentation of functional regions directly from the fMRI

signal. The challenge stems from the lack of salient feature s in the original s ignal,

as well as the lack of a controlled experimental design to perform model ﬁtting

as in task- driven fMRI experiments. In particular, it is diﬃcult to optimize the

parameters (dimension and regularization) of the models, hence to obtain an

arguably faithful and meaningful representation of this data. ICA tackles these

diﬃculties by estimating a mixing matrix to minimize the mutual information

between the resulting spatial components. Departing from ICA, [12] performs

segmentation by clustering the time se ries through a mixture model. However,

these approache s lack an explicit noise model and do not take into account the

subject-to-s ubject variability nor the spatial str uc ture of the signal. In this paper,

we formulate the problem in the dictionary learning framework and reject ob-

servation nois e based on the assumption that the relevant patterns are spatially

sparse [10, 26], and we focus on the choice of the involved parameters. The paper

is organized as follows: we give in section 2 a two-level probabilistic model that

involves subject-speciﬁc spatial maps as well as population-level latent maps, and

in section 3 a n associated eﬃcient learning algorithm. In section 4 we describe

how to set the model parameters fr om the data. I n section 5 , we study diﬀerent

learning schemes on synthetic data with simulated inter-individual variability.

Finally, in s e ction 6 we apply the method to learning a detailed population-level

atlas of regions describing spontaneous activity as recorded in fMRI.

2 Multi-subject decomposition model for brain activity

Problem statement We consider a dataset of br ain signal time se ries of length

n for S subjects, measured on p voxels: {Y

∈ R

n×p

, s = 1 . . . S}. We stipula te

that the corresponding 3D image s are the observation of k spatial latent factors

∈ R

p×k

, that characterize functional processes or structured measurement

artifacts, and associated time series U

∈ R

n×k

: Y

≈ U

. We are interested

in the study of resting state, or on-going activity, for which no experimental

design can be used to model time-courses, thus we propose to learn U

and

simultaneously, a problem known as dictionary learning, or linear signal

decomposition [7, 17].

Generative model In the case of a multi-subject dataset, we give a hierarchi-

cal probabilistic model for dictionary learning. Following the standard dictionary

learning model, the data observed for each s ubject is written as the linear com-

bination of s ubject-speciﬁc dictionary elements, that ar e spatial maps V

. For

resting-sta te brain activity, we do not model the loadings U

themselves, but

their cova riance.

∀s ∈ {1 . . . S}, Y

= U

, E

∼ N (0, σI), U

∼ N (0, Σ

) (1)

In addition, the subject-speciﬁc maps V

are generated from population-level

latent factors, the spatial patterns written as brain maps V:

∀s ∈ {1 . . . S}, V

= V + F

, F

∼ N (0, ζI) (2)

Title Suppressed Due to Excessive Length 3

Finally, we specify the prior distribution on V: P(V) ∝ exp (−ξ Ω(V)), where

Ω is typically a norm or a quasi-norm.

Relation to existing models With ζ = 0, the model identiﬁes V

with V: all

latent fa c tors are the same across subjects. In this context, if the prior on V is

un-informative, the model b oils down to a principal component analysis (PCA)

on the concatenated Y

. Fo r a Laplace prior, we recover probabilistic fo rmulation

of a sta ndard sparse ℓ

-penalized PCA [22]. More generally, in this framework,

sparsity-inducing priors give rise to a family of probabilistic projection models

[1]. Our multi-subject model however diﬀers from generalized canonical correla-

tion analysis [16], and its spar se variants [1], as these approaches do not model

subject-speciﬁc latent factors and thus do not allow for two levels of variance.

Note that, for multi-subject studies, non-hierarchical models based on PCA and

ICA impose orthog onality constraints on the loadings at the group level, and

thus introduce a unnatural constraint on the U

across the diﬀerent subjects.

ICA can be formulated in a maximum likelihood approach [4] and thus falls

in the same general class of non-hierar chical dictionary learning models [17].

However, as ICA disregards explained variance, it leads to improper priors on

V and requires the use of a PCA pre-processing step to estimate the noise

[3].

In neuroimaging, multi-subject dictio nary learning using a ﬁxed group model

(ζ = 0) in combination with ICA is p opular, and called concatenated ICA [6]. In

the experimental section of this paper, we will focus on the us e of proper priors

on V based on spars ity-inducing norms Ω, such as the ℓ

norm. They are known

to be eﬃcient in terms of separating signal from noise, in the supervised settings

[27], and lead to tractable optimizations that are convex, though non-smooth.

3 Optimization strategy for eﬃcient learning

We now present a new algorithm to eﬃciently estimate from the data at hand the

model speciﬁed by Eq. (1) and (2). I n the following, we call this problem Multi-

Subject Dictionary Learning (MSDL). In the maximum a posteriori (MAP) es-

timation framework, we le arn the pa rameters from the data by maximizing the

sum of the log-likelihood of the data given the model, and penalization terms

that express our hierarchical priors. In addition, as the variance of the group-

level residuals in Eq. (2) could be arbitrarily shrunk by shrinking the norm of

V, we impose an upper bound on the norm of the columns of U

, V

)

s∈{1...S},

, V = argmin

E(U

, V

, V), s.t. ku

≤ 1 (3)

with E(U

, V

, V) =

s=1



− U

Fro

+ µkV

− Vk

Fro



+ λ Ω(V),

where µ =

and λ =

. The optimization problem given by Eq. (3) is not jointly

convex in U

, V

, and V, however it is separately convex in V

and (U

, V). Our

There exist noisy ICA approaches, but they all assume that the contribution of the

noise to the observed data is small.

4 G. Varoquaux, A. Gramfort, F. Pedregosa, V. Michel, B. Thirion

optimization strategy relies on alternating optimizations of V

, U

, V, keeping

other pa rameters constant. In the following we give the mathematical analysis

of the optimization procedure; the exact operations are detailed in algorithm 1.

Following [18], we use a block coordinate descent, to minimize E a s a function

of U

. Solving Eq. (3) as a function of V

corres ponds to a ridge r egression

problem on the variable (V

− V)

, the solution of which can be computed

eﬃciently (line 9, algorithm 1). Minimizing E as a function of V corresponds

to minimizing

s=1

− vk

Ω(v) for all column vectors v of V. The

solution is a proximal operator [8], as detailed in lemma 1.

Lemma 1. argmin



s=1

−vk

+γ Ω(v)



= prox

Ω

¯v, where ¯v =

s=1

The proof of lemma 1 follows from the fac t that

s=1

− vk

= S

s=1

k¯v −

s=1

k¯v − v

, as the seco nd ter m at the right ha nd side is independent

from v, the minimization problem simpliﬁes to minimizing the ﬁrst term, which

corres ponds to the problem solved by the proximal operator on ¯v.

Algorithm 1 Solving optimization problem given in Eq. (3)

Input: {Y

∈ R

n×p

, s = 1, . . . , S}, the time series for each subject; k, the number of

maps; an initial guess for V.

Output: V ∈ R

p×k

the group-level spatial maps, {V

∈ R

p×k

} the subject-speciﬁc

spatial maps, {U

∈ R

n×k

} the associated time series.

1: E

← ∞, E

← ∞, i ← 1 (initialize variables).

2: V

← V, U

← Y

V(V

−1

, for s = 1 . . . S

3: while E

− E

i−1

> εE

i−1

4: for s=1 to S do

5: for l=1 to k do

6: Update U

: u

← u

+ kv

−2

− U

(following [15])

7: u

← u

/ max(ku

, 1)

8: end for

9: Update V

(ridge regression): V

← V + (Y

−U

)

+ µI)

−1

10: end for

11: Update V using lemma 1: V ← prox

Sµ

Ω



s=1



12: Compute value of energy: E

← E(U

, V

, V)

13: i ← i + 1

14: end while

Choice of i n itialization The o ptimization problem given by Eq. (3) is not

convex, and thus the output of algorithm 1 depends on the initialization. As

ICA applied to fMRI data extracts super Gaussian signals a nd thus can be

used for sparsity recovery [26], we initialize V with maps extracted with the

fastICA algorithm [14], initialized with a rando m mixing matrix. However, as

not all spatial maps es tima ted by ICA are super-Gaussian, we r un ICA with an

Multi-subject dictionary learning to segment an atlas of brain spontaneous activity

Figures

Citations

Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI

Machine learning for neuroimaging with scikit-learn.

Resting-state fMRI in the Human Connectome Project

Local-Global Parcellation of the Human Cerebral Cortex From Intrinsic Functional Connectivity MRI

Sparse Modeling for Image and Vision Processing

References

A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

Atomic Decomposition by Basis Pursuit

De-noising by soft-thresholding

An information-maximization approach to blind separation and blind deconvolution

Independent component analysis: algorithms and applications

Related Papers (5)

The organization of the human cerebral cortex estimated by intrinsic functional connectivity

A method for making group inferences from functional MRI data using independent component analysis

Correspondence of the brain's functional architecture during activation and rest.

Functional connectivity in the motor cortex of resting human brain using echo-planar MRI.

Investigations into resting-state connectivity using independent component analysis

Frequently Asked Questions (15)

Q1. What have the authors contributed in "Multi-subject dictionary learning to segment an atlas of brain spontaneous activity" ?

Q2. How many features are extracted from the mask?

Q3. What is the spectral norm of the Laplacian operator?

Q4. What is the popular method of learning a dictionary?

Q5. What is the problem in the dictionary learning framework?

Q6. What is the importance of ICA in population studies?

Q7. What is the simplest way to minimize E as a function of Vs?

Q8. Why did the authors measure the likelihood of left-out data?

Q9. How can the model learn patterns that account for subject-to-subject variability?

Q10. What is the probability of left-out data?

Q11. What is the simplest way to minimize E as a function of V?

Q12. How do the authors compute the assignment matching the estimated maps with the ground truth?

Q13. What is the simplest way to solve the optimization problem?

Q14. What is the way to recover subject-level maps?

Q15. What is the difference between the two approaches?