What is the effect of the randomization procedure on the comparisons between maps?

The full topographic information of the map is being reduced to a labeling, which as a consequence reduces the comparisons among maps from a continuous and parametric range of similarity or dissimilarity to a binary statement of same or different.

Why is the individual assignment method not able to identify the interactions?

The fact that in the sample analysis, the individual assignment method failed to identify the interactions may be explained by this problem, especiallyalso because these effects occurred in periods of relatively low GFP, where the SNR is typically lower, and common map features are more likely to be obscured by noise.

What was the use of the GFP analysis?

For the topographic analysis, a generalized measure of map differences was used (Koenig et al. 2011), the GFP analysis employed the difference of GFP of the same maps.

What is the probability of the null hypothesis being compatible with the data?

This is done by simple rank statistics, and the probability of the data being compatible with the null hypothesis is defined by the proportion of quantifiers obtained under the null-hypothesis that were larger or equal to the quantifier obtained in the real data.

(Open Access) A tutorial on data-driven methods for statistically assessing ERP topographies. (2014) | Thomas Koenig

Q: What are the contributions mentioned in the paper "A tutorial on data-driven methods for statistically assessing erp topographies" ?

The authors therefore propose a randomizationbased procedure that works without assigning grand-mean microstate prototypes to individual data. In addition, the authors propose a new criterion to select the optimal number of microstate prototypes based on cross-validation across subjects. The authors conclude that the proposed method is well-suited for the assessment of timing differences in cognitive processes.

Q: What is the way to assess the generalizability of the model?

Their proposal is that in ERP microstate models, the generalizability of the model can be assessed by testing it’s consistency across subjects; the parts of the data that can be observed independently of the individual subjects belong to the optimal microstate model, while those parts of the data that depend on the individual subjects should not be part of the model.

ORIGINAL PAPER

A Tutorial on Data-Driven Methods for Statistically

Assessing ERP Topographies

Thomas Koenig

•

Maria Stein

•

Matthias Grieder

•

Mara Kottlow

Received: 2 April 2012 / Accepted: 14 August 2013 / Published online: 29 August 2013

Ó Springer Science+Business Media New York 2013

Abstract Dynamic changes in ERP topographies can be

conveniently analyzed by means of microstates, the so-

called ‘‘atoms of thoughts’’, that represent brief periods of

quasi-stable synchronized network activation. Comparing

temporal microstate features such as on- and offset or

duration between groups and conditions therefore allows a

precise assessment of the timing of cognitive processes. So

far, this has been achieved by assigning the individual

time-varying ERP maps to spatially deﬁned microstate

templates obtained from clustering the grand mean data

into predetermined numbers of topographies (microstate

prototypes). Features obtained from these individual

assignments were then statistically compared. This has the

problem that the individual noise dilutes the match between

individual topographies and templates leading to lower

statistical power. We therefore propose a randomization-

based procedure that works without assigning grand-mean

microstate prototypes to individual data. In addition, we

propose a new criterion to select the optimal number of

microstate prototypes based on cross-validation across

subjects. After a formal introduction, the method is applied

to a sample data set of an N400 experiment and to simu-

lated data with varying signal-to-noise ratios, and the

results are compared to existing methods. In a ﬁrst com-

parison with previously employed statistical procedures,

the new method showed an increased robustness to noise,

and a higher sensitivity for more subtle effects of micro-

state timing. We conclude that the proposed method is

well-suited for the assessment of timing differences in

cognitive processes. The increased statistical power allows

identifying more subtle effects, which is particularly

important in small and scarce patient populations.

Keywords Microstates  Timing  Statistics 

Randomization  Topography  Model selection

Introduction

Scalp recorded evoked potentials permit the non-invasive

mapping of human brain functions at an excellent tem-

poral resolution. This allows for the decomposition of

complex cognitive processes into a sequence of process-

ing stages, each with a different functional signiﬁcance

(Lehmann 1990; Murray et al. 2008). Importantly, an

unequivocal distinction of ERP components originating

from different brain regions can be obtained by com-

paring the topographies of scalp electromagnetic ﬁelds of

the ERP (McCarthy and Wood 1985; Michel et al. 2009).

By identifying and comparing ERP scalp topographies, it

is thus possible to track changes of brain functional

states, where a state is deﬁned globally by a speciﬁc

distribution of one or several simultaneously active brain

regions. Spatial analysis of scalp electromagnetic ﬁelds

(Lehmann and Skrandies 1984) has moreover the

advantage of being reference independent, as topographic

conﬁgurations are not inﬂuenced by a reference electrode

(Lehmann 1987).

T. Koenig (&)  M. Stein  M. Grieder  M. Kottlow

Department of Psychiatric Neurophysiology, University Hospital

of Psychiatry, University of Bern, Bern, Switzerland

e-mail: thomas.koenig@puk.unibe.ch

M. Stein

Department of Clinical Psychology and Psychotherapy, Institute

of Psychology, University of Bern, Bern, Switzerland

M. Kottlow

Institute of Pharmacology and Toxicology, University of Zurich,

Zurich, Switzerland

123

Brain Topogr (2014) 27:72–83

DOI 10.1007/s10548-013-0310-1

A commonly used way to compare multichannel ERP

data between groups or conditions is to quantify the dif-

ference of the topography in a given time range and to test

it for signiﬁcance. Various such methods exist and have

been proven to allow for a sound assessment of topo-

graphic differences in ERPs (Koenig et al. 2011; Lehmann

1987; Lehmann et al. 1993; Nishida et al. 2013; Strik et al.

1998). If the topography of a certain process is known, it is

also possible to quantify the amount of ERP variance that

can be attributed to this process and compare different

datasets based on this quantiﬁer (Brandeis et al. 1992).

Another approach to multichannel ERP analyses are

various kinds of data driven spatio-temporal factor analy-

ses, such as principal component analysis (PCA), inde-

pendent component analysis (ICA), or as discussed in more

detail below, cluster analysis. Factor analyses of multi-

channel ERP data describe an ERP as composed of a

limited set of constant topographies, each with a speciﬁc

time course. The comparison of ERPs among different

groups or conditions is then primarily based on a com-

parison of the time-course of selected factors. A good

overview of spatial factor analysis methods (PCA, ICA,

microstates) in comparison to classical ERP approaches is

provided by Pourtois et al. (2008).

While PCA and ICA were primarily based on statistical

arguments such as independence among the factors, the

rationale for using cluster analysis emerged from the

observation of periods of stable ﬁeld conﬁgurations typi-

cally separated by brief moments of rapid transitions

(Lehmann 1990; Wackermann et al. 1993). These periods

of quasi-stable ﬁeld conﬁgurations were called microstates

(Lehmann and Skrandies 1980). They offered a natural,

data-driven and bottom-up deﬁnition of a brain functional

state as a period where a quasi-stable ﬁeld conﬁguration

was observed. Meanwhile, microstate analysis has become

a widely accepted tool for the assessment of the sequence

of functional states in ERPs (see Murray et al. 2008, for a

review). Microstates could also be observed in the elec-

trocorticogram of mice (Megevand et al. 2008). In addition,

it is also possible to identify microstates in the ongoing

resting EEG (Koenig et al. 2002; Lehmann 1990) and

microstate analyses of single trial ERP data have been

proven to be a sensitive and unique tool to track cognitive

processes on a single subject level (De Lucia et al. 2010,

2012; Tzovara et al. 2012a, b, 2013).

Technically, ERP microstate analysis based on spatial

clustering identiﬁes a small set of prototypical ERP

topographies that can be observed in the measured data (so

called microstate class maps) and assigns each time period

of the ERP to exactly one of these microstate class maps

based on a best ﬁt criterion (Murray et al. 2008; Pascual-

Marqui et al. 1995). Whereas the microstate maps corre-

spond to the forward solution of all sources contributing to

a microstate class, the assignment step yields the time of

the on- and offset of the microstates in the ERP. If this

algorithm is used to identify microstates in data consisting

of several experimental conditions or groups, the assign-

ment can be used to identify differences in the timing of a

given microstate class (i.e. onset, offset and duration),

which is a very elegant and efﬁcient way to exploit the

information yielded by the high temporal resolution of the

data.

On the level of statistics, the microstate analyses per-

formed so far have been done by identifying the microstate

maps in ERP datasets averaged over a group of subjects

(grand mean ERPs), but the assignment was then done in

the ERPs of the individuals. From this individual assign-

ments, several parameters were extracted for a given

microstate map, such as the variance explained by the map,

the time when the ﬁrst or last assignment to the map was

observed, or the total number of time points assigned to the

map. These individual parameters were then entered into

classical, usually parametric, univariate test statistics such

as t tests or ANOVAs (Michel et al.

2009).

While this approach has been applied successfully in a

series of studies (Arzy et al. 2007; Chouiter et al. 2013;

Darque et al. 2012; Knebel and Murray 2012; Kottlow

et al. 2011; Kovalenko et al. 2012; Laganaro and Perret

2011; Overney et al. 2005; Pannekamp et al. 2011; Pegna

et al. 1997; Perret and Laganaro 2012; Pourtois 2011;

Spierer et al. 2007; Stevenson et al. 2012; Taha et al. 2013),

it appeared to the authors that the method can still be

improved to increase statistical power and decrease the

effects of individual variance. Our criticism is that in the

above described approach, the microstate maps are com-

pared to data that has not been directly available to the

clustering algorithm, which obviously impoverishes the

amount of variance explained by the microstate maps.

Furthermore, the individual data contains individual vari-

ance that is usually of little interest, but reduces the topo-

graphic similarity to the microstate maps. We suspect that

this loss of similarity resulting from comparing microstate

maps obtained in grand mean data to individual ERPs may

negatively affect the resulting statistical power.

Our proposal is thus to develop a statistical test for

microstate features where the assignment procedure

remains on the level of the grand mean data. This is

expected to improve the similarity between the microstate

maps and the data these maps are assigned to, and thus

increase the statistical power of the results. For this pur-

pose, we will employ randomization techniques, which

(although computationally expensive) allow custom-tai-

loring statistical tests to such speciﬁc problems.

A further aim of the paper is to propose a solution to the

problem of selecting the appropriate number of microstate

maps. This selection has so far been made on criteria

Brain Topogr (2014) 27:72–83 73

123

extracted from grand mean data (Pascual-Marqui et al.

1995), and the individual variance has been neglected. In

general, the aim of model selection procedures (such as

selecting a number of microstate maps) is to choose a

model that captures as much of that part of the data that

follows some generalizable rules, while it is oblivious to

random noise. Our proposal is that in ERP microstate

models, the generalizability of the model can be assessed

by testing it’s consistency across subjects; the parts of the

data that can be observed independently of the individual

subjects belong to the optimal microstate model, while

those parts of the data that depend on the individual sub-

jects should not be part of the model. The optimal model

(i.e. the optimal number of microstate maps) should thus

maximize the amount of explained variance that is inde-

pendent of individual attributes. This criterion can be

evaluated using cross-validation procedures across subjects

(Devijver and Kittler 1982).

In the following methods and results sections, we will

give a detailed explanation of the procedures and apply it

to a real sample dataset and a series of simulated datasets

with deﬁned signal to noise ratios (SNRs). We will then

also analyze the same dataset with the established meth-

odology and compare the results.

As sample data set we chose data of healthy US

American subjects staying in Switzerland for a German

language exchange. EEG was measured while subjects

performed a sentence reading task once at the beginning

and once at a later phase of their stay (Stein et al. 2006).

These sentences ended with semantically correct or incor-

rect endings. Incorrect versus correct sentence endings

have been found to induce a so-called N400 effect which

was described by (Kutas and Hillyard 1980).

Methods

Selection of the Optimal Microstate Model

As outlined in the introduction, we aimed to identify a

microstate model that is sufﬁciently complex to accom-

modate the part of the data variance that is common across

subjects, while avoiding to account for variance that

appears to be tied to individual attributes. This type of

problems is typically addressed using cross-validation,

where models of different complexity are constructed

based on a subset of the available data, and the resulting

models are then used to make predictions for the remaining

data. Therein, the optimal model is the one that minimizes

the prediction error (Devijver and Kittler 1982).

In the context of microstate modeling, we propose to

implement microstate model selection through cross-vali-

dation by computing microstate models with different

numbers of microstate classes based on ERPs averaged

over a subset of the subjects (training data). These micro-

state models are then tested for their predictive value

(mean correlation) in the ERP’s averaged over the subjects

not included in the construction of the microstate model

(test data). Since the mean correlation of the test data with

a model will depend on the division of the data into

training- and test-sets, this procedure has to be repeated

with different, randomly created subsets of training and test

data. For each number of microstates, the mean correlation

of the test data with the model is then averaged across the

results obtained in the different subsets. The optimal

number of microstates is selected where this grand mean

correlation is maximal.

Note that this procedure contains no measures to mini-

mize the total number of microstates per se, but only

minimizes the number of microstates that cannot be found

consistently across subjects. The encountered number of

microstates therefore does not represent something that

necessarily generalizes across studies, but rather something

that is optimally suited for a dataset with a limited size.

Computationally, the procedure is illustrated in Fig. 1

and is as follows:

1. The algorithm randomly subdivides the subjects into a

training and a test dataset. If the subjects belonged to dif-

ferent groups, each dataset must contain members of all

groups.

2. Grand mean ERPs are computed in the training and

test datasets as a function of group and condition.

3. Spatio-temporal microstate models with different

numbers of microstate maps are computed in the grand

means of the training dataset. This model contains both the

Fig. 1 Flow-chart illustrating the procedure for the selection of the

optimal microstate model

74 Brain Topogr (2014) 27:72–83

123

topographies of the microstate maps as well as the time

instances when these microstate maps are observed.

4. The mean correlation of the test data with each

microstate model is computed (Eq. 1)

Mean correlation ¼

t¼1

CorrðV

; T

ð1Þ

where t is time, nt is the number of time points, Corr is the

correlation function, V

is the voltage vector of the test data

at time t, and T

is the voltage vector of the microstate class

observed in the training data at time t. If several conditions

or groups are available, the mean correlation is computed

in each condition and group and averaged.

5. Steps 1–4 are repeated for a sufﬁcient number of

times, and the mean correlations from each run are

retained.

6. The mean correlations are averaged across repetitions

and the number of microstate classes yielding the maxi-

mum mean correlation is identiﬁed. This represents the

optimal number of microstate classes for the analysis of the

given dataset.

7. The microstate templates with the optimal number of

classes are now computed using the grand mean ERPs of

all available subjects and conditions.

Once the optimal microstate model has been identiﬁed,

we can proceed to the statistical evaluation of the experi-

mental manipulations in the entire dataset.

Statistical Testing of Differences in Microstate Models

As in any statistical testing, an analysis of ERP microstate

features needs to compare an effect (e.g. a difference in the

onset of a given microstate class in the ERPs of two

groups) against the distribution of this effect under the null-

hypothesis. While in classical statistics, this distribution is

estimated based on the variance of the individual data, and

on assumptions about the nature of the distribution, ran-

domization statistics determine this distribution based on

simulations of the effect under the null hypothesis. For our

purposes, the important point here is that with randomi-

zation statistics, we can simulate ERP data under the null-

hypothesis and still compute grand mean ERPs, and

therefore still assess microstate effects based on these

grand means while the null-hypothesis is true.

In general, randomization based statistics consist of the

following three steps (Manly 2007):

1. Quantiﬁcation of an effect of interest in the measured

data.

2. Creation of cases of the same quantiﬁer compatible

with the null hypothesis. This is achieved by repeat-

edly applying the quantiﬁer to the measured data after

randomizing it in a way that eliminates the suspected

structure in the data.

3. Comparison of the distribution of the quantiﬁer

obtained in the real data with the distribution of the

quantiﬁer under the null-hypothesis.

We will follow this scheme for our microstate statistics,

with the constraint that the assignment procedure shall

always be applied on the level of the grand mean data. The

proposed procedure is also illustrated in Fig. 2.

To quantify the effect of interest (step 1), we propose to

use the previously employed features extracted from the

established microstate assignment procedures (Murray

et al. 2008). These features are speciﬁc for a given

microstate map and for the given ERP and include, among

others, the amount of variance explained by the map, the

time point of the ﬁrst (onset) or last (offset) assignment of

the ERP to that map, or the count of time-points assigned to

the maps. The important difference to the previously pro-

posed method is that in our procedure, these features are

extracted after the microstate maps have been assigned to

group and/or condition speciﬁc grand mean data and not to

the individual data. The quantiﬁer of the effect of interest is

then deﬁned by the variance of the feature extracted from

the different groups and/or conditions. For example, in an

analysis of the onset of a language related microstate under

two different conditions, the quantiﬁer of the effect of

interest could be deﬁned as the difference of onset of the

ﬁrst occurrence of the language related microstate map

between the two conditions (the difference here is equiv-

alent to the variance of the two onsets). If we would

hypothesize that the language related microstate system-

atically differs between three groups of subjects, our

quantiﬁer could for example be the variance among the

onsets obtained from the grand means of each of the three

groups.

For the creation of instances of the chosen quantiﬁer

under the null hypothesis (step 2), we propose to randomize

the ERP data such that the possible suspected structure of

interest in the data is eliminated. For example, if we sup-

pose that semantically expected and unexpected sentence

endings systematically lead to different responses in a

group of subjects, we would construct data with two ran-

dom conditions R1 and R2 and randomly assign, in each

subject, the ERPs of expected sentence endings to either

R1 or R2, and the ERPs of unexpected sentence endings to

the remaining random condition. If we expected that two

groups of subjects (e.g. good and weak learners) differ

systematically, we would randomly shufﬂe the ERPs of

each subject among the two groups. Once this randomi-

zation has been done, the random group and/or condition

‘‘speciﬁc’’ grand means ERPs can be computed, and the

quantiﬁer of interest can again be computed as above. The

Brain Topogr (2014) 27:72–83 75

123

important difference to the previously employed procedure

is again that the microstate assignment necessary for the

feature extraction is computed in grand mean data.

Finally, the quantiﬁer obtained in the measured data in

step 1 is compared to the distribution of the quantiﬁer

obtained under the null hypothesis (step 3). This is done by

simple rank statistics, and the probability of the data being

compatible with the null hypothesis is deﬁned by the pro-

portion of quantiﬁers obtained under the null-hypothesis

that were larger or equal to the quantiﬁer obtained in the

real data. As an example, let us assume that our ﬁrst

example above, the difference of onset obtained from the

randomized data was larger than the difference obtained

from the real data in 7 out of 500 cases. The probability

p that the observed difference is compatible with the null

hypothesis is then 7/500 = 0.014, which would (given an

alpha-level of 0.05) indicate that it is signiﬁcant. If the

variance of the onset of the three groups obtained after

randomizing the data would be larger than the variance

obtained in the real data in 1,293 out of 5,000 randomi-

zation runs, the probability p that the observed group dif-

ferences were obtained by chance is estimated to be 1,293/

5,000 = 0.259, which would typically be considered as

not-signiﬁcant. Note that the distribution of the quantiﬁer

under the null-hypothesis depends on the precise random

permutations and assignments and may thus vary. The

resulting p value is thus not an exact value, but an estimate.

The literature suggests that for a reliable rejection of the

null-hypothesis on a 5 % level, 1,000 randomization runs are

necessary, and for an estimate at the 1 % level, 5,000

randomization runs are recommended (Manly 2007). In

contrast to parametric methods, statistical tests as the one

described here are ultimately based on rank statistics.

Therefore, they can be expected to be more robust against

false positive results due to biases and outliers in individual

data.

Sample Data Analysis and Simulations

The sample data and analysis are based on an experiment

that has previously been used to demonstrate statistical

procedures of the analysis of ERPs (Koenig et al. 2008,

2011). These data consist of ERPs recorded in 16 healthy

young English-speaking exchange students that spent a

year in the German-speaking part of Switzerland and that

participated in a larger study on the neurobiology of

training-related changes of the language system (Koenig

et al. 2008; Stein et al. 2006). Participants passively viewed

on a computer screen word-by-word presented German

sentences with semantically expected or unexpected sen-

tence endings. This is a typical setup to elicit the so-called

N400; an ERP component that is associated with the vio-

lation of semantic expectancy and characterized by a

parietal negativity peaking around 400 ms after stimulus

presentation (Brandeis et al. 1995; Kutas and Hillyard

1980). Subjects were recorded twice, once at the beginning

of their stay, and once after having lived about 3 months in

Switzerland. The aim of the experiment was to track the

progress of semantic integration in the acquired foreign

language using an N400 paradigm. The measured data

Fig. 2 Flow-chart depicting the

proposed statistical testing of

the microstate models

76 Brain Topogr (2014) 27:72–83

123

A tutorial on data-driven methods for statistically assessing ERP topographies.

Figures

Citations

EEG microstates as a tool for studying the temporal dynamics of whole-brain neuronal networks: A review.

The multisensory function of the human primary visual cortex

A Student's Guide to Randomization Statistics for Multichannel Event-Related Potentials Using Ragu.

Schizophrenia patients and 22q11.2 deletion syndrome adolescents at risk express the same deviant patterns of resting state EEG microstates: A candidate endophenotype of schizophrenia

Data-driven region-of-interest selection without inflating Type I error rate

References

Randomization, Bootstrap and Monte Carlo Methods in Biology

Reading senseless sentences: brain potentials reflect semantic incongruity

Pattern recognition : a statistical approach

Scalp distributions of event-related potentials: An ambiguity associated with analysis of variance models

Reference-free identification of components of checkerboard-evoked multichannel potential fields

Related Papers (5)

Topographic ERP Analyses: A Step-by-Step Tutorial Review

Reference-free identification of components of checkerboard-evoked multichannel potential fields

Segmentation of brain electrical activity into microstates: model estimation and validation

Mapping of scalp potentials by surface spline interpolation.

EEG source imaging

Frequently Asked Questions (9)

Q1. What are the contributions mentioned in the paper "A tutorial on data-driven methods for statistically assessing erp topographies" ?

Q2. How many randomization runs are recommended for a reliable rejection of the null hypothesis?

Q3. What is the effect of the randomization procedure on the comparisons between maps?

Q4. What is the way to assess the generalizability of the model?

Q5. Why is the individual assignment method not able to identify the interactions?

Q6. What is the way to compare multichannel ERP data?

Q7. What was the use of the GFP analysis?

Q8. What is the probability of the null hypothesis being compatible with the data?

Q9. What is the probability that the difference is compatible with the null hypothesis?