scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Ica: a potential tool for bci systems

01 Jan 2008-IEEE Signal Processing Magazine (IEEE)-Vol. 25, Iss: 1, pp 57-68
TL;DR: A comparative study of widely used ICA algorithms in the BCI community, conducted on simulated electroencephalography (EEG) data, shows that an appropriate selection of an ICA algorithm may significantly improve the capabilities of BCI systems.
Abstract: Several studies dealing with independent component analysis (ICA)-based brain-computer interface (BCI) systems have been reported. Most of them have only explored a limited number of ICA methods, mainly FastICA and INFOMAX. The aim of this article is to help the BCI community researchers, especially those who are not familiar with ICA techniques, to choose an appropriate ICA method. For this purpose, the concept of ICA is reviewed and different measures of statistical independence are reported. Then, the application of these measures is illustrated through a brief description of the widely used algorithms in the ICA community, namely SOBI, COM2, JADE, ICAR, FastICA, and INFOMAX. The implementation of these techniques in the BCI field is also explained. Finally, a comparative study of these algorithms, conducted on simulated electroencephalography (EEG) data, shows that an appropriate selection of an ICA algorithm may significantly improve the capabilities of BCI systems.

Summary (3 min read)

A. The concept of ICA

  • Such BCI systems generally exploit EEG which has a high time resolution (below 100 ms).
  • This task is dealt with in section III-E.
  • When people move their hands a brain wave called theMu wave gets blocked and disappears completely.
  • In such an example, when noninvasive measurements as EEG are used, the surface sensors record the result of theMu wave diffusion from the motor cortex towards the scalp, corrupted by artifacts such as eye movements.
  • Eventually for this instance, onlythe statistical independence between theMu wave and the other sources is crucial since only theMu wave is of interest for this BCI application.

B. A class of statistical tools to perform ICA

  • One of the fundamental questions one should ask oneself in order to choose the most appropriate ICA method in a BCI context is how to characterize the statistical independence of a set ofP random signals{yp[m]}m∈ when one realization of each signal is available.
  • Note that moments do not enjoy these two key properties.
  • Lastly, iv) moments and cumulants satisfy the multi-linearity property [32].
  • The SOBI, COM2, JADE ans ICAR methods perform ICA from the cumulants of the data.

C. Numerical complexity

  • This section aims at giving some insights into the numerical complexity of the six ICA algorithms studied in this paper, for given values ofN , P and data lengthM .
  • A flop is defined as a multiplication followed by an addition; according to the usual practice, only multiplies are counted, which does not affect the orderof magnitude of the computational complexity.
  • But one can say that for a comparable performance, SOBI requires a smaller amount of calculations (when the requested assumptions are satisfied), the iterative algorithms INFOMAX and FastICA require generally a larger amount of calculations, whereas COM2, ICAR, and JADE appear close to each other in the picture.
  • Regarding the number of samplesM , the authors can say thatM is generally low (between 300 snapshots to 5000 snapshots).
  • Most BCI systems exploit four types of neurophysiological signals, namely P-300 ERP, SSVEP and EEG rhythm.

III. W HY ICA-BASED BCI SYSTEMS: A BIBLIOGRAPHICAL SURVEY

  • Promising results have been reported in biomedical signal processing using ICA techniques.
  • They include fetal ECG extraction, Evoked Potentials (EP) enhancement, categorized brain signals detection, spindles detection and estimation, and EEG/MEG artifacts reduction.
  • Therefore, it appears natural to consider ICA techniques as a potential tool for building BCI systems.
  • Four types of neurophysiological signals have been mainly investigated in this context: the P-300 ERP, the SSVEP, the EEG oscillation rhythms and the ERS/ERD phenomenon.
  • The aim of this section is to provide an overview of ICAbased BCI systems, to show how the ICA technique can be applied and how the informative independent components can be automatically chosen.

A. P-300 evoked potentials

  • P-300 is a positive ERP, which occurs over the parietal cortex with a latency of about300 ms after rare or taskrelevant stimuli.
  • The first point was considered by Bayliss and al. in [5].
  • They showed that an ICA technique was able to separate the background EEG signal and eye movements from the P-300 signal.
  • Flashes occurred at about 10 Hz and the desired letter flashed twice in every set of twelve flashes.
  • They showed that the proposed algorithm for P-300 detection based on ICA provided a perfect accuracy(100%) in the competition.

B. Auditory event-related potentials

  • The Auditory Event-Related Potential (AERP) is the brain response time-locked to an auditory stimulus.
  • AERPs are very small electrical potentials [35] (2-10µv for cortical AERPs to much less than 1µv to the deeper structures).
  • The use of AERPs in BCI systems is motivated by some particular problems encountered in communication with patients suffering from severe motor disabilities.
  • In some severe cases, the eyes are completely immobile and the pyramidal cells of the motor cortex are degenerated.
  • Before starting the SVM classifier, ICA was applied to the training set and a (40 × 40) mixing matrix A was obtained.

C. Steady-state visual evoked potentials

  • The visual system can be studied non-invasively by recording scalp EEG overlying the visual cortex.
  • The SSVEP is the response of a continuous rapid visual stimulus [35].
  • Indeed, a typical power spectrum of SSVEP wave, introduced byFs Hz stimulation, presents fundamental harmonics and second harmonics atFs and 2Fs, respectively.
  • ICA was then applied to EEG signals and 13 ICs were derived, and the associated mixing matrix was then estimated.
  • The power spectrum of each IC was analyzed and the four most significant powers at stimulation frequency were supposed to be related to SSVEP signals and the remaining powers were considered as the contribution of background noise.

D. Mu rhythm and other activities from sensorimotor cortex

  • EEG contains a fairly wide frequency spectrum.
  • Several factors suggest that the EEG rhythms could be good signal features for BCI systems.
  • After a preprocessing step, ICA was used to extract ICs related to the left and the right motor imagery task.
  • The estimated mixing matrix was then sorted based on the norm of columns in ascending order.

E. How to select the informative independent components?

  • One of the challenging tasks in BCI is to reliably detect, enhance, and localize very weak brain activities corruptedby noise and various interfering artifacts of physiological and non-physiological origin.
  • One important problem that arises when ICA is used in practical BCI systems, is to automatically select and classify independent sources of interest.
  • The solution to these problems can be decomposed into two stages.
  • Another method allowing to select the components of interest consists in exploiting the spectral information of particular sources.

A. Data generation

  • The main goal of this subsection is to explain how to obtain realistic data for comparing ICA methods in the context of BCI systems based on theMu rhythm (see section IIID) when seven surface electrodes are used to achieve EEG recordings ).
  • In such a context, as depicted in section II-A, the surface observations can be considered asa noisy mixture of one source of interest, namely theMu wave, and artifact sources such as the ocular and cardiac activities.
  • The intracerebralMu wave, located in the motor cortex ), is simulated using the parametric model of Jansen [28] whose parameters are selected to derive aMu-like activity.
  • The ocular and cardiac signals are issued from their polysomnographic database [38].
  • A Gaussian vector process is used to simulate the instrumental noise while a brain volume conduction of 200 independent EEG sources, generated using the Jansen model [28], is simulated in order to produce a surface background EEG activity.

B. Performance criterion

  • Two separators,W(1)o andW (2) o can be compared with the help of the criterion introduced by Chevalier [12].
  • The quality of the extracted component is directly related to its Signal to Interference-plus-Noise Ratio (SINR).
  • Whereπp represents the power of thep-th source,w(i)o the i-th column of the separatorWo and Rνp is the total noise covariance matrix for thep-th source, corresponding to the estimated data covariance matrixRx in the absence of the componentp.
  • The criterion given by (9) allows for a quantification of the component analysis performed by ICA algorithms.
  • It is shown in [12] that the optimal source separator corresponds to the separator Wo(SMF) whose columns are the Spatial Matched Filters (SMF) associated with the different sources.

C. Computer results

  • To conduct a comparative performance study of the six ICA algorithms presented in section II-B, two experiments are envisaged from the data described in section IV-A.
  • Regarding SOBI and ICAR algorithms, they are less effective in comparison to the three previous methods for all values of SNR.
  • A. BELOUCHRANI, K. ABED-MERAIM, J. F. CARDOSO, and E. MOULINES, “A blind source separation technique using second-order statistics,” IEEE Transactions On Signal Processing, vol. 45, no.
  • An application to BCI,EMBS 06, 28th Annual International Conference of the IEEE Enginering in Medicine and Biology Society, New York, USA, August 2006. [28].
  • He received the Ph.D. degree from the University of Rennes 1, France, in signal processing and telecommunications in 1993.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

This material is presented to ensure timely dissemination of scholarly and technical work.
Copyright and all rights therein are retained by authors or by other copyright holders.
All persons copying this information are expected to adhere to the terms and constraints invoked by each author's
copyright.
In most cases, these works may not be reposted without the explicit permission of the copyright holder.
ICA: a potential tool for BCI systems
Amar Kachenoura(1,2), Laurent Albera(1,2), Lotfi Senhadji(1,2) and Pierre Comon(3)
(1)INSERM, U642, Rennes, F-35000, France;
(2) Université de Rennes 1, LTSI, Rennes, F-35000, France;
(3) I3S, Université de Nice Sophia-Antipolis, CNRS, F-06903, France.
For Correspondence : Laurent Albera, LTSI, Campus de Beaulieu, Universit´e de Rennes 1, 263 Avenue du General
Leclerc -
CS 74205 - 35042 Rennes Cedex, France.
Tel: (33) - 2 23 23 50 58, E-Mail: laurent.albera@univ-rennes1.fr
HAL author manuscript inserm-00202706, version 1
HAL author manuscript
IEEE Signal Processing Magazine 2008;25(1):57-68

1
ICA: a potential tool for BCI systems
Amar Kachenoura
(1,2)
, Laurent Albera
(1,2)
, Lotfi Senhadji
(1,2)
and Pierre Comon
(3)
(1)
INSERM, U642, Rennes, F-35000, France;
(2)
Universit
´
e de Rennes 1, LTSI, Rennes, F-35000, France;
(3)
I3S, Universit
´
e de Nice Sophia-Antipolis, CNRS, F-06903, France.
For Correspondence : Laurent Albera, LTSI, Campus de Beaulieu, Universit
´
e de Rennes 1, 263 Avenue du General Leclerc -
CS 74205 - 35042 Rennes Cedex, France.
Tel: (33) - 2 23 23 50 58, E-Mail: laurent.albera@univ-rennes1.fr
AbstractSeveral studies dealing with ICA-based BCI systems
have been reported. Most of them have only explored a limited
number of ICA methods, mainly FastICA and INFOMAX. The
aim of this paper is to help the BCI community researchers, espe-
cially those who are not familiar with ICA techniques, to choose
an appropriate ICA method. For this purpose, the concept of ICA
is reviewed and different measures of statistical independence are
reported. Then, the application of these measures is illustrated
through a brief description of the widely used algorithms in the
ICA community, namely SOBI, COM2, JADE, ICAR, FastICA
and INFOMAX. The implementation of these techniques in the
BCI field is also explained. Finally, a comparative study of these
algorithms, conducted on simulated EEG data, shows that an
appropriate selection of an ICA algorithm may significantly
improve the capabilities of BCI systems.
I. INTRODUCTION
Brain Computer Interface (BCI) technology is a research
field that has emerged and grown rapidly over the past 15
years (see [44], [31] and [8] for a review). The BCI system
is a set of sensors and signal processing components that
allows acquiring and analyzing brain activities with the goal
of establishing a reliable communication channel directly
between the brain and an external device such as a computer,
neuroprosthesis, etc. More precisely, the basic design and
functioning of any BCI system are depicted in figure 1. The
brain activity is recorded by means of electrodes located on the
scalp (non-invasive BCI systems) or by implanted electrodes
placed, in general, in the motor cortex (invasive BCI systems)
[8]. A preprocessing step is applied to enhance the Signal to
Noise Ratio (SNR) and to remove artifacts, such as power
line noise, electrode movements and broken wire contacts, but
also interfering physiological signals as those related to ocular,
muscular and cardiac activities. Then the feature extraction
step is conducted to detect the specific patterns in brain activity
that encode the user’s commands or reflect the patient’s motor
intentions [44] [31]. The last step is aimed at translating (i.e.
associating) specific features into useful control signals to be
sent to an external device. Several existing brain monitoring
technologies have been tested in BCI fields for acquiring data.
They can be divided in two subcategories: i) non-invasive
procedures such as the ElectroEncephaloGraphy (EEG), Mag-
netoEncephaloGraphy (MEG), functional Magnetic Resonance
Imaging (fMRI), Positron Emission Tomography (PET), and
Near Infrared Spectroscopy (NIRS) and ii) invasive ap-
proaches such as the ElectroCorticoGraphy (ECoG) where the
signal is recorded from intracranial microelectrodes [31]. Up
to now, a majority of practical BCI systems exploit EEG
signals and ECoG signals [44] [31]. Indeed, since MEG,
fMRI and PET are expensive and bulky, and as fMRI, PET
and NIRS present long time constants (they do not measure
neural activity directly but rely on the hemodynamic coupling
between neural activity and regional changes in blood flow),
they cannot be deployed as ambulatory BCI systems. Several
varieties of neurological phenomena are used by BCI systems.
They include EEG rhythms such as Mu, Alpha, Beta, Event-
Related Synchronization/Desynchronization (ERS/ERD) phe-
nomena, P-300 component of the Evoked-Related Potentials
(ERPs), Slow Cortical Potentials (SCPs), Steady-State Visual
Evoked Potentials (SSVEPs), etc. (see [31, table 3] for details).
Fast and reliable signal processing tools for preprocessing the
recorded data and for extracting significant features are crucial
in the development of practical BCI systems. Independent
Component Analysis (ICA) [14] is one of the popular signal
processing tools, which has been widely studied during the
last twenty years. Indeed, a great number of algorithms is
available and ICA received a broad attention in various fields
such as biomedical signal analysis and processing [33], image
recognition [18] and wireless communications [20]. In this
Applications
Neuroprosthesis
Spelling Device
Wheelchair,etc.
Brain Computer InterfaceSystem
Signal
Acquisition
Degitized
Signal
Feature
Extraction
Preprocessing
Feature
Translation
Signal Processing
Device
Commands
Applications
Neuroprosthesis
Spelling Device
Wheelchair,etc.
Applications
Neuroprosthesis
Spelling Device
Wheelchair,etc.
Brain Computer InterfaceSystem
Signal
Acquisition
Degitized
Signal
Feature
Extraction
Preprocessing
Feature
Translation
Signal Processing
Device
Commands
Brain Computer InterfaceSystem
Signal
Acquisition
Degitized
Signal
Degitized
Signal
Feature
Extraction
Preprocessing
Feature
Translation
Feature
Extraction
Feature
Extraction
PreprocessingPreprocessing
Feature
Translation
Feature
Translation
Signal Processing
Device
Commands
Fig. 1. Basic design and operation of any BCI system.
paper we focus on the use of ICA in BCI systems. Several
studies dealing with ICA-based BCI systems have been re-
ported during the last decade [31]. Nevertheless, most of these
studies have only explored a limited number of ICA methods,
and mainly FastICA [25] and INFOMAX [30]. In addition, the
performance of ICA algorithms for arbitrary electrophysiolog-
ical sources is still almost unknown. This prevents us from
choosing the best method for a given application, and may
limit the role of these methods in BCI systems. To overcome
these limitations, the purpose of our study is i) to show the
This material is presented to ensure timely dissemination of scholarly and technical work.
Copyright and all rights therein are retained by authors or by other copyright holders.
All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.
In most cases, these works may not be reposted without the explicit permission of the copyright holder.
HAL author manuscript inserm-00202706, version 1

2
interest of ICA in BCI, ii) to identify ICA techniques that are
appropriate to BCI, iii) to present a comparative performance
analysis of six algorithms in BCI operational context, and iv)
to build a reference for BCI community researchers, especially
for those who are not experts of ICA techniques.
II. TWO DECADES OF ICA
H
´
erault and Jutten seem to be the first (around 1983) to use
informally the concept of ICA, especially in order to solve
the BSS problem [4]. A few years later, Comon presents
a mathematical formulation of ICA and shows how Higher
Order (HO) cumulants can be used to solve the problem of
ICA: the HO contrast-based method COM2 arises from this
work (see [14] and references therein). In parallel, Cardoso
and Souloumiac develop the JADE algorithm [10], based
on a Joint Approximate Diagonalization (JAD). While these
two approaches use both Second Order (SO) and Fourth
Order (FO) statistics, other approaches attempt to exploit SO
statistics only. This is made possible thanks to the color of
the sources, assumed unknown but different. Fety is the first to
exploit covariance matrices at two different delay lags [21]; the
complete theoretical background is given only a few years later
by Comon et al. [15]. The same kind of approach is developed
independently several years later by Tong [40], Belouchrani et
al. [7] and Ziehe and M
¨
uller [49], who give rise to the so-called
AMUSE, SOBI and TDSEP methods, respectively. In 1999,
M
¨
uller et al. propose a modified version of JADE, which uses
the color of sources through both SO and FO statistics. More
recently Albera et al. present an extension of SOBI to FO
statistics [20], called FOBIUM, dealing with ICA especially
in underdetermined contexts (more components than observa-
tions). Authors also propose an algebraic method [1], named
ICAR, using the matrix redundancies of the FO covariance
matrix, well-known as the quadricovariance matrix. As pointed
out by Parra and Sajda in [36], under the assumption of non-
Gaussian, non-white or non-stationary sources, ICA can be
easily reformulated as a generalized eigenvalue problem.
Whereas the previous methods identify simultaneously the
independent components, Delfosse and Loubaton [16] propose
to extract one component at a time, which is now referred to as
deflation procedures. A few years later, Hyvarinen et al. pro-
pose the FastICA method, which iteratively maximizes a FO
contrast. While the first version of this algorithm is of deflation
type, as that of Delfosse and Loubaton [16], Hyvarinen et al.
[25] propose later a ”simultaneous” version of FastICA whose
joint orthonormalization step is similar to the one presented
by Moreau [34]. Instead of exploiting, explicitly or implicitly,
the SO and the FO statistics to solve the problem of ICA,
some approaches use directly the independence assumption. In
fact, Lee et al. present an information maximization approach
[30] based on parameterized probability distributions that
have sub- and super-gaussian regimes to derive a general
learning rule, which is optimized using a natural gradient
algorithm proposed by Amari et al. [3]. Pham proposes to
use non-parametric estimates of the likelihood or the mutual
information [37]. Another algorithm, based on a minimization
of a non-parametric estimator of Renyi’s mutual information
as a criterion for ICA is introduced by Erdogmus et al. [17].
Note that Renyi’s entropy is not yet proved to be better than
Shannon’s to address the BSS problem [41].
The previous list of ICA methods is not exhaustive, which
shows that a great deal has been written on the subject.
However, most of the ICA-based BCI systems presently use
only FastICA [25] or INFOMAX [30]. So, after a brief survey
of the concept of ICA, we propose hereafter to help the BCI
scientists to choose the most appropriate ICA method among
a class of six algorithms, namely INFOMAX [30], FastICA
[25], COM2 [14], JADE [10], SOBI [7] and ICAR [1].
A. The concept of ICA
As it will be presented in section III, ICA is very useful
in the case of non-invasive BCI systems. Such BCI systems
generally exploit EEG which has a high time resolution (below
100 ms). This temporal precision allows to explore the timing
of basic neural processes at the level of cell assemblies.
More particularly, EEG consists of measurements of a set
of N electric potential differences between pairs of scalp
electrodes. The sensors may be either directly glued to the skin
at selected locations right above cortical regions of interest,
as the motor area for instance, or fitted in an elastic cap for
rapid attachment with near uniform coverage of the entire
scalp. Research protocols can use up to 256 electrodes. Then
the N -dimensional set of recorded signals can be viewed as
one realization of a random vector process {x[m]}
m
. The
ICA of {x[m]}
m
consists in looking for an overdetermined
(N×P ) mixing matrix A (i.e. P is smaller than or equal to N )
and a P-dimensional source vector process {s[m]}
m
whose
components are the most statistically independent as possible
so that the linear observation model below holds:
m, x[m] = A s[m] + ν[m] (1)
where {ν[m]}
m
is N-dimensional noise vector process
independent from the source process; bold faced lowercases
denote vectors, whereas bold uppercases denote matrices. In
other words, ICA consists of searching for a (N×P ) separator
matrix W
o
such that y
o
[m] = W
o
T
x[m] is an estimate of
the source vector s[m]. It is worth noting that once y
o
[m] is
computed, only its components of interest for the considered
BCI application have to be selected. This task is dealt with in
section III-E. Now how can one justify model (1) in practice?
Let’s consider the instance of subjects who would learn how
to control the amplitude of Mu waves by visualizing motor
activities, such as smiling, chewing, or swallowing. When
people move their hands a brain wave called the Mu wave
gets blocked and disappears completely. Such a suppression
also occurs when a person watches someone else waving his
hand, but not if he/she watches a similar movement of an
inanimate object. These people could thus learn how to drive
a cursor up or down on a computer screen by controlling
the amplitude of Mu waves. In such an example, when non-
invasive measurements as EEG are used, the surface sensors
record the result of the Mu wave diffusion from the motor
cortex towards the scalp, corrupted by artifacts such as eye
movements. The diffusion of electromagnetic waves in the
HAL author manuscript inserm-00202706, version 1

3
head is now well-known by the biomedical community and
can be modeled as a linear static transformation [2]. As far as
the artifacts are concerned, they can be considered as additive
perturbations whose weightings depend on the physiological
nature of the artifacts. Eventually for this instance, only the
statistical independence between the Mu wave and the other
sources is crucial since only the Mu wave is of interest for this
BCI application. In fact, it can be justified by the physiological
independence between the Mu wave and the other sources such
as ocular and cardiac activities.
B. A class of statistical tools to perform ICA
One of the fundamental questions one should ask oneself
in order to choose the most appropriate ICA method in a BCI
context is how to characterize the statistical independence of
a set of P random signals {y
p
[m]}
m
when one realization
of each signal is available.
Entropy and information. First, recall that a random vec-
tor y = [y
1
, · · · , y
P
]
T
has mutually independent components
if and only if its Probability Density Function (PDF) p
y
can
be decomposed as the product of the P marginal PDFs, p
y
p
,
where p
y
p
denotes the PDF of the p-th component y
p
of y.
Then a natural way of checking whether y has independent
components is to measure a pseudo-distance between p
y
and
Q
p
p
y
p
. Such a measure can be chosen among the large class
of f-divergences. If the Kullback divergence is used, we get
the Mutual Information (MI) of y:
MI(p
y
) =
Z
N
p
y
(u) log
p
y
(u)
Q
P
p=1
p
y
p
(u
p
)
!
du (2)
It can be shown that the MI vanishes if and only if the P
components of y are mutually independent, and is strictly
positive otherwise.
Another measure based on the PDF of y is the Differential
Entropy (DE) of y:
S(y) =
Z
N
p
y
(u) log(p
y
(u)) du = E[log(p
y
)] (3)
sometimes referred to as Shannon’s joint entropy, where E[·]
denotes the mathematical expectation. This entropy is not
invariant by an invertible change of coordinates, but only by
orthogonal transforms. A fundamental result in information
theory is that the DE can be used as a measure of non-
gaussianity. Indeed, among the random vectors having an
invertible covariance matrix, the Gaussian vector is the one
that has the largest entropy. Then, to obtain a measure of non-
gaussianity of y that is i) zero only for a Gaussian vector,
ii) always positive and iii) invariant by any linear invertible
transformation, one often uses a normalized version of the DE,
called negentropy, and given by:
J(y) = S(z) S(y) (4)
where z stands for the Gaussian vector with the same mean
and covariance matrix as y. Since MI and negentropy are
simply related to each other [14], estimating the negentropy
allows to estimate the MI. However, even if we have at hand
consistent estimators of PDFs (as Parzen estimators [47]), the
integral computation (3) is time consuming.
The INFOMAX and FastICA methods succeed in avoiding
this exact computation. On one hand, INFOMAX solves the
ICA problem by maximizing the DE of the output of an
invertible non-linear transform of y[m] = W
T
x[m] with
respect to W using the natural gradient algorithm [3]. In
practice, non-linearities whose derivative are sub-Gaussian
(resp. super-Gaussian) PDFs are sufficient for sub-Gaussian
(resp. super-Gaussian) sources [30]. On the other hand, in
its deflationary implementation, FastICA extracts the p-th
(1 p P ) source by maximizing an approximation of the
negentropy J(w
T
p
x[m]) with respect to the (N × 1) vector
w
p
. This maximization is achieved using an approximate
Newton iteration. To prevent all vectors w
p
from converging
to the same maximum (which would yield several times the
same source), the p-th output has to be decorrelated from the
previously estimated sources after every iteration. A simple
way to do this is a deflation scheme based on a Gram-Schmidt
orthogonalization.
Another way to avoid the exact computation of the negen-
tropy consists in using another measure of statistical indepen-
dence less natural but easier to compute. The contrast function
[14, definition 5] built from the data cumulants satisfies this
condition. Let’s recall the definition of cumulants and let’s
show why they are so attractive tools in the ICA framework.
Cumulants. Let Φ
x
(u)= E[exp(iu
T
x)] be the first charac-
teristic function of a random vector x. Since Φ
x
(0)= 1 and Φ
x
is continuous, then there exists an open neighborhood of the
origin, in which Ψ
x
(u)= log(Φ
x
(u)) can be defined. Remind
that the r-th order moments are the coefficients of the Taylor
expansion of Φ
x
about the origin; similarly cumulants, denoted
by C
i,j,··· ,ℓ,x
, are the coefficients of the second characteristic
function, Ψ
x
. For a N-dimensional random vector x, SO
cumulants can be arranged in a (N×N) matrix, which is the
well-known covariance matrix, denoted by R
x
. In the same
way, it is possible to store the FO cumulants of x in a (N
2
×N
2
)
matrix, Q
x
, called the quadricovariance matrix.
But why are cumulants useful to build a good optimization
criterion dedicated to the extraction of independent compo-
nents? Why are they more appropriate than moments? This
comes essentially from two important properties: i) if at least
two components or groups of components of x are statistically
independent, then all cumulants involving these components
are null. For instance, if all components of x are mutually
independent, then C
i,j,··· ,ℓ,x
= δ[i, j, · · · , ] C
i,i,··· ,i,x
, where
the Kronecker δ[i, j, · · · , ] equals 1 when all its arguments
are equal and is null otherwise. And ii) if x is Gaussian,
then all its HO cumulants are null. So HO cumulants may
be seen as a distance to normality. Note that moments do not
enjoy these two key properties. On the other hand, moments
and cumulants share two other useful properties. iii) They
are both symmetric arrays, since the value of their entries
does not change by permutation of their indices. Consequently,
R
x
and Q
x
are necessarily symmetric matrices. Lastly, iv)
moments and cumulants satisfy the multi-linearity property
[32]. To illustrate this, let x be a random vector satisfying
x= As, where A is a (N ×P ) matrix and s a random vector
HAL author manuscript inserm-00202706, version 1

4
with statistically independent components. We know that the
(P ×P ) covariance matrix of s, R
s
, is diagonal, and that the
covariance matrix of x may be written as:
R
x
= A R
s
A
T
(5)
Actually, this is nothing else but the expression of the multi-
linearity property at order 2. Similarly at order 4, one can
define a (P ×P ) diagonal matrix ζ
s
containing marginal source
cumulants, C
p,p,p,p,s
. Then, from properties i), iii) and iv), one
can deduce that Q
x
has the following algebraic structure:
Q
x
= (A A) ζ
s
(A A)
T
(6)
where denotes the column-wise Kronecker product [1].
Now, r-th order cumulants can be related to moments of
order smaller than or equal to r using the Leonov-Shiryaev
formula [32]. For instance, for any zero-mean random vector
x symmetrically distributed, we have:
C
i,j,x
= E[x
i
x
j
]
C
i,j,k,ℓ,x
= E[x
i
x
j
x
k
x
] E[x
i
x
j
]E[x
k
x
]
E[x
i
x
k
]E[x
j
x
] E[x
i
x
]E[x
j
x
k
]
(7)
Now r-th order moments of a stationary-ergodic process do
not depend on time and can be easily estimated using sample
statistics [32]. The SOBI, COM2, JADE ans ICAR methods
perform ICA from the cumulants of the data. More precisely,
SOBI uses the SO cumulants while COM2 and JADE use
both the SO and FO cumulants. As far as ICAR is concerned,
it uses only the FO cumulants of the data. Next, SOBI,
JADE and ICAR take advantage of the algebraic structure
of the covariance (5) and/or quadricovariance matrices (6);
they consider the problem of ICA as a generalized eigenvalue
problem [36], while COM2 explicitly maximizes a contrast
function based on the FO cumulants of the data by rooting
successive polynomials. Finally, SOBI, JADE and ICAR use
the JAD method to extract the independent components.
To conclude this section, let us compare the six previously
seen ICA methods to each other. First, the four cumulant-
based algorithms constitute a semi-algebraic solution to the
ICA problem, in the sense that they terminate within a finite
number of iterations. On the contrary, INFOMAX and FastICA
are iterative methods, whose convergence to local optima is
possible. Moreover, all these methods except ICAR can extract
components whose FO marginal cumulants have different
signs. The latter scenario may occur in biomedical contexts.
Another difference is the need for a spatial whitening (also
called standardization) [14, section 2.2]. This preprocessing,
based on SO cumulants, is mandatory for FastICA, COM2,
JADE and SOBI. It is not necessary but recommended in
INFOMAX, in order to improve its speed of convergence
[25, Chapter 9]. Regarding ICAR, it uses only FO cumulants
without any standardization. Consequently, it is asymptotically
insensitive to the presence of a Gaussian noise with a non-
diagonal covariance matrix. In addition, contrary to the five
other methods, SOBI needs that all sources are not tem-
porally white, which is generally satisfied by BCI systems.
Eventually, these six methods, namely INFOMAX, FastICA,
COM2, JADE, SOBI and ICAR, require the stationarity-
ergodicity assumption to ensure an asymptotical mean square
convergence. However, such an assumption is very rarely
fulfilled in biomedical contexts and a consistence analysis is
difficult in the presence of such complex biomedical signals.
But the good behavior of FastICA, COM2 and JADE given
by recent computer results [29] shows that the stationnarity-
ergodicity assumption is not absolutely necessary. As far as
COM2 and JADE are concerned, even if the sample statistics
do not estimate accurately the cumulants of the data, they still
satisfy reasonably well the above mentioned basic properties
i) to iv).
C. Numerical complexity
This section aims at giving some insights into the numerical
complexity of the six ICA algorithms studied in this paper,
for given values of N, P and data length M . The numerical
complexity of methods is calculated in terms of number of
floating point operations (flops). A flop is defined as a multipli-
cation followed by an addition; according to the usual practice,
only multiplies are counted, which does not affect the order of
magnitude of the computational complexity. Now denote by
f
4
(P ) = P (P +1)(P +2)(P +3)/24 the number of free entries
in a fourth order cumulant tensor of dimension P enjoying
all symmetries, It the number of sweeps required by a joint
diagonalization process (SOBI, JADE, ICAR) or by contrast
function optimization algorithms (COM2), T the number of
delay lags used in SOBI, J the maximal number of iterations
considered in iterative algorithms (FastICA, INFOMAX), Q
the complexity required to compute the roots of a real 4th
degree polynomial by Ferrari’s technique (we may take Q
30 flops), and B = min{M N
2
/2 + 4N
3
/3 + P NM, 2MN
2
}
the number of flops required to perform spatial whitening.
Then for given values of N, P , M, It, T , J and B, the
computational complexities are given in table I. It is difficult
TABLE I
NUMERICAL COMPLEXITY OF THE SIX ANALYZED ICA METHODS
Algorithms Flops
SOBI T M N
2
/2 + 4N
3
/3 + (T 1)N
3
/2+
ItP
2
[4P (T 1) + 17(T 1) + 4P + 75]/2
COM2 B + min{12Itf
4
(P )P
2
+ 2ItP
3
+
3Mf
4
(P ) + M P
2
, 13ItM P
2
/2} + ItP
2
Q/2
JADE B + min{4P
6
/3, 8P
3
(P
2
+ 3)}+
3Mf
4
(P ) + ItP
2
(75 + 21P + 4P
2
)/2 + MP
2
ICAR 3Mf
4
(P )+2N
6
/3+P
2
(3N
2
P )/3+N
2
P+
P
2
N
3
+7P
2
N
2
+ItP
2
(4N
4
8N
3
+25N
2
)/2
FastICA B + J[2P (P + M ) + 5MP
2
/2]
INFOMAX B + J[P
3
+ P
2
+ P (5M + 4)]
to compare computational complexities because the input
parameters are different. But one can say that for a comparable
performance, SOBI requires a smaller amount of calculations
(when the requested assumptions are satisfied), the iterative
algorithms INFOMAX and FastICA require generally a larger
amount of calculations, whereas COM2, ICAR, and JADE
appear close to each other in the picture. See [13] [48] for
more details. The number of electrodes used in BCI systems
can vary for example from one electrode [27] to 41 electrodes
[24]. However, in the perspective of using BCI systems in
HAL author manuscript inserm-00202706, version 1

Citations
More filters
Book
08 Mar 2010
TL;DR: This handbook provides the definitive reference on Blind Source Separation, giving a broad and comprehensive description of all the core principles and methods, numerical algorithms and major applications in the fields of telecommunications, biomedical engineering and audio, acoustic and speech processing.
Abstract: Edited by the people who were forerunners in creating the field, together with contributions from 34 leading international experts, this handbook provides the definitive reference on Blind Source Separation, giving a broad and comprehensive description of all the core principles and methods, numerical algorithms and major applications in the fields of telecommunications, biomedical engineering and audio, acoustic and speech processing. Going beyond a machine learning perspective, the book reflects recent results in signal processing and numerical analysis, and includes topics such as optimization criteria, mathematical tools, the design of numerical algorithms, convolutive mixtures, and time frequency approaches. This Handbook is an ideal reference for university researchers, RD algebraic identification of under-determined mixtures, time-frequency methods, Bayesian approaches, blind identification under non negativity approaches, semi-blind methods for communicationsShows the applications of the methods to key application areas such as telecommunications, biomedical engineering, speech, acoustic, audio and music processing, while also giving a general method for developing applications

1,627 citations


Additional excerpts

  • ...46 5...

    [...]

Journal ArticleDOI
TL;DR: A comprehensive overview of the modern classification algorithms used in EEG-based BCIs is provided, the principles of these methods and guidelines on when and how to use them are presented, and a number of challenges to further advance EEG classification in BCI are identified.
Abstract: Objective: Most current Electroencephalography (EEG)-based Brain-Computer Interfaces (BCIs) are based on machine learning algorithms. There is a large diversity of classifier types that are used in this field, as described in our 2007 review paper. Now, approximately 10 years after this review publication, many new algorithms have been developed and tested to classify EEG signals in BCIs. The time is therefore ripe for an updated review of EEG classification algorithms for BCIs. Approach: We surveyed the BCI and machine learning literature from 2007 to 2017 to identify the new classification approaches that have been investigated to design BCIs. We synthesize these studies in order to present such algorithms, to report how they were used for BCIs, what were the outcomes, and to identify their pros and cons. Main results: We found that the recently designed classification algorithms for EEG-based BCIs can be divided into four main categories: adaptive classifiers, matrix and tensor classifiers, transfer learning and deep learning, plus a few other miscellaneous classifiers. Among these, adaptive classifiers were demonstrated to be generally superior to static ones, even with unsupervised adaptation. Transfer learning can also prove useful although the benefits of transfer learning remain unpredictable. Riemannian geometry-based methods have reached state-of-the-art performances on multiple BCI problems and deserve to be explored more thoroughly, along with tensor-based methods. Shrinkage linear discriminant analysis and random forests also appear particularly useful for small training samples settings. On the other hand, deep learning methods have not yet shown convincing improvement over state-of-the-art BCI methods. Significance: This paper provides a comprehensive overview of the modern classification algorithms used in EEG-based BCIs, presents the principles of these Review of Classification Algorithms for EEG-based BCI 2 methods and guidelines on when and how to use them. It also identifies a number of challenges to further advance EEG classification in BCI.

1,280 citations

Journal ArticleDOI
TL;DR: A new taxonomy based on the multiple access methods used in telecommunication systems is described, which aims to provide useful guidelines for exploring new paradigms and methodologies to improve the current visual and auditory BCI technology.
Abstract: Over the past several decades, electroencephalogram (EEG)-based brain-computer interfaces (BCIs) have attracted attention from researchers in the field of neuroscience, neural engineering, and clinical rehabilitation. While the performance of BCI systems has improved, they do not yet support widespread usage. Recently, visual and auditory BCI systems have become popular because of their high communication speeds, little user training, and low user variation. However, building robust and practical BCI systems from physiological and technical knowledge of neural modulation of visual and auditory brain responses remains a challenging problem. In this paper, we review the current state and future challenges of visual and auditory BCI systems. First, we describe a new taxonomy based on the multiple access methods used in telecommunication systems. Then, we discuss the challenges of translating current technology into real-life practices and outline potential avenues to address them. Specifically, this review aims to provide useful guidelines for exploring new paradigms and methodologies to improve the current visual and auditory BCI technology.

340 citations


Cites methods from "Ica: a potential tool for bci syste..."

  • ...Another widely used spatial filtering method is independent component analysis (ICA) [64], [128]....

    [...]

Journal ArticleDOI
TL;DR: To provide a thorough summary of recent research trends in EEG-based BCIs, the present study reviewed BCI research articles published from 2007 to 2011 and investigated the number of published BCI articles, aims of the articles, and target applications.
Abstract: Brain–computer interface (BCI) technology has been studied with the fundamental goal of helping disabled people communicate with the outside world using brain signals. In particular, a large body of research has been reported in the electroencephalography (EEG)-based BCI research field during recent years. To provide a thorough summary of recent research trends in EEG-based BCIs, the present study reviewed BCI research articles published from 2007 to 2011 and investigated (a) the number of published BCI articles, (b) BCI paradigms, (c) aims of the articles, (d) target applications, (e) feature types, (f) classification algorithms, (g) BCI system types, and (h) nationalities of the author. The detailed survey results are presented and discussed one by one. [Supplemental materials are available for this article. Go to the publisher's online edition of International Journal of Human-Computer Interaction to view the free supplemental file: Supplementary Tables.pdf.]

214 citations

Journal ArticleDOI
13 May 2012
TL;DR: The current neuroscientific questions and data processing challenges facing BCI designers are discussed and some promising current and future directions to address them are outlined.
Abstract: Because of the increasing portability and wearability of noninvasive electrophysiological systems that record and process electrical signals from the human brain, automated systems for assessing changes in user cognitive state, intent, and response to events are of increasing interest. Brain-computer interface (BCI) systems can make use of such knowledge to deliver relevant feedback to the user or to an observer, or within a human-machine system to increase safety and enhance overall performance. Building robust and useful BCI models from accumulated biological knowledge and available data is a major challenge, as are technical problems associated with incorporating multimodal physiological, behavioral, and contextual data that may in the future be increasingly ubiquitous. While performance of current BCI modeling methods is slowly increasing, current performance levels do not yet support widespread uses. Here we discuss the current neuroscientific questions and data processing challenges facing BCI designers and outline some promising current and future directions to address them.

173 citations


Additional excerpts

  • ...[82][83][84][36], and in clinical research applications [85][33][86]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: It is suggested that information maximization provides a unifying framework for problems in "blind" signal processing and dependencies of information transfer on time delays are derived.
Abstract: We derive a new self-organizing learning algorithm that maximizes the information transferred in a network of nonlinear units. The algorithm does not assume any knowledge of the input distributions, and is defined here for the zero-noise limit. Under these conditions, information maximization has extra properties not found in the linear case (Linsker 1989). The nonlinearities in the transfer function are able to pick up higher-order moments of the input distributions and perform something akin to true redundancy reduction between units in the output representation. This enables the network to separate statistically independent components in the inputs: a higher-order generalization of principal components analysis. We apply the network to the source separation (or cocktail party) problem, successfully separating unknown mixtures of up to 10 speakers. We also show that a variant on the network architecture is able to perform blind deconvolution (cancellation of unknown echoes and reverberation in a speech signal). Finally, we derive dependencies of information transfer on time delays. We suggest that information maximization provides a unifying framework for problems in "blind" signal processing.

9,157 citations

Journal ArticleDOI
TL;DR: An efficient algorithm is proposed, which allows the computation of the ICA of a data matrix within a polynomial time and may actually be seen as an extension of the principal component analysis (PCA).

8,522 citations

Book
18 May 2001
TL;DR: Independent component analysis as mentioned in this paper is a statistical generative model based on sparse coding, which is basically a proper probabilistic formulation of the ideas underpinning sparse coding and can be interpreted as providing a Bayesian prior.
Abstract: In this chapter, we discuss a statistical generative model called independent component analysis. It is basically a proper probabilistic formulation of the ideas underpinning sparse coding. It shows how sparse coding can be interpreted as providing a Bayesian prior, and answers some questions which were not properly answered in the sparse coding framework.

8,333 citations

Journal ArticleDOI
TL;DR: With adequate recognition and effective engagement of all issues, BCI systems could eventually provide an important new communication and control option for those with motor disabilities and might also give those without disabilities a supplementary control channel or a control channel useful in special circumstances.

6,803 citations

01 Jan 1998
TL;DR: Historical aspects introduction to the neurophysiological basis of the EEG and DC potentials cellular substrates of spontaneous and evoked brain rhythms dynamics of EEG as signals and neuronal populations are introduced.
Abstract: Historical aspects introduction to the neurophysiological basis of the EEG and DC potentials cellular substrates of spontaneous and evoked brain rhythms dynamics of EEG as signals and neuronal populations - models and theoretical considerations biophysical aspects of EEG and magnetoencephalogram generation technological basis of the EEG recording EEG recording and operation of the apparatus the EEG signal - polarity and field determination digitized (paperless) EEG recording the normal EEG in the waking adult sleep and EEG maturation of the EEG - development of waking and sleep patterns EEG patterns and genetics nonspecific abnormal EEG patterns abnormal EEG patterns - epileptic and paroxysmal activation methods brain tumours and other space-occupying lesions (with a section on oncological CNS complications) the EEG in cerebral inflammatory processes cerebrovascular disorders and EEG EEG and old age EEG and dementia EEG and neurodegenerative disorders the EEG in infantile brain damage and cerebral palsy craniocerebral trauma metabolic central nervous system disorders cerebral anoxia - experimental view cerebral anoxia - clinical aspects coma and brain death epileptic seizure disorders non-epileptic attacks polygraphy polysomnography - principles and applications in sleep and arousal disorders neonatal electroencephalography event-related potentials - methodology and quantification contingent negative variation and Bereitschafts-potential visual evoked potentials auditory evoked potentials evoked potentials in infancy and childhood neurometric use of event-related potentials event-related potentials - P 300 and psychological implications neuroanaesthesia and intraoperative neurological monitoring clinical use of magnetoencephalography brain mapping - methodology the clinical use of brain mapping EEG analysis - theory and practice the EEG in patients with migraine and other headaches psychiatric disorders and the EEG psychology, physiology and the EEG EEG in aviation, space exploration and diving EEG and neuropharmacology - experimental approach EEG, drug effect and central nervous system poisoning toxic encephalography the special form of stereo-electroencephalography electroencephalography subdural EEG recordings special techniques of recording and transmission prolonged EEG monitoring in the diagnosis of seizure disorders EEG monitoring during carotid endarterectomy and open heart surgery computer analysis and cerebral maturation special use of EEG computer analysis in clinical neurology.

3,211 citations