scispace - formally typeset
Open AccessJournal ArticleDOI

Blind Source Separation by Sparse Decomposition in a Signal Dictionary

Michael Zibulevsky, +1 more
- 01 Apr 2001 - 
- Vol. 13, Iss: 4, pp 863-882
TLDR
This work suggests a two-stage separation process: a priori selection of a possibly overcomplete signal dictionary in which the sources are assumed to be sparsely representable, followed by unmixing the sources by exploiting the their sparse representability.
Abstract
The blind source separation problem is to extract the underlying source signals from a set of linear mixtures, where the mixing matrix is unknown. This situation is common in acoustics, radio, medical signal and image processing, hyperspectral imaging, and other areas. We suggest a two-stage separation process: a priori selection of a possibly overcomplete signal dictionary (for instance, a wavelet frame or a learned dictionary) in which the sources are assumed to be sparsely representable, followed by unmixing the sources by exploiting the their sparse representability. We consider the general case of more sources than mixtures, but also derive a more efficient algorithm in the case of a nonovercomplete dictionary and an equal numbers of sources and mixtures. Experiments with artificial signals and musical sounds demonstrate significantly better separation than other known techniques.

read more

Content maybe subject to copyright    Report

LETTER Communicated by Michael Lewicki
Blind Source Separation by Sparse Decomposition in a Signal
Dictionary
Michael Zibulevsky
Department of Computer Science, University of New Mexico, Albuquerque, NM 87131,
U.S.A.
Barak A. Pearlmutter
Department of Computer Science and Department of Neurosciences, University of New
Mexico, Albuquerque, NM 87131, U.S.A.
The blind source separation problem is to extract the underlying source
signals from a set of linear mixtures, where the mixing matrix is unknown.
This situation is common in acoustics, radio, medical signal and image
processing, hyperspectral imaging, and other areas. We suggest a two-
stage separation process: a priori selection of a possibly overcomplete
signal dictionary (for instance, a wavelet frame or a learned dictionary) in
which the sources are assumed to be sparsely representable, followed by
unmixing the sources by exploiting the their sparse representability. We
consider the general case of more sources than mixtures, but also derive a
more efficient algorithm in the case of a nonovercomplete dictionary and
an equal numbers of sources and mixtures. Experiments with artificial
signals and musical sounds demonstrate significantly better separation
than other known techniques.
1 Introduction
In blind source separation an N-channel sensor signal x(t) arises from M un-
known scalar source signals s
i
(t), linearly mixed together by an unknown
N × M matrix A, and possibly corrupted by additive noise ξ(t),
x(t) = As(t) + ξ(t). (1.1)
We wish to estimate the mixing matrix A and the M-dimensional source
signal s(t). Many natural signals can be sparsely represented in a proper
signal dictionary:
s
i
(t) =
K
X
k=1
C
ik
ϕ
k
(t). (1.2)
The scalar functions ϕ
k
(t) are called atoms or elements of the dictionary.
These elements do not have to be linearly independent and instead may
Neural Computation 13, 863–882 (2001)
c
° 2001 Massachusetts Institute of Technology

864 Michael Zibulevsky and Barak A. Pearlmutter
form an overcomplete dictionary. Important examples are wavelet-related
dictionaries (e.g., wavelet packets, stationary wavelets; see Chen, Donoho,
& Saunders, 1996; Mallat, 1998) and learned dictionaries (Lewicki & Se-
jnowski, in press; Lewicki & Olshausen, 1998; Olshausen & Field, 1996,
1997). Sparsity means that only a small number of the coefficients C
ik
differ
significantly from zero.
We suggest a two-stage separation process: a priori selection of a possibly
overcomplete signal dictionary in which the sources are assumed to be
sparsely representable and then unmixing the sources by exploiting their
sparse representability.
In the discrete-time case t = 1, 2,...,T we use matrix notation. X is an
N × T matrix, with the ith component x
i
(t) of the sensor signal in row i, S is
an M × T matrix with the signal s
j
(t) in row j, and 8 is a K × T matrix with
basis function ϕ
k
(t) in row k. Equations 1.1 and 1.2 then take the following
simple form:
X = AS + ξ (1.3)
S = C8. (1.4)
Combining them, we get the following when the noise is small:
X AC8.
Our goal therefore can be formulated as follows: Given the sensor signal ma-
trix X and the dictionary 8, find a mixing matrix A and matrix of coefficients
C such that X AC8 and C is as sparse as possible.
We should mention other problems of sparse representation studied in
the literature. The basic problem is to represent sparsely scalar signal in
given dictionary (see Chen et al., 1996). Another problem is to adapt the
dictionary to the given class of signals
1
(Lewicki & Sejnowski, 1998; Lewicki
& Olshausen, 1998; Olshausen & Field, 1997). This problem is shown to
be equivalent to the problem of blind source separation when the sources
are sparse in time (Lee, Lewicki, Girolami, & Sejnowski, 1999; Lewicki &
Sejnowski, in press). Our problem is different, but we will use and generalize
some techniques presented in these works.
Independent factor analysis (Attias, 1999) and Bayesian blind source sep-
aration (Rowe, 1999) also consider the case of more sources than mixtures.
In our approach, we take an advantage when the sources are sparsely repre-
sentable. In the extreme case, when the decomposition coefficients are very
sparse, the separation becomes practically ideal (see section 3.2 and the six
flutes example in Zibulevsky, Pearlmutter, Bofill, & Kisilev, in press). Never-
theless, detailed comparison of the methods on real-world signals remains
open for future research.
1
Our dictionary 8 may be obtained in this way.

Blind Source Separation by Sparse Decomposition 865
In section 2 we give some motivating examples, which demonstrate how
sparsity helps to separate sources. Section 3 gives the problem formula-
tion in probabilistic framework and presents the maximum a posteriori
approach, which is applicable to the case of more sources than mixtures.
In section 4 we derive another objective function, which provides more ro-
bust computations when there is an equal number of sources and mixtures.
Section 5 presents sequential source extraction using quadratic program-
ming with nonconvex quadratic constraints. Finally, in section 6 we derive
a faster method for nonovercomplete dictionaries and demonstrate high-
quality separation of synthetically mixed musical sounds.
2 Separation of Sparse Signals
In this section we present two examples that demonstrate how sparsity of
source signals in the time domain helps to separate them. Many real-world
signals have sparse representations in a proper signal dictionary but not in
the time domain. The intuition here carries over to that situation, as shown
in section 3.1.
2.1 Example: Two Sources and Two Mixtures. Two synthetic sources
are shown in Figures 1a and 1b. The first source has two nonzero samples,
and the second has three. The mixtures, shown in Figures 1c and 1d, are less
sparse: they have five nonzero samples each. One can use this observation
to recover the sources. For example, we can express one of the sources as
e
s
i
(t) = x
1
(t) + µx
2
(t)
and choose µ so as to minimize the number of nonzero samples k
e
s
i
k
0
, that
is, the l
0
norm of s
i
.
This objective function yields perfect separation. As shown in Figure 2a,
when µ is not optimal, the second source interferes, and the total number
of nonzero samples remains five. Only when the first source is recovered
perfectly, as in Figure 2b, does the number of nonzero samples drop to two
and the objective function achieve its minimum.
Note that the function k
e
s
i
k
0
is discontinuous and may be difficult to
optimize. It is also very sensitive to noise: even a tiny bit of noise would
make all the samples nonzero. Fortunately in many cases, the l
1
norm k
e
s
i
k
1
is a good substitute for this objective function. In this example, it too yields
perfect separation.
2.2 Example: Three Sources and Two Mixtures. The signals are pre-
sented in Figure 3. These sources have about 10% nonzero samples. The
nonzero samples have random positions and are zero-mean unit-variance
gaussian distributed in amplitude. Figure 4 shows a scatter plot of the mix-
tures. The directions of the columns of mixing matrix are clearly visible.

866 Michael Zibulevsky and Barak A. Pearlmutter
(a)
(b)
(c)
(d)
Figure 1: Coefficients of signals, with coefficient identity on the x-axis (10 co-
efficients, arbitrarily ordered) and magnitude on the y-axis (arbitrarily scaled).
Sources (a and b) are sparse. Mixtures (c and d) are less sparse.
Figure 2: Coefficients of signals, with coefficient identity on the x-axis (10 co-
efficients, arbitrarily ordered) and magnitude on the y-axis (arbitrarily scaled).
(a) Imperfect separation. Since the second source is not completely removed,
the total number of nonzero samples remains five. (b) Perfect separation. When
the source is recovered perfectly, the number of nonzero samples drops to two,
and the objective function achieves its minimum.
This phenomenon can be used in clustering approaches to source separa-
tion (Pajunen, Hyvrinen, & Karhunen, 1996; Zibulevsky et al., in press). In
this work we will explore a maximum a posteriori approach.

Blind Source Separation by Sparse Decomposition 867
Figure 3: Coefficients of signals, with coefficient identity on the x-axis (300 co-
efficients, arbitrarily ordered) and magnitude on the y-axis (arbitrarily scaled).
(Top three panels) Sparse sources (sparsity is 10%). (Bottom two panels) Mix-
tures.
3 Probabilistic Framework
In order to derive a maximum a posteriori solution, we consider the blind
source separation problem in a probabilistic framework (Belouchrani & Car-
doso, 1995; Perlmutter & Parra, 1996). Suppose that the coefficients C
ik
in a
source decomposition (see equation 1.4) are independent random variables
with a probability density function (pdf) of an exponential type,
p
i
(C
ik
) exp β
i
h(C
ik
). (3.1)
This kind of distribution is widely used for modeling sparsity (Lewicki &
Sejnowski, in press; Olshausen & Field, 1997). A reasonable choice of h(c)
may be
h(c) =|c|
1
γ 1 (3.2)

Citations
More filters
Journal ArticleDOI

Performance measurement in blind audio source separation

TL;DR: This paper considers four different sets of allowed distortions in blind audio source separation algorithms, from time-invariant gains to time-varying filters, and derives a global performance measure using an energy ratio, plus a separate performance measure for each error term.
Journal ArticleDOI

From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images

TL;DR: The aim of this paper is to introduce a few key notions and applications connected to sparsity, targeting newcomers interested in either the mathematical aspects of this area or its applications.
Journal ArticleDOI

Online Learning for Matrix Factorization and Sparse Coding

TL;DR: In this paper, a new online optimization algorithm based on stochastic approximations is proposed to solve the large-scale matrix factorization problem, which scales up gracefully to large data sets with millions of training samples.
Posted Content

Online Learning for Matrix Factorization and Sparse Coding

TL;DR: A new online optimization algorithm is proposed, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems.
Book

Handbook of Blind Source Separation: Independent Component Analysis and Applications

TL;DR: This handbook provides the definitive reference on Blind Source Separation, giving a broad and comprehensive description of all the core principles and methods, numerical algorithms and major applications in the fields of telecommunications, biomedical engineering and audio, acoustic and speech processing.
References
More filters
Book

A wavelet tour of signal processing

TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Journal ArticleDOI

Learning the parts of objects by non-negative matrix factorization

TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.
Journal ArticleDOI

Atomic Decomposition by Basis Pursuit

TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.

Learning parts of objects by non-negative matrix factorization

D. D. Lee
TL;DR: In this article, non-negative matrix factorization is used to learn parts of faces and semantic features of text, which is in contrast to principal components analysis and vector quantization that learn holistic, not parts-based, representations.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What contributions have the authors mentioned in the paper "Blind source separation by sparse decomposition in a signal dictionary" ?

The authors suggest a twostage separation process: a priori selection of a possibly overcomplete signal dictionary ( for instance, a wavelet frame or a learned dictionary ) in which the sources are assumed to be sparsely representable, followed by unmixing the sources by exploiting the their sparse representability. The authors consider the general case of more sources than mixtures, but also derive a more efficient algorithm in the case of a nonovercomplete dictionary and an equal numbers of sources and mixtures. 

It would be interesting to compare these possibilities to the other methods presented in this article. In this case, the matrices A and W will have linear filters as an elements, and multiplication by an element corresponds to convolution. 

The main difficulty in a maximization problem like equation 3.9 is the bilinear term AC8, which destroys the convexity of the objective function andmakes convergence unstable when optimization starts far from the solution. 

The easiest way to perform sparse decomposition of such sources is to compute a spectrogram, the coefficients of a time-windowed discrete Fourier transform. 

The blind source separation problem is to extract the underlying source signals from a set of linear mixtures, where the mixing matrix is unknown. 

Considering the mixing matrix A as a parameter, the authors can estimate it by maximizing the probability of the observed signal X:max A[ P(X|A) = ∫ P(X|A,C)P(C) dC ] 

The authors suggest a twostage separation process: a priori selection of a possibly overcomplete signal dictionary (for instance, a wavelet frame or a learned dictionary) in which the sources are assumed to be sparsely representable, followed by unmixing the sources by exploiting the their sparse representability. 

Another possibility for ensuring the nonsingularity of W is to subtract K log |det W| from the objectivemin W,C −K log |det W| + 1 2 ‖C8−WX‖2F + µ ∑ j,k βjh(Cjk) (4.3)which (Bell & Sejnowski, 1995; Pearlmutter & Para, 1996) can be viewed as a maximum likelihood term. 

The first approach is to force each row Ai of the mixing matrix A to be bounded in norm,‖Ai‖ ≤ 1 i = 1, . . . ,N. (3.10)The second way is to restrict the norm of the rows 

we haveP(X|A,C) ∝ ∏ i,t exp− (Xit − (AC8)it) 2 2σ 2 . (3.6)By the independence of the coefficients Cjk and equation 3.1, the prior pdf of C isP(C) ∝ ∏ j,k exp(−βjh(Cjk)). 

The authors also suppose a priori that the mixing matrix A is uniformly distributed over the range of interest and that the noise ξ(t) in equation 1.3 is a spatially and temporally uncorrelated gaussian process2 with zero mean and variance σ 2.3.1 Maximum A Posteriori Approach. 

By sparsity, the authors mean the ability of the signal to be approximated by a linear combination of a small number of dictionary elements ϕk, as s ≈ cT8. 

When the noise is small and the matrix A is far from singular, WX gives a reasonable estimate of the source signals S. Taking into account equation 1.4, the authors obtain a least-squares term ‖C8−WX‖2F, so the separation objective may be written asmin W,C 1 2 ‖C8−WX‖2F + µ ∑ j,k βjh(Cjk). 

The authors consider the general case of more sources than mixtures, but also derive a more efficient algorithm in the case of a nonovercomplete dictionary and an equal numbers of sources and mixtures. 

For this reason, reliable convergence was achieved only when the search started randomly within 10% to 20% distance to the actual solution (in order to get such an initial guess, one can use a clustering algorithm, as in Pajunen et al., 1996, or Zibulevsky et al., in press).