What have the authors stated for future works in "Blind source separation by sparse decomposition in a signal dictionary" ?

It would be interesting to compare these possibilities to the other methods presented in this article. In this case, the matrices A and W will have linear filters as an elements, and multiplication by an element corresponds to convolution.

What is the main difficulty in a maximization problem like equation 3.9?

The main difficulty in a maximization problem like equation 3.9 is the bilinear term AC8, which destroys the convexity of the objective function andmakes convergence unstable when optimization starts far from the solution.

What is the easiest way to perform sparse decomposition of sources?

The easiest way to perform sparse decomposition of such sources is to compute a spectrogram, the coefficients of a time-windowed discrete Fourier transform.

How can the authors estimate the mixing matrix A?

Considering the mixing matrix A as a parameter, the authors can estimate it by maximizing the probability of the observed signal X:max A[ P(X|A) = ∫ P(X|A,C)P(C) dC ]

What is the way to ensure the nonsingularity of W?

Another possibility for ensuring the nonsingularity of W is to subtract K log |det W| from the objectivemin W,C −K log |det W| + 1 2 ‖C8−WX‖2F + µ ∑ j,k βjh(Cjk) (4.3)which (Bell & Sejnowski, 1995; Pearlmutter & Para, 1996) can be viewed as a maximum likelihood term.

What is the first approach to limiting the norm of the rows?

The first approach is to force each row Ai of the mixing matrix A to be bounded in norm,‖Ai‖ ≤ 1 i = 1, . . . ,N. (3.10)The second way is to restrict the norm of the rows

What is the prior pdf of C?

we haveP(X|A,C) ∝ ∏ i,t exp− (Xit − (AC8)it) 2 2σ 2 . (3.6)By the independence of the coefficients Cjk and equation 3.1, the prior pdf of C isP(C) ∝ ∏ j,k exp(−βjh(Cjk)).

What is the a priori probability of the noise in equation 1.3?

The authors also suppose a priori that the mixing matrix A is uniformly distributed over the range of interest and that the noise ξ(t) in equation 1.3 is a spatially and temporally uncorrelated gaussian process2 with zero mean and variance σ 2.3.1 Maximum A Posteriori Approach.

What is the definition of a linear combination of a small number of dictionary elements?

By sparsity, the authors mean the ability of the signal to be approximated by a linear combination of a small number of dictionary elements ϕk, as s ≈ cT8.

What is the simplest way to get a reliable estimate of the source signals?

When the noise is small and the matrix A is far from singular, WX gives a reasonable estimate of the source signals S. Taking into account equation 1.4, the authors obtain a least-squares term ‖C8−WX‖2F, so the separation objective may be written asmin W,C 1 2 ‖C8−WX‖2F + µ ∑ j,k βjh(Cjk).

What is the way to separate a signal from a mixture?

The authors consider the general case of more sources than mixtures, but also derive a more efficient algorithm in the case of a nonovercomplete dictionary and an equal numbers of sources and mixtures.

What is the reason why PBM used a clustering algorithm?

For this reason, reliable convergence was achieved only when the search started randomly within 10% to 20% distance to the actual solution (in order to get such an initial guess, one can use a clustering algorithm, as in Pajunen et al., 1996, or Zibulevsky et al., in press).

(Open Access) Blind Source Separation by Sparse Decomposition in a Signal Dictionary (2001) | Michael Zibulevsky

Q: What contributions have the authors mentioned in the paper "Blind source separation by sparse decomposition in a signal dictionary" ?

The authors suggest a twostage separation process: a priori selection of a possibly overcomplete signal dictionary ( for instance, a wavelet frame or a learned dictionary ) in which the sources are assumed to be sparsely representable, followed by unmixing the sources by exploiting the their sparse representability. The authors consider the general case of more sources than mixtures, but also derive a more efficient algorithm in the case of a nonovercomplete dictionary and an equal numbers of sources and mixtures.

Q: What is the method for separating a signal?

The authors suggest a twostage separation process: a priori selection of a possibly overcomplete signal dictionary (for instance, a wavelet frame or a learned dictionary) in which the sources are assumed to be sparsely representable, followed by unmixing the sources by exploiting the their sparse representability.

LETTER Communicated by Michael Lewicki

Blind Source Separation by Sparse Decomposition in a Signal

Dictionary

Michael Zibulevsky

Department of Computer Science, University of New Mexico, Albuquerque, NM 87131,

U.S.A.

Barak A. Pearlmutter

Department of Computer Science and Department of Neurosciences, University of New

Mexico, Albuquerque, NM 87131, U.S.A.

The blind source separation problem is to extract the underlying source

signals from a set of linear mixtures, where the mixing matrix is unknown.

This situation is common in acoustics, radio, medical signal and image

processing, hyperspectral imaging, and other areas. We suggest a two-

stage separation process: a priori selection of a possibly overcomplete

signal dictionary (for instance, a wavelet frame or a learned dictionary) in

which the sources are assumed to be sparsely representable, followed by

unmixing the sources by exploiting the their sparse representability. We

consider the general case of more sources than mixtures, but also derive a

more efﬁcient algorithm in the case of a nonovercomplete dictionary and

an equal numbers of sources and mixtures. Experiments with artiﬁcial

signals and musical sounds demonstrate signiﬁcantly better separation

than other known techniques.

1 Introduction

In blind source separation an N-channel sensor signal x(t) arises from M un-

known scalar source signals s

(t), linearly mixed together by an unknown

N × M matrix A, and possibly corrupted by additive noise ξ(t),

x(t) = As(t) + ξ(t). (1.1)

We wish to estimate the mixing matrix A and the M-dimensional source

signal s(t). Many natural signals can be sparsely represented in a proper

signal dictionary:

(t) =

k=1

(t). (1.2)

The scalar functions ϕ

(t) are called atoms or elements of the dictionary.

These elements do not have to be linearly independent and instead may

Neural Computation 13, 863–882 (2001)

° 2001 Massachusetts Institute of Technology

864 Michael Zibulevsky and Barak A. Pearlmutter

form an overcomplete dictionary. Important examples are wavelet-related

dictionaries (e.g., wavelet packets, stationary wavelets; see Chen, Donoho,

& Saunders, 1996; Mallat, 1998) and learned dictionaries (Lewicki & Se-

jnowski, in press; Lewicki & Olshausen, 1998; Olshausen & Field, 1996,

1997). Sparsity means that only a small number of the coefﬁcients C

differ

signiﬁcantly from zero.

We suggest a two-stage separation process: a priori selection of a possibly

overcomplete signal dictionary in which the sources are assumed to be

sparsely representable and then unmixing the sources by exploiting their

sparse representability.

In the discrete-time case t = 1, 2,...,T we use matrix notation. X is an

N × T matrix, with the ith component x

(t) of the sensor signal in row i, S is

an M × T matrix with the signal s

(t) in row j, and 8 is a K × T matrix with

basis function ϕ

(t) in row k. Equations 1.1 and 1.2 then take the following

simple form:

X = AS + ξ (1.3)

S = C8. (1.4)

Combining them, we get the following when the noise is small:

X ≈ AC8.

Our goal therefore can be formulated as follows: Given the sensor signal ma-

trix X and the dictionary 8, ﬁnd a mixing matrix A and matrix of coefﬁcients

C such that X ≈ AC8 and C is as sparse as possible.

We should mention other problems of sparse representation studied in

the literature. The basic problem is to represent sparsely scalar signal in

given dictionary (see Chen et al., 1996). Another problem is to adapt the

dictionary to the given class of signals

(Lewicki & Sejnowski, 1998; Lewicki

& Olshausen, 1998; Olshausen & Field, 1997). This problem is shown to

be equivalent to the problem of blind source separation when the sources

are sparse in time (Lee, Lewicki, Girolami, & Sejnowski, 1999; Lewicki &

Sejnowski, in press). Our problem is different, but we will use and generalize

some techniques presented in these works.

Independent factor analysis (Attias, 1999) and Bayesian blind source sep-

aration (Rowe, 1999) also consider the case of more sources than mixtures.

In our approach, we take an advantage when the sources are sparsely repre-

sentable. In the extreme case, when the decomposition coefﬁcients are very

sparse, the separation becomes practically ideal (see section 3.2 and the six

ﬂutes example in Zibulevsky, Pearlmutter, Boﬁll, & Kisilev, in press). Never-

theless, detailed comparison of the methods on real-world signals remains

open for future research.

Our dictionary 8 may be obtained in this way.

Blind Source Separation by Sparse Decomposition 865

In section 2 we give some motivating examples, which demonstrate how

sparsity helps to separate sources. Section 3 gives the problem formula-

tion in probabilistic framework and presents the maximum a posteriori

approach, which is applicable to the case of more sources than mixtures.

In section 4 we derive another objective function, which provides more ro-

bust computations when there is an equal number of sources and mixtures.

Section 5 presents sequential source extraction using quadratic program-

ming with nonconvex quadratic constraints. Finally, in section 6 we derive

a faster method for nonovercomplete dictionaries and demonstrate high-

quality separation of synthetically mixed musical sounds.

2 Separation of Sparse Signals

In this section we present two examples that demonstrate how sparsity of

source signals in the time domain helps to separate them. Many real-world

signals have sparse representations in a proper signal dictionary but not in

the time domain. The intuition here carries over to that situation, as shown

in section 3.1.

2.1 Example: Two Sources and Two Mixtures. Two synthetic sources

are shown in Figures 1a and 1b. The ﬁrst source has two nonzero samples,

and the second has three. The mixtures, shown in Figures 1c and 1d, are less

sparse: they have ﬁve nonzero samples each. One can use this observation

to recover the sources. For example, we can express one of the sources as

(t) = x

(t) + µx

(t)

and choose µ so as to minimize the number of nonzero samples k

, that

is, the l

norm of s

This objective function yields perfect separation. As shown in Figure 2a,

when µ is not optimal, the second source interferes, and the total number

of nonzero samples remains ﬁve. Only when the ﬁrst source is recovered

perfectly, as in Figure 2b, does the number of nonzero samples drop to two

and the objective function achieve its minimum.

Note that the function k

is discontinuous and may be difﬁcult to

optimize. It is also very sensitive to noise: even a tiny bit of noise would

make all the samples nonzero. Fortunately in many cases, the l

norm k

is a good substitute for this objective function. In this example, it too yields

perfect separation.

2.2 Example: Three Sources and Two Mixtures. The signals are pre-

sented in Figure 3. These sources have about 10% nonzero samples. The

nonzero samples have random positions and are zero-mean unit-variance

gaussian distributed in amplitude. Figure 4 shows a scatter plot of the mix-

tures. The directions of the columns of mixing matrix are clearly visible.

866 Michael Zibulevsky and Barak A. Pearlmutter

(a)

(b)

(c)

(d)

Figure 1: Coefﬁcients of signals, with coefﬁcient identity on the x-axis (10 co-

efﬁcients, arbitrarily ordered) and magnitude on the y-axis (arbitrarily scaled).

Sources (a and b) are sparse. Mixtures (c and d) are less sparse.

Figure 2: Coefﬁcients of signals, with coefﬁcient identity on the x-axis (10 co-

efﬁcients, arbitrarily ordered) and magnitude on the y-axis (arbitrarily scaled).

(a) Imperfect separation. Since the second source is not completely removed,

the total number of nonzero samples remains ﬁve. (b) Perfect separation. When

the source is recovered perfectly, the number of nonzero samples drops to two,

and the objective function achieves its minimum.

This phenomenon can be used in clustering approaches to source separa-

tion (Pajunen, Hyvrinen, & Karhunen, 1996; Zibulevsky et al., in press). In

this work we will explore a maximum a posteriori approach.

Blind Source Separation by Sparse Decomposition 867

Figure 3: Coefﬁcients of signals, with coefﬁcient identity on the x-axis (300 co-

efﬁcients, arbitrarily ordered) and magnitude on the y-axis (arbitrarily scaled).

(Top three panels) Sparse sources (sparsity is 10%). (Bottom two panels) Mix-

tures.

3 Probabilistic Framework

In order to derive a maximum a posteriori solution, we consider the blind

source separation problem in a probabilistic framework (Belouchrani & Car-

doso, 1995; Perlmutter & Parra, 1996). Suppose that the coefﬁcients C

in a

source decomposition (see equation 1.4) are independent random variables

with a probability density function (pdf) of an exponential type,

) ∝ exp −β

h(C

). (3.1)

This kind of distribution is widely used for modeling sparsity (Lewicki &

Sejnowski, in press; Olshausen & Field, 1997). A reasonable choice of h(c)

may be

h(c) =|c|

1/γ

γ ≥ 1 (3.2)

Blind Source Separation by Sparse Decomposition in a Signal Dictionary

Figures

Citations

Performance measurement in blind audio source separation

From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images

Online Learning for Matrix Factorization and Sparse Coding

Online Learning for Matrix Factorization and Sparse Coding

Handbook of Blind Source Separation: Independent Component Analysis and Applications

References

Maximum likelihood from incomplete data via the EM algorithm

A wavelet tour of signal processing

Learning the parts of objects by non-negative matrix factorization

Atomic Decomposition by Basis Pursuit

Learning parts of objects by non-negative matrix factorization

Related Papers (5)

Atomic Decomposition by Basis Pursuit

An information-maximization approach to blind separation and blind deconvolution

Independent Component Analysis

Independent component analysis, a new concept?

Matching pursuits with time-frequency dictionaries

Frequently Asked Questions (15)

Q1. What contributions have the authors mentioned in the paper "Blind source separation by sparse decomposition in a signal dictionary" ?

Q2. What have the authors stated for future works in "Blind source separation by sparse decomposition in a signal dictionary" ?

Q3. What is the main difficulty in a maximization problem like equation 3.9?

Q4. What is the easiest way to perform sparse decomposition of sources?

Q5. What is the definition of a blind source separation problem?

Q6. How can the authors estimate the mixing matrix A?

Q7. What is the method for separating a signal?

Q8. What is the way to ensure the nonsingularity of W?

Q9. What is the first approach to limiting the norm of the rows?

Q10. What is the prior pdf of C?

Q11. What is the a priori probability of the noise in equation 1.3?

Q12. What is the definition of a linear combination of a small number of dictionary elements?

Q13. What is the simplest way to get a reliable estimate of the source signals?

Q14. What is the way to separate a signal from a mixture?

Q15. What is the reason why PBM used a clustering algorithm?