Support Recovery for Sparse
Deconvolution of Positive Measures
Quentin Denoyelle
∗
, Vincent Duval
†
and Gabriel Peyr´e
‡
September 1, 2016
Abstract
We study sparse spikes deconvolution over the space of Radon mea-
sures on R or T when the input measure is a finite sum of positive Dirac
masses using the BLASSO convex program. We fo c u s o n th e recovery
properties of the support and the a mp l it u d es of the initial m ea su re in the
presence of noise as a function of the minimum separation t of t h e input
measure (the minimum distance between two spikes). We show that when
w/ λ , w/t
2N −1
and λ/t
2N −1
are small enough (where λ is the regula riz a -
tion parameter, w the noise an d N the number of spikes), which corre-
sponds roughly to a sufficient signal-to-noise ratio and a noise level small
enough with respect to the minimum separation, there exists a unique
solution to the BLASSO program with exa c tl y the same number of spikes
as the original measure. We show that the amplitudes and positions of
the spikes of the solution both converge toward those of the input measure
when the noise and the regulariza t io n parameter drops to zero faster than
t
2N −1
.
1 Introduction
1.1 Super-Resolution and Spar s e Spikes Deconvolution
Super-resolution consists i n retrieving fine scale details of a possibly noisy
signal from coarse scale information. The importance of recovering the high
frequencies of a signal comes from the fact that there is often a physical blur in
the acquisition pr ocess, such as diffraction in optical systems, wave reflection in
seismic imaging or spikes recording from neuronal activity.
In resolution theory, the two-point resolution criterion defines the ability of
a system to resolve two points of equal intensities. As a point source produces
a diffraction pattern which is centered about the geometrical image point of the
∗
CEREMADE, Univ. Paris-Dauphine, denoyelle@ceremade.dauphine.fr
†
INRIA Rocquencourt, MOKAPLAN, vincent.duval@inria.fr
‡
CNRS & CEREMADE, Univ. Paris-Dauphine, gabriel.peyre@ceremade.dauphine.fr
1
arXiv:1506.08264v2 [cs.IT] 31 Aug 2016
point source, it is often admitted that two poi nts are re sol ved by a system if the
central maximum of the intensity diffraction of one poi nt source coinci de s with
the first zero of the intensity diffraction pattern of the other point. This defines
a distance that only depends on the system and which is called the Rayleigh
Length. In the case of the ideal low-pass fi lt e r (meaning that the input signal is
convolved with the Dirichlet kernel, see (9) for the exact definition) with cu t off
frequency f
c
, the Rayleigh Length is 1/f
c
. We refer to [9] for more details
about resolution the or y. Super-resol u ti on in signal processi ng thus consi st s in
developing techniques t hat enable t o retrieve information below the Rayleigh
Length.
Let us introduce in a more formal way the problem which will be the core
of this article. Let X be the real line R or the 1-D torus T = R/Z and M(X)
the Banach space of bounded Radon measur es on X, which can be seen as the
topological dual of the space C
X
where C
X
is either the space of continuous
functions on R that vanish at infinity when X = R or the space of continuous
functions on T when X = T. We consider a given integral operator Φ : M(X) →
H, where H is a separable Hilbert space, whose kernel ϕ is s up posed to be a
smooth funct i on (see Definition 1 for the technical assumptions made on ϕ), i.e.
∀m ∈ M(X), Φm =
Z
X
ϕ(x)dm(x). (1)
Φ represe nts the acquisition operator and can for instance account for a blu r in
the measurements. In the sp e ci al case of ϕ(x) = ˜ϕ(· − x), Φ is a convolution
operator. We denote by m
a
0
,tz
0
=
P
N
i=1
a
0,i
δ
tz
0,i
our input sparse spikes train
where the a
0,i
∈ R
∗
+
are the amplitudes of the Dirac masses at positions tz
0,i
∈
X. Let y
t
= Φm
a
0
,tz
0
be the noiseless observation. The parameter t > 0 controls
the minimum separation distance between the spikes, and this paper aims at
studying the recovery of m
a
0
,tz
0
from y
t
+ w (where w ∈ H is some noise) when
t is small.
1.2 From the LASSO to the BLASSO
LASSO. ℓ
1
regularization techniques were first introduced in geophysics (see
[6, 17, 20]) for seismic prospecting. Indeed, the density changes in the un-
derground can be modeled as a sparse spikes train. ℓ
1
reconstruction property
provides solutions with few non ze ro coefficients and can be solved efficiently with
convex optimization methods. Donoho theoretically studied and justified these
techniques in [11]. In statistics, the ℓ
1
norm is used in the Lasso method [23]
which consists in minimizing a quad rat i c error subject to a ℓ
1
penalization. As
the au th or s remarked it, it retains both the features of subset selection (by set-
ting to zero some co effic i ents, thanks to the property of the ℓ
1
norm to favor
sparse solutions) and ridge regression (by shrinking the other coefficients). In
signal processing, the basis pursuit metho d [22] uses t h e ℓ
1
norm to decompose
signals into overcomplete dictionaries.
2
BLASSO. Following recent works (see for instance [1, 3, 5, 8, 12]), the sparse
deconvolution method that we consider in this article operates over a continuous
domain, i.e. wit h out resorting to some sort of discretization on a grid. The in-
verse prob l em is solved over the space of Radon measu r es which is a non-reflexive
Banach space. This c ontinuous “grid free” setting makes the mathematical anal-
ysis easier and all ows us to make precise statement about the location of the
recovered spikes locations.
The te chnique that we study in this paper consists in solving a convex opti-
mization problem that uses the t ot al variation norm which is the equivalent of
the ℓ
1
norm for measures. The ℓ
1
norm is known to be particularly well fitted
for the recovery of sparse signals. Thus the use of the total variation norm
favors the emergence of spikes in the solution.
The total variation norm is defined by
∀m ∈ M(X), |m|(X)
def.
= sup
ψ∈C
X
Z
X
ψdm ; kψk
L
∞
(X)
6 1
.
In particular,
|m
a
0
,tz
0
|(X) = ka
0
k
1
,
which shows in a way that the total variation norm generalizes the ℓ
1
norm to
the continuous set t i ng of measures (i.e. no discretization grid i s required).
The first method that we are interested in is the classical basis pursuit,
defined originally in [22] in a finite dimensional setting, and written her e over
the space of Radon measures
min
m∈M(X)
{|m|(X) ; Φm = y
t
}. (P
0
(y
t
))
This is the problem studied in [5], in the case where Φ is an ideal low-pass filter
on the torus (i.e. X = T).
When th e signal is noisy, i.e. when we observe y
t
+ w instead, with w ∈ H,
we may rather consider the problem
min
m∈M(X)
1
2
kΦm − (y
t
+ w)k
2
H
+ λ|m|(X). (P
λ
(y
t
+ w) )
Here λ > 0 is a parameter that should adapted to the noise level kwk
H
. This
problem is coined “BLASSO” in [8].
While this is not the focus of this article, we note that there exist algori t h ms
to solve the infinite dimensional convex pr ob le ms (P
0
(y
t
)) and (P
λ
(y
t
+ w) ), see
for instance [3, 5].
1.3 Previous Works
LASSO/BLASSO performance anal y si s. In order to quantify the recov-
ery performan ce of the methods P
0
(y
t
) and P
λ
(y
t
+ w), the following two ques-
tions arise:
3
1. Does the solutions of P
0
(y
t
) recover the input measure m
a
0
,tz
0
?
2. How close is the solution of P
λ
(y
t
+ w) to the solution of P
0
(y
t
) ?
When the amplitudes of the spikes are arbitrary compl ex numbers, the an-
swers to the above questions require a large enough minimum separation dis-
tance ∆(tz
0
) between t he spikes where
∆(tz
0
)
def.
= min
i6=j
d
X
(tz
0,i
, tz
0,j
). (2)
where d
X
is either the canonical distance on R i.e.
∀x, y ∈ R, d
X
(x, y) = |x − y|, (3)
when X = R, or the canonical induced distance on T i.e.
∀x, y ∈ R, d
X
(x + Z, y + Z) = min
k∈Z
|x − y + k|, (4)
when X = T. The fi r st question is addressed in [5] where the authors showed,
in the case of Φ being the ideal low-pass filter on the torus (see (9)), i.e. when
H = T, that m
a
0
,tz
0
is the unique solution of P
0
(y
t
) provided that ∆(tz
0
) >
C
f
c
where C > 0 is a universal constant and f
c
the cutoff frequency of the ideal
low-pass filter. In the same paper, it is shown that C 6 2 when a
0
∈ C
N
and
C 6 1.87 when a
0
∈ R
N
. In [12], the authors show that ne ce ssar i l y C >
1
2
.
The second question receives partial answers in [14, 3, 4, 13]. In [3], it is
shown that if the solution of P
0
(y
t
) is unique then the measures recovered by
P
λ
(y
t
+ w) converge in the weak-* sense to the solution of P
0
(y
t
) when λ → 0
and kwk
H
/λ → 0. In [4], the authors measure the reconstructi on error usin g
the L
2
norm of an ideal low-pass filtered version of the recovered measures.
In [14, 13], error bounds are given on the locat i ons of the recovered spi kes with
respect to those of the input measure m
a
0
,tz
0
. However, those works provide
little information about the structure of the measures recovered by P
λ
(y
t
+ w).
That point is addressed in [12] where the authors show that under the Non
Degenerate Source Condition (see Section 2 for more details), the r e exist s a
unique solution to P
λ
(y
t
+ w) with the exact same numb e r of spikes as the
original measure provided t h at λ and kwk
H
/λ are small enough. Moreover
in that re gi me, this solution converges to the original measure when the noise
drops to zero.
LASSO/BLASSO for positive spikes. For positive spikes (i.e. a
0,i
> 0),
the picture is radically diffe re nt, since exact recovery of m
a
0
,tz
0
without noise
(i.e. (w, λ) = (0, 0)) holds for all t > 0, see for instanc e [8]. Stability con-
stants however explode as t → 0. A recent work [19] shows however that stable
recovery is obtained if the signal -t o-n oi se ratio grows f ast er than O(1/t
2N
),
closely matching optimal lower bounds of O(1/t
2N− 1
) obtained by combinato-
rial methods, as also proved recently [10] . Our main contribution is to show
that the same O(1/t
2N− 1
) signal-to-noise scaling in fact guarantees a perfect
4
support recovery of the spikes under a certain non-degeneracy condi t ion on the
filter. This extends, for positive measures, the initial results of [12] by providing
an asymptotic analysis wh en t → 0.
MUSIC and related methods. There is a large body of literature in signal
processing on spectral methods to perform spikes location from low frequency
measurements. One of the most popular methods is MUSIC (for MUltiple Signal
Classification) [21] and we refer to [15] for an overview of its use in sign al pro c es s-
ing for line spect ral estimation. In the noiseless case, exact reconstructi on of the
initial si gn al is guaranteed as long as there are enough observations compared
to the numbe r of distinct frequencies [18]. Stabi l i ty to noise is known to h old
under a minimum separation di s t anc e similar to the one of the BLASSO [18].
However, on sharp contrast with the behavior of the BLASSO, numerical obser-
vations (see for instance [7]), as well as a recent work of Demanet and Nguyen,
show that this stability continues to hold regardless of the sign of the ampli-
tudes a
0,i
, as soon as the signal-to-noise ratio scales l ike O(1/t
2N− 1
). Note that
this m at ches (when w is a Gaussian white noise) the Cramer-Rao lower bound
achievable by any unbiased es t im at or [2]. This class of methods are thus more
efficient than BLASSO for arbitrary measures, but they are restricted to oper-
ators Φ that are convolution with a low-pass filter, which is not the case of our
analysis for the BLASSO.
1.4 Contributions
Main results. From these p re v ious works, one can ask whet h er exact support
estimation by BLASSO for positive spikes is achievable when t tends to 0. Our
main result, Theorem 2, shows t hat this is indeed the case. It states, under some
non-degeneracy condition on Φ, that there exists a unique solution t o P
λ
(y
t
+w)
with the exact same number of spikes as the original measure provided that
kwk
H
/λ, kwk
H
/t
2N− 1
and λ/t
2N− 1
are small enough. Moreover we give an
upper bound, in that regime, on the error of the recovered measure with respect
to the initial measure. As a by-product, we show that the amplitudes and
positions of the spikes of the solution both converge towards those of the input
measure when the noise and the regular iz at i on parameter tend to zero faster
than t
2N− 1
.
Extensions. We consider in this article t he case where all the spikes locations
tz
0
cluster near zero. Following for instance [19], it is possible to consider a
more general model w it h several cluster points, where the sign of the Diracs
is the same around each of these points. Our analysis, although more difficult
to perform, could be e x t en de d to this set ti n g, at the pr ic e of modifying the
definition of η
W
(see Definition 3) to account for several cluster poi nts.
Lastly, if the kernel Φ under consideration has some specific scale σ (such
as the standard deviation of a Gaussian kernel, or σ = 1/f
c
for the ideal low-
pass filter in the case of the deconvolution on the torus), then it is possible to
5