What is the significance of the correspondence?

This correspondence provides insights that one of the most widely used heuristics, the Matching Pursuit algorithm, is stable, and offers good approximation properties when the dictionary is sufficiently incoherent.

What is the bestm-term approximation to f?

For any f 2 H andm, the error of bestm-term approximation to f from the dictionary ism(f) := inffkf DIck; card(I) m; ck 2 g: (14)When there is no ambiguity about which f is considered, the authors will simply write m.

What is the approximant to f from the finite-dimensional subspace?

once m atoms have been selected, the approximantfm =m 1n=0hrn; gk igkis generally not the best approximant to f from the finite-dimensional subspace Vm := span(gk ; . . . ; gk ).

(Open Access) On the exponential convergence of matching pursuits in quasi-incoherent dictionaries (2006) | Rémi Gribonval

Q: What is the stability condition for a Pursuit?

if The authoris a finite set, the stability condition implies that the Pursuit is actually performed in the finite-dimensional space VI .

Q: What is the main result of the Featured Theorem?

The major result of Tropp [28] is what he calls the “Exact Recovery Condition”(I) := sup k=2Ik(DI)ygkk1 < (6)(where ( )y denotes pseudoinversion, see below): when the Exact Recovery Condition is met, Weak ( ) OMP “exactly recovers” any linear combinations of atoms from the subdictionary DI , which means that Weak ( ) OMP can only pick up correct atoms at each step.

HAL Id: inria-00544945

https://hal.inria.fr/inria-00544945

Submitted on 11 Dec 2010

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

On the exponential convergence of Matching Pursuits in

quasi-incoherent dictionaries

Rémi Gribonval, Pierre Vandergheynst

To cite this version:

Rémi Gribonval, Pierre Vandergheynst. On the exponential convergence of Matching Pursuits in

quasi-incoherent dictionaries. IEEE Transactions on Information Theory, Institute of Electrical and

Electronics Engineers, 2006, 52 (1), pp.255–261. �10.1109/TIT.2005.860474�. �inria-00544945�

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006

255

[10] S. Natarajan, “Large deviations, hypothesis testing, and source coding

for ﬁnite Markov chains,” IEEE Trans. Inf. Theory, vol. IT-31, no. 3, pp.

360–365, May 1985.

[11] O. Shalvi and E. Weinstein, “New criteria for blind deconvolution of

minimum phase systems,” IEEE Trans. Inf. Theory, vol. 36, no. 2, pp.

312–321, Mar. 1990.

[12] S. M. Sowelam and A. H. Tewﬁk, “Waveform selection in radar target

classiﬁcation,” IEEE Trans. Inf. Theory, vol. 46, no. 3, pp. 1014–1029,

May 2000.

[13] L. Tong, G. Xu, and T. Kailath, “Blind identiﬁcation and equalization

based on second-order statistics: A time domain approach,” IEEE Trans.

Inf. Theory, vol. 40, no. 2, pp. 340–349, Mar. 1994.

[14] H. L. Van Trees, Detection, Estimation, and Modulation Theory.New

York: Wiley, 1968, pt. 1.

On the Exponential Convergence of Matching Pursuits in

Quasi-Incoherent Dictionaries

Rémi Gribonval, Member, IEEE, and

Pierre Vandergheynst, Member, IEEE

Abstract—The purpose of this correspondence is to extend results by

Villemoes and Temlyakov about exponential convergence of Matching Pur-

suit (MP) with some structured dictionaries for “simple” functions in ﬁnite

or inﬁnite dimension. The results are based on an extension of Tropp’s re-

sults about Orthogonal Matching Pursuit (OMP) in ﬁnite dimension, with

the observation that it does not only work for OMP but also for MP. The

main contribution is a detailed analysis of the approximation and stability

properties of MP with quasi-incoherent dictionaries, and a bound on the

number of steps sufﬁcient to reach an error no larger than a penalization

factor times the best

-term approximation error.

Index Terms—Dictionary, greedy algorithm, matching pursuit (MP),

nonlinear approximation, sparse representation.

I. INTRODUCTION

In a Hilbert space

of ﬁnite or inﬁnite dimension, we consider the

problem of getting

-term approximants of a function

from a pos-

sibly redundant dictionary

2 g

of unit norm basis func-

tions also called atoms. It will often be convenient to see a dictionary

as a synthesis operator (or, in ﬁnite dimension, as a matrix)

)

that maps sequences to vectors in

. A special class of dictionaries that

is widely used in signal and image processing is the family of frames:

a dictionary

is a frame for

if, and only if,

is a bounded oper-

ator from

onto

[2]. However, in this correspondence, we consider

dictionaries that may not be frames, hence

shall be deﬁned essen-

tially on sequences

with a ﬁnite number of nonzero entries. For any

Manuscript received April 9, 2004; revised October 13, 2005.

R. Gribonval is with IRISA-INRIA, Campus de Beaulieu, 35042 Rennes

Cedex, France (e-mail: remi.gribonval@irisa.fr).

P. Vandergheynst is with the Signal Processing Institute, the Swiss Federal

Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland (e-mail:

pierre.vandergheynst@epﬂ.ch).

Communicated by G. Battail, Associate Editor At Large.

Digital Object Identiﬁer 10.1109/TIT.2005.860474

index set

(not necessarily ﬁnite) we will also consider the restricted

synthesis operator

that corresponds to the subset

of the full dictionary.

When

is an orthonormal basis for

, it is well known how to get

the best

-term approximant to any

: the solution is to keep the

atoms of the basis which have the largest inner products

f;g

with

. However, for arbitrary redundant dictionaries, the problem becomes

NP-hard [3]. In the recent years, much effort has been made to under-

stand what structure should be imposed on

(for a given dictionary)

or on the dictionary itself so that good approximants can be obtained

with computationally feasible algorithms.

One of the ﬁrst algorithms that appeared in the signal processing

community for approximating signals from a redundant dictionary was

the Matching Pursuit (MP) algorithm of Mallat and Zhang [25], which

iteratively decomposes the analyzed function

into an

-term approx-

imant



and a residual

. MP is also

known as Projection Pursuit in the statistics community [10], [22] and

as a Pure Greedy Algorithm [27] in the approximation community. In

ﬁnite dimension, MP is known to converge exponentially, i.e., for some

<<







In inﬁnite-dimensional Hilbert spaces, Jones [24] proved that MP is

still convergent, i.e.,

, but gave no estimate of the

speed of convergence. DeVore and Temlyakov [4] exhibited a “bad”

dictionary

where there exists a “simple” function (sum of two dictio-

nary elements) for which MP gives “bad” approximations (i.e., with a

slow convergence

k

). On the positive side, Ville-

moes [30] showed that for Walsh wavelet packets, MP on “simple”

functions (

any sum of any two wavelet packets)

was exponentially convergent (just as MP in ﬁnite dimension) with



. Temlyakov obtained similar results [26]:

in particular, for

a function on the erval

;

taking constant values

on a partition of

;

into

disjoint intervals, and

a highly redun-

dant dictionary containing all (normalized) characteristic functions of

intervals



;



)

In this correspondence, we extend Villemoes and Temlyakov results

about MP to more general dictionaries and “simple functions,” as stated

in the following featured theorem.

Featured Theorem 1: Let

be a dictionary in a ﬁnite- or inﬁnite-

dimensional Hilbert space and

an index set such that the stability

condition (SC)



(

):=sup

(

)

(1)

is met, where

(

)

denotes pseudoinversion.

Then, for any

span(

)

,MP

1) picks up only correct atoms at each step:

(

n; k

)

;

2) if

is a ﬁnite set, then the residual

converges exponentially

to zero.

The stability condition (1) may look fairly abstract, but for so-called

quasi-incoherent dictionaries, one can obtain more explicit sufﬁcient

conditions [28]. For such dictionaries, we derive estimates of the rate of

Basic reminders on pseudoinversion are given in Section II-E

256 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006

exponential convergence of MP, and we obtain the following featured

theorem.

Featured Theorem 2: Let

be a dictionary in a ﬁnite- or inﬁnite-di-

mensional Hilbert space and let



:max

be its coherence.

For any ﬁnite index set

of size card

(

(1 + 1

=

)

and

any

span (

)

,MP

1) picks up only correct atoms at each step:

(

n; k

)

;

2) converges exponentially



((1

)(1 +



))

The proof of this theorem is based on an argument given by Tropp

[28] where the condition (1) is called “Exact Recovery Condition”

(ERC) because it ensures that Orthonormal Matching Pursuit (OMP)

and Basis Pursuit (BP) exactly recover any

span (

)

We have chosen to rename the ERC a “stability condition.” Indeed, for

MP one cannot strictly speak about recovery, however, the theorem is

deﬁnitely a stability result since all residuals remain in the subspace

span

(

)

H

. Tropp’s result was the last of a series of

“recovery” results: ﬁrst with the BP “algorithm”—which was intro-

duced [1] as an alternative to MP since the latter cannot resolve close

atoms—under some assumptions on both the analyzed function and the

dictionary [6]–[9], [20], [19]; then with variants of the MP [12], [13].

After the ﬁrst draft of this manuscript was submitted for publication, it

came to our attention that Donoho, Elad, and Temlyakov also consider

stability and recovery properties of MP in incoherent dictionaries [5].

We discuss in more details in Section V how our results are connected

to other approaches.

The previous theorems only explain the behavior of MP on exact ex-

pansions, i.e., they require that the approximated function

be exactly

expressed as an expansion from a “good” set of atoms. However, real

signals or images almost never have such a simple expansion in prac-

tical dictionaries. Fortunately, just as for OMP [28], the analysis of MP

as an approximation algorithm can be carried out by taking into account

how well a function is approximated by an expansion from a good set

of atoms. In particular, our results lead to the following theorem (with

the notations of Featured Theorem 2)

Featured Theorem 3: Let

be a sequence of approximants to

produced with MP with

the corresponding atoms. Let

(1+1

=

)

and let

be a best

-term approximant

from

, i.e.,



(

):=inf

;

card(

)



m;cc

Then, there is a number

such that

1) the error after

steps of MP satisﬁes

k

1+4

m

;

2) during the ﬁrst

steps, MP picks up atoms from the best

-term approximant:

;

3) if



then

is no larger than





m

In the course of this correspondence, we actually prove slightly more

general results (Theorems 1–4) and particularize them to get our fea-

tured results (Featured Theorems 1 and 3). The structure of this corre-

spondence is as follows. In Section II, we recall the deﬁnition of MP

and several variants thereof, and prove the stability result (Featured

Theorem 1). In Section III, we particularize this result to a special class

of dictionaries, quasi-incoherent dictionaries. This allows us to obtain

constraints on the dictionary so that the stability condition is met and

we also give estimates on the rate of convergence of MP in these cases

(Featured Theorem 2). Finally, in Section IV, we explore the approxi-

mation properties of various ﬂavors of MP. In particular, we show that

greedy algorithms may robustly select atoms participating in a near best

-term approximation and give the resulting approximation bounds

(Featured Theorem 3).

The proof of Featured Theorem 1 is merely a rewriting of Tropp’s

proof with the observation that it does not only work for OMP but

also for MP. Thus, the main contribution of this correspondence is in

the study of the approximation and stability properties of greedy algo-

rithms with quasi-incoherent dictionaries.

II. M

ATCHING PURSUIT(S) ON “SIMPLE”EXPANSIONS

In this section, we ﬁrst recall the deﬁnition of MP and several vari-

ants thereof, then we prove the stability of all these variants, in the sense

of Featured Theorem 1.

A. Matching Pursuit (MP)

MP is an iterative algorithm that builds

-term approximants

and

residuals

by adding one term at a time in the approximant.

It works as follows. At the beginning, we set

and

;

assuming

and

are deﬁned, we set

=sup

(2)

(3)

and compute a new residual as

B. Weak MPs

When the dictionary is inﬁnite, the supremum in (2) may not be at-

tained, so one may have to consider the so-called weak selection rule

ij 



sup

(4)

with some ﬁxed

<



independent of

. Corresponding variants

of MP will be called Weak MP with weakness parameter



, or in short

Weak

(



)

MP or even Weak MP when the value of



does not need to

be speciﬁed.

C. Orthonormal MP

Moreover, once

atoms have been selected, the approximant

is generally not the best approximant to

from the ﬁnite-dimensional

subspace

:= span(

;

...

)

. OMP, respectively, Weak

(



)

OMP, replaces the update rule (3) with

(5)

where

is the orthonormal projector onto the ﬁnite dimensional sub-

space

D. General MPs

More generally, one can consider the family of approximation algo-

rithms based on the repeated application of two steps:

1) a (weak) selection step according to (4);

2) an update step where a new approximant

chosen.

Algorithms from this larger family will be called General MP, Weak

(



)

General MP, or Weak General MP. Examples of Weak

(



)

General

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006

257

MP algorithms include the High Resolution Pursuits [23], [16], which

were introduced to attenuate the lack of resolution of plain MP with

time–frequency dictionaries in the time domain.

E. Stability of Weak

(



)

General MP

The major result of Tropp [28] is what he calls the “Exact Recovery

Condition”



(

):=sup

(

)

<

(6)

(where

(

)

denotes pseudoinversion, see below): when the Exact Re-

covery Condition is met, Weak

(



)

OMP “exactly recovers” any linear

combinations of atoms from the subdictionary

, which means that

Weak

(



)

OMP can only pick up correct atoms at each step. Tropp’s

proof indeed works for Weak

(



)

General MP, with the only difference

that we no longer get exact recovery but only stability of the Pursuit,

as stated in the following theorem.

Theorem 1: Let

be an index set (ﬁnite or inﬁnite) with



(

)

For any

and

>

(

)

, Weak

(



)

General MP picks

up a correct atom at each step, i.e., for all



Note that we do not assume that the elements

are linearly

independent for the result to hold true. Before giving the proof of the

theorem, let us give a quick reminder on the notion of pseudoinverse.

Most of this material can be found in the usual suspects [14], [21].

Let

be a linear operator and let Range

be its range. The pseu-

doinverse

is the left inverse that is zero on

Range

. It is also

the left inverse of minimal sup norm. In the case of general

ma-

trices, we will make use of the Moore–Penrose pseudoinverse. It is the

unique

matrix that satisﬁes the following properties:

(

)

and

(

)

where

(

)

denotes the adjoint. In particular,

is an orthonormal

projection onto Range

. If the inverse of

exists, the Moore–Pen-

rose pseudoinverse can simply be written

)

Proof of Theorem 1: Just as the proof of exactness of OMP by

Tropp (which is a special case), we can show by induction that at each

step MP picks up an atom

, so the residual

remains in the

ﬁnite-dimensional space

= span(

)

. Initially, we have by

assumption

. Assuming that

, we notice that the

inner products

between

and

are listed

in the vector

while those with

;k=

are listed in



where

k;k

. Thus, the atom

is a correct one (i.e.,

) if and only if



(

I; r

):=

<:

The core of of the proof of [28, Theorem 3.1] yields



(

I; r

)





(

)

From the assumption



(

)

<

, we can infer that

and

, and we get the theorem.

F. Recovery and Convergence

Suppose that the analyzed function

belongs to

span(

)

where

satisﬁes



(

)

, and that we perform some Weak

(



)

Gen-

eral MP with

>

(

)

: Theorem 1 states that the Pursuit will only

pick up correct atoms.

In the particular case of an Orthogonal Pursuit, since each residual

is orthogonal to previously selected atoms

;

...

, any atom

can only be picked up once by the Pursuit. As a result, if in addition

is a ﬁnite set of cardinality

, the Orthogonal Pursuit exactly recovers

iterations: this is the main result formalized by Tropp and al-

ready present—though not with such a clear statement—in the results

of Gilbert et al. [12], [13].

If the Pursuit we are performing on

is not orthogonal, it is known

that convergence does not generally occur in a ﬁnite number of steps.

However, if

is a ﬁnite set, the stability condition implies that the Pur-

suit is actually performed in the ﬁnite-dimensional space

. In the

case of Weak MP, it follows [25] that we have exponential convergence,

just as stated in Featured Theorem 1. In the next section, we provide

some tools to estimate the rate of this convergence, and it will turn out

that they also make it possible to estimate the speed of convergence of

(Weak) OMP.

III. MP

IN QUASI-INCOHERENT DICTIONARIES

In the previous section, we have given fairly abstract conditions to

ensure stability of Weak General MP, exact recovery with Weak OMP,

and exponential convergence of Weak MP toward the approximated

function. However, the quantity



(

)

that appears in the stability con-

dition (6) is not very explicit, and we did not yet provide estimates for

the rate of exponential convergence.

In this section, we will show that we can use the so-called cumulative

coherence function

of the dictionary to estimate



(

)

—and check the

Stability Condition—as well as the rate of exponential convergence of

Plain MP.

A. Cumulative Coherence Function and Coherence

Deﬁnition 1: Let

be a dictionary. Its cumulative coherence func-

tion is deﬁned for each integer





(

):= max

card(

)



max

(7)

As a special case, for

, the value of the cumulative coherence

function is the so-called coherence of the dictionary



(1) = max

(8)

One easily observes that the cumulative coherence function is subad-

ditive



(

)





(



(

)

;

k;l

hence, we have



(

)





m; m



. A dictionary is called in-

coherent if



is small: typically, in ﬁnite dimension

, any dictionary

that strictly contains an orthonormal basis has coherence





The union of the Dirac and the Fourier bases is an incoherent dictionary

where indeed



and



(



. When the cumulative

coherence function grows no faster than



, we say that the dictio-

nary is quasi-incoherent.

B. Explicit Stability Condition and Rate of Convergence

Using Neumann series, Tropp proved that whenever

is of size

such that



(

, we have the upper bound



(

)





(

)



(

(9)

From this estimate, we can derive the following theorem which shows

that the cumulative coherence function



can provide both a practical

Stability Condition for Weak General MP and an estimate of the rate

of exponential convergence for Weak MP.

Formerly known as the Babel function.

258 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006

Theorem 2: Let

be an integer such that



(



(

(10)

Then for any index set

of size at most

,any

span (

)

and

>

(

)



(

1))

1) Weak

(



)

General MP picks up a correct atom at each step, i.e.,

for all



2) Weak

(



)

MP/OMP converge exponentially to

: more pre-

cisely, we have



(



(



))

with



(



):=1





(

1))

=m:

(11)

Before we prove the theorem, we need a few lemmas.

Lemma 1: For any index set

with card

(

, the squared

singular values of

exceed



(

The proof relies on Gershgorin Disc Theorem and can be found in

[12], [6], [15], [28], see for example [28, Lemma 2.3]. The second im-

portant lemma is due to DeVore and Temlyakov [4]; it gives a lower

estimate on the amount of energy of a signal that can be removed in

one step of MP.

Lemma 2 (DeVore,Temlyakov): For any

and

sup

ij 

We can now prove Theorem 2.

Proof of Theorem 2: The stability result is trivial using the esti-

mate (9) together with Theorem 1. Let us proceed with the exponential

convergence of Weak

(



)

MP/OMP. From the stability part we know

that at each step the residual

of Weak

(



)

MP/OMP is in

. Thus,

for some sequence

with at most

nonzero

elements. Denoting



, the smallest nonzero singular value of

,it

follows using Lemma 1 that









(

Then, by Lemma 2, we obtain

sup

ij 

k



(

We conclude by noticing that

(a)

k

0jh

k



sup







(

1))





(



)

111

(



(



))



(



))

Notice that

(a)

is an equality for MP and an inequality for OMP.

The above estimate is valid for the whole range of admissible weak-

ness parameter



corresponds to the standard “full search” Pur-

suit while





(

)



(

1))

gives the worst case estimate

corresponding to the limiting case of the weakest allowable Pursuit. To

avoid carrying unnecessary heavy notations throughout the rest of the

correspondence, from now on we will only consider the case of a full

search Pursuit.

C. Estimates Based on the Coherence

For any dictionary, we have seen that the cumulative coherence func-

tion can be bounded using the coherence as



(

)





m; m



Thus, a sufﬁcient condition to get the stability condition (10) with the

cumulative coherence function becomes a condition based on the co-

herence:



(12)

If the dictionary is a union of incoherent orthonormal bases in ﬁnite

dimension

[20], then indeed



(



for



and (12) is equivalent to (10). In any case, the rate



(1)

exponential convergence of a (full search) MP is estimated from above



(1) = 1



(

1))



)(1+



)

(13)

The combination of (3) with Theorem 2 yields our Featured Theorem 2.

IV. MP

AS AN APPROXIMATION ALGORITHM

So far we have considered the behavior of (Weak) MP on exact sparse

expansions in the dictionary. However, the set of functions with an

exact sparse expansion

Range

;

card

(

)

dim

is negli-

gible, hence, it is more interesting to know what is the behavior of Pur-

suits on more general vectors, typically on

“close enough” to some

with an exact sparse expansion.

A. Best

-Term Approximation

For any

and

, the error of best

-term approximation to

from the dictionary is



(

):=inf

;

card(

)



m; c

2 g

(14)

When there is no ambiguity about which

is considered, we will

simply write



.For

, let

be a best

-term

approximation to

, i.e., with card

(

)



and



If a best

-term approximant does not exist (because the inﬁmum

in the deﬁnition of



is not reached), one can consider a near

best

-term approximant by letting

>

and only requiring

=(1+



)



. In any case, without loss of generality, we

can assume that

1) the atoms

are linearly independent;

is the orthogonal projection of

onto span

(

)

;

else, we could easily replace

with a better

-term approximant to

by either changing the coefﬁcients

or selecting a subset



corresponding to linearly independent atoms with

span(

) = span(

)

B. Robustness Theorem

From Theorem 1, we know that if

satisﬁes the stability condition,

then General MP performed on

is stable. The following theorem is

a robustness result which shows that if

is “close enough” to

, the

atoms selected during “the ﬁrst iterations” of a Pursuit will coincide

with those which would be selected by a Pursuit on

, which can be

considered as the correct ones.

On the exponential convergence of matching pursuits in quasi-incoherent dictionaries

Citations

Greed is good: algorithmic results for sparse approximation

From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images

Handbook of Blind Source Separation: Independent Component Analysis and Applications

Block-Sparse Signals: Uncertainty Relations and Efficient Recovery

Iterative thresholding for sparse approximations

References

Matrix Analysis

Atomic Decomposition by Basis Pursuit

Matching pursuits with time-frequency dictionaries

Matrix analysis: Frontmatter

Greed is good: algorithmic results for sparse approximation

Related Papers (5)

Greed is good: algorithmic results for sparse approximation

Matching pursuits with time-frequency dictionaries

Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition

Atomic Decomposition by Basis Pursuit

Compressed sensing

Frequently Asked Questions (5)

Q1. What is the stability condition for a Pursuit?

Q2. What is the significance of the correspondence?

Q3. What is the main result of the Featured Theorem?

Q4. What is the bestm-term approximation to f?

Q5. What is the approximant to f from the finite-dimensional subspace?