What is the way to estimate the coefficients of the polynomial?

Instead of using the Chebyshev polynomial to estimate the coefficients of the polynomial that arises in the proof, one might use the nonnegative polynomial of leastdeviation from zero on the interval [0, 1].

What is the simplest way to calculate the m indices?

Partition T into at most 2 |T | /m disjoint blocks, each containing no more than m indices: T = T1 ∪ T2 ∪ · · · ∪ T2|T |/m. Apply (2.3) to calculate that‖FΩT‖2 ≤ 2 |T | m maxk ‖FΩTk‖2 ≤ |T | · 2s log5 n c |Ω| · 3 |Ω| 2n ≤ 1 2 .Adjusting constants, the authors obtain the result when |Ω| is not too smal.

What is the average spectral norm of a random submatrix?

Given a value of δ ∈ (0, 0.5), the authors formed one hundred random submatrices with dimensions δn × δn and computed the average spectral norm of these matrices.

(Open Access) On the Linear Independence of Spikes and Sines (2008) | Joel A. Tropp

Q: What is the main result of this paper to show that a random collection of spikes and?

The major result of this paper to show that a random collection of spikes and sines is extremely likely to be strongly linearly independent, provided that the total number of spikes and sines does not exceed a constant proportion of the ambient dimension.

Q: how many linearly dependent collections do you see?

the authors will see that the linearly dependent collections form a vanishing proportion of all collections, provided that the total number of spikes and sines is slightly smaller than the dimension n of the vector space.

Q: What is the contrapositive of the Theorem 1?

Then ‖FΩT ‖ < 1. The contrapositive of Theorem 1 is usually interpreted as an discrete uncertainty principle: a vector and its discrete Fourier transform cannot simultaneously be sparse.

Q: How can one construct examples related to the Dirac comb?

one can construct examples related to the Dirac comb which show that the failure probability is constant unless the logarithmic factor is present.

Q: what is the arithmetic principle of the large sieve?

Donoho and Logan [DL92] studythis case using the analytic principle of the large sieve, a powerful technique from number theory that can be traced back to the 1930s.

arXiv:0709.0517v2 [math.FA] 17 Apr 2008

ON THE LINEAR INDEPENDENCE OF SPIKES AND SINES

JOEL A. TROPP

Abstract. The purpose of this work is to survey what is kn own about the linear independ ence

of spikes an d sines. The paper provides new results for the case where the locations of the spikes

and the frequencies of the sines are chosen at random. This problem is equivalent to studying the

sp ectral norm of a random submatrix drawn from the discrete Fourier transform matrix. The proof

depends on an extrapolation argument of Bourgain and Tzafriri.

1. Introduction

An investigation central to sparse approximation is whether a given collection of impulses an d

complex exponentials is linearly independent. This inquiry appears in the early paper of Donoho

and Stark on uncertainty principles [DS89], and it has been repeated and ampliﬁed in the work

of subsequent authors. In deed, researchers in sparse approximation have developed a much deeper

understanding of general dictionaries by probing the structure of the unassuming dictionary that

contains only spikes and sines.

The purpose of this work is to survey what is known about the linear independence of spikes

and sines and to provide some new results on random subcollections chosen from this dictionary.

The m ethod is adapted from a paper of Bourgain–Tzafriri [BT91]. The advantage of this approach

is that it avoids some of the complicated combin atorial arguments that are u sed in related works,

e.g., [CRT06]. The proof also applies to other types of dictionaries, although we do not pursue this

line of inqu iry here.

1.1. Spikes and Sines. Let us shift to formal discussion. We work in the inner-product space

, and we use the symbol

∗

for the conjugate transpose. Deﬁne th e Hermitian inn er product

hx, yi = y

∗

x and the ℓ

vector norm kxk = |hx, xi|

1/2

. We also write k·k for the spectral norm,

i.e., the operator norm for linear maps from (C

, ℓ

) to itself.

We consider two orthonormal bases for C

. The standard basis {e

: j = 1, 2, . . . , n} is given by

(t) =

(

1, t = j

0, t 6= j

for t = 1, 2, . . . , n.

We often refer to the elements of the standard basis as spikes or impulses. The Fourier basis

: j = 1, 2, . . . , n} is given by

(t) =

√

2πijt/n

for t = 1, 2, . . . , n.

We often refer to the elements of the Fourier basis as sines or complex exponentials.

The discrete Fourier transform (DFT) is the n × n matrix F whose rows are f

∗

, f

∗

, . . . , f

∗

. The

matrix F is unitary. In particular, its spectral norm kFk = 1. Moreover, the entries of the DFT

matrix are bounded in magnitude by n

−1/2

. Let T and Ω be subsets of {1, 2, . . . , n}. We write

Date: 4 September 2007. Revised 15 April 2008.

2000 Mathematics Subject Classiﬁcation. Primary: 46B07, 47A11, 15A52. Secondary: 41A46.

Key words and phrases. Fourier analysis, local theory, random matrix, sparse approximation, uncertainty principle.

The author is with App lied & Computational Mathematics, MC 217-50, California Institute of Technology, 1200

E. California Blvd., Pasadena, CA 91125-5000. E-mail: jtropp@acm.caltech.edu. Supported by NSF 0503299.

2 JOEL A. TROPP

ΩT

for the restriction of F to the rows listed in Ω and the columns listed in T . Since F

ΩT

is a

submatrix of the DFT matrix, its spectral norm does not exceed one.

We use th e analysts’ convention that upright letters represent universal constants. We reserve

c for small constants and C for large constants. The value of a constant may change at each

appearance.

1.2. Linear Independence. Let T and Ω be subsets of {1, 2, . . . , n}. Consider the collection of

spikes and sines listed in these sets:

X = X (T, Ω) = {e

: j ∈ T } ∪{f

: j ∈ Ω}.

Today, we will discu ss methods for determining when X is linearly independent. Sin ce a lin early

independent collection in C

contains at most n vectors, we obtain a simple necessary condition

|T | + |Ω| ≤ n. Developing suﬃcient conditions, however, requires more sophistication.

We ap proach the problem by studying the Gram matrix G = G(X ), whose entries are the inner

products between pairs of elements from X . It is easy to check that the Gram matrix can be

expressed as

G =



|Ω|

ΩT

)

∗

|T |



where I

denotes an m × m identity matrix and |·| denotes the cardinality of a set.

It is well known that the collection X is linearly independent if and on ly if its Gram matrix is

nonsingular. The Gram matrix is nonsingular if and only if its eigenvalues are nonzero. A basic

(and easily conﬁrmed ) fact of matrix analysis is that the extreme eigenvalues of G are 1 ±kF

ΩT

Therefore, the collection X is linearly independent if and only if kF

ΩT

k < 1.

One may also attempt to quantify the extent to w hich collection X is linearly independent.

To that end, deﬁne the condition number κ of the Gram matrix, which is the ratio of its largest

eigenvalue to its smallest eigenvalue:

κ(G) =

1 + kF

ΩT

1 − kF

ΩT

If kF

ΩT

k is bounded away from one, then the condition number is constant. One may interpret

this statement as evidence the collection X is strongly linearly independent. The reason is that

the condition number is the reciprocal of the relative spectral-norm distance between G an d the

nearest singular matrix [Dem97, p. 33]. As we have mentioned, G is singular if and only if X is

linearly dependent.

This article focuses on s tatements about linear independence, rather than conditioning. Never-

theless, many results can be adapted to obtain precise inf ormation about the size of kF

ΩT

1.3. Summary of Results. The major result of this paper to show that a random collection of

spikes and s ines is extremely likely to be strongly linearly independent, provided that the total

number of spikes and sines does not exceed a constant proportion of the ambient dimension. We

also provide a result which shows that the norm of a properly scaled random submatrix of the DFT

is at most constant with high probability. For a more detailed statement of these theorems, turn

to Section 2.3.

1.4. Outline. The next section provides a survey of bounds on the norm of a submatrix of the

DFT matrix. It concludes with d etailed new results for the case where the submatrix is random.

Section 3 contains a proof of the new results. Nu merical experiments are presented in Section 4,

and Section 5 describes some additional research directions. Appendix A contains a proof of the

key background result.

SPIKES A ND SINES 3

2. History and Results

The strange, eventful history of our problem can be viewed as a sequence of bounds on norm of

the matrix F

ΩT

. Results in th e literature can be divided into two classes: the case where the sets

Ω and T are ﬁxed and the case where one of the sets is ran dom. In this work , we investigate what

happens when both sets are chosen randomly.

2.1. Bounds for ﬁxed sets. An early result, due to Donoho and Stark [DS89], asserts that an

arbitrary collection of spikes and sines is linearly independent, provided that the collection is not

too big.

Theorem 1 (Donoho–Stark). Suppose that |T ||Ω| < n. Then kF

ΩT

k < 1.

The original argument relies on the fact that F is a Vandermonde matrix. We present a short

proof that is completely analytic. A similar argument using an inequality of Schur yields the more

general result of Elad and Bruckstein [EB02, Thm. 1].

Proof. Th e entries of the |Ω|×|T | matrix F

ΩT

are uniform ly bounded by n

−1/2

. Since the Frobenius

norm dominates the spectral norm, kF

ΩT

≤ kF

ΩT

≤ |Ω||T |/n. Under the hypothesis of the

theorem, this quantity does not exceed one. 

Theorem 1 has an elegant corollary that follows im mediately from the basic inequality for geo-

metric and arithmetic means.

Corollary 2 (Donoho–Stark). Suppose that |T | + |Ω| < 2

√

n. Then k F

ΩT

k < 1.

The contrapositive of Theorem 1 is usually interpreted as an discrete uncertainty principle: a

vector and its discrete Fourier transform cannot simultaneously be sparse. To expr ess this claim

quantitatively, we deﬁne the ℓ

“quasinorm” of a vector by kαk

= |{j : α

6= 0}|.

Corollary 3 (Donoho–Stark). Fix a vector x ∈ C

. Consider the representations of x in the

standard basis and the Fourier basis:

x =

j=1

and x =

j=1

Then kαk

kβk

≥ n.

The example of the Dirac comb shows that Theorem 1 and its corollaries are sharp. Suppose

that n is a s q uare, and let T = Ω = {

√

n, 2

√

n, 3

√

n, . . . , n}. On account of the Poisson summation

formula,

j∈T

j∈Ω

Therefore, the set of vectors X (T, Ω) is linearly dependent and |T ||Ω| = n.

The substance behind this example is that the abelian group Z/Z

contains nontrivial subgroups

when n is composite. The presence of these subgroups leads to arithmetic cancelations for properly

chosen T and Ω. See [DS89] for additional discussion.

One way to eradicate the cancelation phenomenon is to require that n be prime. In this case,

the group Z/Z

has n o nontrivial subgroup. As a result, much larger collections of spikes and sines

are linearly independent. Compare the following result with Corollary 2.

Theorem 4 (Tao [Tao05, Thm. 1.1]). Suppose that n is prime. If |T |+ |Ω| ≤ n, then kF

ΩT

k < 1.

The proof of Theorem 4 is algebraic in nature, and it does not provide information about con-

ditioning. Indeed, one expects that some submatrices have norms very near to one.

When n is composite, sub grou ps of Z/Z

exist, but they have a very rigid structure. Conse-

quently, one can also avoid cancelations by choosing T and Ω with care. In particular, one may

consider the situation where T is clustered and Ω is spread out. Donoho and Logan [DL92] study

4 JOEL A. TROPP

this case using the analytic principle of the large sieve, a powerful technique from number theory

that can be traced back to the 1930s. See the lecture notes [Jam06] for an engaging introduction

and references.

Here, we simply restate the (sharp) large sieve inequality [J am06, L S1.1] in a manner that

exposes its connection with our problem. The spread of a set is measured as the diﬀerence (modulo

n) between the closest pair of indices. Formally, deﬁne

spread(Ω) = min{|j − k mod n| : j, k ∈ Ω, j 6= k}

with the convention that the modulus returns values in the symmetric range {−⌈n/2⌉+1, . . . , ⌊n/2⌋}.

Observe that |Ω| · spread(Ω) ≤ n.

Theorem 5 (Large Sieve Inequality). Suppose that T is a block of adjacent indices:

T = {m + 1, m + 2, . . . , m + |T |} for an integer m. (2.1)

For each set Ω, we have

ΩT

≤

|T | + n/spread(Ω) − 1

In particular, when T has form (2.1), the bound |T |+n /spread(Ω) < n+1 implies that kF

ΩT

k < 1.

Of course, we can reverse the roles of T and Ω in this th eorem on account of duality. T he same

observation applies to other results where the two sets do not participate in the same way.

The discussion above shows that there are cases where delicately constructed sets T and Ω lead

to linearly dependent collections of spikes and sines. Explicit conditions that rule out the bad

examples are unknown, but nevertheless the bad examples turn out to be quite rare. To quantify

this intuition, we must introduce pr ob ab ility.

2.2. Bounds when one set is random. In their work [DS89, Sec. 7.3], Donoho and Stark d iscuss

numerical experiments designed to study what happens when on e of the sets of spikes or sines is

drawn at random. They conjecture that the situation is vastly diﬀerent from th e case where the

spikes and sines are chosen in an arbitrary fashion. Within the last few years, researchers have made

substantial theoretical progress on this question. Indeed, we will see that the linearly depen dent

collections form a vanishing p roportion of all collections, provided that the total number of spikes

and sines is slightly smaller than the dimension n of the vector s pace.

First, we describe a probability mo del for random sets. Fix a number m ≤ n, and consider the

class S

of ind ex sets that have cardinality m:

= {S : S ⊂ {1, 2, . . . , n} and |S| = m}.

We may construct a random set Ω by drawin g an element from S

uniformly at random. That is,

P {Ω = S} = |S

−1

for each S ∈ S

In the sequel, we substitute the symb ol |Ω| for the letter m, and we say “Ω is a random set with

cardinality |Ω|” to describe th is type of random variable. This phr ase should cause no confusion,

and it allows us to avoid extra notation for the cardinality.

In the sparse approximation literature, the ﬁrst rigorous result on random sets is due to Cand`es

and Romberg. T hey study the case where one of the sets is arbitrary and the other set is chosen

at r an dom. Their proof draws heavily on their prior work with Tao [CRT06].

Theorem 6 (Cand`es–Romberg [CR06, Thm. 3.2]). Fix a number s ≥ 1. Suppose that

|T | + |Ω| ≤

(s + 1) log n

. (2.2)

If T is an arbitrary set with cardinality |T | and Ω is a random se t with cardinality |Ω|, then

ΩT

≥ 0.5

≤ C((s + 1) log n)

1/2

−s

SPIKES A ND SINES 5

The nu merical constant c ≥ 0.2791, provided that n ≥ 512.

One should interpret this theorem as follows. Fix a set T , and consider all sets Ω that satisfy (2.2).

Of these, th e proportion that are not strongly linearly independent is only about n

−s

. One should

be aware that th e logarithmic factor in (2.2) is intrinsic when one of the sets is arbitrary. Indeed,

one can constr uct examples related to the Dirac comb which show that the failure probability is

constant unless the logarithmic factor is present. We omit the details.

The proof of Theorem 6 ultimately involves a variation of the moment method for studying

random matrices, which was initiated by Wigner. The key point of the argument is a bound on the

expected trace of a high power of the random matrix

n/ |Ω| · F

∗

ΩT

− I

|T |

. The calculations

involve delicate combinatorial techniques that depend heavily on the structure of the matrix F.

This approach can also be used to establish that the smallest singular value of F

ΩT

is bounded

well away from zero [CRT06, Thm. 2.2]. This lower bound is essential in many app lications, but

we do not need it here. For extensions of these ideas, see also the work of Rauhut [Rau07].

Another result, similar to Theorem 6, suggests that the arbitrary set and the random set do not

contr ibute equally to the spectral norm. We present one version, whose derivation is adapted from

[Tro07, Thm. 10 et seq.].

Theorem 7. Fix a number s ≥ 1. Suppose that

|T |log n + |Ω| ≤

If T is an arbitrary set of cardinality |T | and Ω is a random set of cardinality |Ω|, then

ΩT

≥ 0.5

≤ n

−s

The proof of this theorem uses Rudelson’s selection lemma [Rud 99, Sec. 2] in an essential way.

This lemma in turn hinges on the noncommutative Khintchine inquality [LP86, Buc01]. For a

related application of this approach, see [CR07].

Theorems 6 and 7 are interesting, but they do not predict that a far more striking phenomenon

occurs. A random collection of sin es has the following property with high probability. To this

collection, one can add an arbitrary set of spikes without sacriﬁcing linear independence.

Theorem 8. Fix a number s ≥ 1, and assume n ≥ N(s). Except with probability n

−s

, a random

set Ω whose cardinality |Ω| ≤ n/3 has the f ollowing property. For each set T whose cardinality

|T | ≤

s log

it holds that kF

ΩT

≤ 0.5.

This result follows f rom the (deep) fact that a random row-submatrix of the DFT matrix satisﬁes

the restricted isometry property (RIP ) with high probability. More precisely, a random set Ω with

cardinality |Ω| veriﬁes the following condition, except with probability n

−s

|Ω|

≤ kF

ΩT

≤

3 |Ω|

when |T | ≤

c |Ω |

s log

. (2.3)

This result is adapted from [RV06, Thm. 2.2 et seq.].

The bound (2.3) was originally established by Cand`es and Tao [CT06] for s ets T whose cardinality

|T | ≤ c |Ω|/s log

n. Rudelson and Vershynin developed a simpler proof and reduced the exponent

on the logarithm [RV06]. Experts believe that the correct exponent is just one or two, but this

conjecture is presently out of reach.

Proof. Let c be the constant in (2.3). Abbreviate m = c |Ω|/s log

n, and assume that m ≥ 1 for

now. Draw a random set Ω with cardinality |Ω|, so relation (2.3) holds except with probability n

−s

Select an arbitrary set T whose cardinality |T | ≤ cn/6s log

n. We may assume that 2 |T |/m ≥ 1

On the Linear Independence of Spikes and Sines

Figures

Citations

Beyond Nyquist: Efficient Sampling of Sparse Bandlimited Signals

An Introduction to Matrix Concentration Inequalities

Recovery of Sparsely Corrupted Signals

Computing sparse representations of multidimensional signals using kronecker bases

Co mputing Sparse Representations of Multidimensional Sig- nals Using Kronecker Bases

References

Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information

Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?

Applied Numerical Linear Algebra

Sparsity and incoherence in compressive sampling

Theory of Approximation of Functions of a Real Variable

Related Papers (5)

Compressed sensing

Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information

Atomic Decomposition by Basis Pursuit

A wavelet tour of signal processing

Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?

Frequently Asked Questions (9)

Q1. What have the authors contributed in "On the linear independence of spikes and sines" ?

Q2. What is the way to estimate the coefficients of the polynomial?

Q3. What is the main result of this paper to show that a random collection of spikes and?

Q4. how many linearly dependent collections do you see?

Q5. What is the contrapositive of the Theorem 1?

Q6. How can one construct examples related to the Dirac comb?

Q7. What is the simplest way to calculate the m indices?

Q8. What is the average spectral norm of a random submatrix?

Q9. what is the arithmetic principle of the large sieve?