scispace - formally typeset
Open AccessJournal ArticleDOI

On the Linear Independence of Spikes and Sines

TLDR
In this paper, the authors survey the linear independence of spikes and sines and provide new results for the case where the locations of the spikes and the frequencies of the sines are chosen at random.
Abstract
The purpose of this work is to survey what is known about the linear independence of spikes and sines. The paper provides new results for the case where the locations of the spikes and the frequencies of the sines are chosen at random. This problem is equivalent to studying the spectral norm of a random submatrix drawn from the discrete Fourier transform matrix. The proof depends on an extrapolation argument of Bourgain and Tzafriri.

read more

Content maybe subject to copyright    Report

arXiv:0709.0517v2 [math.FA] 17 Apr 2008
ON THE LINEAR INDEPENDENCE OF SPIKES AND SINES
JOEL A. TROPP
Abstract. The purpose of this work is to survey what is kn own about the linear independ ence
of spikes an d sines. The paper provides new results for the case where the locations of the spikes
and the frequencies of the sines are chosen at random. This problem is equivalent to studying the
sp ectral norm of a random submatrix drawn from the discrete Fourier transform matrix. The proof
depends on an extrapolation argument of Bourgain and Tzafriri.
1. Introduction
An investigation central to sparse approximation is whether a given collection of impulses an d
complex exponentials is linearly independent. This inquiry appears in the early paper of Donoho
and Stark on uncertainty principles [DS89], and it has been repeated and amplified in the work
of subsequent authors. In deed, researchers in sparse approximation have developed a much deeper
understanding of general dictionaries by probing the structure of the unassuming dictionary that
contains only spikes and sines.
The purpose of this work is to survey what is known about the linear independence of spikes
and sines and to provide some new results on random subcollections chosen from this dictionary.
The m ethod is adapted from a paper of Bourgain–Tzafriri [BT91]. The advantage of this approach
is that it avoids some of the complicated combin atorial arguments that are u sed in related works,
e.g., [CRT06]. The proof also applies to other types of dictionaries, although we do not pursue this
line of inqu iry here.
1.1. Spikes and Sines. Let us shift to formal discussion. We work in the inner-product space
C
n
, and we use the symbol
for the conjugate transpose. Define th e Hermitian inn er product
hx, yi = y
x and the
2
vector norm kxk = |hx, xi|
1/2
. We also write k·k for the spectral norm,
i.e., the operator norm for linear maps from (C
n
,
2
) to itself.
We consider two orthonormal bases for C
n
. The standard basis {e
j
: j = 1, 2, . . . , n} is given by
e
j
(t) =
(
1, t = j
0, t 6= j
for t = 1, 2, . . . , n.
We often refer to the elements of the standard basis as spikes or impulses. The Fourier basis
{f
j
: j = 1, 2, . . . , n} is given by
f
j
(t) =
1
n
e
2πijt/n
for t = 1, 2, . . . , n.
We often refer to the elements of the Fourier basis as sines or complex exponentials.
The discrete Fourier transform (DFT) is the n × n matrix F whose rows are f
1
, f
2
, . . . , f
n
. The
matrix F is unitary. In particular, its spectral norm kFk = 1. Moreover, the entries of the DFT
matrix are bounded in magnitude by n
1/2
. Let T and be subsets of {1, 2, . . . , n}. We write
Date: 4 September 2007. Revised 15 April 2008.
2000 Mathematics Subject Classification. Primary: 46B07, 47A11, 15A52. Secondary: 41A46.
Key words and phrases. Fourier analysis, local theory, random matrix, sparse approximation, uncertainty principle.
The author is with App lied & Computational Mathematics, MC 217-50, California Institute of Technology, 1200
E. California Blvd., Pasadena, CA 91125-5000. E-mail: jtropp@acm.caltech.edu. Supported by NSF 0503299.
1

2 JOEL A. TROPP
F
T
for the restriction of F to the rows listed in and the columns listed in T . Since F
T
is a
submatrix of the DFT matrix, its spectral norm does not exceed one.
We use th e analysts’ convention that upright letters represent universal constants. We reserve
c for small constants and C for large constants. The value of a constant may change at each
appearance.
1.2. Linear Independence. Let T and be subsets of {1, 2, . . . , n}. Consider the collection of
spikes and sines listed in these sets:
X = X (T, Ω) = {e
j
: j T } {f
j
: j }.
Today, we will discu ss methods for determining when X is linearly independent. Sin ce a lin early
independent collection in C
n
contains at most n vectors, we obtain a simple necessary condition
|T | + || n. Developing sufficient conditions, however, requires more sophistication.
We ap proach the problem by studying the Gram matrix G = G(X ), whose entries are the inner
products between pairs of elements from X . It is easy to check that the Gram matrix can be
expressed as
G =
I
||
F
T
(F
T
)
I
|T |
where I
m
denotes an m × m identity matrix and |·| denotes the cardinality of a set.
It is well known that the collection X is linearly independent if and on ly if its Gram matrix is
nonsingular. The Gram matrix is nonsingular if and only if its eigenvalues are nonzero. A basic
(and easily confirmed ) fact of matrix analysis is that the extreme eigenvalues of G are 1 ±kF
T
k.
Therefore, the collection X is linearly independent if and only if kF
T
k < 1.
One may also attempt to quantify the extent to w hich collection X is linearly independent.
To that end, define the condition number κ of the Gram matrix, which is the ratio of its largest
eigenvalue to its smallest eigenvalue:
κ(G) =
1 + kF
T
k
1 kF
T
k
.
If kF
T
k is bounded away from one, then the condition number is constant. One may interpret
this statement as evidence the collection X is strongly linearly independent. The reason is that
the condition number is the reciprocal of the relative spectral-norm distance between G an d the
nearest singular matrix [Dem97, p. 33]. As we have mentioned, G is singular if and only if X is
linearly dependent.
This article focuses on s tatements about linear independence, rather than conditioning. Never-
theless, many results can be adapted to obtain precise inf ormation about the size of kF
T
k.
1.3. Summary of Results. The major result of this paper to show that a random collection of
spikes and s ines is extremely likely to be strongly linearly independent, provided that the total
number of spikes and sines does not exceed a constant proportion of the ambient dimension. We
also provide a result which shows that the norm of a properly scaled random submatrix of the DFT
is at most constant with high probability. For a more detailed statement of these theorems, turn
to Section 2.3.
1.4. Outline. The next section provides a survey of bounds on the norm of a submatrix of the
DFT matrix. It concludes with d etailed new results for the case where the submatrix is random.
Section 3 contains a proof of the new results. Nu merical experiments are presented in Section 4,
and Section 5 describes some additional research directions. Appendix A contains a proof of the
key background result.

SPIKES A ND SINES 3
2. History and Results
The strange, eventful history of our problem can be viewed as a sequence of bounds on norm of
the matrix F
T
. Results in th e literature can be divided into two classes: the case where the sets
and T are fixed and the case where one of the sets is ran dom. In this work , we investigate what
happens when both sets are chosen randomly.
2.1. Bounds for fixed sets. An early result, due to Donoho and Stark [DS89], asserts that an
arbitrary collection of spikes and sines is linearly independent, provided that the collection is not
too big.
Theorem 1 (Donoho–Stark). Suppose that |T ||| < n. Then kF
T
k < 1.
The original argument relies on the fact that F is a Vandermonde matrix. We present a short
proof that is completely analytic. A similar argument using an inequality of Schur yields the more
general result of Elad and Bruckstein [EB02, Thm. 1].
Proof. Th e entries of the ||×|T | matrix F
T
are uniform ly bounded by n
1/2
. Since the Frobenius
norm dominates the spectral norm, kF
T
k
2
kF
T
k
2
F
|||T |/n. Under the hypothesis of the
theorem, this quantity does not exceed one.
Theorem 1 has an elegant corollary that follows im mediately from the basic inequality for geo-
metric and arithmetic means.
Corollary 2 (Donoho–Stark). Suppose that |T | + || < 2
n. Then k F
T
k < 1.
The contrapositive of Theorem 1 is usually interpreted as an discrete uncertainty principle: a
vector and its discrete Fourier transform cannot simultaneously be sparse. To expr ess this claim
quantitatively, we define the
0
“quasinorm” of a vector by kαk
0
= |{j : α
j
6= 0}|.
Corollary 3 (Donoho–Stark). Fix a vector x C
n
. Consider the representations of x in the
standard basis and the Fourier basis:
x =
X
n
j=1
α
j
e
j
and x =
X
n
j=1
β
j
f
j
.
Then kαk
0
kβk
0
n.
The example of the Dirac comb shows that Theorem 1 and its corollaries are sharp. Suppose
that n is a s q uare, and let T = = {
n, 2
n, 3
n, . . . , n}. On account of the Poisson summation
formula,
X
jT
e
j
=
X
j
f
j
.
Therefore, the set of vectors X (T, Ω) is linearly dependent and |T ||| = n.
The substance behind this example is that the abelian group Z/Z
n
contains nontrivial subgroups
when n is composite. The presence of these subgroups leads to arithmetic cancelations for properly
chosen T and Ω. See [DS89] for additional discussion.
One way to eradicate the cancelation phenomenon is to require that n be prime. In this case,
the group Z/Z
n
has n o nontrivial subgroup. As a result, much larger collections of spikes and sines
are linearly independent. Compare the following result with Corollary 2.
Theorem 4 (Tao [Tao05, Thm. 1.1]). Suppose that n is prime. If |T |+ || n, then kF
T
k < 1.
The proof of Theorem 4 is algebraic in nature, and it does not provide information about con-
ditioning. Indeed, one expects that some submatrices have norms very near to one.
When n is composite, sub grou ps of Z/Z
n
exist, but they have a very rigid structure. Conse-
quently, one can also avoid cancelations by choosing T and with care. In particular, one may
consider the situation where T is clustered and is spread out. Donoho and Logan [DL92] study

4 JOEL A. TROPP
this case using the analytic principle of the large sieve, a powerful technique from number theory
that can be traced back to the 1930s. See the lecture notes [Jam06] for an engaging introduction
and references.
Here, we simply restate the (sharp) large sieve inequality [J am06, L S1.1] in a manner that
exposes its connection with our problem. The spread of a set is measured as the difference (modulo
n) between the closest pair of indices. Formally, define
spread(Ω) = min{|j k mod n| : j, k , j 6= k}
with the convention that the modulus returns values in the symmetric range {−⌈n/2+1, . . . , n/2⌋}.
Observe that || · spread(Ω) n.
Theorem 5 (Large Sieve Inequality). Suppose that T is a block of adjacent indices:
T = {m + 1, m + 2, . . . , m + |T |} for an integer m. (2.1)
For each set , we have
kF
T
k
2
|T | + n/spread(Ω) 1
n
.
In particular, when T has form (2.1), the bound |T |+n /spread(Ω) < n+1 implies that kF
T
k < 1.
Of course, we can reverse the roles of T and in this th eorem on account of duality. T he same
observation applies to other results where the two sets do not participate in the same way.
The discussion above shows that there are cases where delicately constructed sets T and lead
to linearly dependent collections of spikes and sines. Explicit conditions that rule out the bad
examples are unknown, but nevertheless the bad examples turn out to be quite rare. To quantify
this intuition, we must introduce pr ob ab ility.
2.2. Bounds when one set is random. In their work [DS89, Sec. 7.3], Donoho and Stark d iscuss
numerical experiments designed to study what happens when on e of the sets of spikes or sines is
drawn at random. They conjecture that the situation is vastly different from th e case where the
spikes and sines are chosen in an arbitrary fashion. Within the last few years, researchers have made
substantial theoretical progress on this question. Indeed, we will see that the linearly depen dent
collections form a vanishing p roportion of all collections, provided that the total number of spikes
and sines is slightly smaller than the dimension n of the vector s pace.
First, we describe a probability mo del for random sets. Fix a number m n, and consider the
class S
m
of ind ex sets that have cardinality m:
S
m
= {S : S {1, 2, . . . , n} and |S| = m}.
We may construct a random set by drawin g an element from S
m
uniformly at random. That is,
P { = S} = |S
m
|
1
for each S S
m
.
In the sequel, we substitute the symb ol || for the letter m, and we say “Ω is a random set with
cardinality || to describe th is type of random variable. This phr ase should cause no confusion,
and it allows us to avoid extra notation for the cardinality.
In the sparse approximation literature, the first rigorous result on random sets is due to Cand`es
and Romberg. T hey study the case where one of the sets is arbitrary and the other set is chosen
at r an dom. Their proof draws heavily on their prior work with Tao [CRT06].
Theorem 6 (Cand`es–Romberg [CR06, Thm. 3.2]). Fix a number s 1. Suppose that
|T | + ||
cn
p
(s + 1) log n
. (2.2)
If T is an arbitrary set with cardinality |T | and is a random se t with cardinality ||, then
P
n
kF
T
k
2
0.5
o
C((s + 1) log n)
1/2
n
s
.

SPIKES A ND SINES 5
The nu merical constant c 0.2791, provided that n 512.
One should interpret this theorem as follows. Fix a set T , and consider all sets that satisfy (2.2).
Of these, th e proportion that are not strongly linearly independent is only about n
s
. One should
be aware that th e logarithmic factor in (2.2) is intrinsic when one of the sets is arbitrary. Indeed,
one can constr uct examples related to the Dirac comb which show that the failure probability is
constant unless the logarithmic factor is present. We omit the details.
The proof of Theorem 6 ultimately involves a variation of the moment method for studying
random matrices, which was initiated by Wigner. The key point of the argument is a bound on the
expected trace of a high power of the random matrix
p
n/ || · F
T
F
T
I
|T |
. The calculations
involve delicate combinatorial techniques that depend heavily on the structure of the matrix F.
This approach can also be used to establish that the smallest singular value of F
T
is bounded
well away from zero [CRT06, Thm. 2.2]. This lower bound is essential in many app lications, but
we do not need it here. For extensions of these ideas, see also the work of Rauhut [Rau07].
Another result, similar to Theorem 6, suggests that the arbitrary set and the random set do not
contr ibute equally to the spectral norm. We present one version, whose derivation is adapted from
[Tro07, Thm. 10 et seq.].
Theorem 7. Fix a number s 1. Suppose that
|T |log n + ||
cn
s
.
If T is an arbitrary set of cardinality |T | and is a random set of cardinality ||, then
P
n
kF
T
k
2
0.5
o
n
s
.
The proof of this theorem uses Rudelson’s selection lemma [Rud 99, Sec. 2] in an essential way.
This lemma in turn hinges on the noncommutative Khintchine inquality [LP86, Buc01]. For a
related application of this approach, see [CR07].
Theorems 6 and 7 are interesting, but they do not predict that a far more striking phenomenon
occurs. A random collection of sin es has the following property with high probability. To this
collection, one can add an arbitrary set of spikes without sacrificing linear independence.
Theorem 8. Fix a number s 1, and assume n N(s). Except with probability n
s
, a random
set whose cardinality || n/3 has the f ollowing property. For each set T whose cardinality
|T |
cn
s log
5
n
,
it holds that kF
T
k
2
0.5.
This result follows f rom the (deep) fact that a random row-submatrix of the DFT matrix satisfies
the restricted isometry property (RIP ) with high probability. More precisely, a random set with
cardinality || verifies the following condition, except with probability n
s
.
||
2n
kF
T
k
2
3 ||
2n
when |T |
c | |
s log
5
n
. (2.3)
This result is adapted from [RV06, Thm. 2.2 et seq.].
The bound (2.3) was originally established by Cand`es and Tao [CT06] for s ets T whose cardinality
|T | c ||/s log
6
n. Rudelson and Vershynin developed a simpler proof and reduced the exponent
on the logarithm [RV06]. Experts believe that the correct exponent is just one or two, but this
conjecture is presently out of reach.
Proof. Let c be the constant in (2.3). Abbreviate m = c ||/s log
5
n, and assume that m 1 for
now. Draw a random set with cardinality ||, so relation (2.3) holds except with probability n
s
.
Select an arbitrary set T whose cardinality |T | cn/6s log
5
n. We may assume that 2 |T |/m 1

Citations
More filters
Journal ArticleDOI

Beyond Nyquist: Efficient Sampling of Sparse Bandlimited Signals

TL;DR: A new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components that supports the empirical observations, and a detailed theoretical analysis of the system's performance is provided.
Book

An Introduction to Matrix Concentration Inequalities

TL;DR: The matrix concentration inequalities as discussed by the authors are a family of matrix inequalities that can be found in many areas of theoretical, applied, and computational mathematics. But they are not suitable for the analysis of random matrices.
Journal ArticleDOI

Recovery of Sparsely Corrupted Signals

TL;DR: In this paper, the authors investigate the recovery of signals exhibiting a sparse representation in a general (i.e., possibly redundant or incomplete) dictionary that are corrupted by additive noise admitting sparse representations in another general dictionary.
Journal ArticleDOI

Computing sparse representations of multidimensional signals using kronecker bases

TL;DR: In this paper, the problem of finding the sparsest solution of an underdetermined linear system of algebraic equations is reduced to solving a large-scale under-defined linear systems of equations possessing a Kronecker structure, for which a greedy algorithm is developed.

Co mputing Sparse Representations of Multidimensional Sig- nals Using Kronecker Bases

TL;DR: This letter generalizes the theory of sparse representations of vectors to multiway arrays (tensors)—signals with a multidimensional structure—by using the Tucker model to derive a very fast and memory-efficient algorithm called N-BOMP (N-way block OMP), and theoretically demonstrates that under the block-sparsity assumption, this algorithm not only has a considerably lower complexity but is also more precise than the classic OMP algorithm.
References
More filters
Journal ArticleDOI

Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information

TL;DR: In this paper, the authors considered the model problem of reconstructing an object from incomplete frequency samples and showed that with probability at least 1-O(N/sup -M/), f can be reconstructed exactly as the solution to the lscr/sub 1/ minimization problem.
Journal ArticleDOI

Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?

TL;DR: If the objects of interest are sparse in a fixed basis or compressible, then it is possible to reconstruct f to within very high accuracy from a small number of random measurements by solving a simple linear program.
Book

Applied Numerical Linear Algebra

TL;DR: The symmetric Eigenproblem and singular value decomposition and the Iterative methods for linear systems Bibliography Index.
Journal ArticleDOI

Sparsity and incoherence in compressive sampling

TL;DR: It is shown that ‘1 minimization recovers x 0 exactly when the number of measurements exceeds m Const ·µ 2 (U) ·S · logn, where S is the numberof nonzero components in x 0, and µ is the largest entry in U properly normalized: µ(U) = p n · maxk,j |Uk,j|.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What have the authors contributed in "On the linear independence of spikes and sines" ?

The purpose of this work is to survey what is known about the linear independence of spikes and sines. The paper provides new results for the case where the locations of the spikes and the frequencies of the sines are chosen at random. This problem is equivalent to studying the spectral norm of a random submatrix drawn from the discrete Fourier transform matrix. 

Instead of using the Chebyshev polynomial to estimate the coefficients of the polynomial that arises in the proof, one might use the nonnegative polynomial of leastdeviation from zero on the interval [0, 1]. 

The major result of this paper to show that a random collection of spikes and sines is extremely likely to be strongly linearly independent, provided that the total number of spikes and sines does not exceed a constant proportion of the ambient dimension. 

the authors will see that the linearly dependent collections form a vanishing proportion of all collections, provided that the total number of spikes and sines is slightly smaller than the dimension n of the vector space. 

Then ‖FΩT ‖ < 1. The contrapositive of Theorem 1 is usually interpreted as an discrete uncertainty principle: a vector and its discrete Fourier transform cannot simultaneously be sparse. 

one can construct examples related to the Dirac comb which show that the failure probability is constant unless the logarithmic factor is present. 

Partition T into at most 2 |T | /m disjoint blocks, each containing no more than m indices: T = T1 ∪ T2 ∪ · · · ∪ T2|T |/m. Apply (2.3) to calculate that‖FΩT‖2 ≤ 2 |T | m maxk ‖FΩTk‖2 ≤ |T | · 2s log5 n c |Ω| · 3 |Ω| 2n ≤ 1 2 .Adjusting constants, the authors obtain the result when |Ω| is not too smal. 

Given a value of δ ∈ (0, 0.5), the authors formed one hundred random submatrices with dimensions δn × δn and computed the average spectral norm of these matrices. 

Donoho and Logan [DL92] studythis case using the analytic principle of the large sieve, a powerful technique from number theory that can be traced back to the 1930s.