scispace - formally typeset
Open AccessJournal ArticleDOI

User-friendly tail bounds for sums of random matrices

Reads0
Chats0
TLDR
In this paper, the authors present new probability inequalities for sums of independent, random, self-adjoint matrices, which place simple and easily verifiable hypotheses on the summands, and deliver strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum.
Abstract
This paper presents new probability inequalities for sums of independent, random, self-adjoint matrices. These results place simple and easily verifiable hypotheses on the summands, and they deliver strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. Tail bounds for the norm of a sum of random rectangular matrices follow as an immediate corollary. The proof techniques also yield some information about matrix-valued martingales. In other words, this paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid. The matrix inequalities promise the same diversity of application, ease of use, and strength of conclusion that have made the scalar inequalities so valuable.

read more

Content maybe subject to copyright    Report

A P P L I E D & C O M P U T A T I O N A L M A T H E M A T I C S
C A L I F O R N I A I N S T I T U T E O F T E C H N O L O G Y
m a i l c o d e 2 1 7 - 50 ! p a s a d e n a , c a 9 1 1 2 5 !
Technical Report No. 2010-01
April 2010
USER-FRIENDLY TAIL BOUNDS
FOR SUMS OF RANDOM MATRICES
JOEL A. TROPP
!

USER-FRIENDLY TAIL BOUNDS
FOR SUMS OF RANDOM MATRICES
J. A. TROPP
Abstract. This work presents probability inequalities for sums of independent, random, self-
adjoint matrices. The results frame simple, easily verifiable hypotheses on the summands, and they
yield strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum.
Tail bounds for the norm of a sum of rectangular matrices follow as an immediate corollary, and
similar techniques yield information about matrix-valued martingales.
In other words, this paper provides noncommutative generalizations of the classical bounds
asso ci ated with the names Azuma, Bennett, Bernstein, Cherno, Hoeding, and McDiarmid. The
matrix inequalities promise the same ease of use, diversity of application, and strength of conclusion
that have made the scalar inequalities so valuable.
1. Introduction
Random matrices have come to play a significant role in computational mathematics. This line
of research has advanced by using established methods from random matrix theory, but it has also
generated dicult questions that cannot be addressed without new tools. Let us summarize some
of the challenges that arise.
For numerical applications, it is important to obtain detailed quantitative information about
random matrices of finite order. Asymptotic theory has limited value.
Many problems require explicit large deviation bounds for the extreme eigenvalues of a
random matrix. In other cases, we are concerned not with the eigenvalue spectrum but
rather with the action of a random operator on some class of vectors or matrices.
In numerical analysis, it is essential to compute eective constants to ensure that an algo-
rithm is provably correct in practice.
We often encounter highly structured matrices that involve a limited amount of randomness.
One important example is the randomized DFT, which consists of a diagonal matrix of signs
multiplied by a discrete Fourier transform matrix.
Other problems involve a sparse matrix sampled from a fixed matrix or a random submatrix
drawn from a fixed matrix. These applications lead to random matrices whose distribution
varies by coordinate, in contrast to the classical ensembles of random matrices that have
i.i.d. entries or i.i.d. columns.
We have encountered these issues in a wide range of problems from computational mathemat-
ics: smoothed analysis of Gaussian elimination [SST06]; semidefinite relaxation and rounding of
quadratic maximization problems [Nem07, So09]; construction of maps for dimensionality reduc-
tion [AC09]; matrix approximation by sparsification [AM07] and by sampling submatrices [RV07];
Date: 25 April 2010. Corrected: 29 April 2010.
Key words and phrases. Discrete-time martingale, large deviation, random matrix, sum of independent random
variabl es.
2010 Mathematics Subject Classification. Primary: 60B20. Secondary: 60F10, 60G50, 60G42.
JAT is with Applied and Computational Mathematics, MC 305-16, California Inst. Technology, Pasadena, CA
91125. E-mail: jtropp@acm.caltech.edu. Research supported by ONR award N00014-08-1-0883, DARPA award
N66001-08-1-2065, and AFOSR award FA9550-09-1-0643.
1

2 J. A. TROPP
analysis of sparse approximation [Tro08] and compressive sampling [CR07] problems; random-
ized schemes for low-rank matrix factorization [HMT09]; and analysis of algorithms for comple-
tion [Gro09, Rec09] and decomposition [CSPW09, CLMW09] of low-rank matrices. And this list
is by no means comprehensive!
In these applications, the methods currently invoked to study random matrices are often cum-
bersome, and they require a substantial amount of practice to use eectively. These frustrations
have led us to search for simpler techniques that still yield detailed quantitative information about
finite random matrices.
Inspired by the work of Ahslwede–Winter [AW02] and Rudelson–Vershynin [Rud99, RV07], we
study sums of independent, random, self-adjoint matrices. Our results place simple and easily
verifiable hypotheses on the summands that allow us to reach strong conclusions about the large-
deviation behavior of the maximum eigenvalue of the sum. These bounds can be viewed as matrix
analogs of the probability inequalities associated with the names Azuma, Bennett, Bernstein, Cher-
no, Hoeding, and McDiarmid. We hope that these new matrix inequalities will oer researchers
the same ease of use, diversity of application, and strength of conclusion that have made the scalar
inequalities so indispensable.
1.1. Roadmap. The rest of the paper is organized as follows. Section 2 provides an overview of
our main results and a discussion of related work. Section 3 introduces the background required
for our proofs, which ranges from the elementary to the esoteric. Section 4 contains the main
technical innovations. Sections 5–8 complete the proofs of the matrix probability inequalities.
Section 9 describes some complementary results, including the extension to rectangular matrices.
We conclude in Section 10 with some open questions.
2. Main Results and Discussion
Our goal has been to extend the most useful of the classical tail bounds to the matrix case, rather
than to produce a complete catalog of matrix inequalities. This approach allows us to introduce
several dierent techniques that are useful for making the translation from the scalar to the matrix
setting. This section summarizes the main results for easy reference. Section 2.6 describes some
additional theorems that may be found deeper inside the paper.
2.1. Technical Approach. Consider a finite sequence {X
k
} of independent, random, self-adjoint
matrices. We wish to bound the probability
P
!
λ
max
"
#
k
X
k
$
t
%
.
Here and elsewhere, λ
max
denotes the algebraically largest eigenvalue of a self-adjoint matrix. This
formulation is more general than it may appear because we can exploit the same ideas to explore
several related problems:
We can study the smallest eigenvalue of the sum.
We can bound the largest singular value of a sum of random rectangular matrices.
We can extend these methods to matrix-valued martingales.
We can investigate the probability that the sum satisfies other semidefinite relations.
In the matrix setting, the structure of the main argument parallels established proofs of the
classical inequalities. See [McD98, Lug09] for accessible surveys in the scalar setting. First, we
describe a suitable generalization of Bernstein’s argument, which is sometimes known as the Laplace
transform method. In the matrix setting, this approach yields the bound
P
!
λ
max
"
#
k
X
k
$
t
%
inf
θ>0
!
e
θt
tr exp
"
#
k
log E e
θX
k
$%
.
In words, the probability of a large deviation is controlled by the “cumulant generating functions”
of the random matrices. Although this inequality superficially resembles the classical Laplace

TAIL BOUNDS FOR SUMS OF RANDOM MATRICES 3
transform bound for real random variables, the proof is no longer elementary. Our argument relies
on a deep inequality of Lieb [Lie73, Thm. 6]. This part of the reasoning appears in Section 4.
As in the scalar case, the second stage of the development uses information about each random
matrix to obtain bounds for the “cumulant generating functions.” Certain classical methods extend
directly to the matrix case, but they usually require additional care. Other proofs do not generalize
at all, and we have to identify alternative approaches. Sections 5–8 present these arguments.
Let us emphasize that many of the ideas in this work have appeared in the literature. The primary
precedent is the important paper of Ahlswede and Winter [AW02], which develops a matrix analog
of the Laplace transform method; see also [Gro09, Rec09]. We have been influenced strongly by
Rudelson and Vershynin’s approach [Rud99, RV07] to random matrices via the noncommutative
Khintchine inequality [LP86, Buc01]. Finally, the recent work of Oliveira [Oli10b] persuaded us
that it might be possible to combine the best qualities of these two approaches.
2.2. Rademacher and Gaussian Series. For motivation, we begin with the simplest example
of a sum of independent random variables: a series with real coecients modulated by random
signs. This discussion illustrates some new phenomena that arise when we try to translate scalar
tail bounds to the matrix setting.
Consider a finite sequence {a
k
} of real numbers and a finite sequence {ε
k
} of independent
Rademacher variables
1
. A classical result, due to Bernstein, shows that
P
!
#
k
ε
k
a
k
t
%
e
t
2
/2σ
2
where σ
2
=
#
k
a
2
k
. (2.1)
In words, a real Rademacher series exhibits normal concentration with variance equal to the sum
of the squared coecients. The central limit theorem guarantees that there are Rademacher series
where this estimate is essentially sharp.
What is the correct generalization of (2.1) to random matrices? The approach of Ahlswede and
Winter [AW02] suggests the bound
P
!
λ
max
"
#
k
ε
k
A
k
$
t
%
d ·e
t
2
/2σ
2
where σ
2
=
#
k
&
&
A
2
k
&
&
. (2.2)
The symbol #·# denotes the usual norm for operators on a Hilbert space, which returns the largest
singular value of its argument. Although the statement (2.2) identifies a plausible generalization for
the variance, this result can be improved dramatically in most cases. Indeed, a matrix Rademacher
series satisfies a fundamentally stronger tail bound.
Theorem 2.1 (Matrix Rademacher and Gaussian Series). Consider a finite sequence {A
k
} of
fixed self-adjoint matrices with dimension d, and let {ε
k
} be a sequence of independent Rademacher
variables. Compute the norm of the sum of squared coecient matrices:
σ
2
=
&
&
&
#
k
A
2
k
&
&
&
. (2.3)
For all t 0,
P
!
λ
max
"
#
k
ε
k
A
k
$
t
%
d ·e
t
2
/2σ
2
. (2.4)
In particular,
P
!
&
&
&
#
k
ε
k
A
k
&
&
&
t
%
2d ·e
t
2
/2σ
2
. (2.5)
The same bounds hold when we replace {ε
k
} by a sequence of independent, standard normal random
variables.
1
A Rademacher random variable is uniformly distributed on 1}.

4 J. A. TROPP
When the dimension d = 1, the bound (2.4) reduces to the classical result (2.1). Of course, one
may still wonder whether the formula (2.3) for the variance is sharp and whether the dimensional
dependence is necessary. Remarks 2.2, 2.3, and 2.4 demonstrate that Theorem 2.1 cannot be
improved without changing its form. A casual reader may bypass this discussion without loss of
continuity.
The technology required to prove Theorem 2.1 has been available for some time now. One
argument applies sharp noncommutative Khintchine inequalities, [Buc01, Thm. 5] and [Buc05,
Thm. 5], to bound the moment generating function of the maximum eigenvalue of the random sum.
Very recently, Oliveira has developed a dierent approach [Oli10b, Lem. 2] using a clever variation
of Ahlswede and Winter’s techniques. We present our proof in Section 7.
Remark 2.2. The matrix variance σ
2
given by (2.3) is truly the correct quantity for controlling large
deviations of a matrix Gaussian series. Indeed, it follows from general principles [LT91, Cor. 3.2]
that
lim
t→∞
1
t
2
log P
!
&
&
&
#
k
γ
k
A
k
&
&
&
t
%
=
1
2σ
2
.
where {γ
k
} is a sequence of independent, standard normal variables. By the (scalar) central limit
theorem, we can construct Rademacher series that exhibit essentially the same large-deviation
behavior by repeating each matrix A
k
multiple times. (Of course, a finite Rademacher series is
almost surely bounded!)
In contrast to a Gaussian series, a Rademacher series can have a constant operator norm. Nev-
ertheless, the matrix variance in (2.3) always provides a lower bound for the supremal norm of the
series:
σ sup
ε
&
&
&
#
k
ε
k
A
k
&
&
&
.
This fact follows easily from the statement of the noncommutative Khintchine inequality in [Rud99,
Sec. 3]. A simple e xam ple shows that the lower bound is sharp. Let E
ij
be the matrix with a unit
entry in the (i, j) position and zeros elsewhere, and consider the Rademacher series with coecients
A
k
= E
kk
for k =1, 2, . . . , d. This example also demonstrates that the bound (2.2) is fundamentally
worse than Theorem 2.1.
Remark 2.3. In general, we cannot remove the factor d from the probability bound in Theorem 2.1.
Consider the Gaussian series
&
&
&
&
#
d
k=1
γ
k
E
kk
&
&
&
&
= max
k
|γ
k
| c
'
log d with high probability.
Since the variance parameter σ
2
= 1, Theorem 2.1 yields
P
(
&
&
&
&
#
d
k=1
γ
k
E
kk
&
&
&
&
t
)
d ·e
t
2
/2
.
We need the factor d to ensure that the probability bound does not become eective until t
2 log d. The dimensional factor is also necessary in the tail bound for Rademacher series because
of the central limit theorem.
Remark 2.4. The dimensional dependence does not appear in standard bounds for Rademacher
series in Banach space because they concern the deviation of the norm of the sum above its mean
value. For example, Ledoux [Led96, Eqn. (1.9)] proves that
P
!
&
&
&
#
k
ε
k
A
k
&
&
&
E
&
&
&
#
k
ε
k
A
k
&
&
&
+ t
%
e
t
2
/8σ
2
where σ
2
is given by (2.3). Unfortunately, this formula provides no information about the size of
the expectation. In contrast, we can always bound the expectation by integrating (2.5), although
the estimate may not be sharp.

Citations
More filters
Book ChapterDOI

Introduction to the non-asymptotic analysis of random matrices.

TL;DR: This is a tutorial on some basic non-asymptotic methods and concepts in random matrix theory, particularly for the problem of estimating covariance matrices in statistics and for validating probabilistic constructions of measurementMatrices in compressed sensing.
Journal ArticleDOI

Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion

TL;DR: In this article, a new nuclear-norm penalized estimator of A0 was proposed and established a general sharp oracle inequality for this estimator for arbitrary values of n, m1, m2 under the condition of isometry in expectation.
Journal ArticleDOI

Estimation of high-dimensional low-rank matrices

TL;DR: In this paper, the authors consider the high-dimensional setting where the number of unknown entries can be much larger than the sample size, and derive nonasymptotic upper bounds on the prediction risk and on the Schatten-$q$ risk of the estimators.
BookDOI

Oracle inequalities in empirical risk minimization and sparse recovery problems

TL;DR: The main tools involved in the analysis of these problems are concentration and deviation inequalities by Talagrand along with other methods of empirical processes theory (symmetrization inequalities, contraction inequality for Rademacher sums, entropy and generic chaining bounds) as discussed by the authors.
Journal ArticleDOI

Quantum Reverse Shannon Theorem

TL;DR: The amounts of communication and auxiliary resources needed in both the classical and quantum cases, the tradeoffs among them, and the loss of simulation efficiency when auxiliary resources are absent or insufficient are established.
References
More filters
Book

Matrix Analysis

TL;DR: In this article, the authors present results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrate their importance in a variety of applications, such as linear algebra and matrix theory.
Book

Topics in Matrix Analysis

TL;DR: The field of values as discussed by the authors is a generalization of the field of value of matrices and functions, and it includes singular value inequalities, matrix equations and Kronecker products, and Hadamard products.
Book

Randomized Algorithms

TL;DR: This book introduces the basic concepts in the design and analysis of randomized algorithms and presents basic tools such as probability theory and probabilistic analysis that are frequently used in algorithmic applications.
Journal ArticleDOI

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.
Journal ArticleDOI

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

TL;DR: This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation, and presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions.
Frequently Asked Questions (1)
Q1. What have the authors contributed in "User-friendly tail bounds for sums of random matrices" ?

This work presents probability inequalities for sums of independent, random, selfadjoint matrices. Tail bounds for the norm of a sum of rectangular matrices follow as an immediate corollary, and similar techniques yield information about matrix-valued martingales. In other words, this paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid.