scispace - formally typeset
Open AccessProceedings ArticleDOI

Multiplying matrices faster than coppersmith-winograd

Reads0
Chats0
TLDR
An automated approach for designing matrix multiplication algorithms based on constructions similar to the Coppersmith-Winograd construction is developed and a new improved bound on the matrix multiplication exponent ω<2.3727 is obtained.
Abstract
We develop an automated approach for designing matrix multiplication algorithms based on constructions similar to the Coppersmith-Winograd construction. Using this approach we obtain a new improved bound on the matrix multiplication exponent ω

read more

Content maybe subject to copyright    Report

Multiplying matrices in O(n
2.373
) time
Virginia Vassilevska Williams, Stanford University
July 1, 2014
Abstract
We develop new tools for analyzing matrix multiplication constructions similar to the Coppersmith-
Winograd construction, and obtain a new improved bound on ω < 2.372873.
1 Introduction
The product of two matrices is one of the most basic operations in mathematics and computer science. Many
other essential matrix operations can be efficiently reduced to it, such as Gaussian elimination, LUP decom-
position, the determinant or the inverse of a matrix [1]. Matrix multiplication is also used as a subroutine in
many computational problems that, on the face of it, have nothing to do with matrices. As a small sample
illustrating the variety of applications, there are faster algorithms relying on matrix multiplication for graph
transitive closure (see e.g. [1]), context free grammar parsing [21], and even learning juntas [13].
Until the late 1960s it was believed that computing the product C of two n × n matrices requires
essentially a cubic number of operations, as the fastest algorithm known was the naive algorithm which
indeed runs in O(n
3
) time. In 1969, Strassen [19] excited the research community by giving the first
subcubic time algorithm for matrix multiplication, running in O(n
2.808
) time. This amazing discovery
spawned a long line of research which gradually reduced the matrix multiplication exponent ω over time. In
1978, Pan [14] showed ω < 2.796. The following year, Bini et al. [4] introduced the notion of border rank
and obtained ω < 2.78. Sch
¨
onhage [17] generalized this notion in 1981, proved his τ-theorem (also called
the asymptotic sum inequality), and showed that ω < 2.548. In the same paper, combining his work with
ideas by Pan, he also showed ω < 2.522. The following year, Romani [15] found that ω < 2.517. The first
result to break 2.5 was by Coppersmith and Winograd [9] who obtained ω < 2.496. In 1986, Strassen [20]
introduced his laser method which allowed for an entirely new attack on the matrix multiplication problem.
He also decreased the bound to ω < 2.479. Three years later, Coppersmith and Winograd [10] combined
Strassen’s technique with a novel form of analysis based on large sets avoiding arithmetic progressions and
obtained the famous bound of ω < 2.376 which has remained unchanged for more than twenty years.
In 2003, Cohn and Umans [8] introduced a new, group-theoretic framework for designing and analyzing
matrix multiplication algorithms. In 2005, together with Kleinberg and Szegedy [7], they obtained several
novel matrix multiplication algorithms using the new framework, however they were not able to beat 2.376.
Many researchers believe that the true value of ω is 2. In fact, both Coppersmith and Winograd [10]
and Cohn et al. [7] presented conjectures which if true would imply ω = 2. Recently, Alon, Shpilka and
Umans [2] showed that both the Coppersmith-Winograd conjecture and one of the Cohn et al. [7] conjectures
contradict a variant of the widely believed sunflower conjecture of Erd
¨
os and Rado [12]. Nevertheless, it
could be that at least the remaining Cohn et al. conjecture could lead to a proof that ω = 2.
1

The Coppersmith-Winograd Algorithm. In this paper we revisit the Coppersmith-Winograd (CW) ap-
proach [10]. We give a very brief summary of the approach here; we will give a more detailed account in
later sections.
One first constructs an algorithm A which given Q-length vectors x and y for constant Q, computes Q
values of the form z
k
=
P
i,j
t
ijk
x
i
y
j
, say with t
ijk
{0, 1}, using a smaller number of products than would
naively be necessary. The values z
k
do not necessarily have to correspond to entries from a matrix product.
Then, one considers the algorithm A
n
obtained by applying A to vectors x, y of length Q
n
, recursively n
times as follows. Split x and y into Q subvectors of length Q
n1
. Then run A on x and y treating them
as vectors of length Q with entries that are vectors of length Q
n1
. When the product of two entries is
needed, use A
n1
to compute it. This algorithm A
n
is called the nth tensor power of A. Its running time is
essentially O(r
n
) if r is the number of multiplications performed by A.
The goal of the approach is to show that for very large n one can set enough variables x
i
, y
j
, z
k
to 0 so
that running A
n
on the resulting vectors x and y actually computes a matrix product. That is, as n grows,
some subvectors x
0
of x and y
0
of y can be thought to represent square matrices and when A
n
is run on x
and y, a subvector of z is actually the matrix product of x
0
and y
0
.
If A
n
can be used to multiply m × m matrices in O(r
n
) time, then this implies that ω log
m
r
n
, so
that the larger m is, the better the bound on ω.
Coppersmith and Winograd [10] introduced techniques which, when combined with previous techniques
by Sch
¨
onhage [17], allowed them to effectively choose which variables to set to 0 so that one can compute
very large matrix products using A
n
. Part of their techniques rely on partitioning the index triples i, j, k
[Q]
n
into groups and analyzing how “similar” each group g computation {z
kg
=
P
i,j: (i,j,k)g
t
ijk
x
i
y
j
}
k
is
to a matrix product. The similarity measure used is called the value of the group.
Depending on the underlying algorithm A, the partitioning into groups varies and can affect the final
bound on ω. Coppersmith and Winograd analyzed a particular algorithm A which resulted in ω < 2.39.
Then they noticed that if one uses A
2
as the basic algorithm (the “base case”) instead, one can obtain the
better bound ω < 2.376. They left as an open problem what happens if one uses A
3
as the basic algorithm
instead.
Our contribution. We give a new way to more tightly analyze the techniques behind the Coppersmith-
Winograd (CW) approach [10]. We demonstrate the effectiveness of our new analysis by showing that the
8th tensor power of the CW algorithm [10] in fact gives ω < 2.3729. (The conference version of this paper
claimed ω < 2.3727, but due to an error, this turned out to be incorrect in the fourth decimal place.)
There are two main theorems behind our approach. The first theorem takes any tensor power A
n
of a
basic algorithm A, picks a particular group partitioning for A
n
and derives a procedure computing formulas
for (lower bounds on) the values of these groups.
The second theorem assumes that one knows the values for A
n
and derives an efficient procedure which
outputs a (nonlinear) constraint program on O(n
2
) variables, the solution of which gives a bound on ω.
We then apply the procedures given by the theorems to the second, fourth and eighth tensor powers of
the Coppersmith-Winograd algorithm, obtaining improved bounds with each new tensor power.
Similar to [10], our proofs apply to any starting algorithm that satisfies a simple uniformity requirement
which we formalize later. The upshot of our approach is that now any such algorithm and its higher tensor
powers can be analyzed entirely by computer. (In fact, our analysis of the 8th tensor power of the CW
algorithm is done this way.) The burden is now entirely offloaded to constructing base algorithms satisfying
the requirement. We hope that some of the new group-theoretic techniques can help in this regard.
2

Why wasn’t an improvement on CW found in the 1990s? After all, the CW paper explicitly posed the
analysis of the third tensor power as an open problem.
The answer to this question is twofold. Firstly, several people have attempted to analyze the third tensor
power (from personal communication with Umans, Kleinberg and Coppersmith). As the author found out
from personal experience, analyzing the third tensor power reveals to be very disappointing. In fact no
improvement whatsoever can be found. This finding led some to believe that 2.376 may be the final answer,
at least for the CW algorithm.
The second issue is that with each new tensor power, the number of new values that need to be analyzed
grows quadratically. For the eighth tensor power for instance, 30 separate analyses are required! Prior to
our work, each of these analyses would require a separate application of the CW techniques. It would have
required an enormous amount of patience to analyze larger tensor powers, and since the third tensor power
does not give any improvement, the prospects looked bleak.
Stothers’ work. We were recently made aware of the thesis work of A. Stothers [18] in which he claims an
improvement to ω. (More recently, a journal paper by Davie and Stothers provides a more detailed account of
Stothers’ work [11]). Stothers argues that ω < 2.3737 by analyzing the 4th tensor power of the Coppersmith-
Winograd construction. Our approach can be seen as a vast generalization of the Coppersmith-Winograd
analysis. In the special case of even tensor powers, part of our proof has benefited from an observation of
Stothers’ which we will point out in the main text.
There are several differences between our approach and Stothers’. The first is relatively minor: the CW
approach requires the use of some hash functions; ours are different and simpler than Stothers’. The main
difference is that because of the generality of our analysis, we do not need to fully analyze all groups of
each tensor power construction. Instead we can just apply our formulas in a mechanical way. Stothers, on
the other hand, did a completely separate analysis of each group.
Finally, Stothers’ approach only works for tensor powers up to 4. Starting with the 5-th tensor power,
the values of some of the groups begin to depend on more variables and a more careful analysis is needed.
(Incidentally, we also obtain a better bound from the 4th tensor power, ω < 2.37293, however we believe
this is an artifact of our optimization software, as we end up solving an equivalent constraint program.)
Acknowledgments. The author would like to thank Satish Rao for encouraging her to explore the matrix
multiplication problem more thoroughly and Ryan Williams for his support. The author is extremely grateful
to Franc¸ois Le Gall who alerted her to Stothers’ work, suggested the use of NLOPT, and pointed out that
the feasible solution obtained by Stothers for his 4th tensor power constraint program can be improved to
ω < 2.37294 with a different setting of the parameters. Franc¸ois also uncovered a flaw in a prior version of
the paper, which we have fixed in the current version. He was also recently able to improve our bound on ω
slightly to 2.37287.
Preliminaries We use the following notation: [n] := {1, . . . , n}, and
N
[a
i
]
i[k]
:=
N
a
1
,...,a
k
.
We define ω 2 to be the infimum over the set of all reals r such that n × n matrix multiplication
over Q can be computed in n
r
additions and multiplications for some natural number n. (However, the CW
approach and our extensions work over any ring.)
A three-term arithmetic progression is a sequence of three integers a b c so that b a = c b, or
equivalently, a + c = 2b. An arithmetic progression is nontrivial if a < b < c.
The following is a theorem by Behrend [3] improving on Salem and Spencer [16]. The subset A com-
puted by the theorem is called a Salem-Spencer set.
3

Theorem 1. There exists an absolute constant c such that for every N exp(c
2
), one can construct in
poly(N) time a subset A [N] with no three-term arithmetic progressions and |A| > N exp(c
log N).
The following lemma is needed in our analysis.
Lemma 1. Let k be a constant. Let B
i
be fixed for i [k]. Let a
i
for i [k] be variables such that a
i
0
and
P
i
a
i
= 1. Then, as N goes to infinity, the quantity
N
[a
i
N]
i[k]
k
Y
i=1
B
a
i
N
i
is maximized for the choices a
i
= B
i
/
P
k
j=1
B
j
for all i [k] and for these choices it is at least
k
X
j=1
B
j
N
/ (N + 1)
k
.
Proof. We will prove the lemma by induction on k. Suppose that k = 2 and consider
N
aN
x
aN
y
N(1a)
= y
N
N
aN
(x/y)
aN
,
where x y.
When (x/y) 1, the function f(a) =
N
aN
(x/y)
aN
of a is concave for a 1/2. Hence its maximum
is achieved when f (a)/∂a = 0. Consider f (a): it is N!/((aN)!(N(1 a))!)(x/y)
aN
. We can take the
logarithm to obtain ln f(a) = ln(N!) + Na ln(x/y) ln(aN !) ln((N(1 a))!). f (a) grows exactly
when a ln(x/y) ln(aN!)/N ln(N (1 a))!/N does. Taking Stirling’s approximation, we obtain
a ln(x/y)ln(aN!)/N ln(N(1a))!/N = a ln(x/y)a ln(a)(1a) ln(1a)ln N O((log N)/N).
Since N is large, the O((log N)/N ) term is negligible. Thus we are interested in when g(a) =
a ln(x/y) a ln(a) (1 a) ln(1 a) is maximized. Because of concavity, for a 1/2 and x y,
the function is maximized when g(a)/∂a = 0, i.e. when
0 = ln(x/y) ln(a) 1 + ln(1 a) + 1 = ln(x/y) ln(a/(1 a)).
Hence a/(1 a) = x/y and so a = x/(x + y).
Furthermore, since the maximum is attained for this value of a, we get that for each t {0, . . . , N}
we have that
N
t
x
t
y
Nt
N
aN
x
aN
y
N(1a)
, and since
P
N
t=0
N
t
x
t
y
Nt
= (x + y)
N
, we obtain that for
a = x/(x + y),
N
aN
x
aN
y
N(1a)
(x + y)
N
/(N + 1).
Now let’s consider the case k > 2. First assume that the B
i
are sorted so that B
i
B
i+1
. Since
P
i
a
i
= 1, we obtain
N
[a
i
]
i[k]
k
Y
i=1
B
a
i
N
i
=
X
i
B
i
!
N
N
[a
i
]
i[k]
k
Y
i=1
b
a
i
N
i
,
4

where b
i
= B
i
/
P
j
B
j
. We will prove the claim for
N
[a
i
]
i[k]
Q
k
i=1
b
a
i
N
i
, and the lemma will follow for the
B
i
as well. Hence we can assume that
P
i
b
i
= 1.
Suppose that we have proven the claim for k 1. This means that in particular
N a
1
N
[a
j
N]
j2
k
Y
j=2
b
a
j
N
j
k
X
j=2
b
j
Na
1
N
/(N + 1)
k1
,
and the quantity is maximized for a
j
N/(N a
1
N) = b
j
/
P
j2
b
j
for all j 2.
Now consider
N
a
1
N
b
a
1
N
1
P
k
j=2
b
j
Na
1
N
. By our base case we get that this is maximized and is at
least (
P
k
j=1
b
j
)
N
/N for the setting a
1
= b
1
. Hence, we will get
N
[a
j
N]
j[k]
k
Y
j=1
b
a
j
N
j
k
X
j=1
b
j
N
/(N + 1)
k
,
for the setting a
1
= b
1
and for j 2, a
j
N/(N a
1
N) = b
j
/
P
j2
b
j
implies a
j
/(1 b
1
) = b
j
/(1 b
1
)
and hence a
j
= b
j
. We have proven the lemma.
1.1 A brief summary of the techniques used in bilinear matrix multiplication algorithms
A full exposition of the techniques can be found in the book by B
¨
urgisser, Clausen and Shokrollahi [6]. The
lecture notes by Bl
¨
aser [5] are also a nice read.
Bilinear algorithms and trilinear forms. Matrix multiplication is an example of a trilinear form. n × n
matrix multiplication, for instance, can be written as
X
i,j[n]
X
kn
x
ik
y
kj
z
ij
,
which corresponds to the equalities z
ij
=
P
kn
x
ik
y
kj
for all i, j [n]. In general, a trilinear form has the
form
P
i,j,k
t
ijk
x
i
y
j
z
k
where i, j, k are indices in some range and t
ijk
are the coefficients which define the
trilinear form; t
ijk
is also called a tensor. The trilinear form for the product of a k ×m by an m × n matrix
is denoted by hk, m, ni.
Strassen’s algorithm for matrix multiplication is an example of a bilinear algorithm which computes a
trilinear form. A bilinear algorithm is equivalent to a representation of a trilinear form of the following form:
X
i,j,k
t
ijk
x
i
y
j
z
k
=
r
X
λ=1
(
X
i
α
λ,i
x
i
)(
X
j
β
λ,j
y
j
)(
X
k
γ
λ,k
z
k
).
Given the above representation, the algorithm is then to first compute the r products P
λ
= (
P
i
α
λ,i
x
i
)(
P
j
β
λ,j
y
j
)
and then for each k to compute z
k
=
P
λ
γ
λ,k
P
λ
.
For instance, Strassen’s algorithm for 2 × 2 matrix multiplication can be represented as follows:
(x
11
y
11
+ x
12
y
21
)z
11
+ (x
11
y
12
+ x
12
y
22
)z
12
+ (x
21
y
11
+ x
22
y
21
)z
21
+ (x
21
y
12
+ x
22
y
22
)z
22
=
(x
11
+ x
22
)(y
11
+ y
22
)(z
11
+ z
22
) + (x
21
+ x
22
)y
11
(z
21
z
22
) + x
11
(y
12
y
22
)(z
12
+ z
22
)+
5

Citations
More filters
Book

Parameterized Algorithms

TL;DR: This comprehensive textbook presents a clean and coherent account of most fundamental tools and techniques in Parameterized Algorithms and is a self-contained guide to the area, providing a toolbox of algorithmic techniques.
Book ChapterDOI

Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based

TL;DR: In this work, a comparatively simple fully homomorphic encryption (FHE) scheme based on the learning with errors (LWE) problem is described, with a new technique for building FHE schemes called the approximate eigenvector method.
Proceedings ArticleDOI

Powers of Tensors and Fast Matrix Multiplication

TL;DR: In this paper, the authors present a method to analyze the powers of a given trilinear form (a special kind of algebraic constructions also called a tensor) and obtain upper bounds on the asymptotic complexity of matrix multiplication.
Book

Analysis of Boolean Functions

TL;DR: This text gives a thorough overview of Boolean functions, beginning with the most basic definitions and proceeding to advanced topics such as hypercontractivity and isoperimetry, and includes a "highlight application" such as Arrow's theorem from economics.
Proceedings ArticleDOI

Powers of tensors and fast matrix multiplication

TL;DR: This paper presents a method to analyze the powers of a given trilinear form and obtain upper bounds on the asymptotic complexity of matrix multiplication and obtains the upper bound ω < 2.3728639 on the exponent of square matrix multiplication, which slightly improves the best known upper bound.
References
More filters
Book

The Design and Analysis of Computer Algorithms

TL;DR: This text introduces the basic data structures and programming techniques often used in efficient algorithms, and covers use of lists, push-down stacks, queues, trees, and graphs.
Journal ArticleDOI

Gaussian elimination is not optimal

TL;DR: In this paper, Cook et al. gave an algorithm which computes the coefficients of the product of two square matrices A and B of order n with less than 4. 7 n l°g 7 arithmetical operations (all logarithms in this paper are for base 2).
Journal ArticleDOI

Matrix multiplication via arithmetic progressions

TL;DR: In this article, a new method for accelerating matrix multiplication asymptotically is presented, based on the ideas of Volker Strassen, by using a basic trilinear form which is not a matrix product.
Proceedings ArticleDOI

Matrix multiplication via arithmetic progressions

TL;DR: A new method for accelerating matrix multiplication asymptotically is presented, by using a basic trilinear form which is not a matrix product, and making novel use of the Salem-Spencer Theorem.
Book

Algebraic Complexity Theory

TL;DR: This is the first book to present an up-to-date and self-contained account of Algebraic Complexity Theory that is both comprehensive and unified.
Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "Multiplying matrices in o(n) time" ?

The authors develop new tools for analyzing matrix multiplication constructions similar to the CoppersmithWinograd construction, and obtain a new improved bound on ω < 2. 372873.