What contributions have the authors mentioned in the paper "Multiplying matrices in o(n) time" ?

Q: What contributions have the authors mentioned in the paper "Multiplying matrices in o(n) time" ?

The authors develop new tools for analyzing matrix multiplication constructions similar to the CoppersmithWinograd construction, and obtain a new improved bound on ω < 2. 372873.

(Open Access) Multiplying matrices faster than coppersmith-winograd (2012) | Virginia Vassilevska Williams

Multiplying matrices in O(n

2.373

) time

Virginia Vassilevska Williams, Stanford University

July 1, 2014

Abstract

We develop new tools for analyzing matrix multiplication constructions similar to the Coppersmith-

Winograd construction, and obtain a new improved bound on ω < 2.372873.

1 Introduction

The product of two matrices is one of the most basic operations in mathematics and computer science. Many

other essential matrix operations can be efﬁciently reduced to it, such as Gaussian elimination, LUP decom-

position, the determinant or the inverse of a matrix [1]. Matrix multiplication is also used as a subroutine in

many computational problems that, on the face of it, have nothing to do with matrices. As a small sample

illustrating the variety of applications, there are faster algorithms relying on matrix multiplication for graph

transitive closure (see e.g. [1]), context free grammar parsing [21], and even learning juntas [13].

Until the late 1960s it was believed that computing the product C of two n × n matrices requires

essentially a cubic number of operations, as the fastest algorithm known was the naive algorithm which

indeed runs in O(n

) time. In 1969, Strassen [19] excited the research community by giving the ﬁrst

subcubic time algorithm for matrix multiplication, running in O(n

2.808

) time. This amazing discovery

spawned a long line of research which gradually reduced the matrix multiplication exponent ω over time. In

1978, Pan [14] showed ω < 2.796. The following year, Bini et al. [4] introduced the notion of border rank

and obtained ω < 2.78. Sch

onhage [17] generalized this notion in 1981, proved his τ-theorem (also called

the asymptotic sum inequality), and showed that ω < 2.548. In the same paper, combining his work with

ideas by Pan, he also showed ω < 2.522. The following year, Romani [15] found that ω < 2.517. The ﬁrst

result to break 2.5 was by Coppersmith and Winograd [9] who obtained ω < 2.496. In 1986, Strassen [20]

introduced his laser method which allowed for an entirely new attack on the matrix multiplication problem.

He also decreased the bound to ω < 2.479. Three years later, Coppersmith and Winograd [10] combined

Strassen’s technique with a novel form of analysis based on large sets avoiding arithmetic progressions and

obtained the famous bound of ω < 2.376 which has remained unchanged for more than twenty years.

In 2003, Cohn and Umans [8] introduced a new, group-theoretic framework for designing and analyzing

matrix multiplication algorithms. In 2005, together with Kleinberg and Szegedy [7], they obtained several

novel matrix multiplication algorithms using the new framework, however they were not able to beat 2.376.

Many researchers believe that the true value of ω is 2. In fact, both Coppersmith and Winograd [10]

and Cohn et al. [7] presented conjectures which if true would imply ω = 2. Recently, Alon, Shpilka and

Umans [2] showed that both the Coppersmith-Winograd conjecture and one of the Cohn et al. [7] conjectures

contradict a variant of the widely believed sunﬂower conjecture of Erd

os and Rado [12]. Nevertheless, it

could be that at least the remaining Cohn et al. conjecture could lead to a proof that ω = 2.

The Coppersmith-Winograd Algorithm. In this paper we revisit the Coppersmith-Winograd (CW) ap-

proach [10]. We give a very brief summary of the approach here; we will give a more detailed account in

later sections.

One ﬁrst constructs an algorithm A which given Q-length vectors x and y for constant Q, computes Q

values of the form z

i,j

ijk

, say with t

ijk

∈ {0, 1}, using a smaller number of products than would

naively be necessary. The values z

do not necessarily have to correspond to entries from a matrix product.

Then, one considers the algorithm A

obtained by applying A to vectors x, y of length Q

, recursively n

times as follows. Split x and y into Q subvectors of length Q

n−1

. Then run A on x and y treating them

as vectors of length Q with entries that are vectors of length Q

n−1

. When the product of two entries is

needed, use A

n−1

to compute it. This algorithm A

is called the nth tensor power of A. Its running time is

essentially O(r

) if r is the number of multiplications performed by A.

The goal of the approach is to show that for very large n one can set enough variables x

, y

, z

to 0 so

that running A

on the resulting vectors x and y actually computes a matrix product. That is, as n grows,

some subvectors x

of x and y

of y can be thought to represent square matrices and when A

is run on x

and y, a subvector of z is actually the matrix product of x

and y

If A

can be used to multiply m × m matrices in O(r

) time, then this implies that ω ≤ log

, so

that the larger m is, the better the bound on ω.

Coppersmith and Winograd [10] introduced techniques which, when combined with previous techniques

by Sch

onhage [17], allowed them to effectively choose which variables to set to 0 so that one can compute

very large matrix products using A

. Part of their techniques rely on partitioning the index triples i, j, k ∈

[Q]

into groups and analyzing how “similar” each group g computation {z

i,j: (i,j,k)∈g

ijk

}

to a matrix product. The similarity measure used is called the value of the group.

Depending on the underlying algorithm A, the partitioning into groups varies and can affect the ﬁnal

bound on ω. Coppersmith and Winograd analyzed a particular algorithm A which resulted in ω < 2.39.

Then they noticed that if one uses A

as the basic algorithm (the “base case”) instead, one can obtain the

better bound ω < 2.376. They left as an open problem what happens if one uses A

as the basic algorithm

instead.

Our contribution. We give a new way to more tightly analyze the techniques behind the Coppersmith-

Winograd (CW) approach [10]. We demonstrate the effectiveness of our new analysis by showing that the

8th tensor power of the CW algorithm [10] in fact gives ω < 2.3729. (The conference version of this paper

claimed ω < 2.3727, but due to an error, this turned out to be incorrect in the fourth decimal place.)

There are two main theorems behind our approach. The ﬁrst theorem takes any tensor power A

of a

basic algorithm A, picks a particular group partitioning for A

and derives a procedure computing formulas

for (lower bounds on) the values of these groups.

The second theorem assumes that one knows the values for A

and derives an efﬁcient procedure which

outputs a (nonlinear) constraint program on O(n

) variables, the solution of which gives a bound on ω.

We then apply the procedures given by the theorems to the second, fourth and eighth tensor powers of

the Coppersmith-Winograd algorithm, obtaining improved bounds with each new tensor power.

Similar to [10], our proofs apply to any starting algorithm that satisﬁes a simple uniformity requirement

which we formalize later. The upshot of our approach is that now any such algorithm and its higher tensor

powers can be analyzed entirely by computer. (In fact, our analysis of the 8th tensor power of the CW

algorithm is done this way.) The burden is now entirely ofﬂoaded to constructing base algorithms satisfying

the requirement. We hope that some of the new group-theoretic techniques can help in this regard.

Why wasn’t an improvement on CW found in the 1990s? After all, the CW paper explicitly posed the

analysis of the third tensor power as an open problem.

The answer to this question is twofold. Firstly, several people have attempted to analyze the third tensor

power (from personal communication with Umans, Kleinberg and Coppersmith). As the author found out

from personal experience, analyzing the third tensor power reveals to be very disappointing. In fact no

improvement whatsoever can be found. This ﬁnding led some to believe that 2.376 may be the ﬁnal answer,

at least for the CW algorithm.

The second issue is that with each new tensor power, the number of new values that need to be analyzed

grows quadratically. For the eighth tensor power for instance, 30 separate analyses are required! Prior to

our work, each of these analyses would require a separate application of the CW techniques. It would have

required an enormous amount of patience to analyze larger tensor powers, and since the third tensor power

does not give any improvement, the prospects looked bleak.

Stothers’ work. We were recently made aware of the thesis work of A. Stothers [18] in which he claims an

improvement to ω. (More recently, a journal paper by Davie and Stothers provides a more detailed account of

Stothers’ work [11]). Stothers argues that ω < 2.3737 by analyzing the 4th tensor power of the Coppersmith-

Winograd construction. Our approach can be seen as a vast generalization of the Coppersmith-Winograd

analysis. In the special case of even tensor powers, part of our proof has beneﬁted from an observation of

Stothers’ which we will point out in the main text.

There are several differences between our approach and Stothers’. The ﬁrst is relatively minor: the CW

approach requires the use of some hash functions; ours are different and simpler than Stothers’. The main

difference is that because of the generality of our analysis, we do not need to fully analyze all groups of

each tensor power construction. Instead we can just apply our formulas in a mechanical way. Stothers, on

the other hand, did a completely separate analysis of each group.

Finally, Stothers’ approach only works for tensor powers up to 4. Starting with the 5-th tensor power,

the values of some of the groups begin to depend on more variables and a more careful analysis is needed.

(Incidentally, we also obtain a better bound from the 4th tensor power, ω < 2.37293, however we believe

this is an artifact of our optimization software, as we end up solving an equivalent constraint program.)

Acknowledgments. The author would like to thank Satish Rao for encouraging her to explore the matrix

multiplication problem more thoroughly and Ryan Williams for his support. The author is extremely grateful

to Franc¸ois Le Gall who alerted her to Stothers’ work, suggested the use of NLOPT, and pointed out that

the feasible solution obtained by Stothers for his 4th tensor power constraint program can be improved to

ω < 2.37294 with a different setting of the parameters. Franc¸ois also uncovered a ﬂaw in a prior version of

the paper, which we have ﬁxed in the current version. He was also recently able to improve our bound on ω

slightly to 2.37287.

Preliminaries We use the following notation: [n] := {1, . . . , n}, and



]

i∈[k]





,...,a



We deﬁne ω ≥ 2 to be the inﬁmum over the set of all reals r such that n × n matrix multiplication

over Q can be computed in n

additions and multiplications for some natural number n. (However, the CW

approach and our extensions work over any ring.)

A three-term arithmetic progression is a sequence of three integers a ≤ b ≤ c so that b − a = c − b, or

equivalently, a + c = 2b. An arithmetic progression is nontrivial if a < b < c.

The following is a theorem by Behrend [3] improving on Salem and Spencer [16]. The subset A com-

puted by the theorem is called a Salem-Spencer set.

Theorem 1. There exists an absolute constant c such that for every N ≥ exp(c

), one can construct in

poly(N) time a subset A ⊂ [N] with no three-term arithmetic progressions and |A| > N exp(−c

√

log N).

The following lemma is needed in our analysis.

Lemma 1. Let k be a constant. Let B

be ﬁxed for i ∈ [k]. Let a

for i ∈ [k] be variables such that a

≥ 0

and

= 1. Then, as N goes to inﬁnity, the quantity



i∈[k]



i=1

is maximized for the choices a

= B

j=1

for all i ∈ [k] and for these choices it is at least





j=1





/ (N + 1)

Proof. We will prove the lemma by induction on k. Suppose that k = 2 and consider





N(1−a)

= y





(x/y)

where x ≤ y.

When (x/y) ≤ 1, the function f(a) =





(x/y)

of a is concave for a ≤ 1/2. Hence its maximum

is achieved when ∂f (a)/∂a = 0. Consider f (a): it is N!/((aN)!(N(1 − a))!)(x/y)

. We can take the

logarithm to obtain ln f(a) = ln(N!) + Na ln(x/y) − ln(aN !) − ln((N(1 − a))!). f (a) grows exactly

when a ln(x/y) − ln(aN!)/N − ln(N (1 −a))!/N does. Taking Stirling’s approximation, we obtain

a ln(x/y)−ln(aN!)/N −ln(N(1−a))!/N = a ln(x/y)−a ln(a)−(1−a) ln(1−a)−ln N −O((log N)/N).

Since N is large, the O((log N)/N ) term is negligible. Thus we are interested in when g(a) =

a ln(x/y) − a ln(a) − (1 − a) ln(1 − a) is maximized. Because of concavity, for a ≤ 1/2 and x ≤ y,

the function is maximized when ∂g(a)/∂a = 0, i.e. when

0 = ln(x/y) − ln(a) −1 + ln(1 − a) + 1 = ln(x/y) − ln(a/(1 −a)).

Hence a/(1 − a) = x/y and so a = x/(x + y).

Furthermore, since the maximum is attained for this value of a, we get that for each t ∈ {0, . . . , N}

we have that





N−t

≤





N(1−a)

, and since

t=0





N−t

= (x + y)

, we obtain that for

a = x/(x + y),





N(1−a)

≥ (x + y)

/(N + 1).

Now let’s consider the case k > 2. First assume that the B

are sorted so that B

≤ B

i+1

. Since

= 1, we obtain



]

i∈[k]



i=1



]

i∈[k]



i=1

where b

= B

. We will prove the claim for



]

i∈[k]



i=1

, and the lemma will follow for the

as well. Hence we can assume that

= 1.

Suppose that we have proven the claim for k −1. This means that in particular



N − a

j≥2



j=2

≥





j=2





N−a

/(N + 1)

k−1

and the quantity is maximized for a

N/(N − a

N) = b

j≥2

for all j ≥ 2.

Now consider







j=2



N−a

. By our base case we get that this is maximized and is at

least (

j=1

)

/N for the setting a

= b

. Hence, we will get



j∈[k]



j=1

≥





j=1





/(N + 1)

for the setting a

= b

and for j ≥ 2, a

N/(N − a

N) = b

j≥2

implies a

/(1 − b

) = b

/(1 − b

)

and hence a

= b

. We have proven the lemma.

1.1 A brief summary of the techniques used in bilinear matrix multiplication algorithms

A full exposition of the techniques can be found in the book by B

urgisser, Clausen and Shokrollahi [6]. The

lecture notes by Bl

aser [5] are also a nice read.

Bilinear algorithms and trilinear forms. Matrix multiplication is an example of a trilinear form. n × n

matrix multiplication, for instance, can be written as

i,j∈[n]

k∈n

which corresponds to the equalities z

k∈n

for all i, j ∈ [n]. In general, a trilinear form has the

form

i,j,k

ijk

where i, j, k are indices in some range and t

ijk

are the coefﬁcients which deﬁne the

trilinear form; t

ijk

is also called a tensor. The trilinear form for the product of a k ×m by an m × n matrix

is denoted by hk, m, ni.

Strassen’s algorithm for matrix multiplication is an example of a bilinear algorithm which computes a

trilinear form. A bilinear algorithm is equivalent to a representation of a trilinear form of the following form:

i,j,k

ijk

λ=1

(

λ,i

)(

λ,j

)(

λ,k

Given the above representation, the algorithm is then to ﬁrst compute the r products P

= (

λ,i

)(

λ,j

)

and then for each k to compute z

λ,k

For instance, Strassen’s algorithm for 2 × 2 matrix multiplication can be represented as follows:

+ x

+ (x

+ x

+ (x

+ x

+ (x

+ x

)(y

+ y

)(z

+ z

) + (x

+ x

− z

) + x

− y

)(z

+ z

Multiplying matrices faster than coppersmith-winograd

Figures

Citations

Parameterized Algorithms

Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based

Powers of Tensors and Fast Matrix Multiplication

Analysis of Boolean Functions

Powers of tensors and fast matrix multiplication

References

The Design and Analysis of Computer Algorithms

Gaussian elimination is not optimal

Matrix multiplication via arithmetic progressions

Matrix multiplication via arithmetic progressions

Algebraic Complexity Theory

Related Papers (5)

Matrix multiplication via arithmetic progressions

Powers of tensors and fast matrix multiplication

Gaussian elimination is not optimal

Subcubic Equivalences between Path, Matrix and Triangle Problems

All pairs shortest paths using bridging sets and rectangular matrix multiplication

Frequently Asked Questions (1)

Q1. What contributions have the authors mentioned in the paper "Multiplying matrices in o(n) time" ?