scispace - formally typeset
Search or ask a question
Author

Oded Schwartz

Bio: Oded Schwartz is an academic researcher from Hebrew University of Jerusalem. The author has contributed to research in topics: Matrix multiplication & Strassen algorithm. The author has an hindex of 28, co-authored 73 publications receiving 2540 citations. Previous affiliations of Oded Schwartz include Technical University of Berlin & Tel Aviv University.


Papers
More filters
Journal ArticleDOI
TL;DR: This work generalizes a lower bound on the amount of communication needed to perform dense, n-by-n matrix multiplication using the conventional O(n3) algorithm to a much wider variety of algorithms, including LU factorization, Cholesky factors, LDLT factors, QR factors, the Gram–Schmidt algorithm, and algorithms for eigenvalues and singular values.
Abstract: In 1981 Hong and Kung [HK81] proved a lower bound on the amount of communication (amount of data moved between a small, fast memory and large, slow memory) needed to perform dense,n-by-nmatrix-multiplication using the conventionalO(n 3 ) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin [ITT04] gave a new proof of this result and extended it to the parallel case (where communication means the amount of data moved between processors). In both cases the lower bound may be expressed as !(#arithmetic operations / ! M), whereMis the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization,LDL T factorization, QR factorization, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth) we get lower bounds on the number of messages required to move it (latency). We illustrate how to extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication) to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems. We point out recently designed algorithms for dense LU, Cholesky, QR, eigenvalue and the SVD problems that attain these lower bounds; implementations of LU and QR show large speedups over conventional linear algebra algorithms in standard libraries like LAPACK and ScaLAPACK. Many open problems remain.

257 citations

Journal ArticleDOI
TL;DR: Hong and Kung as discussed by the authors gave a lower bound on the communication complexity of matrix multiplication in the parallel case. But this lower bound was later extended to a much wider variety of linear algebra algorithms, including LU factorization, Cholesky factorization and LDLT factorization.
Abstract: In 1981 Hong and Kung proved a lower bound on the amount of communication (amount of data moved between a small, fast memory and large, slow memory) needed to perform dense, n-by-n matrix multiplication using the conventional O(n3) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo, and Tiskin gave a new proof of this result and extended it to the parallel case (where communication means the amount of data moved between processors). In both cases the lower bound may be expressed as Ω(#arithmetic_operations/M), where M is the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization, LDLT factorization, QR factorization, the Gram–Schmidt algorithm, and algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices and for sequential or para...

212 citations

Journal ArticleDOI
TL;DR: It is proved that the Maximumk -Set Packing problem cannot be efficiently approximated to within a factor ofmega unless P = NP, which improves the previous hardness of approximation factor of k/2^{{O({\sqrt {\ln k} })}} by Trevisan.
Abstract: Given a k-uniform hypergraph, the MAXIMUM k-SET PACKING problem is to find the maximum disjoint set of edges. We prove that this problem cannot be efficiently approximated to within a factor of Ω(k/ln k) unless P = NP. This improves the previous hardness of approximation factor of k/2O(√lnk) by Trevisan. This result extends to the problem of k-Dimensional-Matching.

211 citations

Journal ArticleDOI
TL;DR: This paper describes lower bounds on communication in linear algebra, and presents lower bounds for Strassen-like algorithms, and for iterative methods, in particular Krylov subspace methods applied to sparse matrices.
Abstract: The traditional metric for the efficiency of a numerical algorithm has been the number of arithmetic operations it performs. Technological trends have long been reducing the time to perform an arithmetic operation, so it is no longer the bottleneck in many algorithms; rather, communication, or moving data, is the bottleneck. This motivates us to seek algorithms that move as little data as possible, either between levels of a memory hierarchy or between parallel processors over a network. In this paper we summarize recent progress in three aspects of this problem. First we describe lower bounds on communication. Some of these generalize known lower bounds for dense classical (O(n3)) matrix multiplication to all direct methods of linear algebra, to sequential and parallel algorithms, and to dense and sparse matrices. We also present lower bounds for Strassen-like algorithms, and for iterative methods, in particular Krylov subspace methods applied to sparse matrices. Second, we compare these lower bounds to widely used versions of these algorithms, and note that these widely used algorithms usually communicate asymptotically more than is necessary. Third, we identify or invent new algorithms for most linear algebra problems that do attain these lower bounds, and demonstrate large speed-ups in theory and practice.

138 citations

Proceedings ArticleDOI
20 May 2013
TL;DR: This work obtains the first communication-optimal algorithm for all dimensions of rectangular matrices by combining the dimension-splitting technique with the recursive BFS/DFS approach, and shows significant speedups over existing parallel linear algebra libraries both on a 32-core shared-memory machine and on a distributed-memory supercomputer.
Abstract: Communication-optimal algorithms are known for square matrix multiplication. Here, we obtain the first communication-optimal algorithm for all dimensions of rectangular matrices. Combining the dimension-splitting technique of Frigo, Leiserson, Prokop and Ramachandran (1999) with the recursive BFS/DFS approach of Ballard, Demmel, Holtz, Lipshitz and Schwartz (2012) allows for a communication-optimal as well as cache and network-oblivious algorithm. Moreover, the implementation is simple: approximately 50 lines of code for the shared-memory version. Since the new algorithm minimizes communication across the network, between NUMA domains, and between levels of cache, it performs well in practice on both shared and distributed-memory machines. We show significant speedups over existing parallel linear algebra libraries both on a 32-core shared-memory machine and on a distributed-memory supercomputer.

132 citations


Cited by
More filters
Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

01 Jan 2009
TL;DR: This paper presents a meta-modelling framework for modeling and testing the robustness of the modeled systems and some of the techniques used in this framework have been developed and tested in the field.
Abstract: ing WS1S Systems to Verify Parameterized Networks . . . . . . . . . . . . 188 Kai Baukus, Saddek Bensalem, Yassine Lakhnech and Karsten Stahl FMona: A Tool for Expressing Validation Techniques over Infinite State Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 J.-P. Bodeveix and M. Filali Transitive Closures of Regular Relations for Verifying Infinite-State Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Bengt Jonsson and Marcus Nilsson Diagnostic and Test Generation Using Static Analysis to Improve Automatic Test Generation . . . . . . . . . . . . . 235 Marius Bozga, Jean-Claude Fernandez and Lucian Ghirvu Efficient Diagnostic Generation for Boolean Equation Systems . . . . . . . . . . . . 251 Radu Mateescu Efficient Model-Checking Compositional State Space Generation with Partial Order Reductions for Asynchronous Communicating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Jean-Pierre Krimm and Laurent Mounier Checking for CFFD-Preorder with Tester Processes . . . . . . . . . . . . . . . . . . . . . . . 283 Juhana Helovuo and Antti Valmari Fair Bisimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Thomas A. Henzinger and Sriram K. Rajamani Integrating Low Level Symmetries into Reachability Analysis . . . . . . . . . . . . . 315 Karsten Schmidt Model-Checking Tools Model Checking Support for the ASM High-Level Language . . . . . . . . . . . . . . 331 Giuseppe Del Castillo and Kirsten Winter Table of

1,687 citations

Journal ArticleDOI

784 citations

Proceedings ArticleDOI
07 Apr 2008
TL;DR: The empirical study indicates that anonymized social networks generated by the method can still be used to answer aggregate network queries with high accuracy and present a practical solution to battle neighborhood attacks.
Abstract: Recently, as more and more social network data has been published in one way or another, preserving privacy in publishing social network data becomes an important concern. With some local knowledge about individuals in a social network, an adversary may attack the privacy of some victims easily. Unfortunately, most of the previous studies on privacy preservation can deal with relational data only, and cannot be applied to social network data. In this paper, we take an initiative towards preserving privacy in social network data. We identify an essential type of privacy attacks: neighborhood attacks. If an adversary has some knowledge about the neighbors of a target victim and the relationship among the neighbors, the victim may be re-identified from a social network even if the victim's identity is preserved using the conventional anonymization techniques. We show that the problem is challenging, and present a practical solution to battle neighborhood attacks. The empirical study indicates that anonymized social networks generated by our method can still be used to answer aggregate network queries with high accuracy.

764 citations

Journal ArticleDOI
TL;DR: An improved coating pan apparatus and spray arm assembly are disclosed for providing facilitated maintenance and cleaning of sensitive spray nozzles.
Abstract: Let $f:2^X \rightarrow \cal R_+$ be a monotone submodular set function, and let $(X,\cal I)$ be a matroid. We consider the problem ${\rm max}_{S \in \cal I} f(S)$. It is known that the greedy algorithm yields a $1/2$-approximation [M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey, Math. Programming Stud., no. 8 (1978), pp. 73-87] for this problem. For certain special cases, e.g., ${\rm max}_{|S| \leq k} f(S)$, the greedy algorithm yields a $(1-1/e)$-approximation. It is known that this is optimal both in the value oracle model (where the only access to $f$ is through a black box returning $f(S)$ for a given set $S$) [G. L. Nemhauser and L. A. Wolsey, Math. Oper. Res., 3 (1978), pp. 177-188] and for explicitly posed instances assuming $P eq NP$ [U. Feige, J. ACM, 45 (1998), pp. 634-652]. In this paper, we provide a randomized $(1-1/e)$-approximation for any monotone submodular function and an arbitrary matroid. The algorithm works in the value oracle model. Our main tools are a variant of the pipage rounding technique of Ageev and Sviridenko [J. Combin. Optim., 8 (2004), pp. 307-328], and a continuous greedy process that may be of independent interest. As a special case, our algorithm implies an optimal approximation for the submodular welfare problem in the value oracle model [J. Vondrak, Proceedings of the $38$th ACM Symposium on Theory of Computing, 2008, pp. 67-74]. As a second application, we show that the generalized assignment problem (GAP) is also a special case; although the reduction requires $|X|$ to be exponential in the original problem size, we are able to achieve a $(1-1/e-o(1))$-approximation for GAP, simplifying previously known algorithms. Additionally, the reduction enables us to obtain approximation algorithms for variants of GAP with more general constraints.

761 citations