Journal ArticleDOI
Elemental: A New Framework for Distributed Memory Dense Matrix Computations
Reads0
Chats0
TLDR
Preliminary performance results show the new solution achieves competitive, if not superior, performance on large clusters, and a simple yet effective alternative to the traditional MPI-based approaches.Abstract:
Parallelizing dense matrix computations to distributed memory architectures is a well-studied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of many-core architectures, which may very well take the shape of distributed memory architectures within a single processor, these packages must be revisited since the traditional MPI-based approaches will likely need to be extended. Thus, this is a good time to review lessons learned since the introduction of these two packages and to propose a simple yet effective alternative. Preliminary performance results show the new solution achieves competitive, if not superior, performance on large clusters.read more
Citations
More filters
How to… home page
Challenger Tafe,WestOne +1 more
TL;DR: In this article, the authors developed a center to address state-of-the-art research, create innovating educational programs, and support technology transfers using commercially viable results to assist the Army Research Laboratory to develop the next generation Future Combat System in the telecommunications sector that assures prevention of perceived threats, and non-line of sight/Beyond line of sight lethal support.
Proceedings ArticleDOI
Dask: Parallel Computation with Blocked algorithms and Task Scheduling
TL;DR: This work couple blocked algorithms with dynamic and memory aware task scheduling to achieve a parallel and out-of-core NumPy clone and shows how this extends the effective scale of modern hardware to larger datasets.
Journal ArticleDOI
BLIS: A Framework for Rapidly Instantiating BLAS Functionality
TL;DR: Preliminary performance of level-2 and level-3 operations is observed to be competitive with two mature open source libraries (OpenBLAS and ATLAS) as well as an established commercial product (Intel MKL).
Journal ArticleDOI
Detection of lensing substructure using ALMA observations of the dusty galaxy SDP.81
Yashar D. Hezaveh,Neal Dalal,Daniel P. Marrone,Yao-Yuan Mao,Warren R. Morningstar,Di Wen,Roger Blandford,John E. Carlstrom,Christopher D. Fassnacht,Gilbert Holder,Athol J. Kemball,Philip J. Marshall,Norman Murray,Laurence Perreault Levasseur,Joaquin Vieira,Risa H. Wechsler +15 more
TL;DR: In this article, the authors used the Weiland Family Stanford Graduate Fellowship (WFG) and the Office of Science of the United States Department of Energy (OSDE) to conduct an experimental study on the HST-HF2-51358.001-A.
Journal ArticleDOI
The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science
Andreas Marek,Volker Blum,Volker Blum,R. Johanni,Ville Havu,Bruno Lang,T. Auckenthaler,Alexander Heinecke,Hans-Joachim Bungartz,Hermann Lederer +9 more
TL;DR: The Eigenvalue soLvers for Petascale Applications (ELPA) as discussed by the authors is a library for solving symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries.
References
More filters
Book
The algebraic eigenvalue problem
TL;DR: Theoretical background Perturbation theory Error analysis Solution of linear algebraic equations Hermitian matrices Reduction of a general matrix to condensed form Eigenvalues of matrices of condensed forms The LR and QR algorithms Iterative methods Bibliography.
Book
Lapack Users' Guide
TL;DR: The third edition of LAPACK provided a guide to troubleshooting and installation of Routines, as well as providing examples of how to convert from LINPACK or EISPACK to BLAS.
Book
MPI: The Complete Reference
TL;DR: MPI: The Complete Reference is an annotated manual for the latest 1.1 version of the standard that illuminates the more advanced and subtle features of MPI and covers such advanced issues in parallel computing and programming as true portability, deadlock, high-performance message passing, and libraries for distributed and parallel computing.
Journal ArticleDOI
A set of level 3 basic linear algebra subprograms
TL;DR: This paper describes an extension to the set of Basic Linear Algebra Subprograms targeted at matrix-vector operations that should provide for efficient and portable implementations of algorithms for high-performance computers.
How to… home page
Challenger Tafe,WestOne +1 more
TL;DR: In this article, the authors developed a center to address state-of-the-art research, create innovating educational programs, and support technology transfers using commercially viable results to assist the Army Research Laboratory to develop the next generation Future Combat System in the telecommunications sector that assures prevention of perceived threats, and non-line of sight/Beyond line of sight lethal support.