scispace - formally typeset
Search or ask a question

Showing papers on "Sparse matrix published in 2007"


Book
06 Jul 2007

948 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coefficients in this combination.
Abstract: Given a covariance matrix, we consider the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coefficients in this combination This problem arises in the decomposition of a covariance matrix into sparse factors or sparse principal component analysis (PCA), and has wide applications ranging from biology to finance We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming-based relaxation for our problem We also discuss Nesterov's smooth minimization technique applied to the semidefinite program arising in the semidefinite relaxation of the sparse PCA problem The method has complexity $O(n^4 \sqrt{\log(n)}/\epsilon)$, where $n$ is the size of the underlying covariance matrix and $\epsilon$ is the desired absolute accuracy on the optimal value of the problem

699 citations


Journal ArticleDOI
TL;DR: In this article, a three-phase ac-ac sparse matrix converter with no energy storage elements and employing only 15 IGBTs, as opposed to 18 IGBT switches, was proposed.
Abstract: A novel three-phase ac-ac sparse matrix converter having no energy storage elements and employing only 15 IGBTs, as opposed to 18 IGBTs of a functionally equivalent conventional ac-ac matrix converter, is proposed. It is shown that the realization effort could be further reduced to only nine IGBTs in an ultra sparse matrix converter (USMC) in the case where only unidirectional power flow is required and the fundamental phase displacement at the input and at the output is limited to plusmnpi/6. The dependency of the voltage and current transfer ratios of the sparse matrix converters on the operating parameters is analyzed and a space vector modulation scheme is described in combination with a zero current commutation method. Finally, the sparse matrix concept is verified by simulation and experimentally using a 6.8-kW/400-V very sparse matrix converter, which is implemented with 12 IGBT switches, and USMC prototypes.

398 citations


Proceedings ArticleDOI
26 Aug 2007
TL;DR: It is shown that Toeplitz-structured matrices with entries drawn independently from the same distributions are also sufficient to recover x from y with high probability, and the performance of such matrices is compared with that of fully independent and identically distributed ones.
Abstract: The problem of recovering a sparse signal x Rn from a relatively small number of its observations of the form y = Ax Rk, where A is a known matrix and k « n, has recently received a lot of attention under the rubric of compressed sensing (CS) and has applications in many areas of signal processing such as data cmpression, image processing, dimensionality reduction, etc. Recent work has established that if A is a random matrix with entries drawn independently from certain probability distributions then exact recovery of x from these observations can be guaranteed with high probability. In this paper, we show that Toeplitz-structured matrices with entries drawn independently from the same distributions are also sufficient to recover x from y with high probability, and we compare the performance of such matrices with that of fully independent and identically distributed ones. The use of Toeplitz matrices in CS applications has several potential advantages: (i) they require the generation of only O(n) independent random variables; (ii) multiplication with Toeplitz matrices can be efficiently implemented using fast Fourier transform, resulting in faster acquisition and reconstruction algorithms; and (iii) Toeplitz-structured matrices arise naturally in certain application areas such as system identification.

339 citations


Journal ArticleDOI
TL;DR: A new algorithm, "local K" (LK), is proposed, for fast evaluation of the exchange Fock matrix in case the Cholesky decomposition of the electron repulsion integrals is used, which can be used also within the class of methods that employ auxiliary basis set expansions for representing the electron Repulsion Integrals.
Abstract: The authors propose a new algorithm, “local K” (LK), for fast evaluation of the exchange Fock matrix in case the Cholesky decomposition of the electron repulsion integrals is used. The novelty lies in the fact that rigorous upper bounds to the contribution from each occupied orbital to the exchange Fock matrix are employed. By formulating these inequalities in terms of localized orbitals, the scaling of computing the exchange Fock matrix is reduced from quartic to quadratic with only negligible prescreening overhead and strict error control. Compared to the unscreened Cholesky algorithm, the computational saving is substantial for systems of medium and large sizes. By virtue of its general formulation, the LK algorithm can be used also within the class of methods that employ auxiliary basis set expansions for representing the electron repulsion integrals.

287 citations


Journal ArticleDOI
TL;DR: Two fourth-order cumulant-based techniques for the estimation of the mixing matrix in underdetermined independent component analysis based on a simultaneous matrix diagonalization and a simultaneous off-diagonalization are studied.
Abstract: In this paper we study two fourth-order cumulant-based techniques for the estimation of the mixing matrix in underdetermined independent component analysis. The first method is based on a simultaneous matrix diagonalization. The second is based on a simultaneous off-diagonalization. The number of sources that can be allowed is roughly quadratic in the number of observations. For both methods, explicit expressions for the maximum number of sources are given. Simulations illustrate the performance of the techniques

283 citations


Proceedings ArticleDOI
24 Sep 2007
TL;DR: A compressive sensing scheme with deterministic performance guarantees using expander-graphs-based measurement matrices is proposed and it is shown that the signal recovery can be achieved with complexity O(n) even if the number of nonzero elements k grows linearly with n.
Abstract: Compressive sensing is an emerging technology which can recover a sparse signal vector of dimension n via a much smaller number of measurements than n. However, the existing compressive sensing methods may still suffer from relatively high recovery complexity, such as O(n3), or can only work efficiently when the signal is super sparse, sometimes without deterministic performance guarantees. In this paper, we propose a compressive sensing scheme with deterministic performance guarantees using expander-graphs-based measurement matrices and show that the signal recovery can be achieved with complexity O(n) even if the number of nonzero elements k grows linearly with n. We also investigate compressive sensing for approximately sparse signals using this new method. Moreover, explicit constructions of the considered expander graphs exist. Simulation results are given to show the performance and complexity of the new method.

224 citations


Proceedings ArticleDOI
25 Apr 2007
TL;DR: A novel distributed algorithm based on sparse random projections, which requires no global coordination or knowledge to solve the problem of recovering an approximation of n data values by querying any L sensors, so that the reconstruction error is comparable to the optimal fc-term approximation.
Abstract: Consider a large-scale wireless sensor network measuring compressible data, where n distributed data values can be well-approximated using only k

196 citations


Proceedings ArticleDOI
26 Aug 2007
TL;DR: A compressed sensing framework is introduced for efficient representation of multichannel, multiple trial EEG data and a simultaneous orthogonal matching pursuit algorithm is shown to be effective in the joint recovery of the original multiple trail EEG signals from a small number of projections.
Abstract: Many applications in signal processing require the efficient representation and processing of data. The traditional approach to efficient signal representation is compression. In recent years, there has been a new approach to compression at the sensing level. Compressed sensing (CS) is an emerging field which is based on the revelation that a small collection of linear projections of a sparse signal contains enough information for reconstruction. In this paper, we propose an application of compressed sensing in the field of biomedical signal processing, particularly electroencophelogram (EEG) collection and storage. A compressed sensing framework is introduced for efficient representation of multichannel, multiple trial EEG data. The proposed framework is based on the revelation that EEG signals are sparse in a Gabor frame. The sparsity of EEG signals in a Gabor frame is utilized for compressed sensing of these signals. A simultaneous orthogonal matching pursuit algorithm is shown to be effective in the joint recovery of the original multiple trail EEG signals from a small number of projections.

183 citations


Journal ArticleDOI
TL;DR: This study uses performance profiles as a tool for evaluating and comparing the performance of serial sparse direct solvers on an extensive set of symmetric test problems taken from a range of practical applications.
Abstract: In recent years a number of solvers for the direct solution of large sparse symmetric linear systems of equations have been developed. These include solvers that are designed for the solution of positive definite systems as well as those that are principally intended for solving indefinite problems. In this study, we use performance profiles as a tool for evaluating and comparing the performance of serial sparse direct solvers on an extensive set of symmetric test problems taken from a range of practical applications.

174 citations


Journal ArticleDOI
TL;DR: A boundary element method with basis functions that incorporate the asymptotic behavior of the solution at high frequencies and combines this hybrid method with very effective quadrature rules for oscillatory integrals to obtain a sparse discretization matrix for the oscillatory problem.
Abstract: We consider two-dimensional scattering problems, formulated as an integral equation defined on the boundary of the scattering obstacle. The oscillatory nature of high-frequency scattering problems necessitates a large number of unknowns in classical boundary element methods. In addition, the corresponding discretization matrix of the integral equation is dense. We formulate a boundary element method with basis functions that incorporate the asymptotic behavior of the solution at high frequencies. The method exhibits the effectiveness of asymptotic methods at high frequencies with only few unknowns, but retains accuracy for lower frequencies. New in our approach is that we combine this hybrid method with very effective quadrature rules for oscillatory integrals. As a result, we obtain a sparse discretization matrix for the oscillatory problem. Moreover, numerical experiments indicate that the accuracy of the solution actually increases with increasing frequency. The sparse discretization applies to problems where the phase of the solution can be predicted a priori, for example in the case of smooth and convex scatterers.

01 Jan 2007
TL;DR: In this article, decay bounds for the entries of f(A), where A is a sparse (in particular, banded) n× n diagonalizable matrix and f is smooth on a subset of the complex plane containing the spectrum of A, are established.
Abstract: We establish decay bounds for the entries of f(A), where A is a sparse (in particular, banded) n× n diagonalizable matrix and f is smooth on a subset of the complex plane containing the spectrum of A. Combined with techniques from approximation theory, the bounds are used to compute sparse (or banded) approximations to f(A), resulting in algorithms that under appropriate conditions have linear complexity in the matrix dimension. Applications to various types of problems are discussed and illustrated by numerical examples.

Book ChapterDOI
26 Sep 2007
TL;DR: By combining recent GPU programming techniques with supercomputing strategies, this work implements a sparse general-purpose linear solver which outperforms leading-edge CPU counterparts (MKL / ACML).
Abstract: A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse general-purpose linear solver which outperforms leading-edge CPU counterparts (MKL / ACML).

Journal ArticleDOI
TL;DR: In this article, Chen and Dunson proposed a modified Cholesky decomposition of the form E = DLL'D for a covariance matrix where D is a diagonal matrix with entries proportional to the square roots of the diagonal entries of E and L is a unit lower-triangular matrix solely determining its correlation matrix.
Abstract: SUMMARY Chen & Dunson (2003) have proposed a modified Cholesky decomposition of the form E = DLL'D for a covariance matrix where D is a diagonal matrix with entries proportional to the square roots of the diagonal entries of E and L is a unit lower-triangular matrix solely determining its correlation matrix. This total separation of variance and correlation is definitely a major advantage over the more traditional modified Cholesky decomposition of the form LD2L' (Pourahmadi, 1999). We show that, though the variance and correlation parameters of the former decomposition are separate, they are not asymptotically orthogonal and that the estimation of the new parameters could be more demanding computationally. We also provide statistical interpretation for the entries of L and D as certain moving average parameters and innovation variances and indicate how the existing likelihood procedures can be employed to estimate the new parameters.

Journal ArticleDOI
TL;DR: A new software code for computing selected eigenvalues and associated eigenvectors of a real symmetric matrix is described, which combines the Jacobi–Davidson method with efficient multilevel incomplete LU (ILU) preconditioning.

Journal ArticleDOI
TL;DR: In this paper, the performance of the combination technique is analyzed using a projection framework and the C/S decomposition, and modified combination coefficients are derived which are optimal in a certain sense.

Journal ArticleDOI
TL;DR: A performance model for Cell is introduced and applied to several key numerical kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs and is validated by comparing results against published hardware data, as well as the own Cell blade implementations.
Abstract: In this work, we examine the potential of using the recently-released STI Cell processor as a building block for future high-end scientific computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key numerical kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. Next, we validate our model by comparing results against published hardware data, as well as our own Cell blade implementations. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different kernel implementations and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.

Journal ArticleDOI
TL;DR: A spectral-element time-domain (SETD) method is proposed to solve 3-D transient electromagnetic problems based on Gauss-Lobatto-Legendre polynomials that requires only a trivial sparse matrix-vector product at each time step, thus significantly reducing CPU time and memory requirement.
Abstract: A spectral-element time-domain (SETD) method is proposed to solve 3-D transient electromagnetic problems based on Gauss-Lobatto-Legendre polynomials. It has the advantages of spectral accuracy and block-diagonal mass matrix. With the inexpensive inversion of the block-diagonal mass matrix, the proposed method requires only a trivial sparse matrix-vector product at each time step, thus significantly reducing CPU time and memory requirement. Galerkin's method is used for spatial discretization, and a fourth-order Runge-Kutta scheme is employed for the time integration. The perfectly matched layer (PML) is employed to truncate the boundary in unbounded problems. The pseudospectral time-domain method is used to simplify the treatment of the PML inside the proposed SETD method. Numerical examples are shown to verify the efficiency and the spectral accuracy with the order of basis functions

Journal ArticleDOI
TL;DR: This paper proposes a novel natural gradient method for complex sparse representation based on the maximum a posteriori (MAP) criterion, and develops a new CBSS method that works in the frequency domain.
Abstract: Convolutive blind source separation (CBSS) that exploits the sparsity of source signals in the frequency domain is addressed in this paper. We assume the sources follow complex Laplacian-like distribution for complex random variable, in which the real part and imaginary part of complex-valued source signals are not necessarily independent. Based on the maximum a posteriori (MAP) criterion, we propose a novel natural gradient method for complex sparse representation. Moreover, a new CBSS method is further developed based on complex sparse representation. The developed CBSS algorithm works in the frequency domain. Here, we assume that the source signals are sufficiently sparse in the frequency domain. If the sources are sufficiently sparse in the frequency domain and the filter length of mixing channels is relatively small and can be estimated, we can even achieve underdetermined CBSS. We illustrate the validity and performance of the proposed learning algorithm by several simulation examples.

Proceedings ArticleDOI
15 Oct 2007
TL;DR: This study examines the use of graph ordering algorithms for visual analysis of data sets using visual similarity matrices, considering three general classes of algorithms for generating orderings: simple graph theoretic algorithms, symbolic sparse matrix reordering algorithms, and spectral decomposition algorithms.
Abstract: In this study, we examine the use of graph ordering algorithms for visual analysis of data sets using visual similarity matrices. Visual similarity matrices display the relationships between data items in a dot-matrix plot format, with the axes labeled with the data items and points drawn if there is a relationship between two data items. The biggest challenge for displaying data using this representation is finding an ordering of the data items that reveals the internal structure of the data set. Poor orderings are indistinguishable from noise whereas a good ordering can reveal complex and subtle features of the data. We consider three general classes of algorithms for generating orderings: simple graph theoretic algorithms, symbolic sparse matrix reordering algorithms, and spectral decomposition algorithms. We apply each algorithm to synthetic and real world data sets and evaluate each algorithm for interpretability (i.e., does the algorithm lead to images with usable visual features?) and stability (i.e., does the algorithm consistently produce similar results?). We also provide a detailed discussion of the results for each algorithm across the different graph types and include a discussion of some strategies for using ordering algorithms for data analysis based on these results.

Journal ArticleDOI
TL;DR: A recently proposed implicit discretisation scheme is used which generalises the standard approach based on gridding and improves the reconstruction quality when the sampling scheme or the weights are less regular.
Abstract: In magnetic resonance imaging (MRI), methods that use a non-Cartesian grid in k-space are becoming increasingly important. In this paper, we use a recently proposed implicit discretisation scheme which generalises the standard approach based on gridding. While the latter succeeds for sufficiently uniform sampling sets and accurate estimated density compensation weights, the implicit method further improves the reconstruction quality when the sampling scheme or the weights are less regular. Both approaches can be solved efficiently with the nonequispaced FFT. Due to several new techniques for the storage of an involved sparse matrix, our examples include also the reconstruction of a large 3D data set. We present four case studies and report on efficient implementation of the related algorithms.

Journal ArticleDOI
TL;DR: In this article, a three-step solution technique is presented for solving two-dimensional and three-dimensional (3D) nonhomogeneous material problems using the multi-domain boundary element method.
Abstract: A three-step solution technique is presented for solving two-dimensional (2D) and three-dimensional (3D) nonhomogeneous material problems using the multi-domain boundary element method. The discretized boundary element formulation expressed in terms of normalized displacements and tractions is written for each sub-domain. The first step is to eliminate internal variables at the individual domain level. The second step is to eliminate boundary unknowns defined over nodes used only by the domain itself. And the third step is to establish the system of equations according to the compatibility of displacements and equilibrium of tractions at common interface nodes. Discontinuous elements are utilized to model the traction discontinuity across corner nodes. The distinct feature of the three-step solver is that only interface displacements are unknowns in the final system of equations and the coefficient matrix is blocked sparse. As a result, large-scale 3D problems can be solved efficiently. Three numerical examples for 2D and 3D problems are given to demonstrate the effectiveness of the presented technique.

Proceedings ArticleDOI
23 Apr 2007
TL;DR: This paper introduces the innovative SpMxV solver designed for FPGAs (SSF), which achieves up to 20x speedup and minimizes the control logic by taking advantage of the data flow via an innovative accumulation circuit which uses pipelined floating point adders.
Abstract: Creating a high throughput sparse matrix vector multiplication (SpMxV) implementation depends on a balanced system design. In this paper, we introduce the innovative SpMxV solver designed for FPGAs (SSF). Besides high computational throughput, system performance is optimized by reducing initialization time and overheads, minimizing and overlapping I/O operations, and increasing scalability. SSF accepts any matrix size and can be easily adapted to different data formats. SSF minimizes the control logic by taking advantage of the data flow via an innovative accumulation circuit which uses pipelined floating point adders. Compared to optimized software codes on a Pentium 4 microprocessor, our design achieves up to 20x speedup.

Journal ArticleDOI
TL;DR: It is demonstrated that the suggested two-step approach for model reduction in flexible multibody dynamics has very good approximation capabilities in the time as well as in the frequency domain and can help to reduce the computation time of a numerical simulation significantly.
Abstract: In this work, a two-step approach for model reduction in flexible multibody dynamics is proposed. This technique is a combination of the Krylov-subspace method and a Gramian matrix based reduction approach that is particularly suited if a small reduced-order model of a system charged with many force-inputs has to be generated. The proposed methodology can be implemented efficiently using sparse matrix techniques and is therefore applicable to large-scale systems too. By a numerical example, it is demonstrated that the suggested two-step approach has very good approximation capabilities in the time as well as in the frequency domain and can help to reduce the computation time of a numerical simulation significantly.

Patent
12 Feb 2007
TL;DR: In this paper, the authors present a sparse matrix processing system and method which uses sparse matrices that are compressed to reduce memory traffic and improve performance of computations using sparse matrix.
Abstract: The present invention involves a sparse matrix processing system and method which uses sparse matrices that are compressed to reduce memory traffic and improve performance of computations using sparse matrices.

Proceedings ArticleDOI
11 Jun 2007
TL;DR: This paper argues that the single-table approach is a necessary component of self-managing database systems because it frees users from a tedious and potentially ineffective schema-design phase when managing sparse data sets.
Abstract: A "sparse" data set typically has hundreds or even thousands of attributes, but most objects have non-null values for only a small number of these attributes. A popular view about sparse data is that it arises merely as the result of poor schema design. In this paper, we argue that rather than being the result of inept schema design,storing a sparse data set in a single table is the right way to proceed. However, for this to be the case, RDBMSs must provide sparse data management facilities that go beyond the previously studied requirement of storing such data sets efficiently. In particular, an RDBMS must 1) enable users to effectively build ad hoc queries over a very large number of attributes, and 2) support efficient evaluation of these queries over a wide, sparse table. We propose techniques that provide these capabilities, and argue that the single-table approach is a necessary component of self-managing database systems because it frees users from a tedious and potentially ineffective schema-design phase when managing sparse data sets.

Journal ArticleDOI
TL;DR: A diverse set of test calculations shows that the size of system where significant computational savings can be achieved depends strongly on the dimensionality of the system, and the extent of localizability of the molecular orbitals.
Abstract: The scaled opposite spin Moller–Plesset method (SOS-MP2) is an economical way of obtaining correlation energies that are computationally cheaper, and yet, in a statistical sense, of higher quality than standard MP2 theory, by introducing one empirical parameter. But SOS-MP2 still has a fourth-order scaling step that makes the method inapplicable to very large molecular systems. We reduce the scaling of SOS-MP2 by exploiting the sparsity of expansion coefficients and local integral matrices, by performing local auxiliary basis expansions for the occupied-virtual product distributions. To exploit sparsity of 3-index local quantities, we use a blocking scheme in which entire zero-rows and columns, for a given third global index, are deleted by comparison against a numerical threshold. This approach minimizes sparse matrix book-keeping overhead, and also provides sufficiently large submatrices after blocking, to allow efficient matrix–matrix multiplies. The resulting algorithm is formally cubic scaling, and requires only moderate computational resources (quadratic memory and disk space) and, in favorable cases, is shown to yield effective quadratic scaling behavior in the size regime we can apply it to. Errors associated with local fitting using the attenuated Coulomb metric and numerical thresholds in the blocking procedure are found to be insignificant in terms of the predicted relative energies. A diverse set of test calculations shows that the size of system where significant computational savings can be achieved depends strongly on the dimensionality of the system, and the extent of localizability of the molecular orbitals. © 2007 Wiley Periodicals, Inc. J Comput Chem 2007

Journal ArticleDOI
TL;DR: A multiresolution (MR) basis is described for the method of moments (MoM) analysis of a generic 3-D conductor discretized with triangular cells, constructing as linear combination of Rao-Wilton-Glisson basis functions thus allowing direct applicability on existing MoM codes.
Abstract: A multiresolution (MR) basis is described for the method of moments (MoM) analysis of a generic 3-D conductor discretized with triangular cells. The MR basis functions are constructed as linear combination of Rao-Wilton-Glisson (RWG) basis functions, thus allowing direct applicability on existing MoM codes. With respect to previous work by the authors, the generation of the MR functions is significantly simplified, yet keeping similar applicability and performance, in terms of conditioning of the MoM matrix, convergence of the iterative solvers, and possibility of sparsifying the MoM matrix by clipping. Moreover, with the proposed basis the basis-change matrix from RWG to MR is highly sparse at all levels; this allows a more efficient generation of the MR-MoM matrix

Journal ArticleDOI
TL;DR: In this article, the authors consider the inverse problem of reconstructing the absorption and scattering coefficients of the radiative transfer equation (RTE) from measurements of photon current transmitted through a scattering medium in the frequency domain.
Abstract: In this paper, we consider the inverse problem of reconstructing the absorption and scattering coefficients of the radiative transfer equation (RTE) from measurements of photon current transmitted through a scattering medium in the frequency domain. We consider an output least-squares formulation of this problem and derive the appropriate forward operators and their Frechet derivatives. For efficient implementation, we use the second-order form of the RTE and discuss its solution using a finite element method. The PN approximation is used to expand the radiance in spherical harmonics, which leads to a large sparse matrix system that can be efficiently solved. Examples are shown in the low-scattering case where the diffusion approximation fails.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: The design of the Matrix Template Library (MTL) has been independently extended to provide recursators, to support block-recursive algorithms, supplementing MTL's iterators, and these techniques generalize to other applications in linear algebra.
Abstract: Positive results from new object-oriented tools for scientific programming are reported. Using template classes, abstractions of matrix representations are available that subsume conventional row-major, column-major, either Z- or I-Morton-order, as well as block-wise combinations of these. Moreover, the design of the Matrix Template Library (MTL) has been independently extended to provide recursators, to support block-recursive algorithms, supplementing MTL's iterators. Data types modeling both concepts enable the programmer to implement both iterative and recursive algorithms (or even both) on all of the aforementioned matrix representations at once for a wide family of important scientific operations.We illustrate the unrestricted applicability of our matrix-recursator on matrix multiplication. The same generic block-recursive function, unaltered, is instantiated on different triplets of matrix types. Within a base block, either a library multiplication or a user-provided, platform-specific code provides excellent performance. We achieve 77% of peak-performance using hand-tuned base cases without explicit prefetching. This excellent performance becomes available over a wide family of matrix representations from a single program. The techniques generalize to other applications in linear algebra.