scispace - formally typeset
Search or ask a question

Showing papers on "Sparse matrix published in 2023"


Journal ArticleDOI
TL;DR: In this paper , an adaptive multispace adjustable sparse filtering (AMSASF) method was proposed to automatically capture rich and complementary features under multiple spaces by combining four classical sparse measures, and the attention mechanism was designed to adaptively assign different importance to different sparse spaces to improve the robustness of the algorithm.

5 citations


Journal ArticleDOI
TL;DR: In this article , an accurate multiobjective low-rank and sparse denoising framework is proposed for HSIs to achieve accurate modeling, which is based on a subfitness strategy.
Abstract: Due to the unavoidable influence of sparse and Gaussian noise during the process of data acquisition, the quality of hyperspectral images (HSIs) is degraded and their applications are greatly limited. It is therefore necessary to restore clean HSIs. In the traditional methods, low-rank and sparse matrix decomposition methods are usually applied to restore the pure data matrix from the observed data matrix. However, due to the fact that the optimization of the ${l}_{0}$ -norm for the sparse modeling is a nonconvex and NP-hard problem, convex relaxation and regularization parameters are usually introduced. However, convex relaxation often leads to inaccurate sparse modeling results, and the sensitive regularization parameters can lead to unstable results. Thus, in this article, to address these issues, an accurate multiobjective low-rank and sparse denoising framework is proposed for HSIs to achieve accurate modeling. The ${l}_{0}$ -norm is directly modeled as the sparse noise and is optimized by an evolutionary algorithm, and the denoising problem is converted into a multiobjective optimization problem through simultaneously optimizing the low-rank term, the sparse term, and the data fidelity term, without sensitive regularization parameters. However, since the low-rank clean image and sparse noise of the HSI are encoded into a solution, the length of the solution is too long to be optimized. In this article, a subfitness strategy is constructed to achieve effective optimization by comparing the objective function values corresponding to each band for each solution. The experiments undertaken with simulated images in 11 noise cases and four real noisy images confirm the effectiveness of the proposed method.

4 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a graph regularized sparse nonnegative matrix factorization (GSNMF) algorithm to obtain a cleaner data matrix to approximate the high-dimensional matrix, and the corresponding inference and alternating iterative update algorithm to solve the optimization problem are given.
Abstract: The graph regularized nonnegative matrix factorization (GNMF) algorithms have received a lot of attention in the field of machine learning and data mining, as well as the square loss method is commonly used to measure the quality of reconstructed data. However, noise is introduced when data reconstruction is performed; and the square loss method is sensitive to noise, which leads to degradation in the performance of data analysis tasks. To solve this problem, a novel graph regularized sparse NMF (GSNMF) is proposed in this article. To obtain a cleaner data matrix to approximate the high-dimensional matrix, the $l_{1}$ -norm to the low-dimensional matrix is added to achieve the adjustment of data eigenvalues in the matrix and sparsity constraint. In addition, the corresponding inference and alternating iterative update algorithm to solve the optimization problem are given. Then, an extension of GSNMF, namely, graph regularized sparse nonnegative matrix trifactorization (GSNMTF), is proposed, and the detailed inference procedure is also shown. Finally, the experimental results on eight different datasets demonstrate that the proposed model has a good performance.

4 citations


Proceedings ArticleDOI
25 Feb 2023
TL;DR: WISE as discussed by the authors is a machine learning framework that predicts the magnitude of the speedup of different sparse matrix-vector multiplication methods over a baseline method for a given sparse matrix.
Abstract: Sparse Matrix-Vector Multiplication (SpMV) is an essential sparse kernel. Numerous methods have been developed to accelerate SpMV. However, no single method consistently gives the highest performance across a wide range of matrices. For this reason, a performance prediction model is needed to predict the best SpMV method for a given sparse matrix. Unfortunately, predicting SpMV's performance is challenging due to the diversity of factors that impact it. In this work, we develop a machine learning framework called WISE that accurately predicts the magnitude of the speedups of different SpMV methods over a baseline method for a given sparse matrix. WISE relies on a novel feature set that summarizes a matrix's size, skew, and locality traits. WISE can then select the best SpMV method for each specific matrix. With a set of nearly 1,500 matrices, we show that using WISE delivers an average speedup of 2.4× over using Intel's MKL in a 24-core server.

2 citations


Journal ArticleDOI
TL;DR: In this paper , a matrix reconstruction method was proposed to improve the sparsity of the input matrix by extracting category rates from the user-item matrix according to the user preferences and organizing these into vectors.
Abstract: With the development of the Web, users spend more time accessing information that they seek. As a result, recommendation systems have emerged to provide users with preferred contents by filtering abundant information, along with providing means of exposing search results to users more effectively. These recommendation systems operate based on the user reactions to items or on the various user or item features. It is known that recommendation results based on sparse datasets are less reliable because recommender systems operate according to user responses. Thus, we propose a method to improve the dataset sparsity and increase the accuracy of the prediction results by using item features with user responses. A method based on the content-based filtering concept is proposed to extract category rates from the user–item matrix according to the user preferences and to organize these into vectors. Thereafter, we present a method to filter the user–item matrix using the extracted vectors and to regenerate the input matrix for collaborative filtering (CF). We compare the prediction results of our approach and conventional CF using the mean absolute error and root mean square error. Moreover, we calculate the sparsity of the regenerated matrix and the existing input matrix, and demonstrate that the regenerated matrix is more dense than the existing one. By computing the Jaccard similarity between the item sets in the regenerated and existing matrices, we verify the matrix distinctions. The results of the proposed methods confirm that if the regenerated matrix is used as the CF input, a denser matrix with higher predictive accuracy can be constructed than when using conventional methods. The validity of the proposed method was verified by analyzing the effect of the input matrix composed of high average ratings on the CF prediction performance. The low sparsity and high prediction accuracy of the proposed method are verified by comparisons with the results by conventional methods. Improvements of approximately 16% based on K-nearest neighbor and 15% based on singular value decomposition, and a three times improvement in the sparsity based on regenerated and original matrices are obtained. We propose a matrix reconstruction method that can improve the performance of recommendations.

2 citations


Journal ArticleDOI
TL;DR: SuiteSparse:GraphBLAS as mentioned in this paper is a full parallel implementation of the GraphBLAS standard, which defines a set of sparse matrix operations on an extended algebra of semirings.
Abstract: SuiteSparse:GraphBLAS is a full parallel implementation of the GraphBLAS standard, which defines a set of sparse matrix operations on an extended algebra of semirings using an almost unlimited variety of operators and types. When applied to sparse adjacency matrices, these algebraic operations are equivalent to computations on graphs. A description of the parallel implementation of SuiteSparse:GraphBLAS is given, including its novel parallel algorithms for sparse matrix multiply, addition, element-wise multiply, submatrix extraction and assignment, and the GraphBLAS mask/accumulator operation. Its performance is illustrated by solving the graph problems in the GAP Benchmark and by comparing it with other sparse matrix libraries.

2 citations


Journal ArticleDOI
TL;DR: In this article , the authors proposed a pseudo-inverse-based hard thresholding sparse signal recovery algorithm called PHT for short, which is O(n 2 /m 2 ) times faster than both NSIHT and NSHTP, where m and n are the number of rows and columns of the sensing matrix.
Abstract: Acquiring a sparse signal from an underdetermined linear system arises from numerous applications. Several hard thresholding algorithms have been developed for the sparse reconstruction. In particular, the so-called Newton-Step-based Iterative Hard Thresholding (NSIHT) and Newton-Step-based Hard Thresholding Pursuit (NSHTP) algorithms were developed by Meng et al. recently. Although they are efficient, a faster sparse recovery algorithm with a better reconstruction performance is still needed. In this paper, we first propose a Pseudo-inverse-based Hard Thresholding sparse signal recovery algorithm called PHT for short. Unlike the Iterative Hard Thresholding (IHT) algorithm which uses the gradient of the objective function to iteratively update the solution, our proposed algorithm utilizes the pseudo-inverse of the sensing matrix to iteratively update the solution. We then analyze the computational complexity of PHT and show that it is O(n²/m²) times faster than both NSIHT and NSHTP if they perform the same number of iterations, where m and n are the number of rows and columns of the sensing matrix \A. Furthermore, we establish a sufficient condition of stable recovery of the sparse signal with PHT by using the restricted isometry property (RIP) of the sensing matrix. Finally, extensive experiments are conducted which indicate that our proposed algorithm PHT is much faster than both NSIHT and NSHTP with overall better recovery performance.

2 citations


Journal ArticleDOI
TL;DR: In this paper , a graph neural network SPT-GCN was proposed to select a suitable tensor sparse format for sparse tensor analysis on CPU-GPU heterogeneous hybrid systems and a parallel execution strategy for SpTTM in different sparse formats.
Abstract: Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse distributions of different tensors vary greatly, which poses a big challenge to designing efficient and general SpTTM. In this paper, we describe SpTTM on CPU-GPU heterogeneous hybrid systems and give a parallel execution strategy for SpTTM in different sparse formats. We analyze the theoretical computer powers and estimate the number of tasks to achieve the load balancing between the CPU and the GPU of the heterogeneous systems. We discuss a method to describe tensor sparse structure by graph structure and design a new graph neural network SPT-GCN to select a suitable tensor sparse format. Furthermore, we perform extensive experiments using real datasets to demonstrate the advantages and efficiency of our proposed input-aware slice-wise SpTTM. The experimental results show that our input-aware slice-wise SpTTM can achieve an average speedup of 1.310 × compared to ParTI! library on a CPU-GPU heterogeneous system.

1 citations


Journal ArticleDOI
TL;DR: In this paper , a block diagonalization is proposed to accelerate a Poisson solver by taking advantage of s spatial reflection symmetries, which simplifies the task of discretising complex geometries since it only requires meshing a portion of the domain that is then mirrored implicitly by the symmetry's hyperplanes.

1 citations


Journal ArticleDOI
TL;DR: In this article , it was shown that the low-rank plus sparse matrix set is closed provided the incoherence of the lowrank component is upper bounded as μ

1 citations


Proceedings ArticleDOI
27 Jan 2023
TL;DR: In this paper , a combination of three novel techniques for SpGEMM accelerators is proposed to efficiently adapt to various sparse patterns and achieve an average 1.44× speedup across a wide range of sparse matrices and neural network models.
Abstract: Sparse matrix-matrix multiplication (SpGEMM) is widely used in many scientific and deep learning applications. The highly irregular structures of SpGEMM limit its performance and efficiency on conventional computation platforms, and thus motivate a large body of specialized hardware designs. Existing SpGEMM accelerators only support specific types of rigid execution dataflow such as inner/output-product or row-based schemes. Each dataflow is only optimized for certain sparse patterns and fails to generalize with robust performance to the widely diverse SpGEMM workloads across various domains. We propose Spada, a combination of three novel techniques for SpGEMM accelerators to efficiently adapt to various sparse patterns. First, we describe a window-based adaptive dataflow that can be flexibly adapted to different modes to best match the data distributions and realize different reuse benefits. Then, our hardware architecture efficiently supports this dataflow template, with flexible, fast, and low-cost reconfigurability and effective load balancing features. Finally, we use a profiling-guided approach to detect the sparse pattern and determine the optimized dataflow mode to use, based on the key observations of sparse pattern similarity in nearby matrix regions. Our evaluation results demonstrate that Spada is able to match or exceed the best among three state-of-the-art SpGEMM accelerators, and avoid the performance degradation of the others if data distribution and dataflow mismatch. It achieves an average 1.44× speedup across a wide range of sparse matrices and compressed neural network models.

Proceedings ArticleDOI
09 Feb 2023
TL;DR: NeuKron as mentioned in this paper generalizes Kronecker products using a recurrent neural network with a constant number of parameters and reorders the rows and columns of the matrix to facilitate the approximation.
Abstract: Many real-world data are naturally represented as a sparse reorderable matrix, whose rows and columns can be arbitrarily ordered (e.g., the adjacency matrix of a bipartite graph). Storing a sparse matrix in conventional ways requires an amount of space linear in the number of non-zeros, and lossy compression of sparse matrices (e.g., Truncated SVD) typically requires an amount of space linear in the number of rows and columns. In this work, we propose NeuKron for compressing a sparse reorderable matrix into a constant-size space. NeuKron generalizes Kronecker products using a recurrent neural network with a constant number of parameters. NeuKron updates the parameters so that a given matrix is approximated by the product and reorders the rows and columns of the matrix to facilitate the approximation. The updates take time linear in the number of non-zeros in the input matrix, and the approximation of each entry can be retrieved in logarithmic time. We also extend NeuKron to compress sparse reorderable tensors (e.g. multi-layer graphs), which generalize matrices. Through experiments on ten real-world datasets, we show that NeuKron is (a) Compact: requiring up to five orders of magnitude less space than its best competitor with similar approximation errors, (b) Accurate: giving up to 10 × smaller approximation error than its best competitors with similar size outputs, and (c) Scalable: successfully compressing a matrix with over 230 million non-zero entries.

Proceedings ArticleDOI
25 Mar 2023
TL;DR: Dynamic sparsity-aware tiling (DRT) as mentioned in this paper is a tiling method that improves data reuse over prior art for sparse tensor kernels, unlocking significant performance improvement opportunities.
Abstract: Tensor algebra involving multiple sparse operands is severely memory bound, making it a challenging target for acceleration. Furthermore, irregular sparsity complicates traditional techniques—such as tiling—for ameliorating memory bottlenecks. Prior sparse tiling schemes are sparsity unaware: they carve tensors into uniform coordinate-space shapes, which leads to low-occupancy tiles and thus lower exploitable reuse. To address these challenges, this paper proposes dynamic reflexive tiling (DRT), a novel tiling method that improves data reuse over prior art for sparse tensor kernels, unlocking significant performance improvement opportunities. DRT’s key idea is dynamic sparsity-aware tiling. DRT continuously re-tiles sparse tensors at runtime based on the current sparsity of the active regions of all input tensors, to maximize accelerator buffer utilization while retaining the ability to co-iterate through tiles of distinct tensors. Through an extensive evaluation over a set of SuiteSparse matrices, we show how DRT can be applied to multiple prior accelerators with different dataflows (ExTensor, OuterSPACE, MatRaptor), improving their performance (by 3.3×, 5.1× and 1.6×, respectively) while adding negligible area overhead. We apply DRT to higher-order tensor kernels to reduce DRAM traffic by 3.9× and 16.9× over a CPU implementation and prior-art tiling scheme, respectively. Finally, we show that the technique is portable to software, with an improvement of 7.29× and 2.94× in memory overhead compared to untiled sparse-sparse matrix multiplication (SpMSpM).

Journal ArticleDOI
TL;DR: In this article , the authors provide a structured and comprehensive overview of the researches on sparse matrix-matrix multiplication (SpGEMM) and highlight future research directions, which encourage better design and implementations.
Abstract: General Sparse Matrix-Matrix Multiplication (SpGEMM) has attracted much attention from researchers in graph analyzing, scientific computing, and deep learning. Many optimization techniques have been developed for different applications and computing architectures over the past decades. The objective of this article is to provide a structured and comprehensive overview of the researches on SpGEMM. Existing researches have been grouped into different categories based on target architectures and design choices. Covered topics include typical applications, compression formats, general formulations, key problems and techniques, architecture-oriented optimizations, and programming models. The rationales of different algorithms are analyzed and summarized. This survey sufficiently reveals the latest progress of SpGEMM research to 2021. Moreover, a thorough performance comparison of existing implementations is presented. Based on our findings, we highlight future research directions, which encourage better design and implementations in later studies.

Journal ArticleDOI
TL;DR: In this article , a mixed atomic matrix norm was proposed to promote low-rank matrices with sparse factors, and the authors showed that convex programs with this norm as a regularizer provided near-optimal sample complexity and error rate guarantees for sparse phase retrieval and sparse PCA.
Abstract: We present novel analysis and algorithms for solving sparse phase retrieval and sparse principal component analysis (PCA) with convex lifted matrix formulations. The key innovation is a new mixed atomic matrix norm that, when used as regularization, promotes low-rank matrices with sparse factors. We show that convex programs with this atomic norm as a regularizer provide near-optimal sample complexity and error rate guarantees for sparse phase retrieval and sparse PCA. While we do not know how to solve the convex programs exactly with an efficient algorithm, for the phase retrieval case we carefully analyze the program and its dual and thereby derive a practical heuristic algorithm. We show empirically that this practical algorithm performs similarly to existing state-of-the-art algorithms.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed the concept of consensus sparsity (Con-sparsity) and correspondingly build a multi-context sparse image representation (MCSIR) framework to realize this.
Abstract: The sparsity is an attractive property that has been widely and intensively utilized in various image processing fields (e.g., robust image representation, image compression, image analysis, etc.). Its actual success owes to the exhaustive mining of the intrinsic (or homogenous) information from the whole data carrying redundant information. From the perspective of image representation, the sparsity can successfully find an underlying homogenous subspace from a collection of training data to represent a given test sample. The famous sparse representation (SR) and its variants embed the sparsity by representing the test sample using a linear combination of training samples with L0-norm regularization and L1-norm regularization. However, although these state-of-the-art methods achieve powerful and robust performances, the sparsity is not fully exploited on the image representation in the following three aspects: 1) the within-sample sparsity, 2) the between-sample sparsity, and 3) the image structural sparsity. In this paper, to make the above-mentioned multi-context sparsity properties agree and simultaneously learned in one model, we propose the concept of consensus sparsity (Con-sparsity) and correspondingly build a multi-context sparse image representation (MCSIR) framework to realize this. We theoretically prove that the consensus sparsity can be achieved by the L∞-induced matrix variate based on the Bayesian inference. Extensive experiments and comparisons with the state-of-the-art methods (including deep learning) are performed to demonstrate the promising performance and property of the proposed consensus sparsity.

Posted ContentDOI
07 Mar 2023
TL;DR: In this article , a nearly optimal explicitly sparse representation for oscillatory kernels is presented by developing a curvelet based method, where multilevel curvelet-like functions are constructed as the transform of the original nodal basis, and the system matrix in a new non-standard form is derived with respect to the curvelet basis.
Abstract: A nearly optimal explicitly-sparse representation for oscillatory kernels is presented in this work by developing a curvelet based method. Multilevel curvelet-like functions are constructed as the transform of the original nodal basis. Then the system matrix in a new non-standard form is derived with respect to the curvelet basis, which would be nearly optimally sparse due to the directional low rank property of the oscillatory kernel. Its sparsity is further enhanced via a-posteriori compression. Finally its nearly optimial log-linear computational complexity with controllable accuracy is demonstrated with numerical results. This explicitly-sparse representation is expected to lay ground to future work related to fast direct solvers and effective preconditioners for high frequency problems. It may also be viewed as the generalization of wavelet based methods to high frequency cases, and used as a new wideband fast algorithm for wave problems.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a stripe removal model based on a variable weight coefficient and group sparse regularization, which can completely eliminate complex stripes with varying levels of strength.
Abstract: Striping effects are common phenomena in remote sensing images, and they significantly limit subsequent applications. Although many destriping approaches have been developed, there aren’t many that can completely eliminate complex stripes with varying levels of strength. To address this issue, we propose a stripe removal model based on a variable weight coefficient and group sparse regularization. Specifically, rather than a single scalar for the stripes in most approaches, different weights are set for different stripe rows to estimate the stripes with varying intensities. An adaptive method to estimate the weight matrix is proposed. On the other hand, group sparsity regularization is employed to constrain the entire stripe. In addition, region weights are designed for regions with different stripe characteristics. The alternating direction multiplier method is employed to solve the proposed model by alternating minimization. Experimental results based on simulation and real data demonstrate that the proposed model outperforms other advanced methods in terms of stripe noise removal and image detail preservation.

Proceedings ArticleDOI
17 Jun 2023
TL;DR: SPADE as mentioned in this paper is a new SPMM and SDDMM hardware accelerator that avoids data transfers by tightlycoupling accelerator processing elements (PEs) with the cores of a multicore, as if the accelerator PEs were advanced functional units.
Abstract: The widespread use of Sparse Matrix Dense Matrix Multiplication (SpMM) and Sampled Dense Matrix Dense Matrix Multiplication (SDDMM) kernels makes them candidates for hardware acceleration. However, accelerator design for these kernels faces two main challenges: (1) the overhead of moving data between CPU and accelerator (often including an address space conversion from the CPU's virtual addresses) and (2) marginal flexibility to leverage the fact that different sparse input matrices benefit from different variations of the SpMM and SDDMM algorithms. To address these challenges, this paper proposes SPADE, a new SpMM and SDDMM hardware accelerator. SPADE avoids data transfers by tightly-coupling accelerator processing elements (PEs) with the cores of a multicore, as if the accelerator PEs were advanced functional units---allowing the accelerator to reuse the CPU memory system and its virtual addresses. SPADE attains flexibility and programmability by supporting a tile-based ISA---high level enough to eliminate the overhead of fetching and decoding fine-grained instructions. To prove the SPADE concept, we have taped-out a simplified SPADE chip. Further, simulations of a SPADE system with 224--1792 PEs show its high performance and scalability. A 224-PE SPADE system is on average 2.3x, 1.3x and 2.5x faster than a 56-core CPU, a server-class GPU, and an SpMM accelerator, respectively, without accounting for the host-accelerator data transfer overhead. If such overhead is taken into account, the 224-PE SPADE system is on average 43.4x and 52.4x faster than the GPU and the accelerator, respectively. Further, SPADE has a small area and power footprint.

Journal ArticleDOI
TL;DR: In this article , a dynamic sparsity-aware mapping scheme generating method that models the problem with a sequential decision-making model, and optimizes it by reinforcement learning (RL) algorithm (REINFORCE).
Abstract: The sparse representation of graphs has shown great potential for accelerating the computation of graph applications (e.g., social networks and knowledge graphs) on traditional computing architectures (CPU, GPU, or TPU). But, the exploration of large-scale sparse graph computing on processing-in-memory (PIM) platforms (typically with memristive crossbars) is still in its infancy. To implement the computation or storage of large-scale or batch graphs on memristive crossbars, a natural assumption is that a large-scale crossbar is demanded, but with low utilization. Some recent works question this assumption; to avoid the waste of storage and computational resource, the fixed-size or progressively scheduled “block partition” schemes are proposed. However, these methods are coarse-grained or static and are not effectively sparsity-aware. This work proposes the dynamic sparsity-aware mapping scheme generating method that models the problem with a sequential decision-making model, and optimizes it by reinforcement learning (RL) algorithm (REINFORCE). Our generating model long short-term memory (LSTM), combined with the dynamic-fill scheme generates remarkable mapping performance on the small-scale graph/matrix data (complete mapping costs 43% area of the original matrix) and two large-scale matrix data (costing 22.5% area on qh882 and 17.1% area on qh1484). Our method may be extended to sparse graph computing on other PIM architectures, not limited to the memristive device-based platforms.

Journal ArticleDOI
TL;DR: CSR-k as mentioned in this paper is a heterogeneous format based on sparse matrix-vector multiplication (CSR) that can be tuned quickly and outperforms the average performance of Intel MKL on Intel Xeon Platinum 838 and AMD Epyc 7742 CPUs while still outperforming NVIDIA's cuSPARSE and Sandia National Laboratories' KokkosKernels on NVIDIA A100 and V100 for regular sparse matrices.

Book ChapterDOI
01 Jan 2023
TL;DR: In this article , the authors introduce the parallel sparse matrix-vector computation pattern and introduce a kernel based on compressed sparse row data storage for sparse matrices, which can exhibit very different performance behavior on different datasets.
Abstract: This chapter introduces the parallel sparse matrix-vector computation pattern. It starts with the basic concepts of sparse matrices, which are matrices in which most of the elements are zeros. It introduces the coordinate list format as a flexible representation that does not store zero matrix elements. It then introduces a kernel based on compressed sparse row data storage for sparse matrices. The ELL format with data padding is then introduced as a regularization technique for improved memory coalescing. Finally, the jagged diagonal storage format based on sorting is introduced to smooth out variation from one row to the next row, allowing further reduction of control divergence and padding overhead in the regularization process. The chapter shows that the same sparse matrix kernel code can exhibit very different performance behavior on different datasets.

Proceedings ArticleDOI
01 Feb 2023
TL;DR: GROW as discussed by the authors is a row-wise product based sparse-dense GEMM accelerator for GCNs, which reduces the average memory traffic by 2× and achieves an average 2.8× and 2.3× improvement in performance and energy-efficiency, respectively.
Abstract: Graph convolutional neural networks (GCNs) have emerged as a key technology in various application domains where the input data is relational. A unique property of GCNs is that its two primary execution stages, aggregation and combination, exhibit drastically different dataflows. Consequently, prior GCN accelerators tackle this research space by casting the aggregation and combination stages as a series of sparse-dense matrix multiplication. However, prior work frequently suffers from inefficient data movements, leaving significant performance left on the table. We present GROW, a GCN accelerator based on Gustavson’s algorithm to architect a row-wise product based sparse-dense GEMM accelerator. GROW co-designs the software/ hardware that strikes a balance in locality and parallelism for GCNs, reducing the average memory traffic by 2×, and achieving an average 2.8× and 2.3× improvement in performance and energy-efficiency, respectively.

Posted ContentDOI
09 May 2023
TL;DR: In this article , the authors enhance an existing memory-streaming RISC-V ISA extension to accelerate both one and two-sided operand sparsity on widespread sparse tensor formats like compressed sparse row (CSR) and compressed sparse fiber (CSF) by accelerating the underlying operations of streaming indirection, intersection and union.
Abstract: Sparse linear algebra is crucial in many application domains, but challenging to handle efficiently in both software and hardware, with one- and two-sided operand sparsity handled with distinct approaches. In this work, we enhance an existing memory-streaming RISC-V ISA extension to accelerate both one- and two-sided operand sparsity on widespread sparse tensor formats like compressed sparse row (CSR) and compressed sparse fiber (CSF) by accelerating the underlying operations of streaming indirection, intersection, and union. Our extensions enable single-core speedups over an optimized RISC-V baseline of up to 7.0x, 7.7x, and 9.8x on sparse-dense multiply, sparse-sparse multiply, and sparse-sparse addition, respectively, and peak FPU utilizations of up to 80% on sparse-dense problems. On an eight-core cluster, sparse-dense and sparse-sparse matrix-vector multiply using real-world matrices are up to 5.0x and 5.9x faster and up to 2.9x and 3.0x more energy efficient. We explore further applications for our extensions, such as stencil codes and graph pattern matching. Compared to recent CPU, GPU, and accelerator approaches, our extensions enable higher flexibility on data representation, degree of sparsity, and dataflow at a minimal hardware footprint, adding only 1.8% in area to a compute cluster. A cluster with our extensions running CSR matrix-vector multiplication achieves 69x and 2.8x higher peak floating-point utilizations than recent CPU and GPU software, respectively.

Proceedings ArticleDOI

[...]

25 Feb 2023
TL;DR: In this paper , a performance prediction model is needed to predict the best sparse matrix-vector multiplication (SVMV) method for a given sparse matrix, which is challenging due to the diversity of factors that impact it.
Abstract: Sparse Matrix-Vector Multiplication (SpMV) is an essential sparse kernel. Numerous methods have been developed to accelerate SpMV. However, no single method consistently gives the highest performance across a wide range of matrices. For this reason, a performance prediction model is needed to predict the best SpMV method for a given sparse matrix. Unfortunately, predicting SpMV's performance is challenging due to the diversity of factors that impact it.

Proceedings ArticleDOI
27 Jan 2023
TL;DR: In this paper , a modified version of run length encoding which scales aptly with matrix size and sparsity density of the binary sparse matrix has been proposed, which performs better than many state-of-the-art algorithms.
Abstract: Binary Sparse Matrix storage is one of the critical problems in embedded system applications. Storing these matrices in the memory efficiently is important. Magnitude of increase in matrix size also has significant impact on the memory requirement. Sometimes, it might not be possible to accommodate the data due to memory constraints. In this work, we have analyzed some of state-of-the-art methods deployed for storing these matrices in the system on-chip memory and we have demonstrated the shortcomings of each. Thus, we propose a modified version of run length encoding which scales aptly with matrix size and sparsity density of the binary sparse matrix. Through simulations we have shown that the proposed method performs better than many state-of-the-art algorithms.

Journal ArticleDOI
TL;DR: In this paper , an adaptive real-time sensing method for microseismic monitoring from the perspective of systematic design was proposed in which the authors first analyzed noise and signal structure characteristics to construct an over-complete learning dictionary.
Abstract: The large amount of monitoring data has posed enormous challenges to the quick response and accurate analysis of microseismic events. Compressed sensing (CS) has the advantages of low resource cost, high efficiency, and excellent data compression ratio (CR), over conventional sensing methods. However, there are still issues to be addressed for its applications: 1) the poor quality and complex signal structure significantly increased the difficulty of keeping satisfactory efficiency; 2) the systematic design of the sparse dictionary, and the measurement matrix for microseismic signal CS are still poor; and 3) the conventional recovery algorithms also require prior knowledge of signal sparsity, which is hardly possible to know or estimate in practice. Therefore, an adaptive real-time sensing method for microseismic monitoring from the perspective of systematic design was proposed in this work. We first analyzed noise and signal structure characteristics to construct an over-complete learning dictionary. Second, according to the learned dictionary, we analyzed the key performance factors of random projection through comparison between different matrices. Third, we explored the relationship between the signal sparsity and the residual energy decay during data recovery with the greedy pursuit algorithms and then presented an energy-ratio-based sparsity adaptive matching algorithm. Finally, we carried out the performance evaluation of the proposed real-time sensing method through synthetic signals and field monitoring data.