Optimizing sparse tensor times matrix on multi-core and many-core architectures
Jiajia Li,Yuchen Ma,Chenggang Yan,Richard Vuduc +3 more
- pp 26-33
Reads0
Chats0
TLDR
The optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms is presented, which is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition.Abstract:
This paper presents the optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms. This primitive is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition. We first design and implement sequential SpTTM to avoid explicit data transformations between a tensor and a matrix, which is the conventional approach. We further optimize SpTTM on multicore CPU and GPU systems by parallelizing, avoiding locks, and exploiting data locality. Our sequential SpTTM is up to 3.5× faster than the SpTTM from Tensor Toolbox and 1.5× over that from Cyclops Tensor Framework. Our parallel algorithms show 4.1× speedup on multicore Intel Core i7 and 18.8× speedup on NVIDIA K40c GPU over our sequential SpTTM respectively.read more
Citations
More filters
Posted Content
Extending Sparse Tensor Accelerators to Support Multiple Compression Formats
Eric Qin,Geonhwa Jeong,William Won,Sheng-Chun Kao,Hyoukjun Kwon,Sudarshan Srinivasan,Dipankar Das,Gordon E. Moon,Sivasankaran Rajamanickam,Tushar Krishna +9 more
TL;DR: This work proposes hardware extensions to accelerators for supporting numerous format combinations seamlessly and demonstrates $\sim 4 \times$ speedup over performing format conversions in software.
Proceedings ArticleDOI
Cutensor-tubal: Optimized GPU Library for Low-tubal-rank Tensors
Tao Zhang,Xiao-Yang Liu +1 more
TL;DR: A BLAS-like library for the low- Tubal-rank tensor model called cuTensor-tubal is developed and optimize, which includes efficient GPU primitives for tensor operations and key processes.
Proceedings ArticleDOI
Space-Efficient k-d Tree-Based Storage Format for Sparse Tensors
TL;DR: This paper presents a new storage format for sparse tensors, which it is called the succinct k-d tree-based tensor (SKTB) format, and presents a parallel space-efficient algorithm for converting tensors to the SKTB format.
Journal ArticleDOI
Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems
Karan Aggarwal,Uday Bondhugula +1 more
TL;DR: In this article, target-independent optimizations are proposed to optimize the SpMV operations of Linear Fascicle Evaluation (LiFE) decomposed using the STD technique, followed by target-dependent optimizations for CPU and GPU systems.
Journal ArticleDOI
swTensor : accelerating tensor decomposition on Sunway architecture
TL;DR: Wang et al. as discussed by the authors proposed swTensor that adapts the Canonical Polyadic decomposition to Sunway processor by leveraging the MapReduce framework for automatic parallelization and the unique architecture of Sunway for high performance.
References
More filters
Journal ArticleDOI
Tensor Decompositions and Applications
Tamara G. Kolda,Brett W. Bader +1 more
TL;DR: This survey provides an overview of higher-order tensor decompositions, their applications, and available software.
Proceedings Article
Toward an architecture for never-ending language learning
Andrew Carlson,Justin Betteridge,Bryan Kisiel,Burr Settles,Estevam R. Hruschka,Tom M. Mitchell +5 more
TL;DR: This work proposes an approach and a set of design principles for an intelligent computer agent that runs forever and describes a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs.
Journal ArticleDOI
Predicting Human Brain Activity Associated with the Meanings of Nouns
Tom M. Mitchell,Svetlana V. Shinkareva,Andrew Carlson,Kai-Min Chang,Vicente L. Malave,Robert A. Mason,Marcel Adam Just +6 more
TL;DR: A computational model is presented that predicts the functional magnetic resonance imaging (fMRI) neural activation associated with words for which fMRI data are not yet available, trained with a combination of data from a trillion-word text corpus and observed f MRI data associated with viewing several dozen concrete nouns.
Posted Content
Tensor decompositions for learning latent variable models
TL;DR: A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models.
Journal ArticleDOI
Tensor decompositions for learning latent variable models
TL;DR: In this article, the authors consider a wide class of latent variable models, including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation, which exploit a certain tensor structure in their low-order observable moments (typically, of second and third-order).