Optimizing sparse tensor times matrix on multi-core and many-core architectures

doi:10.5555/3018843.3018848

Open AccessProceedings ArticleDOI

Optimizing sparse tensor times matrix on multi-core and many-core architectures

Jiajia Li, +3 more

- pp 26-33

Chats0

TLDR

The optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms is presented, which is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition.

Abstract:

This paper presents the optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms. This primitive is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition. We first design and implement sequential SpTTM to avoid explicit data transformations between a tensor and a matrix, which is the conventional approach. We further optimize SpTTM on multicore CPU and GPU systems by parallelizing, avoiding locks, and exploiting data locality. Our sequential SpTTM is up to 3.5× faster than the SpTTM from Tensor Toolbox and 1.5× over that from Cyclops Tensor Framework. Our parallel algorithms show 4.1× speedup on multicore Intel Core i7 and 18.8× speedup on NVIDIA K40c GPU over our sequential SpTTM respectively.

Citations

PDF

Open Access

More filters

Posted Content

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats

Eric Qin, +9 more

- 18 Mar 2021 -

arXiv: Distributed, Parallel, and Cluste...

TL;DR: This work proposes hardware extensions to accelerators for supporting numerous format combinations seamlessly and demonstrates $\sim 4 \times$ speedup over performing format conversions in software.

...read moreread less

Proceedings ArticleDOI

Cutensor-tubal: Optimized GPU Library for Low-tubal-rank Tensors

Tao Zhang, +1 more

TL;DR: A BLAS-like library for the low- Tubal-rank tensor model called cuTensor-tubal is developed and optimize, which includes efficient GPU primitives for tensor operations and key processes.

...read moreread less

Proceedings ArticleDOI

Space-Efficient k-d Tree-Based Storage Format for Sparse Tensors

Ivan Simecek, +3 more

TL;DR: This paper presents a new storage format for sparse tensors, which it is called the succinct k-d tree-based tensor (SKTB) format, and presents a parallel space-efficient algorithm for converting tensors to the SKTB format.

...read moreread less

Journal ArticleDOI

Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems

Karan Aggarwal, +1 more

TL;DR: In this article, target-independent optimizations are proposed to optimize the SpMV operations of Linear Fascicle Evaluation (LiFE) decomposed using the STD technique, followed by target-dependent optimizations for CPU and GPU systems.

...read moreread less

Journal ArticleDOI

swTensor : accelerating tensor decomposition on Sunway architecture

Xiaogang Zhong, +5 more

TL;DR: Wang et al. as discussed by the authors proposed swTensor that adapts the Canonical Polyadic decomposition to Sunway processor by leveraging the MapReduce framework for automatic parallelization and the unique architecture of Sunway for high performance.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Tensor Decompositions and Applications

Tamara G. Kolda, +1 more

- 01 Aug 2009 -

Siam Review

TL;DR: This survey provides an overview of higher-order tensor decompositions, their applications, and available software.

...read moreread less

Proceedings Article

Toward an architecture for never-ending language learning

Andrew Carlson, +5 more

TL;DR: This work proposes an approach and a set of design principles for an intelligent computer agent that runs forever and describes a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs.

...read moreread less

Journal ArticleDOI

Predicting Human Brain Activity Associated with the Meanings of Nouns

Tom M. Mitchell, +6 more

- 30 May 2008 -

Science

TL;DR: A computational model is presented that predicts the functional magnetic resonance imaging (fMRI) neural activation associated with words for which fMRI data are not yet available, trained with a combination of data from a trillion-word text corpus and observed f MRI data associated with viewing several dozen concrete nouns.

...read moreread less

Posted Content

Tensor decompositions for learning latent variable models

Animashree Anandkumar, +4 more

- 29 Oct 2012 -

arXiv: Learning

TL;DR: A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models.

...read moreread less

Journal ArticleDOI

Tensor decompositions for learning latent variable models

Animashree Anandkumar, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: In this article, the authors consider a wide class of latent variable models, including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation, which exploit a certain tensor structure in their low-order observable moments (typically, of second and third-order).

...read moreread less

Collapse

Optimizing sparse tensor times matrix on multi-core and many-core architectures

Citations

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats

Cutensor-tubal: Optimized GPU Library for Low-tubal-rank Tensors

Space-Efficient k-d Tree-Based Storage Format for Sparse Tensors

Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems

swTensor : accelerating tensor decomposition on Sunway architecture

References

Tensor Decompositions and Applications

Toward an architecture for never-ending language learning

Predicting Human Brain Activity Associated with the Meanings of Nouns

Tensor decompositions for learning latent variable models

Tensor decompositions for learning latent variable models

Related Papers (5)

Tensor Decompositions and Applications

SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication

Tensor-matrix products with a compressed sparse tensor

GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries

Scalable sparse tensor decompositions in distributed memory systems