scispace - formally typeset
Open AccessProceedings ArticleDOI

Optimizing sparse tensor times matrix on multi-core and many-core architectures

Reads0
Chats0
TLDR
The optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms is presented, which is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition.
Abstract
This paper presents the optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms. This primitive is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition. We first design and implement sequential SpTTM to avoid explicit data transformations between a tensor and a matrix, which is the conventional approach. We further optimize SpTTM on multicore CPU and GPU systems by parallelizing, avoiding locks, and exploiting data locality. Our sequential SpTTM is up to 3.5× faster than the SpTTM from Tensor Toolbox and 1.5× over that from Cyclops Tensor Framework. Our parallel algorithms show 4.1× speedup on multicore Intel Core i7 and 18.8× speedup on NVIDIA K40c GPU over our sequential SpTTM respectively.

read more

Citations
More filters
Posted Content

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats

TL;DR: This work proposes hardware extensions to accelerators for supporting numerous format combinations seamlessly and demonstrates $\sim 4 \times$ speedup over performing format conversions in software.
Proceedings ArticleDOI

Cutensor-tubal: Optimized GPU Library for Low-tubal-rank Tensors

TL;DR: A BLAS-like library for the low- Tubal-rank tensor model called cuTensor-tubal is developed and optimize, which includes efficient GPU primitives for tensor operations and key processes.
Proceedings ArticleDOI

Space-Efficient k-d Tree-Based Storage Format for Sparse Tensors

TL;DR: This paper presents a new storage format for sparse tensors, which it is called the succinct k-d tree-based tensor (SKTB) format, and presents a parallel space-efficient algorithm for converting tensors to the SKTB format.
Journal ArticleDOI

Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems

TL;DR: In this article, target-independent optimizations are proposed to optimize the SpMV operations of Linear Fascicle Evaluation (LiFE) decomposed using the STD technique, followed by target-dependent optimizations for CPU and GPU systems.
Journal ArticleDOI

swTensor : accelerating tensor decomposition on Sunway architecture

TL;DR: Wang et al. as discussed by the authors proposed swTensor that adapts the Canonical Polyadic decomposition to Sunway processor by leveraging the MapReduce framework for automatic parallelization and the unique architecture of Sunway for high performance.
References
More filters
Journal ArticleDOI

Tensor Decompositions and Applications

TL;DR: This survey provides an overview of higher-order tensor decompositions, their applications, and available software.
Proceedings Article

Toward an architecture for never-ending language learning

TL;DR: This work proposes an approach and a set of design principles for an intelligent computer agent that runs forever and describes a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs.
Journal ArticleDOI

Predicting Human Brain Activity Associated with the Meanings of Nouns

TL;DR: A computational model is presented that predicts the functional magnetic resonance imaging (fMRI) neural activation associated with words for which fMRI data are not yet available, trained with a combination of data from a trillion-word text corpus and observed f MRI data associated with viewing several dozen concrete nouns.
Posted Content

Tensor decompositions for learning latent variable models

TL;DR: A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models.
Journal ArticleDOI

Tensor decompositions for learning latent variable models

TL;DR: In this article, the authors consider a wide class of latent variable models, including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation, which exploit a certain tensor structure in their low-order observable moments (typically, of second and third-order).