SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication
read more
Citations
Tensor Decomposition for Signal Processing and Machine Learning
Tensors for Data Mining and Data Fusion: Models, Applications, and Scalable Algorithms
The tensor algebra compiler
A Flexible and Efficient Algorithmic Framework for Constrained Matrix and Tensor Factorization
Parallel Tensor Compression for Large-Scale Scientific Data
References
MapReduce: simplified data processing on large clusters
MapReduce: simplified data processing on large clusters
Tensor Decompositions and Applications
Analysis of individual differences in multidimensional scaling via an n-way generalization of 'eckart-young' decomposition
Toward an architecture for never-ending language learning
Related Papers (5)
Efficient MATLAB Computations with Sparse and Factored Tensors
Frequently Asked Questions (16)
Q2. What have the authors stated for future works in "Splatt: efficient and parallel sparse tensor-matrix multiplication" ?
Their future work includes adapting SPLATT to this model investigating memory-scalable algorithms for tensor factorization in the context of distributed systems.
Q3. What is the purpose of the parallel version of SPLATT?
The parallel version of SPLATT uses a task decomposition on the rows of M. Since the computation of M(i, :) requires only the nonzeros in slice X (i, :, :), the mode-1 slices of X can be distributed among processes.
Q4. What is the drawback of a mode-dependent reordering?
Relabeled mode-2 indices affect spatial locality and allow a fiber and its neighbors to access consecutive rows of B.A clear drawback of a mode-dependent reordering is the need to construct and partition a hypergraph for each mode.
Q5. What is the objective of a mode-independent reordering?
The objective of a mode-independent reordering is to find a single tensor permutation that results in improved execution time regardless of which mode MTTKRP is being performed on.
Q6. What is the advantage of a shared address space?
Since the authors operate in a shared address space the authors are able to evenly distribute the The author′ rows of scratch space among threads and do a reduction with only a synchronization at the end.
Q7. What is the way to speed up the execution of SPLATT?
Dense regions are attractive because they offer increased cache performance while accessing B and C. Consider the execution of SPLATT along the first mode.
Q8. What is the corresponding indices for each fiber?
Analogous to rows of a CSR matrix, (P+1) integers are required to store the start indices for the fibers and one integer is used for each fid.
Q9. What is the common method of computing the least squares problem?
The least squares problem is minimized by = X(1)(C B)(CᵀC ∗ BᵀB)†,where M† is the pseudo-inverse of M. (CᵀC ∗ BᵀB) is an F×F matrix, so computing its pseudo-inverse is a minor computation relative to X(1)(C B).
Q10. What is the way to store fibers?
By storing fibers along the longer mode, the authors are able to minimize the number of stored fibers and increase the average fiber length.
Q11. What is the tensor of the brainq dataset?
BrainQ is an interesting dataset because its dimensions are relatively small, resulting in a tensor several orders of magnitudes more dense than the other tensors studied in this work.
Q12. What is the simplest way to divide the mode-1 indices into tubes?
The authors divide the indices into sets of size K ′ and arrange the X (i, :, k) fibers into tubes, each with a maximum of The author′ mode-1 indices and K ′ mode-3 indices.
Q13. what is the simplest way to factor out the X(1) columns?
K∑ k=0 J∑ j=0 X (i, j, k)(B(j, :) ∗ C(k, :)) (3)M(i, :) = K∑k=0C(k, :) ∗ J∑j=0X (i, j, k)B(j, :) (4)First, the authors rewrite (1) to operate on a row of M. Next, the authors break the columns of X(1) into J and K components to arrive at (3).
Q14. What is the simplest way to compute a tensor?
Their algorithm computes entire rows of M at a time and as a result only requires a single traversal of the sparse tensor structure.
Q15. What is the problem with the reordering of the mode?
The extremely large dimensions that sparse tensors often exhibit are prohibitive to memory performance, even with a good reordering.
Q16. How many partitions are used to model memory accesses?
The number of partitions in which a hyperedge is found (or, its connectivity) exactly models the number of times that its corresponding row in M, B, or C must be fetched from memory.