Journal ArticleDOI
An efficient algorithm for out-of-core matrix transposition
Jinwoo Suh,Viktor K. Prasanna +1 more
TLDR
This paper proposes an algorithm that considers the index computation time and the I/O time and reduces the overall execution time and results in an overall reduction in the execution time due to the elimination of the expensive index computation.Abstract:
Efficient transposition of out-of-core matrices has been widely studied. These efforts have focused on reducing the number of I/O operations. However, in state-of-the-art architectures, the memory-memory data transfer time and the index computation time are also significant components of the overall time. In this paper, we propose an algorithm that considers the index computation time and the I/O time and reduces the overall execution time. Our algorithm reduces the total execution time by reducing the number of I/O operations and eliminating the index computation. In doing so, two techniques are employed: writing the data on to disk in pre-defined patterns and balancing the number of disk read and write operations. The index computation time, which is an expensive operation involving two divisions and a multiplication, is eliminated by partitioning the memory into read and write buffers. The expensive in-processor permutation is replaced by data collection from the read buffer to the write buffer. Even though this partitioning may increase the number of I/O operations for some cases, it results in an overall reduction in the execution time due to the elimination of the expensive index computation. Our algorithm is analyzed using the well-known linear model and the parallel disk model. The experimental results on a Sun Enterprise, an SGI R12000 and a Pentium III show that our algorithm reduces the overall execution time by up to 50% compared with the best known algorithms in the literature.read more
Citations
More filters
Journal ArticleDOI
vLOD: high-fidelity walkthrough of large virtual environments
Jatin Chhugani,Budirijanto Purnomo,Shankar Krishnan,Jonathan D. Cohen,Suresh Venkatasubramanian,David S. Johnson,Subodh Kumar +6 more
TL;DR: A novel feature of this walkthrough system is that it performs work proportional only to the required detail in visible geometry at the rendering time, and uses a precomputation phase that efficiently generates per cell vLOD: the geometry visible from a view-region at the right level of detail.
Book ChapterDOI
Generating SIMD vectorized permutations
Franz Franchetti,Markus Püschel +1 more
TL;DR: A method to generate efficient vectorized implementations of small stride permutations using only vector load and vector shuffle instructions for highperformance numerical kernels including the fast Fourier transform is introduced.
Journal ArticleDOI
Efficient parallel out-of-core matrix transposition
TL;DR: An algorithm that directly targets the improvement of overall transposition time is proposed and the I/O characteristics of the system are used to determine the read, write and communication block sizes such that the total execution time is minimised.
Journal ArticleDOI
Enhancing the matrix transpose operation using intel avx instruction set extension
TL;DR: This paper presents a novel vector-based matrix transpose algorithm and its optimized implementation using AVX instructions, and demonstrates a 2.83 speedup over the standard sequential implementation, and a maximum of 1.53 speed up over the GCC library implementation.
References
More filters
Book
Introduction to parallel computing: design and analysis of algorithms
TL;DR: Performance and Scalability of Parallel Systems, General Issues in Mapping Systolic Systems Onto Parallel Computers, and Speedup Anomalies in Parallel Search Algorithms.
Journal ArticleDOI
RAID: high-performance, reliable secondary storage
TL;DR: A comprehensive overview of disk array technology and implementation topics such as refining the basic RAID levels to improve performance and designing algorithms to maintain data consistency are discussed.
Journal ArticleDOI
The input/output complexity of sorting and related problems
Alok Aggarwal,S. Vitter Jeffrey +1 more
TL;DR: Tight upper and lower bounds are provided for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition.
Journal ArticleDOI
Computability of Recursive Functions
TL;DR: One half of this equivalence, that all functions computable by any finite, discrete, deterministic device supplied with unlimited storage are partial recursive, is relatively straightforward 3 once the elements of recursive function theory have been established.
Journal ArticleDOI
Algorithms for parallel memory, I: Two-level memories
TL;DR: In this article, the authors provided the first optimal algorithms in terms of the number of input/outputs (I/Os) required between internal memory and multiple secondary storage devices for sorting, FFT, matrix transposition, standard matrix multiplication, and related problems.