scispace - formally typeset
Search or ask a question

Showing papers by "Uzi Vishkin published in 2016"


Proceedings ArticleDOI
11 Sep 2016
TL;DR: ICE is a new parallel programming language that is easy-to-program, since it is a synchronous, lock-step language so there is no need for programmer-specified synchronization, and the PRAM algorithmic theory offers unique wealth of parallel algorithms and techniques.
Abstract: Large performance growth for processors requires exploitation of hardware parallelism, which, itself, requires parallelism in software. In spite of massive efforts, automatic parallelization of serial programs has had limited success mostly for regular programs with affine accesses, but not for many applications including irregular ones. It appears that the bare minimum that the programmer needs to spell out is which operations can be executed in parallel. However, parallel programming today requires so much more. The programmer is expected to partition a task into subtasks (often threads) so as to meet multiple constraints and objectives, involving data and computation partitioning, locality, synchronization, race conditions, limiting and hiding communication latencies. It is no wonder that this makes parallel programming hard, drastically reducing programmer's productivity and performance gains hence reducing adoption by programmers and their employers. Suppose, however, that the effort of the programmer is reduced to merely stating operations that can be executed in parallel, the ‘work-depth’ bare minimum abstraction developed for PRAM (the lead theory of parallel algorithms). What performance penalty should this incur? Perhaps surprisingly, the upshot of our work is that this can be done with no performance penalty relative to hand-optimized multi-threaded code.

5 citations


Proceedings ArticleDOI
23 May 2016
TL;DR: Using FFT as an example, this work examines the impact that adoption of some enabling technologies, including silicon photonics, would have on the performance of a many-core architecture and shows that a single-chip many- core processor could potentially outperform a large high-performance computing cluster.
Abstract: FFT has been a classic computation engine for numerous applications. The bandwidth-intensive nature of FFT capped its performance on off-the-shelf parallel machines that are bandwidth-limited, and forced application researchers into seeking easier-to-speedup alternatives to FFT, even when inferior to FFT. But, what if effective support of FFT is feasible? Using FFT as an example, we examine the impact that adoption of some enabling technologies, including silicon photonics, would have on the performance of a many-core architecture. The results show that a single-chip many-core processor could potentially outperform a large high-performance computing cluster.