Hideki Saito

Researcher at Intel

Publications - 41

Citations - 851

Hideki Saito is an academic researcher from Intel. The author has contributed to research in topics: Compiler & SIMD. The author has an hindex of 18, co-authored 40 publications receiving 814 citations. Previous affiliations of Hideki Saito include University of Illinois at Urbana–Champaign.

Papers

PDF

Open Access

More filters

Journal ArticleDOI

Can traditional programming bridge the Ninja performance gap for parallel computing applications

Nadathur Satish, +7 more

TL;DR: It is demonstrated that the otherwise uncontrolled growth of the Ninja gap can be contained and offer a more stable and predictable performance growth over future architectures, offering strong evidence that radical language changes are not required.

...read moreread less

Journal ArticleDOI

Can traditional programming bridge the ninja performance gap for parallel computing applications

Nadathur Satish, +7 more

- 23 Apr 2015 -

Communications of The ACM

TL;DR: It is demonstrated that one can contain the otherwise uncontrolled growth of the Ninja gap and offer a more stable and predictable performance growth over future architectures, offering strong evidence that radical language changes are not required.

...read moreread less

Proceedings ArticleDOI

SPEC MPI2007—an application benchmark suite for parallel systems using MPI

Matthias S. Müller, +11 more

TL;DR: The benchmark suite SPEC MPI2007, which includes 13 technical computing applications from the fields of computational fluid dynamics, molecular dynamics, electromagnetism, geophysics, ray tracing, and hydrodynamics, is described and compared with other benchmark suites.

...read moreread less

Proceedings ArticleDOI

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors

Xinmin Tian, +7 more

TL;DR: Several practical SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel® MIC specific alignment optimization, and small matrix transpose/multiplication 2-Dvectorization implemented in the Intel® C/C++ and Fortran production compilers for Intel® Xeon Phi coprocessors are presented.

...read moreread less

On the Performance Potential of Different Types of Speculative Thread-Level Parallelism

Arun Kejariwal, +9 more

TL;DR: This study shows that, at the loop-level, the upper bound on the arithmetic mean and geometric mean speedup achievable via TLS across SPEC CPU2000 is 39.16% (standard deviation = 31.23) and 18.18% respectively.

...read moreread less

Collapse