scispace - formally typeset
H

Hideki Saito

Researcher at Intel

Publications -  41
Citations -  851

Hideki Saito is an academic researcher from Intel. The author has contributed to research in topics: Compiler & SIMD. The author has an hindex of 18, co-authored 40 publications receiving 814 citations. Previous affiliations of Hideki Saito include University of Illinois at Urbana–Champaign.

Papers
More filters
Journal ArticleDOI

Can traditional programming bridge the Ninja performance gap for parallel computing applications

TL;DR: It is demonstrated that the otherwise uncontrolled growth of the Ninja gap can be contained and offer a more stable and predictable performance growth over future architectures, offering strong evidence that radical language changes are not required.
Journal ArticleDOI

Can traditional programming bridge the ninja performance gap for parallel computing applications

TL;DR: It is demonstrated that one can contain the otherwise uncontrolled growth of the Ninja gap and offer a more stable and predictable performance growth over future architectures, offering strong evidence that radical language changes are not required.
Proceedings ArticleDOI

SPEC MPI2007—an application benchmark suite for parallel systems using MPI

TL;DR: The benchmark suite SPEC MPI2007, which includes 13 technical computing applications from the fields of computational fluid dynamics, molecular dynamics, electromagnetism, geophysics, ray tracing, and hydrodynamics, is described and compared with other benchmark suites.
Proceedings ArticleDOI

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors

TL;DR: Several practical SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel® MIC specific alignment optimization, and small matrix transpose/multiplication 2-Dvectorization implemented in the Intel® C/C++ and Fortran production compilers for Intel® Xeon Phi coprocessors are presented.

On the Performance Potential of Different Types of Speculative Thread-Level Parallelism

TL;DR: This study shows that, at the loop-level, the upper bound on the arithmetic mean and geometric mean speedup achievable via TLS across SPEC CPU2000 is 39.16% (standard deviation = 31.23) and 18.18% respectively.