E
Eiman Ebrahimi
Researcher at Nvidia
Publications - 35
Citations - 2387
Eiman Ebrahimi is an academic researcher from Nvidia. The author has contributed to research in topics: Shared memory & Cache. The author has an hindex of 19, co-authored 35 publications receiving 1953 citations. Previous affiliations of Eiman Ebrahimi include University of Tehran & University of Texas at Austin.
Papers
More filters
Journal ArticleDOI
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems
TL;DR: The technique, Fairness via Source Throttling (FST), estimates unfairness in the entire memory system, and enforces thread priorities/weights, and enables system-software to enforce different fairness objectives in the memory system.
Journal ArticleDOI
EMOGI: efficient memory-access for out-of-memory graph-traversal in GPUs
Seungwon Min,Vikram Sharma Mailthody,Zaid Qureshi,Jinjun Xiong,Eiman Ebrahimi,Wen-mei W. Hwu +5 more
TL;DR: This paper addresses the open question of whether a sufficiently large number of overlapping cacheline-sized accesses can be sustained to tolerate the long latency to host memory, fully utilize the available bandwidth, and achieve favorable execution performance and proposes EMOGI, an alternative approach to traverse graphs that do not fit in GPU memory using direct cacheline -sized access to data stored in host memory.
Proceedings ArticleDOI
Self-adaptive memetic algorithm: an adaptive conjugate gradient approach
TL;DR: This proposed method not only adds no extra computation load to the genetic algorithm but also eliminates computation burden of parameter adjustment of hill climbing operator.
Proceedings ArticleDOI
Energy Savings via Dead Sub-Block Prediction
Marco A. Z. Alves,Khubaib,Eiman Ebrahimi,Veynu Narasiman,Carlos Villavieja,Philippe O. A. Navaux,Yale N. Patt +6 more
TL;DR: The Dead Sub-Block Predictor (DSBP) is proposed to predict which sub-block of a cache line will be actually used and how many times it will be used in order to bring into the cache only those sub-blocks that are necessary, and power them off after they are touched the predicted number of times.
Journal ArticleDOI
DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems
TL;DR: The combination of these two mechanisms, DUCATI, is an address translation architecture that improves GPU performance by 81%; (up to 4.5×) while requiring minimal changes to the existing system design.