scispace - formally typeset
N

Nacho Navarro

Researcher at Polytechnic University of Catalonia

Publications -  95
Citations -  2320

Nacho Navarro is an academic researcher from Polytechnic University of Catalonia. The author has contributed to research in topics: Shared memory & Thread (computing). The author has an hindex of 26, co-authored 93 publications receiving 2209 citations. Previous affiliations of Nacho Navarro include Barcelona Supercomputing Center & University of Illinois at Urbana–Champaign.

Papers
More filters
Journal ArticleDOI

Enabling preemptive multiprogramming on GPUs

TL;DR: This paper argues for preemptive multitasking and design two preemption mechanisms that can be used to implement GPU scheduling policies and extends the NVIDIA GK110 (Kepler) like GPU architecture to allow concurrent execution of GPU kernels from different user processes and implements a scheduling policy that dynamically distributes the GPU cores among concurrently running kernels, according to their priorities.
Journal ArticleDOI

An asymmetric distributed shared memory model for heterogeneous parallel systems

TL;DR: A new programming model for heterogeneous computing, called Asymmetric Distributed Shared Memory (ADSM), that maintains a shared logical memory space for CPUs to access objects in the accelerator physical memory but not vice versa, is presented.
Proceedings ArticleDOI

Decomposable and responsive power models for multicore processors using performance counters

TL;DR: This paper presents a methodology to produce decomposable PMC-based power models on current multicore architectures and demonstrates that the proposed methodology produces more accurate and responsive power models.
Book ChapterDOI

Predictive Runtime Code Scheduling for Heterogeneous Architectures

TL;DR: It is demonstrated that a novel predictive user-level scheduler based on past performance history for heterogeneous systems allows multiple applications to fully utilize all available processing resources in CPU/GPU-like systems and consistently achieve speedups ranging from 30% to 40% compared to just using the GPU in a single application mode.
Proceedings ArticleDOI

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory

TL;DR: This paper characterize the impact of TLB shoot downs on multiprocessor performance and scalability, and presents the design of a scalable TLB coherency mechanism that couples a shared TLB directory with load/store queue support for lightweight TLB invalidation, and thereby eliminates the need for costly IPIs.