D
Daniel Cederman
Researcher at Chalmers University of Technology
Publications - 24
Citations - 674
Daniel Cederman is an academic researcher from Chalmers University of Technology. The author has contributed to research in topics: Data structure & Graphics. The author has an hindex of 14, co-authored 24 publications receiving 643 citations.
Papers
More filters
Journal ArticleDOI
GPU-Quicksort: A practical Quicksort algorithm for graphics processors
Daniel Cederman,Philippas Tsigas +1 more
TL;DR: GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multicore graphics processors, is described and shown that in CUDA, NVIDIA's programing platform for general-purpose computations on graphical processors, it performs better than the fastest-known sorting implementations for graphics processors.
Proceedings ArticleDOI
On dynamic load balancing on graphics processors
Daniel Cederman,Philippas Tsigas +1 more
TL;DR: Four different dynamic load balancing methods are compared to see which one is most suited to the highly parallel world of graphics processors and it is shown that lock-free methods achieves better performance than blocking and that they can be made to scale with increased numbers of processing units.
Book ChapterDOI
A Practical Quicksort Algorithm for Graphics Processors
Daniel Cederman,Philippas Tsigas +1 more
TL;DR: GPU-Quicksort is presented, an efficient Quicksort algorithm suitable for highly parallel multi-core graphics processors that often performs better than the fastest known sorting implementations for graphics processors, such as radix and bitonic sort.
Proceedings ArticleDOI
Towards a software transactional memory for graphics processors
TL;DR: Two STMs for graphics processors are designed and implemented, one blocking and one non-blocking, and experimental results comparing the performance of the two STMs are described and explained.
Book ChapterDOI
Dynamic Load Balancing Using Work-Stealing
Daniel Cederman,Philippas Tsigas +1 more
TL;DR: Work-stealing allows an idle core to acquire tasks from a core that is overloaded, causing the total work to be distributed evenly among cores, while minimizing the communication costs, as tasks are only redistributed when required.