Proceedings ArticleDOI
GPUShare: Fair-Sharing Middleware for GPU Clouds
Anshuman Goswami,Jeffrey Young,Karsten Schwan,Naila Farooqui,Ada Gavrilovska,Matthew Wolf,Greg Eisenhauer +6 more
- pp 1769-1776
Reads0
Chats0
TLDR
GPUShare is presented, a software-based mechanism that can yield a kernel before all of its threads have run, thus giving finer control over the time slice for which the GPU is allocated to a process and improves fair GPU sharing across tenants.Citations
More filters
Proceedings ArticleDOI
CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs
TL;DR: This paper first design high-performance GPU computation kernels that accelerate individual SGD updates by exploiting model parallelism, then design efficient schemes that parallelize SGD Updates by exploiting data parallelism and scales cuMF SGD to large data sets that cannot fit into one GPU's memory.
Proceedings ArticleDOI
Dynamic application reconfiguration on heterogeneous hardware
Juan Fumero,Michail Papadimitriou,Foivos S. Zakkak,Maria Xekalaki,James Clarkson,Christos Kotselidis +5 more
TL;DR: Through TornadoVM, a virtual machine capable of reconfiguring applications, at runtime, for hardware acceleration based on the currently available hardware resources, this paper introduces a new level of compilation in which applications can benefit from heterogeneous hardware.
Proceedings ArticleDOI
A View from ORNL: Scientific Data Research Opportunities in the Big Data Age
Scott Klasky,Scott Klasky,Scott Klasky,Matthew Wolf,Mark Ainsworth,Mark Ainsworth,Chuck Atkins,Jong Choi,Greg Eisenhauer,Berk Geveci,William F. Godoy,Mark Kim,James Kress,Tahsin Kurc,Tahsin Kurc,Qing Liu,Qing Liu,Jeremy Logan,Arthur B. Maccabe,Kshitij Mehta,George Ostrouchov,George Ostrouchov,Manish Parashar,Norbert Podhorszki,David Pugmire,David Pugmire,E. Suchyta,Lipeng Wan,Ruonan Wang +28 more
TL;DR: A forward-looking research and development plan which centers around the concept of making Input/Output (I/O) intelligent for users in the scientific community, whether they are accessing scalable storage or performing in situ workflow tasks.
Proceedings ArticleDOI
Wheel: Accelerating CNNs with Distributed GPUs via Hybrid Parallelism and Alternate Strategy
TL;DR: Wheel first partitions the layers of a CNN into two kinds of modules: convolutional module and fully-connected module, and deploys them following the proposed hybrid parallelism, which reduces the transmitted data and fully using GPUs simultaneously.
Proceedings ArticleDOI
GLoop: an event-driven runtime for consolidating GPGPU applications
TL;DR: GLoop is presented, which is a software runtime that enables us to consolidate GPGPU apps including GPU eaters including GLoop offers an event-driven programming model, which allows GLoop-based apps to inherit the GPU eater's high functionality while proportionally scheduling them on a shared GPU in an isolated manner.
References
More filters
Proceedings Article
The PageRank Citation Ranking : Bringing Order to the Web
TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
Posted Content
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Journal ArticleDOI
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment
C. L. Liu,James W. Layland +1 more
TL;DR: The problem of multiprogram scheduling on a single processor is studied from the viewpoint of the characteristics peculiar to the program functions that need guaranteed service and it is shown that an optimum fixed priority scheduler possesses an upper bound to processor utilization.
Book
Scheduling algorithms for multiprogramming in a hard real-time environment
C. L. Liu,James W. Layland +1 more
TL;DR: In this paper, the problem of multiprogram scheduling on a single processor is studied from the viewpoint of the characteristics peculiar to the program functions that need guaranteed service, and it is shown that an optimum fixed priority scheduler possesses an upper bound to processor utilization which may be as low as 70 percent for large task sets.