J
Jeremy Sugerman
Researcher at Stanford University
Publications - 16
Citations - 4654
Jeremy Sugerman is an academic researcher from Stanford University. The author has contributed to research in topics: Virtual machine & Full virtualization. The author has an hindex of 12, co-authored 16 publications receiving 4570 citations. Previous affiliations of Jeremy Sugerman include Nvidia & VMware.
Papers
More filters
Journal ArticleDOI
Brook for GPUs: stream computing on graphics hardware
Ian Buck,Tim Foley,Daniel Reiter Horn,Jeremy Sugerman,Kayvon Fatahalian,Mike Houston,Pat Hanrahan +6 more
TL;DR: This paper presents Brook for GPUs, a system for general-purpose computation on programmable graphics hardware that abstracts and virtualizes many aspects of graphics hardware, and presents an analysis of the effectiveness of the GPU as a compute engine compared to the CPU.
Proceedings Article
Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor
TL;DR: Results indicate that with optimizations, VMware Workstation’s hosted virtualization architecture can match native I/O throughput on standard PCs.
Journal ArticleDOI
Larrabee: a many-core x86 architecture for visual computing
Larry D. Seiler,Doug Carmean,Eric Sprangle,Tom Forsyth,Michael Abrash,Pradeep Dubey,Stephen Junkins,Adam T. Lake,Jeremy Sugerman,Robert Dale Cavin,Roger Espasa,Ed Grochowski,Toni Juan,Pat Hanrahan +13 more
TL;DR: This article consists of a collection of slides from the author's conference presentation, some of the topics discussed include: architecture convergence; Larrabee architecture; and graphics pipeline.
Journal ArticleDOI
Larrabee: A Many-Core x86 Architecture for Visual Computing
Larry D. Seiler,Douglas M. Carmean,Eric Sprangle,Tom Forsyth,Pradeep Dubey,Stephen Junkins,Adam T. Lake,Robert Dale Cavin,Roger Espasa,Edward T. Grochowski,Toni Juan,Michael Abrash,Jeremy Sugerman,Pat Hanrahan +13 more
TL;DR: The Larrabee many-core visual computing architecture uses multiple in-order x86 cores augmented by wide vector processor units, together with some fixed-function logic, which increases the architecture's programmability as compared to standard GPUs.
Proceedings ArticleDOI
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
TL;DR: An in-depth analysis of dense matrix-matrix multiplication, which reuses each element of input matrices O(n) times, finds even near-optimal GPU implementations are pronouncedly less efficient than current cache-aware CPU approaches.