Proceedings ArticleDOI
Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs
Ehsan Totoni,Babak Behzad,Swapnil Ghike,Josep Torrellas +3 more
- pp 78-87
TLDR
The results show that the GPGPU has outstanding results in performance, power consumption and energy efficiency for many applications, but it requires significant programming effort and is not general enough to show the same level of efficiency for all the applications.Abstract:
Power dissipation and energy consumption are becoming increasingly important architectural design constraints in different types of computers, from embedded systems to large-scale supercomputers. To continue the scaling of performance, it is essential that we build parallel processor chips that make the best use of exponentially increasing numbers of transistors within the power and energy budgets. Intel SCC is an appealing option for future many-core architectures. In this paper, we use various scalable applications to quantitatively compare and analyze the performance, power consumption and energy efficiency of different cutting-edge platforms that differ in architectural build. These platforms include the Intel Single-Chip Cloud Computer (SCC) many-core, the Intel Core i7 general-purpose multi-core, the Intel Atom low-power processor, and the Nvidia ION2 GPGPU. Our results show that the GPGPU has outstanding results in performance, power consumption and energy efficiency for many applications, but it requires significant programming effort and is not general enough to show the same level of efficiency for all the applications. The “light-weight” many-core presents an opportunity for better performance per watt over the “heavy-weight” multi-core, although the multi-core is still very effective for some sophisticated applications. In addition, the low-power processor is not necessarily energy-efficient, since the runtime delay effect can be greater than the power savings.read more
Citations
More filters
Journal ArticleDOI
A Survey of Mobile Device Virtualization: Taxonomy and State of the Art
Junaid Shuja,Abdullah Gani,Kashif Bilal,Atta ur Rehman Khan,Sajjad A. Madani,Samee U. Khan,Albert Y. Zomaya +6 more
TL;DR: Challenges and issues faced in virtualization of CPU, memory, I/O, interrupt, and network interfaces are highlighted and various performance parameters are presented in a detailed comparative analysis to quantify the efficiency of mobile virtualization techniques and solutions.
Journal ArticleDOI
On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms
Emilio Francesquini,Márcio Castro,Pedro Henrique Penna,Fabrice Dupros,Henrique Freitas,Philippe O. A. Navaux,Jean-François Méhaut +6 more
TL;DR: This study evaluates the computing and energy performance of two well-known irregular NP-hard problems-the Traveling-Salesman Problem and K-Means clustering-and a numerical seismic wave propagation simulation kernel-Ondes3D-on multicore, NUMA, and manycore platforms.
Journal ArticleDOI
An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding
TL;DR: A coarse-grained reconfigurable processing unit (RPU) consisting of 16 ×16 multi-functional processing elements (PEs) interconnected by an area-efficient line-switched mesh connect (LSMC) routing is proposed to reduce the implementation overhead and the energy dissipation spent on fast reconfiguration.
Proceedings ArticleDOI
Protozoa: adaptive granularity cache coherence
TL;DR: The design of Protozoa is presented, a family of coherence protocols that eliminate unnecessary coherence traffic and match data movement to an application's spatial locality and is demonstrated to consistently reduce miss rate and improve the fraction of transmitted data that is actually utilized.
Proceedings ArticleDOI
Improving Energy Efficiency through Parallelization and Vectorization on Intel Core i5 and i7 Processors
TL;DR: Results show that software developers should prioritize vectorization over parallelization whenever possible, as it is much better in terms of energy efficiency, and need to develop a more detailed model to predict system power based on on-chip power information.
References
More filters
Journal ArticleDOI
Scalable molecular dynamics with NAMD
James C. Phillips,Rosemary Braun,Wei Wang,James C. Gumbart,Emad Tajkhorshid,Elizabeth Villa,Christophe Chipot,Robert D. Skeel,Laxmikant V. Kale,Klaus Schulten +9 more
TL;DR: NAMD as discussed by the authors is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems that scales to hundreds of processors on high-end parallel platforms, as well as tens of processors in low-cost commodity clusters, and also runs on individual desktop and laptop computers.
Proceedings ArticleDOI
A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS
Jason Howard,Saurabh Dighe,Yatin Hoskote,Sriram R. Vangal,D. Finan,G. Ruhl,David Jenkins,H. Wilson,Nitin Borkar,Gerhard Schrom,Fabrice Pailet,Shailendra Jain,Tiju Jacob,Satish Yada,Sravan K. Marella,Praveen Salihundam,Vasantha Erraguntla,Michael Konow,Michael Riepen,Guido Droege,Joerg Lindemann,Matthias Gries,Thomas Apel,Kersten Henriss,Tor Lund-Larsen,Sebastian Steibl,Shekhar Borkar,Vivek De,Rob F. Van der Wijngaart,Timothy G. Mattson +29 more
TL;DR: This paper presents a prototype chip that integrates 48 Pentium™ class IA-32 cores on a 6×4 2D-mesh network of tiled core clusters with high-speed I/Os on the periphery to realize a data-center-on-a-die microprocessor architecture.
Proceedings ArticleDOI
The 48-core SCC Processor: the Programmer's View
Timothy G. Mattson,Michael Riepen,Thomas Lehnig,Paul Brett,Werner Haas,Patrick Kennedy,Jason Howard,Sriram R. Vangal,Nitin Borkar,Greg Ruhl,Saurabh Dighe +10 more
TL;DR: The programmer's view of this chip is described and RCCE is described: the native message passing model created for the SCC processor, an intermediate case, sharing traits of message passing and shared memory architectures.
Proceedings ArticleDOI
NAS parallel benchmark results
TL;DR: The performance results of various systems using the NAS parallel benchmarks are presented and these results represent the best results that have been reported to the authors for the specific systems listed.