scispace - formally typeset
Journal ArticleDOI

Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Reads0
Chats0
TLDR
Examination of the area, power, performance, and design issues for the on-chip interconnects on a chip multiprocessor shows that designs that treat interconnect as an entity that can be independently architected and optimized would not arrive at the best multi-core design.
Abstract
This paper examines the area, power, performance, and design issues for the on-chip interconnects on a chip multiprocessor, attempting to present a comprehensive view of a class of interconnect architectures. It shows that the design choices for the interconnect have significant effect on the rest of the chip, potentially consuming a significant fraction of the real estate and power budget. This research shows that designs that treat interconnect as an entity that can be independently architected and optimized would not arrive at the best multi-core design. Several examples are presented showing the need for careful co-design. For instance, increasing interconnect bandwidth requires area that then constrains the number of cores or cache sizes, and does not necessarily increase performance. Also, shared level-2 caches become significantly less attractive when the overhead of the resulting crossbar is accounted for. A hierarchical bus structure is examined which negates some of the performance costs of the assumed base-line architecture.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Journal ArticleDOI

Photonic Networks-on-Chip for Future Generations of Chip Multiprocessors

TL;DR: Results confirm the unique benefits for future generations of CMPs that can be achieved by bringing optics into the chip in the form of photonic NoCs, as well as a comparative power analysis of a photonic versus an electronic NoC.
Journal ArticleDOI

Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives

TL;DR: This paper provides a general description of NoC architectures and applications and enumerates several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation.
Journal ArticleDOI

Corona: System Implications of Emerging Nanophotonic Technology

TL;DR: This work believes that in comparison with an electrically-connected many-core alternative that uses the same on-stack interconnect power, Corona can provide 2 to 6 times more performance on many memory intensive workloads, while simultaneously reducing power.
Journal ArticleDOI

Heterogeneous chip multiprocessors

TL;DR: Heterogeneous (or asymmetric) chip multiprocessors present unique opportunities for improving system throughput, reducing processor power, and mitigating Amdahl's law.
References
More filters
Proceedings ArticleDOI

Route packets, not wires: on-chip interconnection networks

TL;DR: This paper introduces the concept of on-chip networks, sketches a simple network, and discusses some challenges in the architecture and design of these networks.
Journal ArticleDOI

The future of wires

TL;DR: Wires that shorten in length as technologies scale have delays that either track gate delays or grow slowly relative to gate delays, which is good news since these "local" wires dominate chip wiring.
Journal ArticleDOI

The cosmic cube

TL;DR: Cosmic Cube as discussed by the authors is a hardware simulation of a future VLSI implementation that will consist of single-chip nodes, which offers high degrees of concurrency in applications and suggests that future machines with thousands of nodes are both feasible and attractive.
Journal ArticleDOI

The Stanford Dash multiprocessor

TL;DR: The directory architecture for shared memory (Dash) as discussed by the authors allows shared data to be cached, significantly reducing the latency of memory accesses and yielding higher processor utilization and higher overall performance, and a distributed directory-based protocol that provides cache coherence without compromising scalability.
Proceedings ArticleDOI

Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction

TL;DR: This paper proposes and evaluates single-ISA heterogeneousmulti-core architectures as a mechanism to reduceprocessor power dissipation and results indicate a 39% average energy reduction while only sacrificing 3% in performance.