The Combined Input-Output Queued Crossbar Architecture for High-Radix On-Chip Switches

doi:10.1109/MM.2014.56

Journal ArticleDOI

The Combined Input-Output Queued Crossbar Architecture for High-Radix On-Chip Switches

G. Passas, +2 more

- 01 Nov 2015 -

IEEE Micro

- Vol. 35, Iss: 6, pp 38-47

Chats0

TLDR

This article proposes a novel microarchitecture that allows flat crossbar switches to scale to 128 ports, supporting 32 Gbits per second per port (Gbps/port) while occupying 4.9 mm^2 and consuming 4.2 W, and compares CIOQ with Swizzle Switch prototypes and demonstrates high-radix crossbars' potential for system-on-chip interconnects.

Abstract:

High-radix, single-chip routers have emerged as efficient building blocks for interconnection networks. Researchers believe that hierarchical switch architectures are needed at high radices as crossbars scale with the square of the router radix. This article proposes a novel microarchitecture that allows flat crossbar switches to scale to 128 ports, supporting 32 Gbits per second per port (Gbps/port) while occupying 4.9 mm^2 and consuming 4.2 W, or supporting 64 Gbps/port at 7.5 mm^2 and 7.5 W, in 45-nm CMOS. Key features include deep crossbar pipelining to cope with wire delay, a novel cross-scheduler architecture to reduce wiring complexity, and catalytic custom gate placement within standard electronic design automation (EDA) flows. Furthermore, on a chip, crossbar speedup and combined I/O queuing (CIOQ) is better than hierarchical queueing, providing top performance with orders of magnitude lower memory cost. Finally, the authors compare CIOQ with Swizzle Switch prototypes and demonstrate high-radix crossbars' potential for system-on-chip interconnects.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Enhanced Overloaded CDMA Interconnect (OCI) Bus Architecture for On-Chip Communication

Khaled I. E. Ahmed, +1 more

TL;DR: This work proposes the Difference-Overloaded CDMA Interconnect (D-OCI) bus that leverages the balancing property of the Walsh codes to increase the number of interconnected elements by 50% and motivates using Code Division Multiple Access (CDMA) as a bus sharing strategy which offers many advantages over other topologies.

...read moreread less

Journal ArticleDOI

Ping-lock round robin arbiter

Alireza Monemi, +3 more

- 01 May 2017 -

Microelectronics Journal

TL;DR: PLA, which is an improved IPPA offers fair arbitration under any distribution of active requests and has the advantage of low execution delay, while the FPGA and ASIC implementations of PLA show up to 18% and 12% improvement in average delay when compared to existing RRAs in literature.

...read moreread less

Journal ArticleDOI

Low power hardware implementations for network packet processing elements

Mohamed Asan Basiri M, +1 more

- 01 Jun 2018 -

Integration

TL;DR: This paper proposes a new crossbar switch based packet classification and 2-to-1 multiplexer based buffered crossbar backplane to achieve high performance and achieves significant improvement in power reduction over the existing designs.

...read moreread less

Posted Content

SERENADE: A Parallel Randomized Algorithm Suite for Crossbar Scheduling in Input-Queued Switches

Long Gong, +5 more

- 19 Oct 2017 -

arXiv: Performance

TL;DR: This work proposes SERENADE (SERENA, the Distributed Edition), a parallel algorithm suite that emulates SERENA in only $O(\log N)$ %(distributed) iterations between input ports and output ports, and hence has a time complexity of only £log N per port.

...read moreread less

Proceedings ArticleDOI

DeepHiR: improving high-radix router throughput with deep hybrid memory buffer microarchitecture

Cunlu Li, +4 more

TL;DR: DeepHiR is proposed, a novel deep hybrid buffer organization (STT-MRAM and SRAM) combined with a centralized buffer organization to provide high performance with minimal cost and leverage alternate memory technology to increase the buffer capacity for high-radix routers.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

The future of wires

R. Ho, +2 more

TL;DR: Wires that shorten in length as technologies scale have delays that either track gate delays or grow slowly relative to gate delays, which is good news since these "local" wires dominate chip wiring.

...read moreread less

Proceedings ArticleDOI

Express virtual channels: towards the ideal interconnection fabric

Amit Kumar, +3 more

TL;DR: This paper proposes express virtual channels (EVCs), a novel flow control mechanism which allows packets to virtually bypass intermediate routers along their path in a completely non-speculative fashion, thereby lowering the energy/delay towards that of a dedicated wire while simultaneously approaching ideal throughput with a practical design suitable for on-chip networks.

...read moreread less

Posted Content

The Tiny Tera: A Packet Switch Core

Nick McKeown, +4 more

- 05 Oct 1998 -

arXiv: Networking and Internet Architect...

TL;DR: The Tiny Tera is a CMOS-based input-queued, fixed-size packet switch suitable for a wide range of applications such as a highperformance ATM switch, the core of an Internet router or as a fast multiprocessor interconnect.

...read moreread less

Journal ArticleDOI

Tiny Tera: a packet switch core

Nick McKeown, +4 more

- 01 Jan 1997 -

IEEE Micro

TL;DR: Tiny Tera as mentioned in this paper is an input-buffered switch, which makes it the highest bandwidth switch possible given a particular CMOS and memory technology. But it does not support multicasting.

...read moreread less

Journal ArticleDOI

Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support

Natalie Enright Jerger, +2 more

TL;DR: The proposed Virtual Circuit Tree Multicasting (VCTM) router is flexible enough to improve interconnect performance for a broad spectrum of multicasting scenarios, and achieves these benefits with straightforward and inexpensive extensions to a state-of-the-art packet-switched router.

...read moreread less

The Combined Input-Output Queued Crossbar Architecture for High-Radix On-Chip Switches

Citations

Enhanced Overloaded CDMA Interconnect (OCI) Bus Architecture for On-Chip Communication

Ping-lock round robin arbiter

Low power hardware implementations for network packet processing elements

SERENADE: A Parallel Randomized Algorithm Suite for Crossbar Scheduling in Input-Queued Switches

DeepHiR: improving high-radix router throughput with deep hybrid memory buffer microarchitecture

References

The future of wires

Express virtual channels: towards the ideal interconnection fabric

The Tiny Tera: A Packet Switch Core

Tiny Tera: a packet switch core

Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support

Related Papers (5)

VLSI micro-architectures for high-radix crossbar schedulers

Comparison of synthesized bus and crossbar interconnection architectures

Design and implementation of a routing switch for on-chip interconnection networks

Microarchitecture of a High-Radix Router

A performance evaluation of 2D-mesh, ring, and crossbar interconnects for chip multi-processors

Trending Questions (1)