scispace - formally typeset
Journal ArticleDOI

The Combined Input-Output Queued Crossbar Architecture for High-Radix On-Chip Switches

G. Passas, +2 more
- 01 Nov 2015 - 
- Vol. 35, Iss: 6, pp 38-47
Reads0
Chats0
TLDR
This article proposes a novel microarchitecture that allows flat crossbar switches to scale to 128 ports, supporting 32 Gbits per second per port (Gbps/port) while occupying 4.9 mm^2 and consuming 4.2 W, and compares CIOQ with Swizzle Switch prototypes and demonstrates high-radix crossbars' potential for system-on-chip interconnects.
Abstract
High-radix, single-chip routers have emerged as efficient building blocks for interconnection networks. Researchers believe that hierarchical switch architectures are needed at high radices as crossbars scale with the square of the router radix. This article proposes a novel microarchitecture that allows flat crossbar switches to scale to 128 ports, supporting 32 Gbits per second per port (Gbps/port) while occupying 4.9 mm^2 and consuming 4.2 W, or supporting 64 Gbps/port at 7.5 mm^2 and 7.5 W, in 45-nm CMOS. Key features include deep crossbar pipelining to cope with wire delay, a novel cross-scheduler architecture to reduce wiring complexity, and catalytic custom gate placement within standard electronic design automation (EDA) flows. Furthermore, on a chip, crossbar speedup and combined I/O queuing (CIOQ) is better than hierarchical queueing, providing top performance with orders of magnitude lower memory cost. Finally, the authors compare CIOQ with Swizzle Switch prototypes and demonstrate high-radix crossbars' potential for system-on-chip interconnects.

read more

Citations
More filters
Proceedings ArticleDOI

Enhanced Overloaded CDMA Interconnect (OCI) Bus Architecture for On-Chip Communication

TL;DR: This work proposes the Difference-Overloaded CDMA Interconnect (D-OCI) bus that leverages the balancing property of the Walsh codes to increase the number of interconnected elements by 50% and motivates using Code Division Multiple Access (CDMA) as a bus sharing strategy which offers many advantages over other topologies.
Journal ArticleDOI

Ping-lock round robin arbiter

TL;DR: PLA, which is an improved IPPA offers fair arbitration under any distribution of active requests and has the advantage of low execution delay, while the FPGA and ASIC implementations of PLA show up to 18% and 12% improvement in average delay when compared to existing RRAs in literature.
Journal ArticleDOI

Low power hardware implementations for network packet processing elements

TL;DR: This paper proposes a new crossbar switch based packet classification and 2-to-1 multiplexer based buffered crossbar backplane to achieve high performance and achieves significant improvement in power reduction over the existing designs.
Posted Content

SERENADE: A Parallel Randomized Algorithm Suite for Crossbar Scheduling in Input-Queued Switches

TL;DR: This work proposes SERENADE (SERENA, the Distributed Edition), a parallel algorithm suite that emulates SERENA in only $O(\log N)$ %(distributed) iterations between input ports and output ports, and hence has a time complexity of only £log N per port.
Proceedings ArticleDOI

DeepHiR: improving high-radix router throughput with deep hybrid memory buffer microarchitecture

TL;DR: DeepHiR is proposed, a novel deep hybrid buffer organization (STT-MRAM and SRAM) combined with a centralized buffer organization to provide high performance with minimal cost and leverage alternate memory technology to increase the buffer capacity for high-radix routers.
References
More filters
Journal ArticleDOI

The future of wires

TL;DR: Wires that shorten in length as technologies scale have delays that either track gate delays or grow slowly relative to gate delays, which is good news since these "local" wires dominate chip wiring.
Proceedings ArticleDOI

Express virtual channels: towards the ideal interconnection fabric

TL;DR: This paper proposes express virtual channels (EVCs), a novel flow control mechanism which allows packets to virtually bypass intermediate routers along their path in a completely non-speculative fashion, thereby lowering the energy/delay towards that of a dedicated wire while simultaneously approaching ideal throughput with a practical design suitable for on-chip networks.
Posted Content

The Tiny Tera: A Packet Switch Core

TL;DR: The Tiny Tera is a CMOS-based input-queued, fixed-size packet switch suitable for a wide range of applications such as a highperformance ATM switch, the core of an Internet router or as a fast multiprocessor interconnect.
Journal ArticleDOI

Tiny Tera: a packet switch core

TL;DR: Tiny Tera as mentioned in this paper is an input-buffered switch, which makes it the highest bandwidth switch possible given a particular CMOS and memory technology. But it does not support multicasting.
Journal ArticleDOI

Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support

TL;DR: The proposed Virtual Circuit Tree Multicasting (VCTM) router is flexible enough to improve interconnect performance for a broad spectrum of multicasting scenarios, and achieves these benefits with straightforward and inexpensive extensions to a state-of-the-art packet-switched router.
Related Papers (5)
Trending Questions (1)
Why we don't find port fastethernet0/0 on Catalyst switches?

This article proposes a novel microarchitecture that allows flat crossbar switches to scale to 128 ports, supporting 32 Gbits per second per port (Gbps/port) while occupying 4.9 mm^2 and consuming 4.2 W, or supporting 64 Gbps/port at 7.5 mm^2 and 7.5 W, in 45-nm CMOS.