scispace - formally typeset
Search or ask a question
Author

Sai-Wang Tam

Bio: Sai-Wang Tam is an academic researcher from Marvell Technology Group. The author has contributed to research in topics: Network on a chip & CMOS. The author has an hindex of 16, co-authored 41 publications receiving 1081 citations. Previous affiliations of Sai-Wang Tam include University of California, Los Angeles & University of California.
Topics: Network on a chip, CMOS, Signal, Transmitter, Baseband

Papers
More filters
Proceedings ArticleDOI
24 Oct 2008
TL;DR: This paper explores the use of multi-band radio frequency interconnect (or RF-I) with signal propagation at the speed of light to provide shortcuts in a many core network-on-chip (NoC) mesh topology, and investigates the costs associated with this technology, and examines the latency and bandwidth benefits that it can provide.
Abstract: In this paper, we explore the use of multi-band radio frequency interconnect (or RF-I) with signal propagation at the speed of light to provide shortcuts in a many core network-on-chip (NoC) mesh topology. We investigate the costs associated with this technology, and examine the latency and bandwidth benefits that it can provide. Assuming a 400 mm2 die, we demonstrate that in exchange for 0.13% of area overhead on the active layer, RF-I can provide an average 13% (max 18%) boost in application performance, corresponding to an average 22% (max 24%) reduction in packet latency. We observe that RF access points may become traffic bottlenecks when many packets try to use the RF at once, and conclude by proposing strategies that adapt RF-I utilization at runtime to actively combat this congestion.

276 citations

Proceedings ArticleDOI
20 Sep 2009
TL;DR: This paper proposes a recursive wireless interconnect structure called the WCube that features a single transmit antenna and multiple receive antennas at each micro wireless router and offers scalable performance in terms of latency and connectivity.
Abstract: This paper describes an unconventional way to apply wireless networking in emerging technologies. It makes the case for using a two-tier hybrid wireless/wired architecture to interconnect hundreds to thousands of cores in chip multiprocessors (CMPs), where current interconnect technologies face severe scaling limitations in excessive latency, long wiring, and complex layout. We propose a recursive wireless interconnect structure called the WCube that features a single transmit antenna and multiple receive antennas at each micro wireless router and offers scalable performance in terms of latency and connectivity. We show the feasibility to build miniature on-chip antennas, and simple transmitters and receivers that operate at 100-500 GHz sub-terahertz frequency bands. We also devise new two-tier wormhole based routing algorithms that are deadlock free and ensure a minimum-latency route on a 1000-core on-chip interconnect network. Our simulations show that our protocol suite can reduce the observed latency by 20% to 45%, and consumes power that is comparable to or less than current 2-D wired mesh designs.

220 citations

Proceedings ArticleDOI
08 Nov 2008
TL;DR: A novel interconnect design exploiting dynamic RF-I bandwidth allocation to realize a reconfigurable network-on-chip architecture is proposed, and it is found that the adaptiveRF-I architecture on top of a mesh with 4B links can even outperform the baseline with 16B mesh links by about 1%, and reduces NoC power by approximately 65% including the overhead incurred for supporting RF- I.
Abstract: As chip multiprocessors scale to a greater number of processing cores, on-chip interconnection networks will experience dramatic increases in both bandwidth demand and power dissipation. Fortunately, promising gains can be realized via integration of radio frequency interconnect (RF-I) through on-chip transmission lines with traditional interconnects implemented with RC wires. While prior work has considered the latency advantage of RF-I, we demonstrate three further advantages of RF-I: (1) RF-I bandwidth can be flexibly allocated to provide an adaptive NoC, (2) RF-I can enable a dramatic power and area reduction by simplification of NoC topology, and (3) RF-I provides natural and efficient support for multicast. In this paper, we propose a novel interconnect design, exploiting dynamic RF-I bandwidth allocation to realize a reconfigurable network-on-chip architecture. We find that our adaptive RF-I architecture on top of a mesh with 4B links can even outperform the baseline with 16B mesh links by about 1%, and reduces NoC power by approximately 65% including the overhead incurred for supporting RF-I.

70 citations

Proceedings ArticleDOI
13 Apr 2008
TL;DR: A new way of implementing on-chip global interconnect that would meet stringent challenges of core-to-core communications in latency, data rate, and re-configurability for future chip-microprocessors (CMP) with efficient area and energy overheads is proposed.
Abstract: In this paper, we propose a new way of implementing on-chip global interconnect that would meet stringent challenges of core-to-core communications in latency, data rate, and re-configurability for future chip-microprocessors (CMP) with efficient area and energy overheads. We discuss the limitation of traditional RC-limited interconnects and possible benefits of multi-band RF-interconnect (RF-I) through on-chip differential transmission lines. The physical implementation of RF-I and its projected performance versus overhead as the function of CMOS technology scaling are discussed as well

67 citations

Proceedings ArticleDOI
15 Jun 2008
TL;DR: In this paper, a digital control of the effective dielectric constant of a differential mode transmission line up to 60GHz in standard CMOS technology was shown. But the authors only used MOS switches to dynamically control the phase.
Abstract: Digital control of the effective dielectric constant of a differential mode transmission line is shown up to 60GHz in standard CMOS technology. The effective dielectric constant is shown to increase from 5 to over 50 for the fixed artificial dielectric case. The digital controlled artificial dielectric transmission line (DiCAD) uses MOS switches to dynamically control the phase. DiCAD achieves 50% of the physically available tuning range with effective dielectric constants varying between 7 and 28. Measured results favorably agree with full-wave electromagnetic simulations.

66 citations


Cited by
More filters
Proceedings ArticleDOI
Jung-Il Choi1, Mayank Jain1, Kannan Srinivasan1, Phil Levis1, Sachin Katti1 
20 Sep 2010
TL;DR: In this paper, a single channel full-duplex wireless transceiver is proposed, which uses a combination of RF and baseband techniques to achieve FD with minimal effect on link reliability.
Abstract: This paper discusses the design of a single channel full-duplex wireless transceiver. The design uses a combination of RF and baseband techniques to achieve full-duplexing with minimal effect on link reliability. Experiments on real nodes show the full-duplex prototype achieves median performance that is within 8% of an ideal full-duplexing system. This paper presents Antenna Cancellation, a novel technique for self-interference cancellation. In conjunction with existing RF interference cancellation and digital baseband interference cancellation, antenna cancellation achieves the amount of self-interference cancellation required for full-duplex operation. The paper also discusses potential MAC and network gains with full-duplexing. It suggests ways in which a full-duplex system can solve some important problems with existing wireless systems including hidden terminals, loss of throughput due to congestion, and large end-to-end delays.

1,623 citations

Journal ArticleDOI
TL;DR: This paper provides a general description of NoC architectures and applications and enumerates several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation.
Abstract: To alleviate the complex communication problems that arise as the number of on-chip components increases, network-on-chip (NoC) architectures have been recently proposed to replace global interconnects. In this paper, we first provide a general description of NoC architectures and applications. Then, we enumerate several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation. Motivation, problem description, proposed approaches, and open issues are discussed for each problem from system, microarchitecture, and circuit perspectives. Finally, we address the interactions among these research problems and put the NoC design process into perspective.

733 citations

Proceedings ArticleDOI
26 Apr 2009
TL;DR: In this article, a detailed cycle-accurate interconnection network model (GARNET) is proposed to simulate a CMP architecture with virtual channel (VC) flow control.
Abstract: Until very recently, microprocessor designs were computation-centric. On-chip communication was frequently ignored. This was because of fast, single-cycle on-chip communication. The interconnect power was also insignificant compared to the transistor power. With uniprocessor designs providing diminishing returns and the advent of chip multiprocessors (CMPs) in mainstream systems, the on-chip network that connects different processing cores has become a critical part of the design. Transistor miniaturization has led to high global wire delay, and interconnect power comparable to transistor power. CMP design proposals can no longer ignore the interaction between the memory hierarchy and the interconnection network that connects various elements. This necessitates a detailed and accurate interconnection network model within a full-system evaluation framework. Ignoring the interconnect details might lead to inaccurate results when simulating a CMP architecture. It also becomes important to analyze the impact of interconnection network optimization techniques on full system behavior. In this light, we developed a detailed cycle-accurate interconnection network model (GARNET), inside the GEMS full-system simulation framework. GARNET models a classic five-stage pipelined router with virtual channel (VC) flow control. Microarchitectural details, such as flit-level input buffers, routing logic, allocators and the crossbar switch, are modeled. GARNET, along with GEMS, provides a detailed and accurate memory system timing model. To demonstrate the importance and potential impact of GARNET, we evaluate a shared and private L2 CMP with a realistic state-of-the-art interconnection network against the original GEMS simple network. The objective of the evaluation was to figure out which configuration is better for a particular workload. We show that not modeling the interconnect in detail might lead to an incorrect outcome. We also evaluate Express Virtual Channels (EVCs), an on-chip network flow control proposal, in a full-system fashion. We show that in improving on-chip network latency-throughput, EVCs do lead to better overall system runtime, however, the impact varies widely across applications.

719 citations

Journal ArticleDOI
01 Jun 2008
TL;DR: This work believes that in comparison with an electrically-connected many-core alternative that uses the same on-stack interconnect power, Corona can provide 2 to 6 times more performance on many memory intensive workloads, while simultaneously reducing power.
Abstract: We expect that many-core microprocessors will push performance per chip from the 10 gigaflop to the 10 teraflop range in the coming decade. To support this increased performance, memory and inter-core bandwidths will also have to scale by orders of magnitude. Pin limitations, the energy cost of electrical signaling, and the non-scalability of chip-length global wires are significant bandwidth impediments. Recent developments in silicon nanophotonic technology have the potential to meet these off- and on-stack bandwidth requirements at acceptable power levels. Corona is a 3D many-core architecture that uses nanophotonic communication for both inter-core communication and off-stack communication to memory or I/O devices. Its peak floating-point performance is 10 teraflops. Dense wavelength division multiplexed optically connected memory modules provide 10 terabyte per second memory bandwidth. A photonic crossbar fully interconnects its 256 low-power multithreaded cores at 20 terabyte per second bandwidth. We have simulated a 1024 thread Corona system running synthetic benchmarks and scaled versions of the SPLASH-2 benchmark suite. We believe that in comparison with an electrically-connected many-core alternative that uses the same on-stack interconnect power, Corona can provide 2 to 6 times more performance on many memory intensive workloads, while simultaneously reducing power.

688 citations

Proceedings ArticleDOI
Yan Pan1, Prabhat Kumar1, John Kim2, Gokhan Memik1, Yu Zhang1, Alok Choudhary1 
20 Jun 2009
TL;DR: Firefly is a hybrid, hierarchical network architecture that consists of clusters of nodes that are connected using conventional, electrical signaling while the inter-cluster communication is done using nanophotonics - exploiting the benefits of electrical signaling for short, local communication while nanophotinics is used only for global communication to realize an efficient on-chip network.
Abstract: Future many-core processors will require high-performance yet energy-efficient on-chip networks to provide a communication substrate for the increasing number of cores. Recent advances in silicon nanophotonics create new opportunities for on-chip networks. To efficiently exploit the benefits of nanophotonics, we propose Firefly - a hybrid, hierarchical network architecture. Firefly consists of clusters of nodes that are connected using conventional, electrical signaling while the inter-cluster communication is done using nanophotonics - exploiting the benefits of electrical signaling for short, local communication while nanophotonics is used only for global communication to realize an efficient on-chip network. Crossbar architecture is used for inter-cluster communication. However, to avoid global arbitration, the crossbar is partitioned into multiple, logical crossbars and their arbitration is localized. Our evaluations show that Firefly improves the performance by up to 57% compared to an all-electrical concentrated mesh (CMESH) topology on adversarial traffic patterns and up to 54% compared to an all-optical crossbar (OP XBAR) on traffic patterns with locality. If the energy-delay-product is compared, Firefly improves the efficiency of the on-chip network by up to 51% and 38% compared to CMESH and OP XBAR, respectively.

411 citations