Home
/
Authors
/
Vijaykrishnan Narayanan

Author

Vijaykrishnan Narayanan

Other affiliations: Foundation University, Islamabad, Arizona State University

Bio: Vijaykrishnan Narayanan is an academic researcher from Pennsylvania State University. The author has contributed to research in topics: Speedup & Neuromorphic engineering. The author has an hindex of 27, co-authored 97 publications receiving 4157 citations. Previous affiliations of Vijaykrishnan Narayanan include Foundation University, Islamabad & Arizona State University.

Papers published on a yearly basis

2021
2020
2019
2018
2017
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2003

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Leakage current: Moore's law meets static power

[...]

Nam Sung Kim¹, Todd Austin¹, D. Baauw¹, Trevor Mudge¹, Krisztian Flautner, Jie Hu², Mary Jane Irwin², Mahmut Kandemir², Vijaykrishnan Narayanan² - Show less +5 more•Institutions (2)

University of Michigan¹, Pennsylvania State University²

01 Dec 2003-IEEE Computer

TL;DR: The other source of power dissipation in microprocessors, dynamic power, arises from the repeated capacitance charge and discharge on the output of the hundreds of millions of gates in today's chips.

...read moreread less

Abstract: Off-state leakage is static power, current that leaks through transistors even when they are turned off. The other source of power dissipation in today's microprocessors, dynamic power, arises from the repeated capacitance charge and discharge on the output of the hundreds of millions of gates in today's chips. Until recently, only dynamic power has been a significant source of power consumption, and Moore's law helped control it. However, power consumption has now become a primary microprocessor design constraint; one that researchers in both industry and academia will struggle to overcome in the next few years. Microprocessor design has traditionally focused on dynamic power consumption as a limiting factor in system integration. As feature sizes shrink below 0.1 micron, static power is posing new low-power design challenges.

...read moreread less

1,233 citations

Journal Article•DOI•

Design and Management of 3D Chip Multiprocessors Using Network-in-Memory

[...]

Feihui Li¹, Chrysostomos Nicopoulos¹, Thomas E. Richardson¹, Yuan Xie¹, Vijaykrishnan Narayanan¹, Mahmut Kandemir¹ - Show less +2 more•Institutions (1)

Pennsylvania State University¹

01 May 2006

TL;DR: A router architecture and a topology design that makes use of a network architecture embedded into the L2 cache memory are proposed that demonstrate that a 3D L2 memory architecture generates much better results than the conventional two-dimensional designs under different number of layers and vertical connections.

...read moreread less

Abstract: Long interconnects are becoming an increasingly important problem from both power and performance perspectives. This motivates designers to adopt on-chip network-based communication infrastructures and three-dimensional (3D) designs where multiple device layers are stacked together. Considering the current trends towards increasing use of chip multiprocessing, it is timely to consider 3D chip multiprocessor design and memory networking issues, especially in the context of data management in large L2 caches. The overall goal of this paper is to study the challenges for L2 design and management in 3D chip multiprocessors. Our first contribution is to propose a router architecture and a topology design that makes use of a network architecture embedded into the L2 cache memory. Our second contribution is to demonstrate, through extensive experiments, that a 3D L2 memory architecture generates much better results than the conventional two-dimensional (2D) designs under different number of layers and vertical (inter-wafer) connections. In particular, our experiments show that a 3D architecture with no dynamic data migration generates better performance than a 2D architecture that employs data migration. This also helps reduce power consumption in L2 due to a reduced number of data movements.

...read moreread less

397 citations

Proceedings Article•DOI•

A novel dimensionally-decomposed router for on-chip communication in 3D architectures

[...]

Jongman Kim¹, Chrysostomos Nicopoulos¹, Dongkook Park¹, Reetuparna Das¹, Yuan Xie¹, Vijaykrishnan Narayanan¹, Mazin Yousif², Chita R. Das¹ - Show less +4 more•Institutions (2)

Pennsylvania State University¹, Intel²

09 Jun 2007

TL;DR: A novel partially-connected 3D crossbar structure, called the 3D Dimensionally-Decomposed (DimDe) Router, is proposed, which provides a good tradeoff between circuit complexity and performance benefits.

...read moreread less

Abstract: Much like multi-storey buildings in densely packed metropolises, three-dimensional (3D) chip structures are envisioned as a viable solution to skyrocketing transistor densities and burgeoning die sizes in multi-core architectures. Partitioning a larger die into smaller segments and then stacking them in a 3D fashion can significantly reduce latency and energy consumption. Such benefits emanate from the notion that inter-wafer distances are negligible compared to intra-wafer distances. This attribute substantially reduces global wiring length in 3D chips. The work in this paper integrates the increasingly popular idea of packet-based Networks-on-Chip (NoC) into a 3D setting. While NoCs have been studied extensively in the 2D realm, the microarchitectural ramifications of moving into the third dimension have yet to be fully explored. This paper presents a detailed exploration of inter-strata communication architectures in 3D NoCs. Three design options are investigated; a simple bus-based inter-wafer connection, a hop-by-hop standard 3D design, and a full 3D crossbar implementation. In this context, we propose a novel partially-connected 3D crossbar structure, called the 3D Dimensionally-Decomposed (DimDe) Router, which provides a good tradeoff between circuit complexity and performance benefits. Simulation results using (a) a stand-alone cycle-accurate 3D NoC simulator running synthetic workloads, and (b) a hybrid 3D NoC/cache simulation environment running real commercial and scientific benchmarks, indicate that the proposed DimDe design provides latency and throughput improvements of over 20% on average over the other 3D architectures, while remaining within 5% of the full 3D crossbar performance. Furthermore, based on synthesized hardware implementations in 90 nm technology, the DimDe architecture outperforms all other designs -- including the full 3D crossbar -- by an average of 26% in terms of the Energy-Delay Product (EDP).

...read moreread less

285 citations

Proceedings Article•DOI•

Cache revive: architecting volatile STT-RAM caches for enhanced performance in CMPs

[...]

Adwait Jog¹, Asit K. Mishra², Cong Xu¹, Yuan Xie¹, Vijaykrishnan Narayanan¹, Ravishankar Iyer², Chita R. Das¹ - Show less +3 more•Institutions (2)

Pennsylvania State University¹, Intel²

03 Jun 2012

TL;DR: This work forms the relationship between retention-time and write-latency, and finds optimal retention- time for architecting an efficient cache hierarchy using STT-RAM to overcome high write latency and energy problems.

...read moreread less

Abstract: High density, low leakage and non-volatility are the attractive features of Spin-Transfer-Torque-RAM (STT-RAM), which has made it a strong competitor against SRAM as a universal memory replacement in multi-core systems. However, STT-RAM suffers from high write latency and energy which has impeded its widespread adoption. To this end, we look at trading-off STT-RAM's non-volatility property (data-retention-time) to overcome these problems. We formulate the relationship between retention-time and write-latency, and find optimal retention-time for architecting an efficient cache hierarchy using STT-RAM. Our results show that, compared to SRAM-based design, our proposal can improve performance and energy consumption by 18% and 60%, respectively.

...read moreread less

261 citations

Journal Article•DOI•

A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks

[...]

Jongman Kim¹, Chrysostomos Nicopoulos¹, Dongkook Park¹, Vijaykrishnan Narayanan¹, Mazin Yousif², Chita R. Das¹ - Show less +2 more•Institutions (2)

Pennsylvania State University¹, Intel²

01 May 2006

TL;DR: This work proposes a novel fine-grained modular router architecture that employs decoupled parallel arbiters and uses smaller crossbars for row and column connections to reduce output port contention probabilities as compared to existing designs.

...read moreread less

Abstract: Packet-based on-chip networks are increasingly being adopted in complex System-on-Chip (SoC) designs supporting numerous homogeneous and heterogeneous functional blocks. These Network-on-Chip (NoC) architectures are required to not only provide ultra-low latency, but also occupy a small footprint and consume as little energy as possible. Further, reliability is rapidly becoming a major challenge in deep sub-micron technologies due to the increased prominence of permanent faults resulting from accelerated aging effects and manufacturing/testing challenges. Towards the goal of designing low-latency, energy efficient and reliable on-chip communication networks, we propose a novel fine-grained modular router architecture. The proposed architecture employs decoupled parallel arbiters and uses smaller crossbars for row and column connections to reduce output port contention probabilities as compared to existing designs. Furthermore, the router employs a new switch allocation technique known as "Mirroring Effect" to reduce arbitration depth and increase concurrency. In addition, the modular design permits graceful degradation of the network in the event of permanent faults and also helps to reduce the dynamic power consumption. Our simulation results indicate that in an 8 × 8 mesh network, the proposed architecture reduces packet latency by 4-40% and power consumption by 6-20% as compared to two existing router architectures. Evaluation using a combined performance, energy and fault-tolerance metric indicates that the proposed architecture provides 35-50% overall improvement compared to the two earlier routers.

...read moreread less

215 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Tunnel field-effect transistors as energy-efficient electronic switches

[...]

Adrian M. Ionescu¹, Heike Riel²•Institutions (2)

École Polytechnique¹, IBM²

17 Nov 2011-Nature

TL;DR: Tunnels based on ultrathin semiconducting films or nanowires could achieve a 100-fold power reduction over complementary metal–oxide–semiconductor transistors, so integrating tunnel FETs with CMOS technology could improve low-power integrated circuits.

...read moreread less

Abstract: Power dissipation is a fundamental problem for nanoelectronic circuits. Scaling the supply voltage reduces the energy needed for switching, but the field-effect transistors (FETs) in today's integrated circuits require at least 60 mV of gate voltage to increase the current by one order of magnitude at room temperature. Tunnel FETs avoid this limit by using quantum-mechanical band-to-band tunnelling, rather than thermal injection, to inject charge carriers into the device channel. Tunnel FETs based on ultrathin semiconducting films or nanowires could achieve a 100-fold power reduction over complementary metal-oxide-semiconductor (CMOS) transistors, so integrating tunnel FETs with CMOS technology could improve low-power integrated circuits.

...read moreread less

2,390 citations

Proceedings Article•DOI•

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

[...]

Tianshi Chen¹, Zidong Du¹, Ninghui Sun¹, Jia Wang¹, Chengyong Wu¹, Yunji Chen¹, Olivier Temam² - Show less +3 more•Institutions (2)

Chinese Academy of Sciences¹, French Institute for Research in Computer Science and Automation²

24 Feb 2014

TL;DR: This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.

...read moreread less

Abstract: Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65 nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.

...read moreread less

1,582 citations

Graph-Based Algorithms for Boolean Function Manipulation

[...]

Sofia Cassel

01 Jan 2012

946 citations

Proceedings Article•DOI•

Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

[...]

Naveen Muralimanohar¹, Rajeev Balasubramonian¹, Norm Jouppi²•Institutions (2)

University of Utah¹, Hewlett-Packard²

01 Dec 2007

TL;DR: This work implements two major extensions to the CACTI cache modeling tool that focus on interconnect design for a large cache, and adopts state-of-the-art design space exploration strategies for non-uniform cache access (NUCA).

...read moreread less

Abstract: A significant part of future microprocessor real estate will be dedicated to L2 or L3 caches. These on-chip caches will heavily impact processor perfor- mance, power dissipation, and thermal management strategies. There are a number of interconnect design considerations that influence power/performance/area characteristics of large caches, such as wire mod- els (width/spacing/repeaters), signaling strategy (RC/differential/transmission), router design, etc. Yet, to date, there exists no analytical tool that takes all of these parameters into account to carry out a design space exploration for large caches and estimate an optimal organization. In this work, we implement two major extensions to the CACTI cache modeling tool that focus on interconnect design for a large cache. First, we add the ability to model different types of wires, such as RC-based wires with different power/delay characteristics and differential low-swing buses. Second, we add the ability to model Non-uniform Cache Access (NUCA). We not only adopt state-of-the-art design space exploration strategies for NUCA, we also enhance this exploration by considering on-chip network contention and a wider spectrum of wiring and routing choices. We present a validation analysis of the new tool (to be released as CACTI 6.0) and present a case study to showcase how the tool can improve architecture research methodologies. Keywords: cache models, non-uniform cache archi- tectures (NUCA), memory hierarchies, on-chip intercon- nects.

...read moreread less

778 citations

Journal Article•DOI•

A subthermionic tunnel field-effect transistor with an atomically thin channel.

[...]

Deblina Sarkar¹, Xuejun Xie¹, Wei Liu¹, Wei Cao¹, Jiahao Kang¹, Yongji Gong², Stephan Kraemer¹, Pulickel M. Ajayan², Kaustav Banerjee¹ - Show less +5 more•Institutions (2)

University of California, Santa Barbara¹, Rice University²

01 Oct 2015-Nature

TL;DR: This paper demonstrates band-to-band tunnel field-effect transistors (tunnel-FETs), based on a two-dimensional semiconductor, that exhibit steep turn-on and is the only planar architecture tunnel-fET to achieve subthermionic subthreshold swing over four decades of drain current, and is also the only tunnel- FET (in any architecture) to achieve this at a low power-supply voltage of 0.1 volts.

...read moreread less

Abstract: A new type of device, the band-to-band tunnel transistor, which has atomically thin molybdenum disulfide as the active channel, operates in a fundamentally different way from a conventional silicon (MOSFET) transistor; it has turn-on characteristics and low-power operation that are better than those of state-of-the-art MOSFETs or any tunnelling transistor reported so far. Traditional transistor technology is fast approaching its fundamental limits, and two-dimensional semiconducting materials such as molybdenum disulfide (MoS2) are seen as possible replacements for silicon in a next generation of high-density, lower-power chip electronics. A particularly promising prospect is their potential in band-to-band tunnel transistors, which operate in a fundamentally different way from conventional silicon (MOSFET) transistors. So far, few such devices with overall characteristics better than silicon transistors have been demonstrated. Now Kaustav Banerjee et al. have built a tunnel transistor by making a vertical structure with atomically thin MoS2 as the active channel and germanium as the source electrode. It has turn-on characteristics and low-power operation that are better than those of existing silicon transistors, and the results will be of interest in a range of electronic applications including low-power integrated circuits, as well as ultra-sensitive bio sensors or gas sensors. The fast growth of information technology has been sustained by continuous scaling down of the silicon-based metal–oxide field-effect transistor. However, such technology faces two major challenges to further scaling. First, the device electrostatics (the ability of the transistor’s gate electrode to control its channel potential) are degraded when the channel length is decreased, using conventional bulk materials such as silicon as the channel. Recently, two-dimensional semiconducting materials1,2,3,4,5,6,7 have emerged as promising candidates to replace silicon, as they can maintain excellent device electrostatics even at much reduced channel lengths. The second, more severe, challenge is that the supply voltage can no longer be scaled down by the same factor as the transistor dimensions because of the fundamental thermionic limitation of the steepness of turn-on characteristics, or subthreshold swing8,9. To enable scaling to continue without a power penalty, a different transistor mechanism is required to obtain subthermionic subthreshold swing, such as band-to-band tunnelling10,11,12,13,14,15,16. Here we demonstrate band-to-band tunnel field-effect transistors (tunnel-FETs), based on a two-dimensional semiconductor, that exhibit steep turn-on; subthreshold swing is a minimum of 3.9 millivolts per decade and an average of 31.1 millivolts per decade for four decades of drain current at room temperature. By using highly doped germanium as the source and atomically thin molybdenum disulfide as the channel, a vertical heterostructure is built with excellent electrostatics, a strain-free heterointerface, a low tunnelling barrier, and a large tunnelling area. Our atomically thin and layered semiconducting-channel tunnel-FET (ATLAS-TFET) is the only planar architecture tunnel-FET to achieve subthermionic subthreshold swing over four decades of drain current, as recommended in ref. 17, and is also the only tunnel-FET (in any architecture) to achieve this at a low power-supply voltage of 0.1 volts. Our device is at present the thinnest-channel subthermionic transistor, and has the potential to open up new avenues for ultra-dense and low-power integrated circuits, as well as for ultra-sensitive biosensors and gas sensors18,19,20,21.

...read moreread less

774 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse