1.1 Computing's energy problem (and what we can do about it)

doi:10.1109/ISSCC.2014.6757323

Home
/
Papers
/
1.1 Computing's energy problem (and what we can do about it)

Proceedings Article•DOI•

1.1 Computing's energy problem (and what we can do about it)

Mark Horowitz¹•Institutions (1)

Stanford University¹

06 Mar 2014-pp 10-14

TL;DR: If the drive for performance and the end of voltage scaling have made power, and not the number of transistors, the principal factor limiting further improvements in computing performance, a new wave of innovative and efficient computing devices will be created.

read less

Abstract: Our challenge is clear: The drive for performance and the end of voltage scaling have made power, and not the number of transistors, the principal factor limiting further improvements in computing performance. Continuing to scale compute performance will require the creation and effective use of new specialized compute engines, and will require the participation of application experts to be successful. If we play our cards right, and develop the tools that allow our customers to become part of the design process, we will create a new wave of innovative and efficient computing devices.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

[...]

Vivienne Sze¹, Yu-Hsin Chen¹, Tien-Ju Yang¹, Joel Emer¹•Institutions (1)

Massachusetts Institute of Technology¹

20 Nov 2017

TL;DR: In this paper, the authors provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs, and discuss various hardware platforms and architectures that support DNN, and highlight key trends in reducing the computation cost of deep neural networks either solely via hardware design changes or via joint hardware and DNN algorithm changes.

...read moreread less

Abstract: Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic codesigns, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the tradeoffs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

...read moreread less

2,391 citations

Posted Content•

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

[...]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio - Show less +1 more

09 Feb 2016-arXiv: Learning

TL;DR: A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.

...read moreread less

Abstract: We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At training-time the binary weights and activations are used for computing the parameters gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency. To validate the effectiveness of BNNs we conduct two sets of experiments on the Torch7 and Theano frameworks. On both, BNNs achieved nearly state-of-the-art results over the MNIST, CIFAR-10 and SVHN datasets. Last but not least, we wrote a binary matrix multiplication GPU kernel with which it is possible to run our MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The code for training and running our BNNs is available on-line.

...read moreread less

2,320 citations

Journal Article•DOI•

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

[...]

Yu-Hsin Chen¹, Tushar Krishna¹, Joel Emer¹, Vivienne Sze¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2017-IEEE Journal of Solid-state Circuits

TL;DR: Eyeriss as mentioned in this paper is an accelerator for state-of-the-art deep convolutional neural networks (CNNs) that optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, by reconfiguring the architecture.

...read moreread less

Abstract: Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on-chip and off-chip that is more energy-consuming than computation. Minimizing data movement energy cost for any CNN shape, therefore, is the key to high throughput and energy efficiency. Eyeriss achieves these goals by using a proposed processing dataflow, called row stationary (RS), on a spatial architecture with 168 processing elements. RS dataflow reconfigures the computation mapping of a given shape, which optimizes energy efficiency by maximally reusing data locally to reduce expensive data movement, such as DRAM accesses. Compression and data gating are also applied to further improve energy efficiency. Eyeriss processes the convolutional layers at 35 frames/s and 0.0029 DRAM access/multiply and accumulation (MAC) for AlexNet at 278 mW (batch size $N = 4$ ), and 0.7 frames/s and 0.0035 DRAM access/MAC for VGG-16 at 236 mW ( $N = 3$ ).

...read moreread less

2,165 citations

Posted Content•

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

[...]

Itay Hubara¹, Matthieu Courbariaux², Daniel Soudry³, Ran El-Yaniv¹, Yoshua Bengio² - Show less +1 more•Institutions (3)

Technion – Israel Institute of Technology¹, Université de Montréal², Columbia University³

22 Sep 2016-arXiv: Neural and Evolutionary Computing

TL;DR: A binary matrix multiplication GPU kernel is programmed with which it is possible to run the MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.

...read moreread less

Abstract: We introduce a method to train Quantized Neural Networks (QNNs) --- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32-bit counterparts. For example, our quantized version of AlexNet with 1-bit weights and 2-bit activations achieves $51\%$ top-1 accuracy. Moreover, we quantize the parameter gradients to 6-bits as well which enables gradients computation using only bit-wise operation. Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits. Last but not least, we programmed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The QNN code is available online.

...read moreread less

1,232 citations

Cites background from "1.1 Computing's energy problem (and..."

...Horowitz (2014) provides rough numbers for the energy consumed by the computation (the given numbers are for 45nm technology), as summarized in Tables 5 and 6....
[...]
...Over the last decade, power has been the main constraint on performance (Horowitz, 2014)....
[...]

Journal Article•DOI•

In-memory computing with resistive switching devices

[...]

Daniele Ielmini¹, H.-S. Philip Wong²•Institutions (2)

Polytechnic University of Milan¹, Stanford University²

01 Jun 2018

TL;DR: This Review Article examines the development of in-memory computing using resistive switching devices, where the two-terminal structure of the devices, theirresistive switching properties, and direct data processing in the memory can enable area- and energy-efficient computation.

...read moreread less

Abstract: Modern computers are based on the von Neumann architecture in which computation and storage are physically separated: data are fetched from the memory unit, shuttled to the processing unit (where computation takes place) and then shuttled back to the memory unit to be stored. The rate at which data can be transferred between the processing unit and the memory unit represents a fundamental limitation of modern computers, known as the memory wall. In-memory computing is an approach that attempts to address this issue by designing systems that compute within the memory, thus eliminating the energy-intensive and time-consuming data movement that plagues current designs. Here we review the development of in-memory computing using resistive switching devices, where the two-terminal structure of the devices, their resistive switching properties, and direct data processing in the memory can enable area- and energy-efficient computation. We examine the different digital, analogue, and stochastic computing schemes that have been proposed, and explore the microscopic physical mechanisms involved. Finally, we discuss the challenges in-memory computing faces, including the required scaling characteristics, in delivering next-generation computing. This Review Article examines the development of in-memory computing using resistive switching devices.

...read moreread less

1,193 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Design of ion-implanted MOSFET's with very small physical dimensions

[...]

R.H. Dennard, F.H. Gaensslen, V.L. Rideout, E. Bassous, A. LeBlanc - Show less +1 more

01 Oct 1974-IEEE Journal of Solid-state Circuits

TL;DR: This paper considers the design, fabrication, and characterization of very small Mosfet switching devices suitable for digital integrated circuits, using dimensions of the order of 1 /spl mu/.

...read moreread less

Abstract: This paper considers the design, fabrication, and characterization of very small Mosfet switching devices suitable for digital integrated circuits, using dimensions of the order of 1 /spl mu/. Scaling relationships are presented which show how a conventional MOSFET can be reduced in size. An improved small device structure is presented that uses ion implantation, to provide shallow source and drain regions and a nonuniform substrate doping profile. One-dimensional models are used to predict the substrate doping profile and the corresponding threshold voltage versus source voltage characteristic. A two-dimensional current transport model is used to predict the relative degree of short-channel effects for different device parameter combinations. Polysilicon-gate MOSFET's with channel lengths as short as 0.5 /spl mu/ were fabricated, and the device characteristics measured and compared with predicted values. The performance improvement expected from using these very small devices in highly miniaturized integrated circuits is projected.

...read moreread less

3,008 citations

Book•

Low Power Digital CMOS Design

[...]

Anantha P. Chandrakasan, Robert W. Brodersen

30 Jun 1995

TL;DR: The Hierarchy of Limits of Power J.D. Stratakos, et al., and Low Power Programmable Computation coauthored with M.B. Srivastava, provide a review of the main approaches to Voltage Scaling Approaches.

...read moreread less

Abstract: 1. Introduction. 2. Hierarchy of Limits of Power J.D. Meindl. 3. Sources of Power Consumption. 4. Voltage Scaling Approaches. 5. DC Power Supply Design in Portable Systems coauthored with A.J. Stratakos, et al. 6. Adiabatic Switching L. Svensson. 7. Minimizing Switched Capacitance. 8. Computer Aided Design Tools. 9. A Portable Multimedia Terminal. 10. Low Power Programmable Computation coauthored with M.B. Srivastava. 11. Conclusions. Subject Index.

...read moreread less

1,024 citations

"1.1 Computing's energy problem (and..." refers background in this paper

...had been known for decades [7] that if the application was parallel, a parallel...
[...]

Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions

[...]

K. Steinhubl

01 Jan 1974

856 citations

"1.1 Computing's energy problem (and..." refers background in this paper

...Section 2 quickly reviews how computing became power limited, even before Dennard constantfield scaling [3] broke down, and explains the difficulties of using a technology change to fix our problems....
[...]

IEEE International Solid-State Circuits Conference

[...]

Hurwitz Jonathan Ephraim David, Stewart Smith, A. A. Murray, Peter B. Denyer, John Thomson, Scot D. Anderson, E. Duncan, B. Paisley, A. Kinsey, E. Christison, B. Laffoley, J. Vittu, R. Bechignac, Robert Henderson, M.J. Panaghiston, P.-F. Pugibet, H. Hendry, K. M. Findlater - Show less +14 more

01 Jan 2001

401 citations

Journal Article•DOI•

Towards energy-proportional datacenter memory with mobile DRAM

[...]

Malladi Krishna T¹, Benjamin C. Lee², Frank Austin Nothaft¹, Christos Kozyrakis¹, Karthika Periyathambi¹, Mark Horowitz¹ - Show less +2 more•Institutions (2)

Stanford University¹, Duke University²

09 Jun 2012

TL;DR: This work architects server memory systems using mobile DRAM devices, trading peak bandwidth for lower energy consumption per bit and more efficient idle modes, and demonstrates 3-5× lower memory power, better proportionality, and negligible performance penalties for data-center workloads.

...read moreread less

Abstract: To increase datacenter energy efficiency, we need memory systems that keep pace with processor efficiency gains. Currently, servers use DDR3 memory, which is designed for high bandwidth but not for energy proportionality. A system using 20% of the peak DDR3 bandwidth consumes 2.3x the energy per bit compared to the energy consumed by a system with fully utilized memory bandwidth. Nevertheless, many datacenter applications stress memory capacity and latency but not memory bandwidth. In response, we architect server memory systems using mobile DRAM devices, trading peak bandwidth for lower energy consumption per bit and more efficient idle modes. We demonstrate 3-5x lower memory power, better proportionality, and negligible performance penalties for datacenter workloads.

...read moreread less

249 citations

"1.1 Computing's energy problem (and..." refers background in this paper

...Given that the energy/operation is now critical, it is worth revisiting many of the cache design strategies with a view to minimizing the average memory access energy (AMAE) [8]....
[...]
...AMAE minimization must also take into account the DRAM energy of the system....
[...]
...Part of this high cost comes from the very energy-inefficient I/O that DRAM systems use [8], which not only takes over 20pJ/bit, but also require static power to keep the I/O active....
[...]