Home
/
Authors
/
Adrian Sampson

Author

Adrian Sampson

Other affiliations: Association for Computing Machinery, Microsoft, University of Washington

Bio: Adrian Sampson is an academic researcher from Cornell University. The author has contributed to research in topics: Compiler & Programming paradigm. The author has an hindex of 18, co-authored 52 publications receiving 2964 citations. Previous affiliations of Adrian Sampson include Association for Computing Machinery & Microsoft.

Topics: Compiler, Programming paradigm, Debugging, Computer science, Probabilistic logic ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2018
2017
2015
2014
2013
2012
2011
2010
2008

Papers

PDF

Open Access

More filters

Journal Article•DOI•

EnerJ: approximate data types for safe and general low-power computation

[...]

Adrian Sampson¹, Werner Dietl¹, Emily Fortuna¹, Danushen Gnanapragasam¹, Luis Ceze¹, Dan Grossman¹ - Show less +2 more•Institutions (1)

University of Washington¹

04 Jun 2011

TL;DR: EnerJ is developed, an extension to Java that adds approximate data types and a hardware architecture that offers explicit approximate storage and computation and allows a programmer to control explicitly how information flows from approximate data to precise data.

...read moreread less

Abstract: Energy is increasingly a first-order concern in computer systems. Exploiting energy-accuracy trade-offs is an attractive choice in applications that can tolerate inaccuracies. Recent work has explored exposing this trade-off in programming models. A key challenge, though, is how to isolate parts of the program that must be precise from those that can be approximated so that a program functions correctly even as quality of service degrades.We propose using type qualifiers to declare data that may be subject to approximate computation. Using these types, the system automatically maps approximate variables to low-power storage, uses low-power operations, and even applies more energy-efficient algorithms provided by the programmer. In addition, the system can statically guarantee isolation of the precise program component from the approximate component. This allows a programmer to control explicitly how information flows from approximate data to precise data. Importantly, employing static analysis eliminates the need for dynamic checks, further improving energy savings. As a proof of concept, we develop EnerJ, an extension to Java that adds approximate data types. We also propose a hardware architecture that offers explicit approximate storage and computation. We port several applications to EnerJ and show that our extensions are expressive and effective; a small number of annotations lead to significant potential energy savings (10%-50%) at very little accuracy cost.

...read moreread less

680 citations

Proceedings Article•DOI•

Neural Acceleration for General-Purpose Approximate Programs

[...]

Hadi Esmaeilzadeh¹, Adrian Sampson¹, Luis Ceze¹, Doug Burger²•Institutions (2)

University of Washington¹, Microsoft²

01 Dec 2012

TL;DR: A programming model is defined that allows programmers to identify approximable code regions -- code that can produce imprecise but acceptable results and is faster and more energy efficient than executing the original code.

...read moreread less

Abstract: This paper describes a learning-based approach to the acceleration of approximate programs. We describe the \emph{Parrot transformation}, a program transformation that selects and trains a neural network to mimic a region of imperative code. After the learning phase, the compiler replaces the original code with an invocation of a low-power accelerator called a \emph{neural processing unit} (NPU). The NPU is tightly coupled to the processor pipeline to accelerate small code regions. Since neural networks produce inherently approximate results, we define a programming model that allows programmers to identify approximable code regions -- code that can produce imprecise but acceptable results. Offloading approximable code regions to NPUs is faster and more energy efficient than executing the original code. For a set of diverse applications, NPU acceleration provides whole-application speedup of 2.3x and energy savings of 3.0x on average with quality loss of at most 9.6%.

...read moreread less

532 citations

Proceedings Article•DOI•

Architecture support for disciplined approximate programming

[...]

Hadi Esmaeilzadeh¹, Adrian Sampson¹, Luis Ceze¹, Doug Burger²•Institutions (2)

University of Washington¹, Microsoft²

03 Mar 2012

TL;DR: An ISA extension that provides approximate operations and storage is described that gives the hardware freedom to save energy at the cost of accuracy and Truffle, a microarchitecture design that efficiently supports the ISA extensions is proposed.

...read moreread less

Abstract: Disciplined approximate programming lets programmers declare which parts of a program can be computed approximately and consequently at a lower energy cost. The compiler proves statically that all approximate computation is properly isolated from precise computation. The hardware is then free to selectively apply approximate storage and approximate computation with no need to perform dynamic correctness checks.In this paper, we propose an efficient mapping of disciplined approximate programming onto hardware. We describe an ISA extension that provides approximate operations and storage, which give the hardware freedom to save energy at the cost of accuracy. We then propose Truffle, a microarchitecture design that efficiently supports the ISA extensions. The basis of our design is dual-voltage operation, with a high voltage for precise operations and a low voltage for approximate operations. The key aspect of the microarchitecture is its dependence on the instruction stream to determine when to use the low voltage. We evaluate the power savings potential of in-order and out-of-order Truffle configurations and explore the resulting quality of service degradation. We evaluate several applications and demonstrate energy savings up to 43%.

...read moreread less

423 citations

Journal Article•DOI•

Neural acceleration for general-purpose approximate programs

[...]

Hadi Esmaeilzadeh¹, Adrian Sampson², Luis Ceze², Doug Burger³•Institutions (3)

Georgia Institute of Technology¹, University of Washington², Microsoft³

23 Dec 2014-Communications of The ACM

TL;DR: NPUs leverage the approximate algorithmic transformation that converts regions of code from a Von Neumann model to a neural model and shows that significant performance and efficiency gains are possible when the abstraction of full accuracy is relaxed in general-purpose computing.

...read moreread less

Abstract: As improvements in per-transistor speed and energy efficiency diminish, radical departures from conventional approaches are needed to continue improvements in the performance and energy efficiency of general-purpose processors. One such departure is approximate computing, where error in computation is acceptable and the traditional robust digital abstraction of near-perfect accuracy is relaxed. Conventional techniques in energy-efficient computing navigate a design space defined by the two dimensions of performance and energy, and traditionally trade one for the other. General-purpose approximate computing explores a third dimension---error---and trades the accuracy of computation for gains in both energy and performance. Techniques to harvest large savings from small errors have proven elusive. This paper describes a new approach that uses machine learning-based transformations to accelerate approximation-tolerant programs. The core idea is to train a learning model how an approximable region of code---code that can produce imprecise but acceptable results---behaves and replace the original code region with an efficient computation of the learned model. We use neural networks to learn code behavior and approximate it. We describe the Parrot algorithmic transformation, which leverages a simple programmer annotation ("approximable") to transform a code region from a von Neumann model to a neural model. After the learning phase, the compiler replaces the original code with an invocation of a low-power accelerator called a neural processing unit (NPU). The NPU is tightly coupled to the processor pipeline to permit profitable acceleration even when small regions of code are transformed. Offloading approximable code regions to NPUs is faster and more energy efficient than executing the original code. For a set of diverse applications, NPU acceleration provides whole-application speedup of 2.3× and energy savings of 3.0× on average with average quality loss of at most 9.6%. NPUs form a new class of accelerators and show that significant gains in both performance and efficiency are achievable when the traditional abstraction of near-perfect accuracy is relaxed in general-purpose computing.

...read moreread less

290 citations

Journal Article•DOI•

Approximate Storage in Solid-State Memories

[...]

Adrian Sampson¹, Jacob Nelson¹, Karin Strauss¹, Luis Ceze¹•Institutions (1)

University of Washington¹

23 Sep 2014-ACM Transactions on Computer Systems

TL;DR: Simulations show that reduced-precision writes in multi-level phase-change memory cells can be 1.7× faster on average and using failed blocks can improve array lifetime by 23% on average with quality loss under 10%.

...read moreread less

Abstract: Memories today expose an all-or-nothing correctness model that incurs significant costs in performance, energy, area, and design complexity But not all applications need high-precision storage for all of their data structures all of the time This article proposes mechanisms that enable applications to store data approximately and shows that doing so can improve the performance, lifetime, or density of solid-state memories We propose two mechanisms The first allows errors in multilevel cells by reducing the number of programming pulses used to write them The second mechanism mitigates wear-out failures and extends memory endurance by mapping approximate data onto blocks that have exhausted their hardware error correction resources Simulations show that reduced-precision writes in multilevel phase-change memory cells can be 17 × faster on average and using failed blocks can improve array lifetime by 23p on average with quality loss under 10p

...read moreread less

236 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

Cited by

PDF

Open Access

More filters

The PASCAL Visual Object Classes Challenge

[...]

Jianguo Zhang

01 Jan 2006

3,012 citations

Proceedings Article•DOI•

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

[...]

Tianshi Chen¹, Zidong Du¹, Ninghui Sun¹, Jia Wang¹, Chengyong Wu¹, Yunji Chen¹, Olivier Temam² - Show less +3 more•Institutions (2)

Chinese Academy of Sciences¹, French Institute for Research in Computer Science and Automation²

24 Feb 2014

TL;DR: This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.

...read moreread less

Abstract: Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65 nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.

...read moreread less

1,582 citations

Proceedings Article•DOI•

DaDianNao: A Machine-Learning Supercomputer

[...]

Yunji Chen¹, Luo Tao¹, Liu Shaoli¹, Zhang Shijin¹, Liqiang He², Jia Wang¹, Ling Li¹, Tianshi Chen¹, Zhiwei Xu¹, Ninghui Sun¹, Olivier Temam³ - Show less +7 more•Institutions (3)

Chinese Academy of Sciences¹, Inner Mongolia University², French Institute for Research in Computer Science and Automation³

13 Dec 2014

TL;DR: This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.

...read moreread less

Abstract: Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. We implement the node down to the place and route at 28nm, containing a combination of custom storage and computational units, with industry-grade interconnects.

...read moreread less

1,486 citations

Journal Article•DOI•

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

[...]

Ping Chi¹, Shuangchen Li¹, Cong Xu², Tao Zhang³, Jishen Zhao¹, Yongpan Liu⁴, Yu Wang⁴, Yuan Xie¹ - Show less +4 more•Institutions (4)

University of California¹, Hewlett-Packard², Nvidia³, Tsinghua University⁴

18 Jun 2016

TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.

...read moreread less

Abstract: Processing-in-memory (PIM) is a promising solution to address the "memory wall" challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential to be used for main memory. Moreover, with its crossbar array structure, ReRAM can perform matrix-vector multiplication efficiently, and has been widely studied to accelerate neural network (NN) applications. In this work, we propose a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory. In PRIME, a portion of ReRAM crossbar arrays can be configured as accelerators for NN applications or as normal memory for a larger memory space. We provide microarchitecture and circuit designs to enable the morphable functions with an insignificant area overhead. We also design a software/hardware interface for software developers to implement various NNs on PRIME. Benefiting from both the PIM architecture and the efficiency of using ReRAM for NN computation, PRIME distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance by ~2360× and the energy consumption by ~895×, across the evaluated machine learning benchmarks.

...read moreread less

1,197 citations

Proceedings Article•DOI•

ShiDianNao: shifting vision processing closer to the sensor

[...]

Zidong Du¹, Robert Fasthuber², Tianshi Chen¹, Paolo Ienne², Ling Li¹, Luo Tao¹, Xiaobing Feng¹, Yunji Chen¹, Olivier Temam³ - Show less +5 more•Institutions (3)

Chinese Academy of Sciences¹, École Polytechnique Fédérale de Lausanne², French Institute for Research in Computer Science and Automation³

13 Jun 2015

TL;DR: This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs.

...read moreread less

Abstract: In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm2 and consuming only 320mW, but still about 30× faster than high-end GPUs.

...read moreread less

1,005 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse