Home
/
Authors
/
Jeffrey Draper

Author

Jeffrey Draper

Other affiliations: Information Sciences Institute, University of Texas at Austin

Bio: Jeffrey Draper is an academic researcher from University of Southern California. The author has contributed to research in topics: Transactional memory & Soft error. The author has an hindex of 25, co-authored 137 publications receiving 2654 citations. Previous affiliations of Jeffrey Draper include Information Sciences Institute & University of Texas at Austin.

Papers published on a yearly basis

2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1994
1993
1992
1991

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

The architecture of the DIVA processing-in-memory chip

[...]

Jeffrey Draper¹, Jacqueline Chame¹, Mary Hall¹, Craig S. Steele¹, Tim Barrett¹, Jeff LaCoss¹, John J. Granacki¹, Jaewook Shin¹, Chun Chen¹, Chang Woo Kang¹, Ihn Kim¹, Gokhan Daglikoca¹ - Show less +8 more•Institutions (1)

Information Sciences Institute¹

22 Jun 2002

TL;DR: The DIVA (Data IntensiVe Architecture) system incorporates a collection of Processing-In-Memory chips as smart-memory co-processors to a conventional microprocessor, and a PIM-based architecture with many such chips yields significantly higher performance than a multiprocessor of a similar scale and at a much reduced hardware cost.

...read moreread less

Abstract: The DIVA (Data IntensiVe Architecture) system incorporates a collection of Processing-In-Memory (PIM) chips as smart-memory co-processors to a conventional microprocessor. We have recently fabricated prototype DIVA PIMs. These chips represent the first smart-memory devices designed to support virtual addressing and capable of executing multiple threads of control. In this paper, we describe the prototype PIM architecture. We emphasize three unique features of DIVA PIMs, namely, the memory interface to the host processor, the 256-bit wide datapaths for exploiting on-chip bandwidth, and the address translation unit. We present detailed simulation results on eight benchmark applications. When just a single PIM chip is used, we achieve an average speedup of 3.3X over host-only execution, due to lower memory stall times and increased fine-grain parallelism. These 1-PIM results suggest that a PIM-based architecture with many such chips yields significantly higher performance than a multiprocessor of a similar scale and at a much reduced hardware cost.

...read moreread less

363 citations

Journal Article•DOI•

A Comprehensive Analytical Model for Wormhole Routing in Multicomputer Systems

[...]

Jeffrey Draper¹, Joydeep Ghosh¹•Institutions (1)

University of Texas at Austin¹

01 Nov 1994-Journal of Parallel and Distributed Computing

TL;DR: The model introduced in this paper is accurate and quite simple, and is sufficiently general to be extended for several networks, including k-ary n-cubes, and related routing paradigms, such as virtual cut-through.

...read moreread less

241 citations

Proceedings Article•DOI•

Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture

[...]

Mary Hall¹, Peter M. Kogge², Jeff Koller¹, Pedro C. Diniz¹, Jacqueline Chame¹, Jeffrey Draper¹, Jeff LaCoss¹, John J. Granacki¹, Jay B. Brockman², Apoorv Srivastava¹, William C. Athas¹, Vincent W. Freeh², Jaewook Shin¹, Joonseok Park¹ - Show less +10 more•Institutions (2)

Information Sciences Institute¹, University of Notre Dame²

01 Jan 1999

TL;DR: The potential of PIM-based architectures in accelerating the performance of three irregular computations, sparse conjugate gradient, a natural-join database operation and an object-oriented database query are demonstrated.

...read moreread less

Abstract: Processing-in-memory (PIM) chips that integrate processor logic into memory devices offer a new opportunity for bridging the growing gap between processor and memory speeds, especially for applications with high memory-bandwidth requirements. The Data-IntensiVe Architecture (DIVA) system combines PIM memories with one or more external host processors and a PIM-to-PIM interconnect. DIVA increases memory bandwidth through two mechanisms: (1) performing selected computation in memory, reducing the quantity of data transferred across the processor-memory interface; and (2) providing communication mechanisms called parcels for moving both data and computation throughout memory, further bypassing the processor-memory bus. DIVA uniquely supports acceleration of important irregular applications, including sparse-matrix and pointer-based computations. In this paper, we focus on several aspects of DIVA designed to effectively support such computations at very high performance levels: (1) the memory model and parcel definitions; (2) the PIM-to-PIM interconnect; and, (3) requirements for the processor-to-memory interface. We demonstrate the potential of PIM-based architectures in accelerating the performance of three irregular computations, sparse conjugate gradient, a natural-join database operation and an object-oriented database query.

...read moreread less

232 citations

Proceedings Article•DOI•

Parallel double error correcting code design to mitigate multi-bit upsets in SRAMs

[...]

R. Naseer¹, Jeffrey Draper¹•Institutions (1)

University of Southern California¹

18 Nov 2008

TL;DR: A double-error correcting ECC implementation technique suitable for SRAM applications is presented and shows that this DEC scheme reduces errors by 98.5% compared to only 44% reduction by conventional SEC-DED ECC.

...read moreread less

Abstract: The range of SRAM multi-bit upsets (MBU) in sub-100 nm technologies is characterized using irradiation tests on two prototype ICs, developed in 90 nm commercial processes. Results reveal that MBU, as large as 13-bit, can occur in these technologies, limiting the efficacy of conventional SEC-DED error-correcting codes (ECC). A double-error correcting (DEC) ECC implementation technique suitable for SRAM applications is presented. Results show that this DEC scheme reduces errors by 98.5% compared to only 44% reduction by conventional SEC-DED ECC.

...read moreread less

147 citations

Proceedings Article•DOI•

Critical Charge Characterization for Soft Error Rate Modeling in 90nm SRAM

[...]

R. Naseer¹, Younes Boulghassoul¹, Jeffrey Draper¹, S. DasGupta², Arthur F. Witulski² - Show less +1 more•Institutions (2)

University of Southern California¹, Vanderbilt University²

27 May 2007

TL;DR: The authors investigate the critical charge (Qcrit) required to upset a 6T SRAM cell designed in a commercial 90nm process and characterize Qcrit using different current models and show that there are significant differences in Qcrit values depending on which models are used.

...read moreread less

Abstract: Due to continuous technology scaling, the reduction of nodal capacitances and the lowering of power supply voltages result in an ever decreasing minimal charge capable of upsetting the logic state of memory circuits. In this paper the authors investigate the critical charge (Qcrit) required to upset a 6T SRAM cell designed in a commercial 90nm process. The authors characterize Qcrit using different current models and show that there are significant differences in Qcrit values depending on which models are used. Discrepancies in critical charge characterization are shown to result in under-predictions of the SRAM's associated soft error rate as large as two orders of magnitude. For accurate Qcrit calculation, it is critical that 3D device simulation is used to calibrate the current pulse modeling heavy ion strikes on the circuit, since the stimuli characteristics are technology feature size dependant. Current models with very fast characteristic timing parameters are shown to result in conservative soft error rate predictions; and can assertively be used to model ion strikes when 3D simulation data is not available.

...read moreread less

112 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Cooperative spectrum sensing in cognitive radio networks: A survey

[...]

Ian F. Akyildiz¹, Brandon F. Lo¹, Ravikumar Balakrishnan¹•Institutions (1)

Georgia Institute of Technology¹

01 Mar 2011-Physical Communication

TL;DR: The state-of-the-art survey of cooperative sensing is provided to address the issues of cooperation method, cooperative gain, and cooperation overhead.

...read moreread less

1,800 citations

Journal Article•DOI•

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

[...]

Ping Chi¹, Shuangchen Li¹, Cong Xu², Tao Zhang³, Jishen Zhao¹, Yongpan Liu⁴, Yu Wang⁴, Yuan Xie¹ - Show less +4 more•Institutions (4)

University of California¹, Hewlett-Packard², Nvidia³, Tsinghua University⁴

18 Jun 2016

TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.

...read moreread less

Abstract: Processing-in-memory (PIM) is a promising solution to address the "memory wall" challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential to be used for main memory. Moreover, with its crossbar array structure, ReRAM can perform matrix-vector multiplication efficiently, and has been widely studied to accelerate neural network (NN) applications. In this work, we propose a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory. In PRIME, a portion of ReRAM crossbar arrays can be configured as accelerators for NN applications or as normal memory for a larger memory space. We provide microarchitecture and circuit designs to enable the morphable functions with an insignificant area overhead. We also design a software/hardware interface for software developers to implement various NNs on PRIME. Benefiting from both the PIM architecture and the efficiency of using ReRAM for NN computation, PRIME distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance by ~2360× and the energy consumption by ~895×, across the evaluated machine learning benchmarks.

...read moreread less

1,197 citations

Journal Article•DOI•

Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives

[...]

Radu Marculescu¹, Umit Y. Ogras¹, Li-Shiuan Peh², Natalie Enright Jerger³, Yatin Hoskote⁴ - Show less +1 more•Institutions (4)

Carnegie Mellon University¹, Princeton University², University of Wisconsin-Madison³, Intel⁴

01 Jan 2009-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper provides a general description of NoC architectures and applications and enumerates several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation.

...read moreread less

Abstract: To alleviate the complex communication problems that arise as the number of on-chip components increases, network-on-chip (NoC) architectures have been recently proposed to replace global interconnects. In this paper, we first provide a general description of NoC architectures and applications. Then, we enumerate several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation. Motivation, problem description, proposed approaches, and open issues are discussed for each problem from system, microarchitecture, and circuit perspectives. Finally, we address the interactions among these research problems and put the NoC design process into perspective.

...read moreread less

733 citations

Proceedings Article•DOI•

A scalable processing-in-memory accelerator for parallel graph processing

[...]

Junwhan Ahn¹, Sungpack Hong², Sungjoo Yoo¹, Onur Mutlu³, Kiyoung Choi¹ - Show less +1 more•Institutions (3)

Seoul National University¹, Oracle Corporation², Carnegie Mellon University³

13 Jun 2015

TL;DR: This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.

...read moreread less

Abstract: The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly important. In particular, large-scale graph processing is gaining attention due to its broad applicability from social science to machine learning. However, scalable hardware design that can efficiently process large graphs in main memory is still an open problem. Ideally, cost-effective and scalable graph processing systems can be realized by building a system whose performance increases proportionally with the sizes of graphs that can be stored in the system, which is extremely challenging in conventional systems due to severe memory bandwidth limitations. In this work, we argue that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve such an objective. The key modern enabler for PIM is the recent advancement of the 3D integration technology that facilitates stacking logic and memory dies in a single package, which was not available when the PIM concept was originally examined. In order to take advantage of such a new technology to enable memory-capacity-proportional performance, we design a programmable PIM accelerator for large-scale graph processing called Tesseract. Tesseract is composed of (1) a new hardware architecture that fully utilizes the available memory bandwidth, (2) an efficient method of communication between different memory partitions, and (3) a programming interface that reflects and exploits the unique hardware design. It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model. Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems.

...read moreread less

718 citations

Proceedings Article•DOI•

PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning

[...]

Linghao Song¹, Xuehai Qian², Hai Li³, Yi Chen⁴•Institutions (4)

University of Pittsburgh¹, University of Southern California², Nanyang Technological University³, Chinese Academy of Sciences⁴

01 Feb 2017

TL;DR: PipeLayer is presented, a ReRAM-based PIM accelerator for CNNs that support both training and testing and proposes highly parallel design based on the notion of parallelism granularity and weight replication, which enables the highly pipelined execution of bothTraining and testing, without introducing the potential stalls in previous work.

...read moreread less

Abstract: Convolution neural networks (CNNs) are the heart of deep learning applications. Recent works PRIME [1] and ISAAC [2] demonstrated the promise of using resistive random access memory (ReRAM) to perform neural computations in memory. We found that training cannot be efficiently supported with the current schemes. First, they do not consider weight update and complex data dependency in training procedure. Second, ISAAC attempts to increase system throughput with a very deep pipeline. It is only beneficial when a large number of consecutive images can be fed into the architecture. In training, the notion of batch (e.g. 64) limits the number of images can be processed consecutively, because the images in the next batch need to be processed based on the updated weights. Third, the deep pipeline in ISAAC is vulnerable to pipeline bubbles and execution stall. In this paper, we present PipeLayer, a ReRAM-based PIM accelerator for CNNs that support both training and testing. We analyze data dependency and weight update in training algorithms and propose efficient pipeline to exploit inter-layer parallelism. To exploit intra-layer parallelism, we propose highly parallel design based on the notion of parallelism granularity and weight replication. With these design choices, PipeLayer enables the highly pipelined execution of both training and testing, without introducing the potential stalls in previous work. The experiment results show that, PipeLayer achieves the speedups of 42.45x compared with GPU platform on average. The average energy saving of PipeLayer compared with GPU implementation is 7.17x.

...read moreread less

633 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse