Home
/
Authors
/
Steven Swanson

Author

Steven Swanson

Other affiliations: MRIGlobal, University of Puget Sound, University of Washington ...read more

Bio: Steven Swanson is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Flash memory & Flash (photography). The author has an hindex of 43, co-authored 136 publications receiving 7932 citations. Previous affiliations of Steven Swanson include MRIGlobal & University of Puget Sound.

Topics: Flash memory, Flash (photography), File system, Flash file system, Memory bus ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2007
2006
2005
2004
2003
2002
2001
2000
1992

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories

[...]

Joel Coburn¹, Adrian M. Caulfield¹, Akel Ameen D¹, Laura M. Grupp¹, Rajesh Gupta¹, Ranjit Jhala¹, Steven Swanson¹ - Show less +3 more•Institutions (1)

University of California, San Diego¹

05 Mar 2011

TL;DR: A lightweight, high-performance persistent object system called NV-heaps is implemented that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about.

...read moreread less

Abstract: Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow programmers to build high-performance, persistent data structures in non-volatile storage that is almost as fast as DRAM. Creating these data structures requires a system that is lightweight enough to expose the performance of the underlying memories but also ensures safety in the presence of application and system failures by avoiding familiar bugs such as dangling pointers, multiple free()s, and locking errors. In addition, the system must prevent new types of hard-to-find pointer safety bugs that only arise with persistent objects. These bugs are especially dangerous since any corruption they cause will be permanent.We have implemented a lightweight, high-performance persistent object system called NV-heaps that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about. We implement search trees, hash tables, sparse graphs, and arrays using NV-heaps, BerkeleyDB, and Stasis. Our results show that NV-heap performance scales with thread count and that data structures implemented using NV-heaps out-perform BerkeleyDB and Stasis implementations by 32x and 244x, respectively, by avoiding the operating system and minimizing other software overheads. We also quantify the cost of enforcing the safety guarantees that NV-heaps provide and measure the costs of NV-heap primitive operations.

...read moreread less

850 citations

Proceedings Article•DOI•

Characterizing flash memory: anomalies, observations, and applications

[...]

Laura M. Grupp¹, Adrian M. Caulfield¹, Joel Coburn¹, Steven Swanson¹, Eitan Yaakobi¹, Paul H. Siegel¹, Jack K. Wolf¹ - Show less +3 more•Institutions (1)

University of California, San Diego¹

12 Dec 2009

TL;DR: This work empirically characterized flash memory technology from five manufacturers by directly measuring the performance, power, and reliability, and demonstrates that performance varies significantly between vendors, devices, and from publicly available datasheets.

...read moreread less

Abstract: Despite flash memory's promise, it suffers from many idiosyncrasies such as limited durability, data integrity problems, and asymmetry in operation granularity. As architects, we aim to find ways to overcome these idiosyncrasies while exploiting flash memory's useful characteristics. To be successful, we must understand the trade-offs between the performance, cost (in both power and dollars), and reliability of flash memory. In addition, we must understand how different usage patterns affect these characteristics. Flash manufacturers provide conservative guidelines about these metrics, and this lack of detail makes it difficult to design systems that fully exploit flash memory's capabilities. We have empirically characterized flash memory technology from five manufacturers by directly measuring the performance, power, and reliability. We demonstrate that performance varies significantly between vendors, devices, and from publicly available datasheets. We also demonstrate and quantify some unexpected device characteristics and show how we can use them to improve responsiveness and energy consumption of solid state disks by 44% and 13%, respectively, as well as increase flash device lifetime by 5.2x.

...read moreread less

483 citations

Proceedings Article•

NOVA: a log-structured file system for hybrid volatile/non-volatile main memories

[...]

Jian Xu¹, Steven Swanson¹•Institutions (1)

University of California, San Diego¹

22 Feb 2016

TL;DR: NoVA is presented, a file system designed to maximize performance on hybrid memory systems while providing strong consistency guarantees, which adapts conventional log-structured file system techniques to exploit the fast random access that NVMs provide.

...read moreread less

Abstract: Fast non-volatile memories (NVMs) will soon appear on the processor memory bus alongside DRAM. The resulting hybrid memory systems will provide software with sub-microsecond, high-bandwidth access to persistent data, but managing, accessing, and maintaining consistency for data stored in NVM raises a host of challenges. Existing file systems built for spinning or solid-state disks introduce software overheads that would obscure the performance that NVMs should provide, but proposed file systems for NVMs either incur similar overheads or fail to provide the strong consistency guarantees that applications require. We present NOVA, a file system designed to maximize performance on hybrid memory systems while providing strong consistency guarantees. NOVA adapts conventional log-structured file system techniques to exploit the fast random access that NVMs provide. In particular, it maintains separate logs for each inode to improve concurrency, and stores file data outside the log to minimize log size and reduce garbage collection costs. NOVA's logs provide metadata, data, and mmap atomicity and focus on simplicity and reliability, keeping complex metadata structures in DRAM to accelerate lookup operations. Experimental results show that in write-intensive workloads, NOVA provides 22% to 216× throughput improvement compared to state-of-the-art file systems, and 3.1× to 13.5× improvement compared to file systems that provide equally strong data consistency guarantees.

...read moreread less

450 citations

Journal Article•DOI•

Conservation cores: reducing the energy of mature computations

[...]

Ganesh Venkatesh¹, Jack Sampson¹, Nathan Goulding¹, Saturnino Garcia¹, Vladyslav Bryksin¹, Jose Lugo-Martinez¹, Steven Swanson¹, Michael Taylor¹ - Show less +4 more•Institutions (1)

University of California, San Diego¹

13 Mar 2010

TL;DR: A toolchain for automatically synthesizing c-cores from application source code is presented and it is demonstrated that they can significantly reduce energy and energy-delay for a wide range of applications, and patching can extend the useful lifetime of individual c-Cores to match that of conventional processors.

...read moreread less

Abstract: Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by reducing the per-computation power requirements and allowing more computations to execute under the same power budget. To pursue this goal, this paper introduces conservation cores. Conservation cores, or c-cores, are specialized processors that focus on reducing energy and energy-delay instead of increasing performance. This focus on energy makes c-cores an excellent match for many applications that would be poor candidates for hardware acceleration (e.g., irregular integer codes). We present a toolchain for automatically synthesizing c-cores from application source code and demonstrate that they can significantly reduce energy and energy-delay for a wide range of applications. The c-cores support patching, a form of targeted reconfigurability, that allows them to adapt to new versions of the software they target. Our results show that conservation cores can reduce energy consumption by up to 16.0x for functions and by up to 2.1x for whole applications, while patching can extend the useful lifetime of individual c-cores to match that of conventional processors.

...read moreread less

363 citations

Posted Content•

Basic Performance Measurements of the Intel Optane DC Persistent Memory Module

[...]

Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R. Dulloor, Jishen Zhao, Steven Swanson¹ - Show less +8 more•Institutions (1)

University of California, San Diego¹

13 Mar 2019-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work comprises the first in-depth, scholarly, performance review of Intel's Optane DC PMM, exploring its capabilities as a main memory device, and as persistent, byte-addressable memory exposed to user-space applications.

...read moreread less

Abstract: Scalable nonvolatile memory DIMMs will finally be commercially available with the release of the Intel Optane DC Persistent Memory Module (or just "Optane DC PMM"). This new nonvolatile DIMM supports byte-granularity accesses with access times on the order of DRAM, while also providing data storage that survives power outages. This work comprises the first in-depth, scholarly, performance review of Intel's Optane DC PMM, exploring its capabilities as a main memory device, and as persistent, byte-addressable memory exposed to user-space applications. This report details the technologies performance under a number of modes and scenarios, and across a wide variety of macro-scale benchmarks. Optane DC PMMs can be used as large memory devices with a DRAM cache to hide their lower bandwidth and higher latency. When used in this Memory (or cached) mode, Optane DC memory has little impact on applications with small memory footprints. Applications with larger memory footprints may experience some slow-down relative to DRAM, but are now able to keep much more data in memory. When used under a file system, Optane DC PMMs can result in significant performance gains, especially when the file system is optimized to use the load/store interface of the Optane DC PMM and the application uses many small, persistent writes. For instance, using the NOVA-relaxed NVMM file system, we can improve the performance of Kyoto Cabinet by almost 2x. Optane DC PMMs can also enable user-space persistence where the application explicitly controls its writes into persistent Optane DC media. In our experiments, modified applications that used user-space Optane DC persistence generally outperformed their file system counterparts. For instance, the persistent version of RocksDB performed almost 2x faster than the equivalent program utilizing an NVMM-aware file system.

...read moreread less

346 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The vision of autonomic computing

[...]

Jeffrey O. Kephart¹, David M. Chess¹•Institutions (1)

IBM¹

01 Jan 2003-IEEE Computer

TL;DR: A 2001 IBM manifesto noted the almost impossible difficulty of managing current and planned computing systems, which require integrating several heterogeneous environments into corporate-wide computing systems that extend into the Internet.

...read moreread less

Abstract: A 2001 IBM manifesto observed that a looming software complexity crisis -caused by applications and environments that number into the tens of millions of lines of code - threatened to halt progress in computing. The manifesto noted the almost impossible difficulty of managing current and planned computing systems, which require integrating several heterogeneous environments into corporate-wide computing systems that extend into the Internet. Autonomic computing, perhaps the most attractive approach to solving this problem, creates systems that can manage themselves when given high-level objectives from administrators. Systems manage themselves according to an administrator's goals. New components integrate as effortlessly as a new cell establishes itself in the human body. These ideas are not science fiction, but elements of the grand challenge to create self-managing computing systems.

...read moreread less

6,527 citations

Journal Article•DOI•

National Institute of Standards and Technology における超伝導研究及び生活

[...]

尚島影

01 Oct 2001-Ieej Transactions on Fundamentals and Materials

2,687 citations

Proceedings Article•DOI•

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

[...]

Tianshi Chen¹, Zidong Du¹, Ninghui Sun¹, Jia Wang¹, Chengyong Wu¹, Yunji Chen¹, Olivier Temam² - Show less +3 more•Institutions (2)

Chinese Academy of Sciences¹, French Institute for Research in Computer Science and Automation²

24 Feb 2014

TL;DR: This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.

...read moreread less

Abstract: Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65 nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.

...read moreread less

1,582 citations

Journal Article•DOI•

Phase Change Memory

[...]

H P Wong¹, S Raoux², Sangbum Kim¹, Jiale Liang¹, J P Reifenberg³, Bipin Rajendran², Mehdi Asheghi¹, Kenneth E. Goodson¹ - Show less +4 more•Institutions (3)

Stanford University¹, IBM², Intel³

20 Apr 2010

TL;DR: The physics behind this large resistivity contrast between the amorphous and crystalline states in phase change materials is presented and how it is being exploited to create high density PCM is described.

...read moreread less

Abstract: In this paper, recent progress of phase change memory (PCM) is reviewed. The electrical and thermal properties of phase change materials are surveyed with a focus on the scalability of the materials and their impact on device design. Innovations in the device structure, memory cell selector, and strategies for achieving multibit operation and 3-D, multilayer high-density memory arrays are described. The scaling properties of PCM are illustrated with recent experimental results using special device test structures and novel material synthesis. Factors affecting the reliability of PCM are discussed.

...read moreread less

1,488 citations

Proceedings Article•DOI•

DaDianNao: A Machine-Learning Supercomputer

[...]

Yunji Chen¹, Luo Tao¹, Liu Shaoli¹, Zhang Shijin¹, Liqiang He², Jia Wang¹, Ling Li¹, Tianshi Chen¹, Zhiwei Xu¹, Ninghui Sun¹, Olivier Temam³ - Show less +7 more•Institutions (3)

Chinese Academy of Sciences¹, Inner Mongolia University², French Institute for Research in Computer Science and Automation³

13 Dec 2014

TL;DR: This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.

...read moreread less

Abstract: Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. We implement the node down to the place and route at 28nm, containing a combination of custom storage and computational units, with industry-grade interconnects.

...read moreread less

1,486 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse