Home
/
Authors
/
Nam Sung Kim

Author

Nam Sung Kim

University of Illinois at Urbana–Champaign

Other affiliations: Advanced Micro Devices, Intel, Wisconsin Alumni Research Foundation ...read more

Bio: Nam Sung Kim is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Cache & Multi-core processor. The author has an hindex of 41, co-authored 300 publications receiving 10005 citations. Previous affiliations of Nam Sung Kim include Advanced Micro Devices & Intel.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Leakage current: Moore's law meets static power

[...]

Nam Sung Kim¹, Todd Austin¹, D. Baauw¹, Trevor Mudge¹, Krisztian Flautner, Jie Hu², Mary Jane Irwin², Mahmut Kandemir², Vijaykrishnan Narayanan² - Show less +5 more•Institutions (2)

University of Michigan¹, Pennsylvania State University²

01 Dec 2003-IEEE Computer

TL;DR: The other source of power dissipation in microprocessors, dynamic power, arises from the repeated capacitance charge and discharge on the output of the hundreds of millions of gates in today's chips.

...read moreread less

Abstract: Off-state leakage is static power, current that leaks through transistors even when they are turned off. The other source of power dissipation in today's microprocessors, dynamic power, arises from the repeated capacitance charge and discharge on the output of the hundreds of millions of gates in today's chips. Until recently, only dynamic power has been a significant source of power consumption, and Moore's law helped control it. However, power consumption has now become a primary microprocessor design constraint; one that researchers in both industry and academia will struggle to overcome in the next few years. Microprocessor design has traditionally focused on dynamic power consumption as a limiting factor in system integration. As feature sizes shrink below 0.1 micron, static power is posing new low-power design challenges.

...read moreread less

1,233 citations

Proceedings Article•DOI•

Razor: a low-power pipeline based on circuit-level timing speculation

[...]

Daniel J. Ernst¹, Nam Sung Kim¹, Shidhartha Das¹, Sanjay Pant¹, Rajeev R. Rao¹, Toan Pham¹, Conrad H. Ziesler¹, David Blaauw¹, Todd Austin¹, Krisztian Flautner, Trevor Mudge¹ - Show less +7 more•Institutions (1)

University of Michigan¹

03 Dec 2003

TL;DR: A solution by which the circuit can be operated even below the ‘critical’ voltage, so that no margins are required and thus more energy can be saved.

...read moreread less

Abstract: With increasing clock frequencies and silicon integration, power aware computing has become a critical concern in the design of embedded processors and systems-on-chip. One of the more effective and widely used methods for power-aware computing is dynamic voltage scaling (DVS). In order to obtain the maximum power savings from DVS, it is essential to scale the supply voltage as low as possible while ensuring correct operation of the processor. The critical voltage is chosen such that under a worst-case scenario of process and environmental variations, the processor always operates correctly. However, this approach leads to a very conservative supply voltage since such a worst-case combination of different variabilities is very rare. In this paper, we propose a new approach to DVS, called Razor, based on dynamic detection and correction of circuit timing errors. The key idea of Razor is to tune the supply voltage by monitoring the error rate during circuit operation, thereby eliminating the need for voltage margins and exploiting the data dependence of circuit delay. A Razor flip-flop is introduced that double-samples pipeline stage values, once with a fast clock and again with a time-borrowing delayed clock. A metastability-tolerant comparator then validates latch values sampled with the fast clock. In the event of timing error, a modified pipeline mispeculation recovery mechanism restores correct program state. A prototype Razor pipeline was designed in a 0.18 /spl mu/m technology and was analyzed. Razor energy overhead during normal operation is limited to 3.1%. Analyses of a full-custom multiplier and a SPICE-level Kogge-Stone adder model reveal that substantial energy savings are possible for these devices (up to 64.2%) with little impact on performance due to error recovery (less than 3%).

...read moreread less

1,137 citations

Journal Article•DOI•

Drowsy caches: simple techniques for reducing leakage power

[...]

Krisztian Flautner, Nam Sung Kim¹, Steve M. Martin¹, David Blaauw¹, Trevor Mudge¹ - Show less +1 more•Institutions (1)

University of Michigan¹

01 May 2002

TL;DR: It is argued that the use of drowsy caches can simplify the design and control of low-leakage caches, and avoid the need to completely turn off selected cache lines and lose their state.

...read moreread less

Abstract: On-chip caches represent a sizable fraction of the total power consumption of microprocessors. Although large caches can significantly improve performance, they have the potential to increase power consumption. As feature sizes shrink, the dominant component of this power loss will be leakage. However, during a fixed period of time the activity in a cache is only centered on a small subset of the lines. This behavior can be exploited to cut the leakage power of large caches by putting the cold cache lines into a state preserving, low-power drowsy mode. Moving lines into and out of drowsy state incurs a slight performance loss. In this paper we investigate policies and circuit techniques for implementing drowsy caches. We show that with simple architectural techniques, about 80%-90% of the cache lines can be maintained in a drowsy state without affecting performance by more than 1%. According to our projections, in a 0.07um CMOS process, drowsy caches will be able to reduce the total energy (static and dynamic) consumed in the caches by 50%-75%. We also argue that the use of drowsy caches can simplify the design and control of low-leakage caches, and avoid the need to completely turn off selected cache lines and lose their state.

...read moreread less

823 citations

Proceedings Article•DOI•

GPUWattch: enabling energy optimizations in GPGPUs

[...]

Jingwen Leng¹, Tayler Hetherington², Ahmed ElTantawy², Syed Zohaib Gilani³, Nam Sung Kim³, Tor M. Aamodt², Vijay Janapa Reddi¹ - Show less +3 more•Institutions (3)

University of Texas at Austin¹, University of British Columbia², University of Wisconsin-Madison³

23 Jun 2013

TL;DR: This work proposes a new GPGPU power model that is configurable, capable of cycle-level calculations, and carefully validated against real hardware measurements, and accurately tracks the power consumption trend over time.

...read moreread less

Abstract: General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance. As such, GPU architects require robust tools that will enable them to quickly explore new ways to optimize GPGPUs for energy efficiency. We propose a new GPGPU power model that is configurable, capable of cycle-level calculations, and carefully validated against real hardware measurements. To achieve configurability, we use a bottom-up methodology and abstract parameters from the microarchitectural components as the model's inputs. We developed a rigorous suite of 80 microbenchmarks that we use to bound any modeling uncertainties and inaccuracies. The power model is comprehensively validated against measurements of two commercially available GPUs, and the measured error is within 9.9% and 13.4% for the two target GPUs (GTX 480 and Quadro FX5600). The model also accurately tracks the power consumption trend over time. We integrated the power model with the cycle-level simulator GPGPU-Sim and demonstrate the energy savings by utilizing dynamic voltage and frequency scaling (DVFS) and clock gating. Traditional DVFS reduces GPU energy consumption by 14.4% by leveraging within-kernel runtime variations. More finer-grained SM cluster-level DVFS improves the energy savings from 6.6% to 13.6% for those benchmarks that show clustered execution behavior. We also show that clock gating inactive lanes during divergence reduces dynamic power by 11.2%.

...read moreread less

558 citations

Journal Article•DOI•

Approximate Computing: A Survey

[...]

Qiang Xu¹, Todd Mytkowicz², Nam Sung Kim³•Institutions (3)

The Chinese University of Hong Kong¹, Microsoft², University of Illinois at Urbana–Champaign³

01 Feb 2016-IEEE Design & Test of Computers

TL;DR: This paper presents a survey of state-of-the-art work in all aspects of approximate computing and highlights future research challenges in this field.

...read moreread less

Abstract: As one of the most promising energy-efficient computing paradigms, approximate computing has gained a lot of research attention in the past few years. This paper presents a survey of state-of-the-art work in all aspects of approximate computing and highlights future research challenges in this field.

...read moreread less

420 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits

[...]

Kaushik Roy¹, Saibal Mukhopadhyay¹, H. Mahmoodi-Meimand¹•Institutions (1)

Purdue University¹

29 Apr 2003

TL;DR: Channel engineering techniques including retrograde well and halo doping are explained as means to manage short-channel effects for continuous scaling of CMOS devices and different circuit techniques to reduce the leakage power consumption are explored.

...read moreread less

Abstract: High leakage current in deep-submicrometer regimes is becoming a significant contributor to power dissipation of CMOS circuits as threshold voltage, channel length, and gate oxide thickness are reduced. Consequently, the identification and modeling of different leakage components is very important for estimation and reduction of leakage power, especially for low-power applications. This paper reviews various transistor intrinsic leakage mechanisms, including weak inversion, drain-induced barrier lowering, gate-induced drain leakage, and gate oxide tunneling. Channel engineering techniques including retrograde well and halo doping are explained as means to manage short-channel effects for continuous scaling of CMOS devices. Finally, the paper explores different circuit techniques to reduce the leakage power consumption.

...read moreread less

2,281 citations

Journal Article•DOI•

Designing reliable systems from unreliable components: the challenges of transistor variability and degradation

[...]

Shekhar Borkar¹•Institutions (1)

Intel¹

01 Nov 2005-IEEE Micro

TL;DR: This article discusses effects of variability in transistor performance and proposes microarchitecture, circuit, and testing research that focuses on designing with many unreliable components (transistors) to yield reliable system designs.

...read moreread less

Abstract: As technology scales, variability in transistor performance continues to increase, making transistors less and less reliable. This creates several challenges in building reliable systems, from the unpredictability of delay to increasing leakage current. Finding solutions to these challenges require a concerted effort on the part of all the players in a system design. This article discusses these effects and proposes microarchitecture, circuit, and testing research that focuses on designing with many unreliable components (transistors) to yield reliable system designs.

...read moreread less

1,421 citations

Journal Article•DOI•

Leakage current: Moore's law meets static power

[...]

Nam Sung Kim¹, Todd Austin¹, D. Baauw¹, Trevor Mudge¹, Krisztian Flautner, Jie Hu², Mary Jane Irwin², Mahmut Kandemir², Vijaykrishnan Narayanan² - Show less +5 more•Institutions (2)

University of Michigan¹, Pennsylvania State University²

01 Dec 2003-IEEE Computer

...read moreread less

1,233 citations

Journal Article•DOI•

Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing

[...]

Zhi Zhou¹, Xu Chen¹, En Li¹, Liekang Zeng¹, Ke Luo¹, Junshan Zhang² - Show less +2 more•Institutions (2)

Sun Yat-sen University¹, Arizona State University²

12 Jun 2019

TL;DR: A comprehensive survey of the recent research efforts on edge intelligence can be found in this paper, where the authors review the background and motivation for AI running at the network edge and provide an overview of the overarching architectures, frameworks, and emerging key technologies for deep learning model toward training/inference at the edge.

...read moreread less

Abstract: With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet of Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution. The resulted new interdiscipline, edge AI or edge intelligence (EI), is beginning to receive a tremendous amount of interest. However, research on EI is still in its infancy stage, and a dedicated venue for exchanging the recent advances of EI is highly desired by both the computer system and AI communities. To this end, we conduct a comprehensive survey of the recent research efforts on EI. Specifically, we first review the background and motivation for AI running at the network edge. We then provide an overview of the overarching architectures, frameworks, and emerging key technologies for deep learning model toward training/inference at the network edge. Finally, we discuss future research opportunities on EI. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions, and inspire further research ideas on EI.

...read moreread less

977 citations

Journal Article•DOI•

Deep Learning With Edge Computing: A Review

[...]

Jiasi Chen¹, Xukan Ran¹•Institutions (1)

University of California, Riverside¹

15 Jul 2019

TL;DR: This paper will provide an overview of applications where deep learning is used at the network edge, discuss various approaches for quickly executing deep learning inference across a combination of end devices, edge servers, and the cloud, and describe the methods for training deep learning models across multiple edge devices.

...read moreread less

Abstract: Deep learning is currently widely used in a variety of applications, including computer vision and natural language processing. End devices, such as smartphones and Internet-of-Things sensors, are generating data that need to be analyzed in real time using deep learning or used to train deep learning models. However, deep learning inference and training require substantial computation resources to run quickly. Edge computing, where a fine mesh of compute nodes are placed close to end devices, is a viable way to meet the high computation and low-latency requirements of deep learning on edge devices and also provides additional benefits in terms of privacy, bandwidth efficiency, and scalability. This paper aims to provide a comprehensive review of the current state of the art at the intersection of deep learning and edge computing. Specifically, it will provide an overview of applications where deep learning is used at the network edge, discuss various approaches for quickly executing deep learning inference across a combination of end devices, edge servers, and the cloud, and describe the methods for training deep learning models across multiple edge devices. It will also discuss open challenges in terms of systems performance, network technologies and management, benchmarks, and privacy. The reader will take away the following concepts from this paper: understanding scenarios where deep learning at the network edge can be useful, understanding common techniques for speeding up deep learning inference and performing distributed training on edge devices, and understanding recent trends and opportunities.

...read moreread less

793 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse