Home
/
Authors
/
Larkhoon Leem

Author

Larkhoon Leem

Bio: Larkhoon Leem is an academic researcher from Intel. The author has contributed to research in topics: Spintronics & NAND gate. The author has an hindex of 10, co-authored 16 publications receiving 1091 citations. Previous affiliations of Larkhoon Leem include Stanford University.

Topics: Spintronics, NAND gate, Redundancy (engineering), Homojunction, Bit error rate ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Sequoia: programming the memory hierarchy

[...]

Kayvon Fatahalian¹, Daniel Reiter Horn¹, Timothy James Knight¹, Larkhoon Leem¹, Mike Houston¹, Ji Young Park¹, Mattan Erez¹, Manman Ren¹, Alex Aiken¹, William J. Dally¹, Pat Hanrahan¹ - Show less +7 more•Institutions (1)

Stanford University¹

11 Nov 2006

TL;DR: This work has implemented a complete programming system, including a compiler and runtime systems for cell processor-based blade systems and distributed memory clusters, and demonstrates efficient performance running Sequoia programs on both of these platforms.

...read moreread less

Abstract: We present Sequoia, a programming language designed to facilitate the development of memory hierarchy aware parallel programs that remain portable across modern machines featuring different memory hierarchy configurations. Sequoia abstractly exposes hierarchical memory in the programming model and provides language mechanisms to describe communication vertically through the machine and to localize computation to particular memory locations within it. We have implemented a complete programming system, including a compiler and runtime systems for Cell processor-based blade systems and distributed memory clusters, and demonstrate efficient performance running Sequoia programs on both of these platforms.

...read moreread less

482 citations

Proceedings Article•DOI•

ERSA: error resilient system architecture for probabilistic applications

[...]

Larkhoon Leem¹, Hyungmin Cho¹, Jason Bau¹, Quinn Jacobson², Subhasish Mitra¹ - Show less +1 more•Institutions (2)

Stanford University¹, Nokia²

08 Mar 2010

TL;DR: Error Resilient System Architecture (ERSA) is presented, a low-cost robust system architecture for emerging killer probabilistic applications such as Recognition, Mining and Synthesis (RMS) applications and may be adapted for general-purpose applications that are less resilient to errors.

...read moreread less

Abstract: There is a growing concern about the increasing vulnerability of future computing systems to errors in the underlying hardware. Traditional redundancy techniques are expensive for designing energy-efficient systems that are resilient to high error rates. We present Error Resilient System Architecture (ERSA), a low-cost robust system architecture for emerging killer probabilistic applications such as Recognition, Mining and Synthesis (RMS) applications. While resilience of such applications to errors in low-order bits of data is well-known, execution of such applications on error-prone hardware significantly degrades output quality (due to high-order bit errors and crashes). ERSA achieves high error resilience to high-order bit errors and control errors (in addition to low-order bit errors) using a judicious combination of 3 key ideas: (1) asymmetric reliability in many-core architectures, (2) error-resilient algorithms at the core of probabilistic applications, and (3) intelligent software optimizations. Error injection experiments on a multi-core ERSA hardware prototype demonstrate that, even at very high error rates of 20,000 errors/second/core or 2×10−4 error/cycle/core (with errors injected in architecturally-visible registers), ERSA maintains 90% or better accuracy of output results, together with minimal impact on execution time, for probabilistic applications such as K-Means clustering, LDPC decoding and Bayesian networks. Moreover, we demonstrate the effectiveness of ERSA in tolerating high rates of static memory errors that are characteristic of emerging challenges such as Vccmin problems and erratic bit errors. Using the concept of configurable reliability, ERSA platforms may also be adapted for general-purpose applications that are less resilient to errors (but at higher costs).

...read moreread less

246 citations

Journal Article•DOI•

ERSA: Error Resilient System Architecture for Probabilistic Applications

[...]

Hyungmin Cho¹, Larkhoon Leem², Subhasish Mitra¹•Institutions (2)

Stanford University¹, Intel²

01 Apr 2012-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

...read moreread less

Abstract: There is a growing concern about the increasing vulnerability of future computing systems to errors in the underlying hardware. Traditional redundancy techniques are expensive for designing energy-efficient systems that are resilient to high error rates. We present Error Resilient System Architecture (ERSA), a robust system architecture which targets emerging killer applications such as recognition, mining, and synthesis (RMS) with inherent error resilience, and ensures high degrees of resilience at low cost. Using the concept of configurable reliability, ERSA may also be adapted for general-purpose applications that are less resilient to errors (but at higher costs). While resilience of RMS applications to errors in low-order bits of data is well-known, execution of such applications on error-prone hardware significantly degrades output quality (due to high-order bit errors and crashes). ERSA achieves high error resilience to high-order bit errors and control flow errors (in addition to low-order bit errors) using a judicious combination of the following key ideas: 1) asymmetric reliability in many-core architectures; 2) error-resilient algorithms at the core of probabilistic applications; and 3) intelligent software optimizations. Error injection experiments on a multicore ERSA hardware prototype demonstrate that, even at very high error rates of 20 errors/flip-flop/108 cycles (equivalent to 25000 errors/core/s), ERSA maintains 90% or better accuracy of output results, together with minimal impact on execution time, for probabilistic applications such as K-Means clustering, LDPC decoding, and Bayesian network inference. In addition, we demonstrate the effectiveness of ERSA in tolerating high rates of static memory errors that are characteristic of emerging challenges related to SRAM Vccmin problems and erratic bit errors.

...read moreread less

199 citations

Patent•

Using read values from previous decoding operations to calculate soft bit information in an error recovery operation

[...]

Larkhoon Leem¹, Xin Guo¹, Ravi H. Motwani¹, Rosanna Yee¹, Scott E. Nelson¹ - Show less +1 more•Institutions (1)

Intel¹

27 Sep 2013

TL;DR: In this paper, an apparatus, system, and method for performing an error recovery operation with respect to a read of a block of memory cells in a storage device is presented. But the method is limited to the case where the bit reliability metrics are decoded by combining the determined read value with at least one value saved during the previous iteration.

...read moreread less

Abstract: Provided are an apparatus, system, and method for performing an error recovery operation with respect to a read of a block of memory cells in a storage device. A current iteration of a decoding operation is performed by applying at least one reference voltage for the current iteration to a block of the memory cells in the storage device to determine current read values in response to applying the reference voltage. A symbol is generated for each of the read memory cells by combining the determined current read value with at least one value saved during the previous iteration. The symbols are used to determine bit reliability metrics for the block of memory cells. The bit reliability metrics are decoded. In response to the decoding failing, an additional iteration of the decoding operation is performed.

...read moreread less

59 citations

Journal Article•DOI•

Magnetic coupled spin-torque devices for nonvolatile logic applications

[...]

Larkhoon Leem, James S. Harris

02 Feb 2009-Journal of Applied Physics

TL;DR: The magnetic coupled spin-torque device (MCSTD) as discussed by the authors is a new spintronics device architecture that is free from the difficulties in spin transport and spin detection, it uses spintorque transfer technique and magnetic coupling to modulate its energy barrier.

...read moreread less

Abstract: The magnetic coupled spin-torque device (MCSTD) is a new spintronics device architecture that is free from the difficulties in spin transport and spin detection. It uses spin-torque transfer technique and magnetic coupling to modulate its energy barrier. Requirements for logic device: fast switching speed, inherent gain, inversion capability, and spin interconnection techniques of MCSTD are presented. Logic functionalities such as NAND, NOR, and NOT are successfully demonstrated in micromagnetics simulations.

...read moreread less

28 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Fast parallel algorithms for short-range molecular dynamics

[...]

Steven J. Plimpton¹•Institutions (1)

Sandia National Laboratories¹

01 May 1993

TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.

...read moreread less

Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

...read moreread less

29,323 citations

Proceedings Article•DOI•

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

[...]

Jonathan Ragan-Kelley¹, Connelly Barnes², Andrew Adams¹, Sylvain Paris², Frédo Durand¹, Saman Amarasinghe¹ - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

16 Jun 2013

TL;DR: A systematic model of the tradeoff space fundamental to stencil pipelines is presented, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule are presented.

...read moreread less

Abstract: Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values.We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5x faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.

...read moreread less

1,074 citations

Proceedings Article•DOI•

Approximate computing: An emerging paradigm for energy-efficient design

[...]

Jie Han¹, Michael Orshansky²•Institutions (2)

University of Alberta¹, University of Texas at Austin²

27 May 2013

TL;DR: This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing.

...read moreread less

Abstract: Approximate computing has recently emerged as a promising approach to energy-efficient design of digital systems. Approximate computing relies on the ability of many systems and applications to tolerate some loss of quality or optimality in the computed result. By relaxing the need for fully precise or completely deterministic operations, approximate computing techniques allow substantially improved energy efficiency. This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing.

...read moreread less

921 citations

Journal Article•DOI•

The future of microprocessors

[...]

Shekhar Borkar¹, Andrew A. Chien²•Institutions (2)

Intel¹, University of California, San Diego²

01 May 2011-Communications of The ACM

TL;DR: Energy efficiency is the new fundamental limiter of processor performance, way beyond numbers of processors.

...read moreread less

Abstract: Energy efficiency is the new fundamental limiter of processor performance, way beyond numbers of processors.

...read moreread less

920 citations

Journal Article•DOI•

Proposal for an all-spin logic device with built-in memory

[...]

Behtash Behin-Aein¹, Deepanjan Datta¹, Sayeef Salahuddin², Supriyo Datta¹•Institutions (2)

Purdue University¹, University of California, Berkeley²

01 Apr 2010-Nature Nanotechnology

TL;DR: This work proposes a spintronic device that uses spin at every stage of its operation and shows the five essential characteristics for logic applications: concatenability, nonlinearity, feedback elimination, gain and a complete set of Boolean operations.

...read moreread less

Abstract: A spintronic device in which the input, output and internal states are all represented by spin, and that shows the five essential characteristics necessary for logic applications, is proposed.

...read moreread less

769 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194

Collapse