Home
/
Authors
/
Krit Athikulwongse

Author

Krit Athikulwongse

Other affiliations: Thailand National Science and Technology Development Agency, NECTEC

Bio: Krit Athikulwongse is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Three-dimensional integrated circuit & Memory bandwidth. The author has an hindex of 14, co-authored 25 publications receiving 990 citations. Previous affiliations of Krit Athikulwongse include Thailand National Science and Technology Development Agency & NECTEC.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A study of Through-Silicon-Via impact on the 3D stacked IC layout

[...]

Dae Hyun Kim¹, Krit Athikulwongse¹, Sung Kyu Lim¹•Institutions (1)

Georgia Institute of Technology¹

02 Nov 2009

TL;DR: A new force-directed 3D gate-level placement that efficiently handles TSV usage, and an algorithm that assigns TSVs to nets to complete routing that involves TSVs are presented.

...read moreread less

Abstract: Through-Silicon-Via (TSV) is the enabling technology for the fine-grained 3D integration of multiple dies into a single stack. These TSVs occupy non-negligible silicon area because of their sheer size. This significant silicon area occupied by the TSVs and the interconnections made to the TSVs greatly affect area, power, performance, and reliability of 3D IC layouts. Well-managed TSVs alleviate congestion, reduce wirelength, and improve performance, whereas excessive TSVs not only increase the die area, but also have negative impact on many design objectives. In this paper, we study the impact of TSV on various aspects of 3D layouts. We use GDSII layouts of 2D and 3D designs, and thoroughly compare the pros and cons of TSV usage. We propose a new force-directed 3D gate-level placement that efficiently handles TSVs. In addition, we present an algorithm that assigns TSVs to nets to complete routing that involves TSVs. This algorithm, together with our 3D placer, is integrated into a commercial P&R tool to generate fully validated GDSII layouts. Our experiments based on synthesized benchmarks indicate that our algorithms help generate GDSII layouts of 3D designs that are optimized in terms of area, wirelength, and metal layer count.

...read moreread less

214 citations

Book Chapter•DOI•

3D-MAPS: 3D Massively parallel processor with stacked memory

[...]

Dae Hyun Kim¹, Krit Athikulwongse¹, Michael B. Healy¹, Mohammad M. Hossain¹, Moongon Jung¹, Ilya Khorosh¹, Gokul Kumar¹, Young-Joon Lee¹, Dean L. Lewis¹, Tzu-Wei Lin¹, Chang Liu¹, Shreepad Panth¹, Mohit Pathak¹, Minzhen Ren¹, Guanhao Shen¹, Taigon Song¹, Dong Hyuk Woo¹, Xin Zhao¹, Joungho Kim², Ho Choi³, Gabriel H. Loh¹, Hsien-Hsin Lee¹, Sung Kyu Lim¹ - Show less +19 more•Institutions (3)

Georgia Institute of Technology¹, KAIST², Amkor Technology³

03 Apr 2012

TL;DR: 3D-MAPS (3D Massively Parallel Processor with Stacked Memory) is a two-tier 3D IC, where the logic die consists of 64 general-purpose processor cores running at 277MHz, and the memory die contains 256KB SRAM.

...read moreread less

Abstract: Several recent works have demonstrated the benefits of through-silicon-via (TSV) based 3D integration [1–4], but none of them involves a fully functioning multicore processor and memory stacking. 3D-MAPS (3D Massively Parallel Processor with Stacked Memory) is a two-tier 3D IC, where the logic die consists of 64 general-purpose processor cores running at 277MHz, and the memory die contains 256KB SRAM (see Fig. 10.6.1). Fabrication is done using 130nm GlobalFoundries device technology and Tezzaron TSV and bonding technology. Packaging is done by Amkor. This processor contains 33M transistors, 50K TSVs, and 50K face-to-face connections in 5×5mm2 footprint. The chip runs at 1.5V and consumes up to 4W, resulting in 16W/cm2 power density. The core architecture is developed from scratch to benefit from single-cycle access to SRAM.

...read moreread less

181 citations

Proceedings Article•DOI•

TSV stress aware timing analysis with applications to 3D-IC layout optimization

[...]

Jae-Seok Yang¹, Krit Athikulwongse², Young-Joon Lee², Sung Kyu Lim², David Z. Pan¹ - Show less +1 more•Institutions (2)

University of Texas at Austin¹, Georgia Institute of Technology²

13 Jun 2010

TL;DR: Systematic TSV stress aware timing analysis is proposed and it is shown that stress-aware perturbation could reduce cell delay by up to 14.0% and critical path delay by 6.5% in a test case.

...read moreread less

Abstract: As the geometry shrinking faces severe limitations, 3D wafer stacking with through silicon via (TSV) has gained interest for future SOC integration. Since TSV fill material and silicon have different coefficients of thermal expansion (CTE), TSV causes silicon deformation due to different temperatures at chip manufacturing and operating. The widely used TSV fill material is copper which causes tensile stress on silicon near TSV. In this paper, we propose systematic TSV stress aware timing analysis and show how to optimize layout for better performance. First, we generate a stress contour map with an analytical radial stress model. Then, the tensile stress is converted to hole and electron mobility variations depending on geometric relation between TSVs and transistors. Mobility variation aware cell library and netlist are generated and incorporated in an industrial timing engine for 3D-IC timing analysis. It is interesting to observe that rise and fall time react differently to stress and relative locations with respect to TSVs. Overall, TSV stress induced timing variations can be as much as ± 10% for an individual cell. Thus as an application for layout optimization, we can exploit the stress-induced mobility enhancement to improve timing on critical cells. We show that stress-aware perturbation could reduce cell delay by up to 14.0% and critical path delay by 6.5% in our test case.

...read moreread less

117 citations

Proceedings Article•DOI•

Stress-driven 3D-IC placement with TSV keep-out zone and regularity study

[...]

Krit Athikulwongse¹, Ashutosh Chakraborty², Jae-Seok Yang², David Z. Pan², Sung Kyu Lim¹ - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, University of Texas at Austin²

07 Nov 2010

TL;DR: This paper proposes a new TSV stress-driven force-directed 3D placement that consistently provides placement result with, on average, 21.6% better worst negative slack (WNS) and 28.0% better total negative slack than wirelength-driven placement.

...read moreread less

Abstract: Through-silicon via (TSV) fabrication causes tensile stress around TSVs which results in significant carrier mobility variation in the devices in their neighborhood. Keep-out zone (KOZ) is a conservative way to prevent any devices/cells from being impacted by the TSV-induced stress. However, owing to already large TSV size, large KOZ can significantly reduce the placement area available for cells, thus requiring larger dies which negate improvement in wirelength and timing due to 3D integration. In this paper, we study the impact of KOZ dimension on stress, carrier mobility variation, area, wirelength, and performance of 3D ICs. We demonstrate that, instead of requiring large KOZ, 3D-IC placers must exploit TSV stress-induced carrier mobility variation to improve the timing and area objectives during placement. We propose a new TSV stress-driven force-directed 3D placement that consistently provides placement result with, on average, 21.6% better worst negative slack (WNS) and 28.0% better total negative slack (TNS) than wirelength-driven placement.

...read moreread less

94 citations

Proceedings Article•DOI•

Design and analysis of 3D-MAPS: A many-core 3D processor with stacked memory

[...]

Michael B. Healy¹, Krit Athikulwongse¹, Rohan Goel¹, Mohammad M. Hossain¹, Dae Hyun Kim¹, Young-Joon Lee¹, Dean L. Lewis¹, Tzu-Wei Lin¹, Chang Liu¹, Moongon Jung¹, Brian Ouellette¹, Mohit Pathak¹, Hemant Sane¹, Guanhao Shen¹, Dong Hyuk Woo¹, Xin Zhao¹, Gabriel H. Loh¹, Hsien-Hsin S. Lee¹, Sung Kyu Lim¹ - Show less +15 more•Institutions (1)

Georgia Institute of Technology¹

01 Nov 2010

TL;DR: The design and analysis of3D-MAPS, a 64-core 3D-stacked memory-on-processor running at 277 MHz with 63 GB/s memory bandwidth, sent for fabrication using Tezzaron's 3D stacking technology is described.

...read moreread less

Abstract: We describe the design and analysis of 3D-MAPS, a 64-core 3D-stacked memory-on-processor running at 277 MHz with 63 GB/s memory bandwidth, sent for fabrication using Tezzaron's 3D stacking technology. We also describe the design flow used to implement it using industrial 2D tools and custom add-ons to handle 3D specifics.

...read moreread less

78 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Report•DOI•

Response to FCC 98-208 notice of inquiry in the matter of revision of part 15 of the commission's rules regarding ultra-wideband transmission systems

[...]

R M Morey¹•Institutions (1)

Lawrence Livermore National Laboratory¹

08 Dec 1998

TL;DR: In this article, the authors consider the unique features of UWB technology and propose that the FCC should consider them in considering changes to Part 15 and take into account their unique features for radar and communications uses.

...read moreread less

Abstract: In general, Micropower Impulse Radar (MIR) depends on Ultra-Wideband (UWB) transmission systems. UWB technology can supply innovative new systems and products that have an obvious value for radar and communications uses. Important applications include bridge-deck inspection systems, ground penetrating radar, mine detection, and precise distance resolution for such things as liquid level measurement. Most of these UWB inspection and measurement methods have some unique qualities, which need to be pursued. Therefore, in considering changes to Part 15 the FCC needs to take into account the unique features of UWB technology. MIR is applicable to two general types of UWB systems: radar systems and communications systems. Currently LLNL and its licensees are focusing on radar or radar type systems. LLNL is evaluating MIR for specialized communication systems. MIR is a relatively low power technology. Therefore, MIR systems seem to have a low potential for causing harmful interference to other users of the spectrum since the transmitted signal is spread over a wide bandwidth, which results in a relatively low spectral power density.

...read moreread less

644 citations

Ultra Wideband Signals and Systems in Communication Engineering

[...]

Vivien Chu

01 Jan 2007

570 citations

Proceedings Article•DOI•

NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads

[...]

Seth H. Pugsley¹, Jeffrey Jestes¹, Huihui Zhang¹, Rajeev Balasubramonian¹, Vijayalakshmi Srinivasan², Alper Buyuktosunoglu², Al Davis¹, Feifei Li¹ - Show less +4 more•Institutions (2)

University of Utah¹, IBM²

23 Mar 2014

TL;DR: A number of key elements necessary in realizing efficient NDC operation are described and evaluated, including low-EPI cores, long daisy chains of memory devices, and the dynamic activation of cores and SerDes links.

...read moreread less

Abstract: While Processing-in-Memory has been investigated for decades, it has not been embraced commercially. A number of emerging technologies have renewed interest in this topic. In particular, the emergence of 3D stacking and the imminent release of Micron's Hybrid Memory Cube device have made it more practical to move computation near memory. However, the literature is missing a detailed analysis of a killer application that can leverage a Near Data Computing (NDC) architecture. This paper focuses on in-memory MapReduce workloads that are commercially important and are especially suitable for NDC because of their embarrassing parallelism and largely localized memory accesses. The NDC architecture incorporates several simple processing cores on a separate, non-memory die in a 3D-stacked memory package; these cores can perform Map operations with efficient memory access and without hitting the bandwidth wall. This paper describes and evaluates a number of key elements necessary in realizing efficient NDC operation: (i) low-EPI cores, (ii) long daisy chains of memory devices, (iii) the dynamic activation of cores and SerDes links. Compared to a baseline that is heavily optimized for MapReduce execution, the NDC design yields up to 15X reduction in execution time and 18X reduction in system energy.

...read moreread less

263 citations

Proceedings Article•DOI•

NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules

[...]

Amin Farmahini-Farahani¹, Jung Ho Ahn², Katherine Morrow¹, Nam Sung Kim¹•Institutions (2)

University of Wisconsin-Madison¹, Seoul National University²

01 Feb 2015

TL;DR: This paper proposes near-DRAM acceleration (NDA) architectures, which process data using accelerators 3D-stacked on DRAM devices comprising off-chip main memory modules, substantially reducing energy consumption and improving performance.

...read moreread less

Abstract: Energy consumed for transferring data across the processor memory hierarchy constitutes a large fraction of total system energy consumption, and this fraction has steadily increased with technology scaling. In this paper, we propose near-DRAM acceleration (NDA) architectures, which process data using accelerators 3D-stacked on DRAM devices comprising off-chip main memory modules. NDA transfers most data through high-bandwidth and low-energy 3D interconnects between accelerators and DRAM devices instead of low-bandwidth and high-energy off-chip interconnects between a processor and DRAM devices, substantially reducing energy consumption and improving performance. Unlike previous near-memory processing architectures, NDA is built upon commodity DRAM devices; apart from inserting through-silicon vias (TSVs) to 3D-interconnect DRAM devices and accelerators, NDA requires minimal changes to the commodity DRAM device and standard memory module architectures. This allows NDA to be more easily adopted in both existing and emerging systems. Our experiments demonstrate that, on average, our NDA-based system consumes 46% (68%) lower (data transfer) energy at 1.67× higher performance than a system that integrates the same accelerator logic within the processor itself.

...read moreread less

251 citations

Journal Article•DOI•

Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems

[...]

Kevin Hsieh¹, Eiman Ebrahimi², Gwangsun Kim³, Niladrish Chatterjee², Mike O'Connor², Nandita Vijaykumar¹, Onur Mutlu¹, Stephen W. Keckler² - Show less +4 more•Institutions (3)

Carnegie Mellon University¹, Nvidia², KAIST³

18 Jun 2016

TL;DR: Extensive evaluations across a variety of modern memory-intensive GPU workloads show that TOM significantly improves performance compared to a baseline GPU system that cannot offload computation to 3D-stacked memories.

...read moreread less

Abstract: Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to significantly alleviate this bottleneck by directly connecting a logic layer to the DRAM layers with high bandwidth connections. Recent work has shown promising potential performance benefits from an architecture that connects multiple such 3D-stacked memories and offloads bandwidth-intensive computations to a GPU in each of the logic layers. An unsolved key challenge in such a system is how to enable computation offloading and data mapping to multiple 3D-stacked memories without burdening the programmer such that any application can transparently benefit from near-data processing capabilities in the logic layer.Our paper develops two new mechanisms to address this key challenge. First, a compiler-based technique that automatically identifies code to offload to a logic-layer GPU based on a simple cost-benefit analysis. Second, a software/hardware cooperative mechanism that predicts which memory pages will be accessed by offloaded code, and places those pages in the memory stack closest to the offloaded code, to minimize off-chip bandwidth consumption. We call the combination of these two programmer-transparent mechanisms TOM: Transparent Offloading and Mapping.Our extensive evaluations across a variety of modern memory-intensive GPU workloads show that, without requiring any program modification, TOM significantly improves performance (by 30% on average, and up to 76%) compared to a baseline GPU system that cannot offload computation to 3D-stacked memories.

...read moreread less

234 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172

Collapse