Home
/
Authors
/
Martin D. F. Wong

Author

Martin D. F. Wong

University of Illinois at Urbana–Champaign

Other affiliations: Urbana University, University of Toronto

Bio: Martin D. F. Wong is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Static routing & Routing (electronic design automation). The author has an hindex of 31, co-authored 206 publications receiving 3707 citations. Previous affiliations of Martin D. F. Wong include Urbana University & University of Toronto.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2001
1996
1986

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

An effective GPU implementation of breadth-first search

[...]

Lijuan Luo¹, Martin D. F. Wong¹, Wen-mei W. Hwu¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

13 Jun 2010

TL;DR: A new GPU implementation of BFS that uses a hierarchical queue management technique and a three-layer kernel arrangement strategy that guarantees the same computational complexity as the fastest sequential version and can achieve up to 10 times speedup.

...read moreread less

Abstract: Breadth-first search (BFS) has wide applications in electronic design automation (EDA) as well as in other fields. Researchers have tried to accelerate BFS on the GPU, but the two published works are both asymptotically slower than the fastest CPU implementation. In this paper, we present a new GPU implementation of BFS that uses a hierarchical queue management technique and a three-layer kernel arrangement strategy. It guarantees the same computational complexity as the fastest sequential version and can achieve up to 10 times speedup.

...read moreread less

235 citations

Proceedings Article•DOI•

Fast algorithms for IR drop analysis in large power grid

[...]

Yu Zhong¹, Martin D. F. Wong¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

31 May 2005

TL;DR: Two iterative algorithms based on node-by-node traversals and row- by-row traversals of the power grid, respectively are presented and can be considered as efficient implementations of the classical successive over relaxation iterative method for solving linear systems.

...read moreread less

Abstract: Due to the extremely large size of power grids, IR drop analysis has become a computationally challenging problem both in terms of runtime and memory usage. Although IR drop analysis can be naturally formulated as the problem of solving a linear system, the system is too large to be solved by existing linear solvers. In this paper, we present two iterative algorithms based on node-by-node traversals and row-by-row traversals of the power grid, respectively. Our algorithms are extremely fast and guarantee convergence to the exact solutions. In fact, they can be considered as efficient implementations of the classical successive over relaxation iterative method for solving linear systems. Our methods take full advantage of the special structure of the power grid. Experimental results show that our algorithms out-perform the random-walk-based algorithm which is the best known method today. For a 16-million node problem, our row-based algorithm took 26.47 minutes while the random-walk-based algorithm took 19.6 hours. Our row-based algorithm produced an exact solution while the random walk produced a solution with maximum error of 5.7 mV.

...read moreread less

105 citations

Proceedings Article•DOI•

Optical proximity correction (OPC)-friendly maze routing

[...]

Li-Da Huang, Martin D. F. Wong¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

07 Jun 2004

TL;DR: This paper proposes a maze routing method that considers the optical effect in the routing algorithm by utilizing the symmetrical property of the optical system and an effective algorithm is designed to solve the problem.

...read moreread less

Abstract: As the technology migrates into the deep submicron manufacturing(DSM) era, the critical dimension of the circuits is getting smaller than the lithographic wavelength. The unavoidable light diffraction phenomena in the sub-wavelength technologies have become one of the major factors in the yield rate. Optical proximity correction (OPC) is one of the methods adopted to compensate for the light diffraction effect as a post layout process.However, the process is time-consuming and the results are still limited by the original layout quality. In this paper, we propose a maze routing method that considers the optical effect in the routing algorithm. By utilizing the symmetrical property of the optical system, the light diffraction is efficiently calculated and stored in tables. The costs that guide the router to minimize the optical interferences are obtained from these look-up tables. The problem is first formulated as a constrained maze routing problem, then it is shown to be a multiple constrained shortest path problem. Based on the Lagrangian relaxation method, an effective algorithm is designed to solve the problem.

...read moreread less

96 citations

Proceedings Article•DOI•

Coupling-aware Dummy Metal Insertion for Lithography

[...]

Liang Deng¹, Martin D. F. Wong¹, Kai-Yuan Chao², Hua Xiang³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Intel², IBM³

23 Jan 2007

TL;DR: This paper presents an optimal algorithm that can minimize lithography cost subject to any given coupling capacitance bound and achieves a highly uniform density because of the locality of coupling capacitate, which automatically ameliorates chemical mechanical polish (CMP) problem.

...read moreread less

Abstract: As integrated circuits manufacturing technology is advancing into 65nm and 45nm nodes, extensive resolution enhancement techniques (RETs) are needed to correctly manufacture a chip design. The widely used RET called off-axis illumination (OAI) introduces forbidden pitches which lead to very complex design rules. It has been observed that imposing uniformity on layout designs can substantially improve printability under OAI. For metal layers, uniformity can be achieved simply by inserting dummy metal wire segments at all free spaces. Simulation results indeed show significant improvement in printability with such a dummy metal insertion approach. To minimize mask cost, it is advantageous to use dummy metal segments that are of the same size as regular metal wires due to their simple geometry. But these dummy wires are printable and hence increase coupling capacitances and potentially affect yield. The alternative is to use a set of parallel sub-resolution thin wires (which is not printed) to replace a printable dummy wire segment. These invisible dummy metal segments do not increase coupling capacitances but bring a higher lithography cost, which includes mask cost and RET/process expense. This paper presents a strategy for dummy metal insertion that can optimally trade off lithography cost and coupling capacitance. In particular, we present an optimal algorithm that can minimize lithography cost subject to any given coupling capacitance bound. Moreover, this dummy metal insertion achieves a highly uniform density because of the locality of coupling capacitance, which automatically ameliorates chemical mechanical polish (CMP) problem.

...read moreread less

86 citations

Proceedings Article•DOI•

A polynomial time triple patterning algorithm for cell based row-structure layout

[...]

Haitong Tian¹, Hongbo Zhang², Qiang Ma², Zigang Xiao¹, Martin D. F. Wong¹ - Show less +1 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Synopsys²

05 Nov 2012

TL;DR: A polynomial time algorithm is proposed to solve the standard cell based row-structure layout decomposition problem in TPL and it is shown that for standardcell based TPL layout decompositions problem, it is polynometric time solvable.

...read moreread less

Abstract: As minimum feature size keeps shrinking, and the next generation lithography (e.g, EUV) further delays, double patterning lithography (DPL) has been widely recognized as a feasible lithography solution in 20nm technology node. However, as technology continues to scale to 14/10nm, DPL begins to show its limitations and usually generates too many undesirable stitches. Triple patterning lithography (TPL) is a natural extension of DPL to conquer the difficulties and achieve a stitch-free layout decomposition. In this paper, we study the standard cell based row-structure layout decomposition problem in TPL. Although the general TPL layout decomposition problem is NP-hard, in this paper we will show that for standard cell based TPL layout decomposition problem, it is polynomial time solvable. We propose a polynomial time algorithm to solve the problem optimally and our approach has the capability to find all stitch-free decompositions. Color balancing is also considered to ensure a balanced triple patterning decomposition. To speed up the algorithm, we further propose a hierarchical algorithm for standard cell based layout, which can reduce the run time by 34.5% on average without sacrificing the optimality. We also extend our algorithm to allow stitches for complex circuit designs, and our algorithm guarantees to find optimal solutions with minimum number of stitches.

...read moreread less

82 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

Collapse

Cited by

PDF

Open Access

More filters

DOI•

International Technology Roadmap for Semiconductors 2003の要求清浄度について－シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について－

[...]

飯田裕幸, 竹田菊男, 藤本武利

20 Sep 2004

1,387 citations

Book•

IEEE transactions on computer-aided design of integrated circuits and systems : a publication of the IEEE Circuits and Systems Society

[...]

Ieee Circuits

01 Jan 1982

729 citations

Journal Article•

Improving resolution in photolithography with a phase-shifting mask

[...]

Marc D. Levenson, N. S. Viswanathan, Robert A. Simpson

01 Jan 2004-SPIE milestone series

TL;DR: The phase-shifting mask as mentioned in this paper consists of a normal transmission mask that has been coated with a transparent layer patterned to ensure that the optical phases of nearest apertures are opposite.

...read moreread less

Abstract: The phase-shifting mask consists of a normal transmission mask that has been coated with a transparent layer patterned to ensure that the optical phases of nearest apertures are opposite. Destructive interference between waves from adjacent apertures cancels some diffraction effects and increases the spatial resolution with which such patterns can be projected. A simple theory predicts a near doubling of resolution for illumination with partial incoherence σ < 0.3, and substantial improvements in resolution for σ < 0.7. Initial results obtained with a phase-shifting mask patterned with typical device structures by electron-beam lithography and exposed using a Mann 4800 10× tool reveals a 40-percent increase in usuable resolution with some structures printed at a resolution of 1000 lines/mm. Phase-shifting mask structures can be used to facilitate proximity printing with larger gaps between mask and wafer. Theory indicates that the increase in resolution is accompanied by a minimal decrease in depth of focus. Thus the phase-shifting mask may be the most desirable device for enhancing optical lithography resolution in the VLSI/VHSIC era.

...read moreread less

705 citations

Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing

[...]

John A. Stratton, Christopher I. Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, Wen-mei W. Hwu¹ - Show less +4 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 2012

TL;DR: By including versions of varying levels of optimization of the same fundamental algorithm, the Parboil benchmarks present opportunities to demonstrate tools and architectures that help programmers get the most out of their parallel hardware.

...read moreread less

Abstract: The Parboil benchmarks are a set of throughput computing applications useful for studying the performance of throughput computing architecture and compilers. The name comes from the culinary term for a partial cooking process, which represents our belief that useful throughput computing benchmarks must be “cooked”, or preselected to implement a scalable algorithm with fine-grained paralle l tasks. But useful benchmarks for this field cannot be “fully cooked”, because the architectures and programming models and supporting tools are evolving rapidly enough that static benchmark codes will lose relevance very quickly. We have collected benchmarks from throughput computing application researchers in many different scientific and commercial fields including image processing, biomolec ular simulation, fluid dynamics, and astronomy. Each benchmark includes several implementations. Some implementations we provide as readable base implementations from which new optimization efforts can begin, and others as examples of the current state-of-the-art targeting specific CPU and GPU architectures. As we continue to optimiz e these benchmarks for new and existing architectures ourselves, we will also gladly accept new implementations and benchmark contributions from developers to recognize those at the frontier of performance optimization on each architecture. Finally, by including versions of varying levels of optimization of the same fundamental algorithm, the benchmarks present opportunities to demonstrate tools and architectures that help programmers get the most out of their parallel hardware. Less optimized versions are presented as challenges to the compiler and architecture research communities: to develop the technology that automatically raises the performance of simpler implementations to the performance level of sophisticated programmer-optimized implementations, or demonstrate any other performance or programmability improvements. We hope that these benchmarks will facilitate effective demonstrations of such technology.

...read moreread less

695 citations

Proceedings Article•DOI•

Scalable GPU graph traversal

[...]

Duane Merrill¹, Michael Garland², Andrew S. Grimshaw¹•Institutions (2)

University of Virginia¹, Nvidia²

25 Feb 2012

TL;DR: This work presents a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity.

...read moreread less

Abstract: Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter.We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.

...read moreread less

541 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse