Home
/
Authors
/
Wayne Luk

Author

Wayne Luk

Other affiliations: Fudan University, University of London, University of Oxford

Bio: Wayne Luk is an academic researcher from Imperial College London. The author has contributed to research in topics: Field-programmable gate array & Reconfigurable computing. The author has an hindex of 54, co-authored 703 publications receiving 12517 citations. Previous affiliations of Wayne Luk include Fudan University & University of London.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Reconfigurable computing: architectures and design methods

[...]

Tim Todman¹, George A. Constantinides¹, Steven J. E. Wilton², Oskar Mencer¹, Wayne Luk¹, Peter Y. K. Cheung¹ - Show less +2 more•Institutions (2)

Imperial College London¹, University of British Columbia²

25 Jul 2005

TL;DR: It is shown that reconfigurable computing designs are capable of achieving up to 500 times speedup and 70% energy savings over microprocessor implementations for specific applications.

...read moreread less

Abstract: Reconfigurable computing is becoming increasingly attractive for many applications. This survey covers two aspects of reconfigurable computing: architectures and design methods. The paper includes recent advances in reconfigurable architectures, such as the Alters Stratix II and Xilinx Virtex 4 FPGA devices. The authors identify major trends in general-purpose and special-purpose design methods. It is shown that reconfigurable computing designs are capable of achieving up to 500 times speedup and 70% energy savings over microprocessor implementations for specific applications.

...read moreread less

414 citations

Journal Article•DOI•

Accuracy-Guaranteed Bit-Width Optimization

[...]

Dong-U Lee¹, A.A. Gaffar², Ray C. C. Cheung², Oskar Mencer², Wayne Luk², George A. Constantinides² - Show less +2 more•Institutions (2)

University of California, Los Angeles¹, Imperial College London²

01 Oct 2006-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: An automated static approach for optimizing bit widths of fixed-point feedforward designs with guaranteed accuracy, called MiniBit, is presented and is demonstrated with polynomial approximation, RGB-to-YCbCr conversion, matrix multiplication, B-splines, and discrete cosine transform placed and routed on a Xilinx Virtex-4 FPGA.

...read moreread less

Abstract: An automated static approach for optimizing bit widths of fixed-point feedforward designs with guaranteed accuracy, called MiniBit, is presented. Methods to minimize both the integer and fraction parts of fixed-point signals with the aim of minimizing the circuit area are described. For range analysis, the technique in this paper identifies the number of integer bits necessary to meet range requirements. For precision analysis, a semianalytical approach with analytical error models in conjunction with adaptive simulated annealing is employed to optimize the number of fraction bits. The analytical models make it possible to guarantee overflow/underflow protection and numerical accuracy for all inputs over the user-specified input intervals. Using a stream compiler for field-programmable gate arrays (FPGAs), the approach in this paper is demonstrated with polynomial approximation, RGB-to-YCbCr conversion, matrix multiplication, B-splines, and discrete cosine transform placed and routed on a Xilinx Virtex-4 FPGA. Improvements for a given design reduce the area and the latency by up to 26% and 12%, respectively, over a design using optimum uniform fraction bit widths. Studies show that MiniBit-optimized designs are within 1% of the area produced from the integer linear programming approach

...read moreread less

226 citations

Proceedings Article•DOI•

A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation

[...]

David B. Thomas¹, Lee Howes¹, Wayne Luk¹•Institutions (1)

Imperial College London¹

22 Feb 2009

TL;DR: For each platform, the most appropriate algorithm for generating each type of number is determined, then the peak generation rate and estimated power efficiency for each device are calculated.

...read moreread less

Abstract: The future of high-performance computing is likely to rely on the ability to efficiently exploit huge amounts of parallelism. One way of taking advantage of this parallelism is to formulate problems as "embarrassingly parallel" Monte-Carlo simulations, which allow applications to achieve a linear speedup over multiple computational nodes, without requiring a super-linear increase in inter-node communication. However, such applications are reliant on a cheap supply of high quality random numbers, particularly for the three main maximum entropy distributions: uniform, used as a general source of randomness; Gaussian, for discrete-time simulations; and exponential, for discrete-event simulations. In this paper we look at four different types of platform: conventional multi-core CPUs (Intel Core2); GPUs (NVidia GTX 200); FPGAs (Xilinx Virtex-5); and Massively Parallel Processor Arrays (Ambric AM2000). For each platform we determine the most appropriate algorithm for generating each type of number, then calculate the peak generation rate and estimated power efficiency for each device.

...read moreread less

202 citations

Journal Article•DOI•

Gaussian random number generators

[...]

David B. Thomas¹, Wayne Luk¹, Philip H. W. Leong², John Villasenor³•Institutions (3)

Imperial College London¹, The Chinese University of Hong Kong², University of California, Los Angeles³

02 Nov 2007-ACM Computing Surveys

TL;DR: The algorithms underlying various GRNGs are described, their computational requirements are compared, and the quality of the random numbers are examined with emphasis on the behaviour in the tail region of the Gaussian probability density function.

...read moreread less

Abstract: Rapid generation of high quality Gaussian random numbers is a key capability for simulations across a wide range of disciplines. Advances in computing have brought the power to conduct simulations with very large numbers of random numbers and with it, the challenge of meeting increasingly stringent requirements on the quality of Gaussian random number generators (GRNG). This article describes the algorithms underlying various GRNGs, compares their computational requirements, and examines the quality of the random numbers with emphasis on the behaviour in the tail region of the Gaussian probability density function.

...read moreread less

202 citations

Journal Article•DOI•

Pipeline vectorization

[...]

Markus Weinhardt¹, Wayne Luk•Institutions (1)

Imperial College London¹

01 Feb 2001-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper presents pipeline vectorization, a method for synthesizing hardware pipelines based on software vectorizing compilers that improves efficiency and ease of development of hardware designs, particularly for users with little electronics design experience.

...read moreread less

Abstract: This paper presents pipeline vectorization, a method for synthesizing hardware pipelines based on software vectorizing compilers. The method improves efficiency and ease of development of hardware designs, particularly for users with little electronics design experience. We propose several loop transformations to customize pipelines to meet hardware resource constraints while maximizing available parallelism. For runtime reconfigurable systems, we apply hardware specialization to increase circuit utilization. Our approach is especially effective for highly repetitive computations in digital signal processor (DSP) and multimedia applications. Case studies using field programmable gate arrays (FPGAs)-based platforms are presented to demonstrate the benefits of our approach and to evaluate tradeoffs between alternative implementations. For instance, the loop-tiling transformation, has been found to improve vectorization performance 30-40 times above a PC-based software implementation, depending on whether runtime reconfiguration (RTR) is used.

...read moreread less

185 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

On robust estimation of the location parameter

[...]

Frederick R. Forst

01 Jan 1980

3,652 citations

Journal Article•

An overview of AspectJ

[...]

Gregor Kiczales¹, Erik Hilsdale², Jim Hugunin², Mik Kersten², Jeffrey Palm², William G. Griswold³ - Show less +2 more•Institutions (3)

University of British Columbia¹, PARC², University of California³

01 Jan 2001-Lecture Notes in Computer Science

TL;DR: AspectJ as mentioned in this paper is a simple and practical aspect-oriented extension to Java with just a few new constructs, AspectJ provides support for modular implementation of a range of crosscutting concerns.

...read moreread less

Abstract: Aspect] is a simple and practical aspect-oriented extension to Java With just a few new constructs, AspectJ provides support for modular implementation of a range of crosscutting concerns. In AspectJ's dynamic join point model, join points are well-defined points in the execution of the program; pointcuts are collections of join points; advice are special method-like constructs that can be attached to pointcuts; and aspects are modular units of crosscutting implementation, comprising pointcuts, advice, and ordinary Java member declarations. AspectJ code is compiled into standard Java bytecode. Simple extensions to existing Java development environments make it possible to browse the crosscutting structure of aspects in the same kind of way as one browses the inheritance structure of classes. Several examples show that AspectJ is powerful, and that programs written using it are easy to understand.

...read moreread less

2,947 citations

Journal Article•DOI•

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

[...]

Vivienne Sze¹, Yu-Hsin Chen¹, Tien-Ju Yang¹, Joel Emer¹•Institutions (1)

Massachusetts Institute of Technology¹

20 Nov 2017

TL;DR: In this paper, the authors provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs, and discuss various hardware platforms and architectures that support DNN, and highlight key trends in reducing the computation cost of deep neural networks either solely via hardware design changes or via joint hardware and DNN algorithm changes.

...read moreread less

Abstract: Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic codesigns, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the tradeoffs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

...read moreread less

2,391 citations

[서평]「Applied Cryptography」

[...]

염흥렬

01 Apr 1997

TL;DR: The objective of this paper is to give a comprehensive introduction to applied cryptography with an engineer or computer scientist in mind on the knowledge needed to create practical systems which supports integrity, confidentiality, or authenticity.

...read moreread less

Abstract: The objective of this paper is to give a comprehensive introduction to applied cryptography with an engineer or computer scientist in mind. The emphasis is on the knowledge needed to create practical systems which supports integrity, confidentiality, or authenticity. Topics covered includes an introduction to the concepts in cryptography, attacks against cryptographic systems, key use and handling, random bit generation, encryption modes, and message authentication codes. Recommendations on algorithms and further reading is given in the end of the paper. This paper should make the reader able to build, understand and evaluate system descriptions and designs based on the cryptographic components described in the paper.

...read moreread less

2,188 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse