W
Wayne Luk
Researcher at Imperial College London
Publications - 737
Citations - 13643
Wayne Luk is an academic researcher from Imperial College London. The author has contributed to research in topics: Field-programmable gate array & Reconfigurable computing. The author has an hindex of 54, co-authored 703 publications receiving 12517 citations. Previous affiliations of Wayne Luk include Fudan University & University of London.
Papers
More filters
Journal ArticleDOI
Performance Tuning and Analysis for Stencil-Based Applications on POWER8 Processor
TL;DR: This article demonstrates an approach for combining general tuning techniques with the POWER8 hardware architecture through optimizing three representative stencil benchmarks, and provides useful guidance for optimizing stencil-based scientific applications on POWER systems.
Proceedings ArticleDOI
ADAM: Automated Design Analysis and Merging for Speeding up FPGA Development
TL;DR: ADAM is introduced, an approach for merging multiple FPGA designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs, especially during the development process.
Proceedings ArticleDOI
A Heterogeneous Computing Framework for Computational Finance
TL;DR: The Forward Financial Framework allows the computational finance problem specification to be captured precisely yet succinctly, then automatically creates efficient implementations for heterogeneous platforms, utilising both multi-core CPUs and FPGAs.
Proceedings ArticleDOI
Pipelined Genetic Propagation
TL;DR: A new hardware-oriented approach to GAs, called Pipelined Genetic Propagation (PGP), which is intrinsically distributed and pipelined, which allows the solution to be scaled to the available resources, and also to dynamically change topology at run-time to explore different solution strategies.
Proceedings ArticleDOI
Optimizing residue arithmetic on FPGAs
TL;DR: An extensive comparison between RNS and other number representations at both the arithmetic unit level and the application level shows that, for applications involving a large number of multiplications, the RNS designs can reduce up to 1/2 DSP48s for large bit-width settings.