Showing papers by "Wayne Luk published in 2000"

PDF

Open Access

Journal Article•DOI•

Video image processing with the Sonic architecture

[...]

S.D. Haynes, J. Stone, Peter Y. K. Cheung¹, Wayne Luk¹•Institutions (1)

01 Apr 2000-IEEE Computer

TL;DR: Sonic is a configurable computing system that performs real-time video image processing and describes how it implements algorithms for two-dimensional linear transforms, fractal image generation, filters, and other video effects.

...read moreread less

Abstract: Current industrial video-processing systems use a mixture of high-performance workstations and application-specific integrated circuits. However, video image processing in the professional broadcast environment requires more computational power and data throughput than most of today's general-purpose computers can provide. In addition, using ASICs for video image processing is both inflexible and expensive. Configurable computing offers an appropriate alternative for broadcast video image editing and manipulation by combining the flexibility, programmability, and economy of general-purpose processors with the performance of dedicated ASICs. Sonic is a configurable computing system that performs real-time video image processing. The authors describe how it implements algorithms for two-dimensional linear transforms, fractal image generation, filters, and other video effects. Sonic's flexible and scalable architecture contains configurable processing elements that accelerate software applications and support the use of plug-in software.

...read moreread less

85 citations

Proceedings Article•DOI•

Customising graphics applications: techniques and programming interface

[...]

H. Styles¹, Wayne Luk•Institutions (1)

Imperial College London¹

17 Apr 2000

TL;DR: This paper identifies opportunities for customising architectures for graphics applications, such as infrared simulation and geometric visualisation, by studying methods for exploiting custom data formats and datapath widths, and for optimising graphics operations such as texture mapping and hidden-surface removal.

...read moreread less

Abstract: This paper identifies opportunities for customising architectures for graphics applications, such as infrared simulation and geometric visualisation. We have studied methods for exploiting custom data formats and datapath widths, and for optimising graphics operations such as texture mapping and hidden-surface removal. Techniques for balancing the graphics pipeline and for run-time reconfiguration have been implemented. The customised architectures are captured in Handel-C, a C-like language supporting parallelism and flexible data size, and compiled for Xilinx 4000 and Virtex FPGAs. We have also developed an application programming interface based on the OpenGL standard for automatic speedup of graphics applications, including the Quake 2 action game.

...read moreread less

47 citations

Journal Article•DOI•

Framework and tools for run-time reconfigurable designs

[...]

Nabeel Shirazi¹, Wayne Luk², Peter Y. K. Cheung²•Institutions (2)

Xilinx¹, Imperial College London²

01 May 2000

TL;DR: A framework and tools for automating the production of designs that can be partially reconfigured at run time are described, which have been used in developing a variety of designs, including arithmetic, video and database applications.

...read moreread less

Abstract: The paper describes a framework and tools for automating the production of designs that can be partially reconfigured at run time. The approach involves several stages, including: (i) a partial evaluation stage, which produces configuration files for a given design, where the number of configurations is minimised during the compile-time sequencing stage; (ii) an incremental configuration calculation stage, which takes the output of the partial evaluator and generates an initial configuration file and incremental configuration files that partially update preceding configurations; and (iii) an optimisation stage for devices or systems supporting simultaneous configuration of multiple components. While many of the techniques are independent of the design language and device used, experimental tools have been developed that target Xilinx 6200 devices. Simultaneous configuration, for example, can be used to reduce the time for reconfiguring an adder to a subtractor from time linear with respect to its size to constant time at best and logarithmic time at worst. The tools have been used in developing a variety of designs, including arithmetic, video and database applications.

...read moreread less

34 citations

Journal Article•DOI•

Optimal datapath allocation for multiple-wordlength systems

[...]

George A. Constantinides¹, Peter Y. K. Cheung¹, Wayne Luk¹•Institutions (1)

Imperial College London¹

17 Aug 2000-Electronics Letters

TL;DR: In this article, a formulation of the combined scheduling, binding, and word length selection problem is proposed, and Integer Linear Programming (ILP) is used to obtain area-optimal scheduling.

...read moreread less

Abstract: High-level synthesis for multiple-wordlength systems is examined. A formulation of the combined scheduling, binding, and wordlength selection problem is proposed. Integer linear programming is used to obtain area-optimal scheduling, binding and wordlength selection for such systems.

...read moreread less

19 citations

Proceedings Article•DOI•

Multiple precision for resource minimization

[...]

George A. Constantinides¹, Peter Y. K. Cheung¹, Wayne Luk¹•Institutions (1)

Imperial College London¹

17 Apr 2000

TL;DR: It is demonstrated that significant area reductions can be obtained by optimizing signal widths individually, compared to the use of a single uniform signal width.

...read moreread less

Abstract: Presents the Synoptix high-level synthesis and precision optimization system for FPGAs. Given abstract specifications in the form of infinite-precision signal flow graphs and a set of error constraints, Synoptix creates hardware descriptions of fixed-point arithmetic implementations. The width of each signal is individually optimized in order to achieve the minimal resource utilization while satisfying user-specified constraints such as signal-to-noise ratio. A heuristic for solving the optimization problem is introduced, and the results of implementations on an Altera Flex10k-based reconfigurable computing platform are reported. It is demonstrated that significant area reductions can be obtained by optimizing signal widths individually, compared to the use of a single uniform signal width.

...read moreread less

17 citations

Proceedings Article•DOI•

Roundoff-noise shaping in filter design

[...]

George A. Constantinides¹, Peter Y. K. Cheung¹, Wayne Luk•Institutions (1)

Imperial College London¹

28 May 2000

TL;DR: An automated feasibility test is introduced, in order to decide whether a given filter realisation meets user-specified constraints on the roundoff noise power spectrum, which is used by an algorithm for optimization of individual signal widths within a filter structure.

...read moreread less

Abstract: This paper presents a technique for the spectral shaping of roundoff noise in fixed-point implementations of digital filters. An automated feasibility test is introduced, in order to decide whether a given filter realisation meets user-specified constraints on the roundoff noise power spectrum. This feasibility test is used by an algorithm for optimization of individual signal widths within a filter structure. Some results are presented, illustrating how the optimization produces filters closely meeting the specification, leading to significant improvements in implementation area.

...read moreread less

16 citations

Book Chapter•DOI•

Multiple-Wordlength Resource Binding

[...]

George A. Constantinides¹, Peter Y. K. Cheung¹, Wayne Luk¹•Institutions (1)

Imperial College London¹

27 Aug 2000

TL;DR: It is demonstrated that the multiple-wordlength binding problem is significantly different for addition and multiplication, and techniques to share resources between several operations are examined for FPGA architectures.

...read moreread less

Abstract: This paper describes a novel resource binding technique for use in multiple-wordlength systems implemented in FPGAs. It is demonstrated that the multiple-wordlength binding problem is significantly different for addition and multiplication, and techniques to share resources between several operations are examined for FPGA architectures. A novel formulation of the resource binding problem is presented as an optimal colouring problem on a resource conflict graph, and several algorithms are developed to solve this problem. Results collected from many sequencing graphs illustrate the effectiveness of the heuristics developed in this paper, demonstrating significant area reductions over more traditional approaches.

...read moreread less

15 citations

Proceedings Article•DOI•

Flexible instruction processors

[...]

Shay Ping Seng¹, Wayne Luk¹, Peter Y. K. Cheung¹•Institutions (1)

Imperial College London¹

01 Nov 2000

TL;DR: This paper introduces the notion of a Flexible Instruction Processor (FIP) for systematic customisation of instruction processor design and implementation and its current implementation is based on a highlevel parallel language called Handel-C, which can be compiled into hardware.

...read moreread less

Abstract: This paper introduces the notion of a Flexible Instruction Processor (FIP) for systematic customisation of instruction processor design and implementation. The features of our approach include: (a) a modular framework based on “processor templates” that capture various instruction processor styles, such as stack-based or register-based styles; (b) enhancements of this framework to improve functionality and performance, such as hybrid processor templates and superscalar operation; (c) compilation strategies involving standard compilers and FIP-specific compilers, and the associated design flow; (d) technology-independent and technology-specific optimisations, such as techniques for ecient resource sharing in FPGA implementations. Our current implementation of the FIP framework is based on a highlevel parallel language called Handel-C, which can be compiled into hardware. Various customised Java Virtual Machines and MIPS style processors have been developed using existing FPGAs to evaluate the eectiveness and promise of this approach.

...read moreread less

13 citations

Book Chapter•DOI•

Static and Dynamic Reconfigurable Designs for a 2D Shape-Adaptive DCT

[...]

Jörn Gause¹, Peter Y. K. Cheung¹, Wayne Luk¹•Institutions (1)

Imperial College London¹

27 Aug 2000

TL;DR: Two reconfigurable design approaches for a two dimensional Shape-Adaptive Discrete Cosine Transform (2D SA-DCT) are presented and it is demonstrated that the area required for an implementation can be significantly reduced.

...read moreread less

Abstract: This paper presents two reconfigurable design approaches for a two dimensional Shape-Adaptive Discrete Cosine Transform (2D SA-DCT). The SA-DCT is an example of a new type of multimedia video processing algorithm where the computations performed are data dependent. A static design, where the configuration does not change during execution of the task, is presented. The use of a data dependence graph (DDG) is proposed which represents the computations and input signals required to calculate a particular output signal depending on a variable input parameter. By re-structuring the DDG and exploiting possible sharing of FPGA resources for different entities within the SA-DCT, it is demonstrated that the area required for an implementation can be significantly reduced. An alternative dynamic approach is also introduced where the FPGA's configuration may change over time. This is well suited to using dynamically reconfigurable logic but suffers from long reconfiguration time if current FPGAs are used.

...read moreread less

12 citations

Book Chapter•DOI•

A Combined Approach to High-Level Synthesis for Dynamically Reconfigurable Systems

[...]

Xue-Jie Zhang¹, Kam-Wing Ng¹, Wayne Luk²•Institutions (2)

The Chinese University of Hong Kong¹, Imperial College London²

27 Aug 2000

TL;DR: Two complementary design models and related synthesis techniques are combined to capture behavioral and structural information in modelling and synthesizing a dynamically reconfigurable system to represent operation-level temporal constraints and dynamic resource constraints in a unified model.

...read moreread less

Abstract: In this paper, two complementary design models and related synthesis techniques are combined to capture behavioral and structural information in modelling and synthesizing a dynamically reconfigurable system. The proposed formulation is achieved by using finite domain constraints and related constraint-solving techniques offered by constraint logic programming. Our formulation represents operation-level temporal constraints and dynamic resource constraints in a unified model. Different synthesis tasks, such as temporal partitioning, scheduling and dynamic module allocation can be modelled in this framework, enabling the discovery of an optimal or near optimal solutions. Experiments have been carried out using a prototype of the high-level synthesis system implemented in CHIP, a constraint logic programming system. Current experimental results show that our approach can provide promising synthesis results in terms of the synthesis time and the number of reconfigurations.

...read moreread less

5 citations

Proceedings Article•DOI•

Combining serialisation and reconfiguration for convolver designs

[...]

Arran Derbyshire¹, Wayne Luk•Institutions (1)

Imperial College London¹

17 Apr 2000

TL;DR: Techniques for combining serialisation and reconfiguration to produce efficient convolver designs are described and an estimate of the performance of a serial design is given when mapped using a distributed arithmetic core onto a Xilinx Virtex FPGA.

...read moreread less

Abstract: This paper describes techniques for combining serialisation and reconfiguration to produce efficient convolver designs. Several optimisation techniques, such as restructuring and pipeline morphing, are presented with an analysis of their impact on performance and resource usage. The proposed techniques do not require the basic processing element to be modified. An estimate of the performance of a serial design is given when mapped using a distributed arithmetic core onto a Xilinx Virtex FPGA. We estimate that a convolver of more than 2000 taps at 470,000 samples per second can be implemented in one quarter of the logic resources of a Virtex XCV300 device.

...read moreread less

Proceedings Article•DOI•

Evaluating hardware compilation techniques

[...]

Markus Weinhardt¹, Wayne Luk•Institutions (1)

Imperial College London¹

17 Apr 2000

TL;DR: Pipeline vectorization has been found to speedup hardware implementations of vectorizable programs by up to two orders of magnitude, whereas local optimizations only achieve speedup factors smaller than two.

...read moreread less

Abstract: Hardware compilation techniques which use high-level programming languages to describe and synthesize hardware are gaining popularity They are especially useful for reconfigurable computing systems since they provide a fast, easy to use, software-like programming environment for users with little hardware design experience We compare three hardware compilation techniques First, we study sequential compilation, which produces hardware that evaluates each assignment of the source program in one clock cycle New, we evaluate the effects of local parallelizing optimizations Finally, we apply pipeline vectorization, a method based on software vectorization for synthesizing hardware pipelines, which exploits hardware parallelism globally Results of all three techniques for several benchmark programs are presented and discussed Pipeline vectorization has been found to speedup hardware implementations of vectorizable programs by up to two orders of magnitude, whereas local optimizations only achieve speedup factors smaller than two

...read moreread less

Proceedings Article•DOI•

Perspectives on custom computing

[...]

Wayne Luk¹, H. Styles•Institutions (1)

Imperial College London¹

29 Oct 2000

TL;DR: A framework for customising designs using appropriate libraries, compilers, validation facilities, application programming interfaces and front-end tools is described and it is shown how circuits can be customised at run time to adapt to changes in the operating conditions.

...read moreread less

Abstract: Custom computing involves customising computations for one or more applications in a given implementation technology. We describe a framework for customising designs using appropriate libraries, compilers, validation facilities, application programming interfaces and front-end tools. The development of custom architectures, data formats and operations is presented. We show how circuits can be customised at run time to adapt to changes in the operating conditions. Graphics examples are used throughout the paper to illustrate our approach.

...read moreread less

Book Chapter•DOI•

Combining Serialisation and Reconfiguration for FPGA Designs

[...]

Arran Derbyshire¹, Wayne Luk¹•Institutions (1)

Imperial College London¹

27 Aug 2000

TL;DR: A tool framework and techniques for combining serialisation and reconfiguration to produce efficient designs and several optimisation techniques, such as restructuring and pipeline morphing, are presented with an analysis of their impact on performance and resource usage.

...read moreread less

Abstract: This paper describes a tool framework and techniques for combining serialisation and reconfiguration to produce efficient designs. Convolver and matrix multiplier designs are examined. Several optimisation techniques, such as restructuring and pipeline morphing, are presented with an analysis of their impact on performance and resource usage. The proposed techniques do not require the basic processing element to be modified. An estimate of the performance of the serial designs is given when mapped using distributed arithmetic and constant multiplier cores onto a Xilinx Virtex FPGA.

...read moreread less