scispace - formally typeset
Search or ask a question

Showing papers by "Wayne Luk published in 2000"


Journal ArticleDOI
TL;DR: Sonic is a configurable computing system that performs real-time video image processing and describes how it implements algorithms for two-dimensional linear transforms, fractal image generation, filters, and other video effects.
Abstract: Current industrial video-processing systems use a mixture of high-performance workstations and application-specific integrated circuits. However, video image processing in the professional broadcast environment requires more computational power and data throughput than most of today's general-purpose computers can provide. In addition, using ASICs for video image processing is both inflexible and expensive. Configurable computing offers an appropriate alternative for broadcast video image editing and manipulation by combining the flexibility, programmability, and economy of general-purpose processors with the performance of dedicated ASICs. Sonic is a configurable computing system that performs real-time video image processing. The authors describe how it implements algorithms for two-dimensional linear transforms, fractal image generation, filters, and other video effects. Sonic's flexible and scalable architecture contains configurable processing elements that accelerate software applications and support the use of plug-in software.

85 citations


Proceedings ArticleDOI
17 Apr 2000
TL;DR: This paper identifies opportunities for customising architectures for graphics applications, such as infrared simulation and geometric visualisation, by studying methods for exploiting custom data formats and datapath widths, and for optimising graphics operations such as texture mapping and hidden-surface removal.
Abstract: This paper identifies opportunities for customising architectures for graphics applications, such as infrared simulation and geometric visualisation. We have studied methods for exploiting custom data formats and datapath widths, and for optimising graphics operations such as texture mapping and hidden-surface removal. Techniques for balancing the graphics pipeline and for run-time reconfiguration have been implemented. The customised architectures are captured in Handel-C, a C-like language supporting parallelism and flexible data size, and compiled for Xilinx 4000 and Virtex FPGAs. We have also developed an application programming interface based on the OpenGL standard for automatic speedup of graphics applications, including the Quake 2 action game.

47 citations


Journal ArticleDOI
01 May 2000
TL;DR: A framework and tools for automating the production of designs that can be partially reconfigured at run time are described, which have been used in developing a variety of designs, including arithmetic, video and database applications.
Abstract: The paper describes a framework and tools for automating the production of designs that can be partially reconfigured at run time. The approach involves several stages, including: (i) a partial evaluation stage, which produces configuration files for a given design, where the number of configurations is minimised during the compile-time sequencing stage; (ii) an incremental configuration calculation stage, which takes the output of the partial evaluator and generates an initial configuration file and incremental configuration files that partially update preceding configurations; and (iii) an optimisation stage for devices or systems supporting simultaneous configuration of multiple components. While many of the techniques are independent of the design language and device used, experimental tools have been developed that target Xilinx 6200 devices. Simultaneous configuration, for example, can be used to reduce the time for reconfiguring an adder to a subtractor from time linear with respect to its size to constant time at best and logarithmic time at worst. The tools have been used in developing a variety of designs, including arithmetic, video and database applications.

34 citations


Journal ArticleDOI
TL;DR: In this article, a formulation of the combined scheduling, binding, and word length selection problem is proposed, and Integer Linear Programming (ILP) is used to obtain area-optimal scheduling.
Abstract: High-level synthesis for multiple-wordlength systems is examined. A formulation of the combined scheduling, binding, and wordlength selection problem is proposed. Integer linear programming is used to obtain area-optimal scheduling, binding and wordlength selection for such systems.

19 citations


Proceedings ArticleDOI
17 Apr 2000
TL;DR: It is demonstrated that significant area reductions can be obtained by optimizing signal widths individually, compared to the use of a single uniform signal width.
Abstract: Presents the Synoptix high-level synthesis and precision optimization system for FPGAs. Given abstract specifications in the form of infinite-precision signal flow graphs and a set of error constraints, Synoptix creates hardware descriptions of fixed-point arithmetic implementations. The width of each signal is individually optimized in order to achieve the minimal resource utilization while satisfying user-specified constraints such as signal-to-noise ratio. A heuristic for solving the optimization problem is introduced, and the results of implementations on an Altera Flex10k-based reconfigurable computing platform are reported. It is demonstrated that significant area reductions can be obtained by optimizing signal widths individually, compared to the use of a single uniform signal width.

17 citations


Proceedings ArticleDOI
28 May 2000
TL;DR: An automated feasibility test is introduced, in order to decide whether a given filter realisation meets user-specified constraints on the roundoff noise power spectrum, which is used by an algorithm for optimization of individual signal widths within a filter structure.
Abstract: This paper presents a technique for the spectral shaping of roundoff noise in fixed-point implementations of digital filters. An automated feasibility test is introduced, in order to decide whether a given filter realisation meets user-specified constraints on the roundoff noise power spectrum. This feasibility test is used by an algorithm for optimization of individual signal widths within a filter structure. Some results are presented, illustrating how the optimization produces filters closely meeting the specification, leading to significant improvements in implementation area.

16 citations


Book ChapterDOI
27 Aug 2000
TL;DR: It is demonstrated that the multiple-wordlength binding problem is significantly different for addition and multiplication, and techniques to share resources between several operations are examined for FPGA architectures.
Abstract: This paper describes a novel resource binding technique for use in multiple-wordlength systems implemented in FPGAs. It is demonstrated that the multiple-wordlength binding problem is significantly different for addition and multiplication, and techniques to share resources between several operations are examined for FPGA architectures. A novel formulation of the resource binding problem is presented as an optimal colouring problem on a resource conflict graph, and several algorithms are developed to solve this problem. Results collected from many sequencing graphs illustrate the effectiveness of the heuristics developed in this paper, demonstrating significant area reductions over more traditional approaches.

15 citations


Proceedings ArticleDOI
01 Nov 2000
TL;DR: This paper introduces the notion of a Flexible Instruction Processor (FIP) for systematic customisation of instruction processor design and implementation and its current implementation is based on a highlevel parallel language called Handel-C, which can be compiled into hardware.
Abstract: This paper introduces the notion of a Flexible Instruction Processor (FIP) for systematic customisation of instruction processor design and implementation. The features of our approach include: (a) a modular framework based on “processor templates” that capture various instruction processor styles, such as stack-based or register-based styles; (b) enhancements of this framework to improve functionality and performance, such as hybrid processor templates and superscalar operation; (c) compilation strategies involving standard compilers and FIP-specific compilers, and the associated design flow; (d) technology-independent and technology-specific optimisations, such as techniques for ecient resource sharing in FPGA implementations. Our current implementation of the FIP framework is based on a highlevel parallel language called Handel-C, which can be compiled into hardware. Various customised Java Virtual Machines and MIPS style processors have been developed using existing FPGAs to evaluate the eectiveness and promise of this approach.

13 citations


Book ChapterDOI
27 Aug 2000
TL;DR: Two reconfigurable design approaches for a two dimensional Shape-Adaptive Discrete Cosine Transform (2D SA-DCT) are presented and it is demonstrated that the area required for an implementation can be significantly reduced.
Abstract: This paper presents two reconfigurable design approaches for a two dimensional Shape-Adaptive Discrete Cosine Transform (2D SA-DCT). The SA-DCT is an example of a new type of multimedia video processing algorithm where the computations performed are data dependent. A static design, where the configuration does not change during execution of the task, is presented. The use of a data dependence graph (DDG) is proposed which represents the computations and input signals required to calculate a particular output signal depending on a variable input parameter. By re-structuring the DDG and exploiting possible sharing of FPGA resources for different entities within the SA-DCT, it is demonstrated that the area required for an implementation can be significantly reduced. An alternative dynamic approach is also introduced where the FPGA's configuration may change over time. This is well suited to using dynamically reconfigurable logic but suffers from long reconfiguration time if current FPGAs are used.

12 citations


Book ChapterDOI
27 Aug 2000
TL;DR: Two complementary design models and related synthesis techniques are combined to capture behavioral and structural information in modelling and synthesizing a dynamically reconfigurable system to represent operation-level temporal constraints and dynamic resource constraints in a unified model.
Abstract: In this paper, two complementary design models and related synthesis techniques are combined to capture behavioral and structural information in modelling and synthesizing a dynamically reconfigurable system. The proposed formulation is achieved by using finite domain constraints and related constraint-solving techniques offered by constraint logic programming. Our formulation represents operation-level temporal constraints and dynamic resource constraints in a unified model. Different synthesis tasks, such as temporal partitioning, scheduling and dynamic module allocation can be modelled in this framework, enabling the discovery of an optimal or near optimal solutions. Experiments have been carried out using a prototype of the high-level synthesis system implemented in CHIP, a constraint logic programming system. Current experimental results show that our approach can provide promising synthesis results in terms of the synthesis time and the number of reconfigurations.

5 citations


Proceedings ArticleDOI
17 Apr 2000
TL;DR: Techniques for combining serialisation and reconfiguration to produce efficient convolver designs are described and an estimate of the performance of a serial design is given when mapped using a distributed arithmetic core onto a Xilinx Virtex FPGA.
Abstract: This paper describes techniques for combining serialisation and reconfiguration to produce efficient convolver designs. Several optimisation techniques, such as restructuring and pipeline morphing, are presented with an analysis of their impact on performance and resource usage. The proposed techniques do not require the basic processing element to be modified. An estimate of the performance of a serial design is given when mapped using a distributed arithmetic core onto a Xilinx Virtex FPGA. We estimate that a convolver of more than 2000 taps at 470,000 samples per second can be implemented in one quarter of the logic resources of a Virtex XCV300 device.

Proceedings ArticleDOI
17 Apr 2000
TL;DR: Pipeline vectorization has been found to speedup hardware implementations of vectorizable programs by up to two orders of magnitude, whereas local optimizations only achieve speedup factors smaller than two.
Abstract: Hardware compilation techniques which use high-level programming languages to describe and synthesize hardware are gaining popularity They are especially useful for reconfigurable computing systems since they provide a fast, easy to use, software-like programming environment for users with little hardware design experience We compare three hardware compilation techniques First, we study sequential compilation, which produces hardware that evaluates each assignment of the source program in one clock cycle New, we evaluate the effects of local parallelizing optimizations Finally, we apply pipeline vectorization, a method based on software vectorization for synthesizing hardware pipelines, which exploits hardware parallelism globally Results of all three techniques for several benchmark programs are presented and discussed Pipeline vectorization has been found to speedup hardware implementations of vectorizable programs by up to two orders of magnitude, whereas local optimizations only achieve speedup factors smaller than two

Proceedings ArticleDOI
29 Oct 2000
TL;DR: A framework for customising designs using appropriate libraries, compilers, validation facilities, application programming interfaces and front-end tools is described and it is shown how circuits can be customised at run time to adapt to changes in the operating conditions.
Abstract: Custom computing involves customising computations for one or more applications in a given implementation technology. We describe a framework for customising designs using appropriate libraries, compilers, validation facilities, application programming interfaces and front-end tools. The development of custom architectures, data formats and operations is presented. We show how circuits can be customised at run time to adapt to changes in the operating conditions. Graphics examples are used throughout the paper to illustrate our approach.

Book ChapterDOI
27 Aug 2000
TL;DR: A tool framework and techniques for combining serialisation and reconfiguration to produce efficient designs and several optimisation techniques, such as restructuring and pipeline morphing, are presented with an analysis of their impact on performance and resource usage.
Abstract: This paper describes a tool framework and techniques for combining serialisation and reconfiguration to produce efficient designs. Convolver and matrix multiplier designs are examined. Several optimisation techniques, such as restructuring and pipeline morphing, are presented with an analysis of their impact on performance and resource usage. The proposed techniques do not require the basic processing element to be modified. An estimate of the performance of the serial designs is given when mapped using distributed arithmetic and constant multiplier cores onto a Xilinx Virtex FPGA.