scispace - formally typeset
Search or ask a question

Showing papers on "Reconfigurable computing published in 1994"


Proceedings ArticleDOI
10 Apr 1994
TL;DR: This paper presents an architecture that makes it feasible to implement large parallel ANNs with FPGAs, and combines stochastic computation techniques with a novel lookup-table-based architecture that fully exploits the lookup- table structure of many FPGA.
Abstract: Reconfigurable field-programmable gate arrays (FPGAs) provide an effective programmable resource for implementing hardware-based artificial neural networks (ANNs). They are low cost, readily available and reconfigurable-all important advantages for ANN applications. However, FPGAs lack the circuit density necessary to implement large parallel ANNs with many thousands of synapses. This paper presents an architecture that makes it feasible to implement large ANNs with FPGAs. The architecture combines stochastic computation techniques with a novel lookup-table-based architecture that fully exploits the lookup-table structure of many FPGAs. This lookup-table-based architecture is extremely efficient: it is capable of supporting up to two synapses per configurable logic block (CLB). In addition, the architecture is simple to implement, self-contained (weights are stored directly in the synapse), and scales easily across multiple chips. >

115 citations


Proceedings ArticleDOI
10 Apr 1994
TL;DR: The run-time reconfiguration artificial neural network (RRANN) uses ran-time resurfacing to increase the hardware density of FPGAs and is a flexible realization of the time/space trade-off.
Abstract: Run-time reconfiguration is a way of more fully exploiting the flexbility of reconfigurable FPGAs. The run-time reconfiguration artificial neural network (RRANN) uses ran-time reconfiguration to increase the hardware density of FPGAs. The RRANN architecture also allows large amounts of parallelism to be used and is very scalable. RRANN divides the back-propagation algorithm into three sequential executed stages and configures the FPGAs to execute only one stage at a time. The FPGAs are reconfigured as part of normal execution in order to change stages. Using reconfigurability in this way increases the number of hardware neurons a single Xilinx XC3090 can implement by 500%. Performance is effected by reconfiguration overhead, but this overhead becomes insignificant in large networks. This overhead is made even more insignificant with improved configuration methods. Run-time reconfiguration is a flexible realization of the time/space trade-off. The RRANN architecture has been designed and built using commercially available hardware, and its performance has been measured. >

114 citations


Journal ArticleDOI
TL;DR: A hardware implementation of a fully digital multilayer perceptron artificial neural network using Xilinx Field Programmable Gate Arrays (FPGAs) and a 1 K/spl times/8 EPROM is presented.
Abstract: In this paper, the authors present a hardware implementation of a fully digital multilayer perceptron artificial neural network using Xilinx Field Programmable Gate Arrays (FPGAs). Each node is implemented with two XC3042 FPGAs and a 1 K/spl times/8 EPROM. Training is done offline on a PC. The authors have tested successfully the performance of the network. >

104 citations


Proceedings ArticleDOI
27 Jun 1994
TL;DR: This paper presents the run-time reconfiguration artificial neural network (RRANN), a hardware implementation of the backpropagation algorithm that is extremely scalable and makes efficient use of FPGA resources.
Abstract: Field programmable gate arrays (FPGAs) are an excellent technology for implementing neural networking hardware. This paper presents the run-time reconfiguration artificial neural network (RRANN). RRANN is a hardware implementation of the backpropagation algorithm that is extremely scalable and makes efficient use of FPGA resources. One key feature is RRANN's ability to exploit parallelism in all stages of the backpropagation algorithm including the stage where errors are propagated backward through the network. This architecture has been designed and implemented on Xilinx XC3090 FPGAs, and its performance has been measured. >

71 citations


Proceedings ArticleDOI
10 Apr 1994
TL;DR: This paper designs and implements four generations of FPGA-based reconfigurable hardware accelerators for Programmable Active Memories, and presents the software lessons they draw from this collective experience.
Abstract: Digital Equipment's Paris Research Laboratory started to investigate the use of FPGA-based reconfigurable hardware accelerators in 1988. We call them PAMs for Programmable Active Memories. Over the past six years, we have designed and implemented four generations of PAM hardware and four generations of PAM programming environments. Several dozen people, ranging from inexperienced students to senior hardware designers, have used our systems. A wide range of applications belonging to several important application domains have demonstrated the interest of these novel computing devices. In this paper we present the software lessons we draw from this collective experience. >

69 citations


Proceedings ArticleDOI
W.S. Carter1
10 Oct 1994
TL;DR: Field programmable gate arrays (FPGAs) combine the high-integration benefits of gate arrays with the time-to-market benefits of a user-programmable device.
Abstract: Field programmable gate arrays (FPGAs) combine the high-integration benefits of gate arrays with the time-to-market benefits of a user-programmable device. Already a mainstream logic technology, the growth rate of FPGA usage will continue to exceed that of other ASIC technologies. FPGA technology is having a major impact on electronic system design, especially through the use of FPGAs as reconfigurable computing elements. >

57 citations


Proceedings ArticleDOI
10 Oct 1994
TL;DR: An abstract model to investigate the area and delay of field programmable gate array architectures is presented and shows that a system implemented on FPGAs will require as much as 100 times more die area than its custom VLSI implementation and would be about 10 times slower.
Abstract: This paper examines the limitations of integrating programmable logic with a powerful core processor on the same die. An abstract model to investigate the area and delay of field programmable gate array architectures is presented. The model is used to show that a system implemented on FPGAs will require as much as 100 times more die area than its custom VLSI implementation and would be about 10 times slower. Our analysis shows that this high cost, inherent to the current FPGA-based architectures, is a severe limitation to virtual hardware development. A new approach to cell architecture and array organization is needed to deliver high computational speed-ups comparable to multiple processor systems with the same total die area. >

46 citations


Proceedings ArticleDOI
10 Apr 1994
TL;DR: The suitability of FPGA devices for implementing graphics algorithms is analysed by a series of experiments and a new design method (based on virtual memory) is presented that exploits the dynamically reconfigurable nature ofFPGAs.
Abstract: The suitability of FPGA devices for implementing graphics algorithms is analysed by a series of experiments. The performance of simple and complicated graphics algorithms on two kinds of FPGAs are compared with the performance of existing custom graphics chips and against general-purpose processors with specialised instruction sets. Various architectures for incorporating FPGA-based systems into graphics workstations are discussed. Finally, a new design method (based on virtual memory) is presented that exploits the dynamically reconfigurable nature of FPGAs. >

45 citations


Proceedings ArticleDOI
10 Apr 1994
TL;DR: A reconfigurable data-driven datapath architecture for ALUs is presented which may be used for custom computing machines, Xputers and other adaptable computer systems as well as for rapid prototyping of high speed datapaths.
Abstract: A reconfigurable data-driven datapath architecture for ALUs is presented which may be used for custom computing machines (CCMs), Xputers (a class of CCMs) and other adaptable computer systems as well as for rapid prototyping of high speed datapaths. Fine grained parallelism is achieved by using simple reconfigurable processing elements which are called datapath units (DPUs). The word-oriented datapath simplifies the mapping of applications onto the architecture. Pipelining is supported by the architecture. The programming environment allows automatic mapping of the operators from high level descriptions. Two implementations, one by FPGAs and one with standard cells are shown. >

43 citations


Proceedings ArticleDOI
01 May 1994
TL;DR: Run-time reconfiguration is a way of more fully exploiting the flexibility of reconfigurable field programmable gate arrays (FPGAs) in order to increase the hardware density of FPGAs.
Abstract: Run-time reconfiguration is a way of more fully exploiting the flexibility of reconfigurable field programmable gate arrays (FPGAs). The run-time reconfiguration artificial neural network (RRANN) uses runtime reconfiguration to increase the hardware density of FPGAs. This is done by dividing the backpropagation algorithm into three sequentially executed stages and configuring the FPGAs to execute only one stage at a time. The FPGAs are reconfigured as part of normal execution in order to change stages. Using reconfigurability in this way increases the number of hardware neurons a single FPGA can implement by 500%. The RRANN architecture has been designed and built using commercially available hardware, and its performance has been measured. >

39 citations


Proceedings ArticleDOI
10 Apr 1994
TL;DR: This work describes how a behavioral synthesis system can be used to create designs for FPGA-based computing systems directly from a specification of the desired algorithm, which reduces design times and design errors.
Abstract: We describe how a behavioral synthesis system can be used to create designs for FPGA-based computing systems directly from a specification of the desired algorithm. This higher level of specification reduces design times and design errors. Our target hardware is called the Rasa Board and is composed of three Xilinx FPGAs interconnected with Aptix Field Programmable Interconnect Chips. We address two significant challenges in this research effort: the synthesis of multiple FPGA designs, and the improvement of the design through designer interaction. >

Proceedings ArticleDOI
23 May 1994
TL;DR: The present CHAMP implementation is based on Xilinx FPGAs and off-the-shelf development tools have been integrated with a custom library of macros as part of CHAMP design to allow development at the algorithm level while retaining preprocessor performance.
Abstract: Programmable preprocessing solutions are often unable to meet the required performance. Custom hardware implementations of preprocessors, however, are seldom reusable, flexible or quickly realized. The Configurable Hardware Algorithm Mappable Preprocessor (CHAMP) technology is a solution to these problems. Developments in field programmable gate array (FPGA) hardware and software have made a reconfigurable preprocessor with custom hardware performance but generic hardware flexibility possible. The key advancements are larger, faster RAM and electrically erasable devices, routers with deadline timers, and synthesis tools with user definable macros. Ongoing work will make reconfigurable preprocessors more powerful. The present CHAMP implementation is based on Xilinx FPGAs. Its architecture consists of multiple reconfigurable processing elements connected through both a ring network and a global crossbar network. It is packaged as a VME 6U/spl times/160 slave board with two high speed reconfigurable parallel interfaces. To allow development at the algorithm level while retaining preprocessor performance, off-the-shelf development tools have been integrated with a custom library of macros as part of CHAMP design. As a verification of the technology, an advanced IRMW application was mapped onto the CHAMP architecture achieving greater than 1 BOPS of real time throughput while utilizing 75% of the CHAMP board's processing resources. >

Proceedings ArticleDOI
10 Apr 1994
TL;DR: A new custom computing architecture which is specifically designed for efficient implementation of DSP algorithms is described, and it is concluded that custom computers based only on FPGA execution units show little performance improvement over state-of-the-art workstations.
Abstract: When FPGA logic circuits are incorporated within a stored-program computer, the result is a machine where the programmer can design both the software and the hardware that will execute that software. This paper first describes some of the more important custom computers, and their potential weakness as DSP implementation platforms. It then describes a new custom computing architecture which is specifically designed for efficient implementation of DSP algorithms. Finally, it presents a simple performance comparison of a number of DSP implementation alternatives, and concludes that: the new custom computing architecture is worthy of further investigation; and that custom computers based only on FPGA execution units show little performance improvement over state-of-the-art workstations. >


Book ChapterDOI
07 Sep 1994
TL;DR: The Helion Fast MD5 core implements the MD5 hash algorithm as described in RFC 1321 and is a high performance core which has been designed especially for use in Xilinx FPGA.
Abstract: Copyright © 2001-2008 Helion Technology Limited. Revision 3.0 (18/02/2008) The Helion Fast MD5 core implements the MD5 hash algorithm as described in RFC 1321. It is a high performance core which has been designed especially for use in Xilinx FPGA. The MD5 algorithm take as input a message of arbitrary length, processes the message as a series of 512-bit blocks, and produces as output a compressed representation of the message data in the form of a 128-bit message digest.

Proceedings ArticleDOI
30 May 1994
TL;DR: A hierarchical interconnection architecture for field programmable gate array (FPGA) is described, in which logic blocks or cluster of logic blocks are connected together with switch blocks.
Abstract: To overcome the low speed and low density problems in the FPGAs, we must reduce the number of switches used in the routing paths without sacrificing the routability for an FPGA. A hierarchical interconnection architecture for field programmable gate array (FPGA) is described. In every level of the hierarchy, logic blocks or cluster of logic blocks are connected together with switch blocks. Experiments on benchmark circuits are shown. It can be seen that significant improvement on performance is achieved. >

Book ChapterDOI
02 Mar 1994
TL;DR: This work analyzes in details some implementations of a challenging, yet simple application: CERN’s calorimeter.
Abstract: We analyze in details some implementations of a challenging, yet simple application: CERN’s calorimeter. We try both general purpose computer architectures (single and multi processors, Simd and Mimd), and special purpose electronics (full-custom, gate-array, FPGA) on the problem.

Book ChapterDOI
07 Sep 1994
TL;DR: Hardware accelleration of static and delay fault simulation, and the ac celleration in simulating new BIST techniques are examined.
Abstract: Circuit emulation, using dynamically reconfigurable hardware is a high speed alternative to circuit simulation, especially for large and complex designs. Dynamic reconfiguration enhances the ability to efficiently analyse the test of combinational and sequential circuits by providing statistical information on fault grading, detectability, and signature analysis. In this paper we examine hardware accelleration of static and delay fault simulation, and the accelleration in simulating new BIST techniques.

Patent
11 Jul 1994
TL;DR: In this article, a system for physical emulation of electronic circuits or systems includes a data entry workstation where a user may input data representing the circuit or system configuration, converted to a form suitable for programming an array of programmable gate elements provided with a richly interconnected architecture.
Abstract: A system for physical emulation of electronic circuits or systems includes a data entry workstation where a user may input data representing the circuit or system configuration. This data is converted to a form suitable for programming an array of programmable gate elements provided with a richly interconnected architecture. Provision is made for externally connecting VLSI devices or other portions of a user's circuit or system. a network of internal probing interconnections is made available by utilization of unused circuit paths in the programmable gate arrays.

Proceedings ArticleDOI
19 Sep 1994
TL;DR: This paper describes the second generation Optimized Reconfigurable Cell Array (ORCA) Field-Programmable Gate Arrays (FPGAs), a family of high capacity and high speed FPGAs.
Abstract: This paper describes the second generation Optimized Reconfigurable Cell Array (ORCA) Field-Programmable Gate Arrays (FPGAs). Architectural Innovations combined with advanced 0.5 m process technology result in a family of high capacity and high speed FPGAs. New types of routing resources are included on the FPGA to ensure routing completion. The first ORCA part in the 2C series, the ATT2C15, contains approximately 2.5 million FETs and has a typical logic capability of about 15,000 usable gates. Preliminary benchmark results confirm the speed and logic capacity of the new parts. >

Proceedings ArticleDOI
10 Apr 1994
TL;DR: A customized computing platform is described which accelerates most of the compute-intensive calculations of a template deforming image segmentation algorithm and is parameterized so that alternative solutions can be evaluated according to technological advances.
Abstract: We describe a customized computing platform we are developing to accelerate a computer vision application. An FPGA-based coprocessor solution is derived which accelerates most of the compute-intensive calculations of a template deforming image segmentation algorithm. Design issues are identified and performance results reported. The results are parameterized so that alternative solutions can be evaluated according to technological advances. We discuss how implementing such a design with available reconfigurable logic platforms can be a worthwhile tool for the development of customized FPGA-based computing solutions. >

Proceedings Article
31 Dec 1994
TL;DR: An algorithm, A2R2, and its implementation on a massively parallel system that outperforms any kind of previously published genetic data base scanning hardware or algorithms and can support those from FAST, BLAST and FLASH algorithms.
Abstract: Homology detection in large data bases is probably the most time consuming operation in molecular genetic computing systems. Moreover, the progresses made all around the world concerning the mapping and sequencing of the genome of Homo Sapiens and other species have increased the size of data bases exponentially. Therefore even the best workstation would not be able to reach the scanning speed required. In order to answer this need we propose an algorithm, A2R2, and its implementation on a massively parallel system. Basically, two kinds of algorithms are used to search in molecular genetic data bases. The first kind is based on dynamic programming and the second on word processing, A2R2 belongs to the second kind. The structure of the motif (pattern) searched by A2R2 can support those from FAST, BLAST and FLASH algorithms. After a short presentation of the reconfigurable hardware concept and technology used in our massively parallel accelerator we present the A2R2 implementation. This parallel implementation outperforms any kind of previously published genetic data base scanning hardware or algorithms. We report up to 25 million nucleotides per scanning seconds as our best results.

Dissertation
01 Jan 1994
TL;DR: A system has been developed in which computationally intensive software routines are identiied and implemented on dedicated, reconngurable hardware increasing processing speeds without the cost and permanence of custom ASICs.
Abstract: Reconngurable logic devices allow for great exibility in the ways computing tasks are accomplished. With them, operations which require the exibility of software at the high speeds of custom hardware can be performed. A system has been developed in which computationally intensive software routines are identiied and implemented on dedicated, reconngurable hardware increasing processing speeds without the cost and permanence of custom ASICs. Hardware subroutines maintain the abstraction, portability, and ease of implementation of software. Several algorithms have been demonstrated on the Virtual Wires board, illustrating the beneets and possibilities of such a system. contributed the underlying system for this project. Without their assistance and the substantial Virtual Wires Emulation System, I would not have had the opportunity to explore FPGA computing so freely. Thanks to Anant Agarwal, who was tremendous in encouraging me and providing me with inspiration and enthusiasm. provided support and insight when disks crashed, hardware failed, or I became simply confused. Thanks also to Bits, who encouraged me stay cool and smile. Ali kept the free food messages coming. Mark's grape-apple-lemon-nutmeg icecream is the best way to end an evening. Thanks also to Damon for never being too busy to welcome a little interruption. My friends ooered encouragement, motivation, and distraction, all at the appropriate times. And nally, many thanks to my marvelous parents and sister, Robyn, whose positive outlook and warm fuzzies are irreplaceable. Their conndence and support has helped me all along the way.

Proceedings ArticleDOI
19 Apr 1994
TL;DR: A new custom computing architecture which uses a processing node with three sections: a standard arithmetic chip, static RAM and reconfigurable logic for operand handling is described.
Abstract: Field programmable gate arrays (FPGAs) can be rapidly reconfigured to provide different digital logic functions. When such FPGA logic circuits are incorporated within a stored-program computer, the result is a machine where the programmer can design both the software and the hardware that will execute that software. This paper first surveys this area of custom computing. It then describes a new custom computing architecture which uses a processing node with three sections: a standard arithmetic chip, static RAM and reconfigurable logic for operand handling. Finally an analysis of the suitability of this new approach for implementation of DSP applications shows it to be worthy of further investigation. >

Proceedings Article
01 Jan 1994
TL;DR: This research proves that transformable computing is the next frontier in computer science, and the achievements as featured represent the building blocks of a new order of computers.
Abstract: The FPGA (Field Programmable Gate Array) was introduced in 1986 and was created for designers requiring a solution that bridged the gap between PALs (Programmable Array Logic) and ASICs (Application Specific Integrated Circuit). Exploitation of the FPGA's reconfigurable capabilities by independent researchers throughout the world has shown that computationally intensive software algorithms can be transposed directly into hardware design for extreme performance gain. This continuing research has spawned numerous developments in the arena of high performance supercomputing. We believe this research proves that transformable computing is the next frontier in computer science. These achievements as featured represent the building blocks of a new order of computers. >

G. Hamid1
05 Dec 1994
TL;DR: An overview of the first ever FPGA-based coprocessor for an image processing system that provides the performance of custom hardware with the added flexibility of an equivalent software-based system is presented.
Abstract: Reconfigurable field programmable gate arrays (FPGAs) are a new technology suitable for building fast and flexible processing systems. This paper contains an overview of the first ever FPGA-based coprocessor for an image processing system. The system provides the performance of custom hardware with the added flexibility of an equivalent software-based system. Benchmark results are given for the time taken to perform two typical image processing functions.

X. Yu1, D. Dent1
09 Mar 1994
TL;DR: The authors present the research work of implementing specific trained neural network in Xilinx XC4000 series FPGAs for portable digital system prototype in heart disease classification process.
Abstract: The reconfigurability of certain field programmable gate arrays (FPGAs) has shown their advantage of flexibility in digital system design. With the availability of greater density and high speed of FPGAs, the ability to realise special purpose processors will become possible. The authors present the research work of implementing specific trained neural network in Xilinx XC4000 series FPGAs for portable digital system prototype in heart disease classification process. >

Proceedings ArticleDOI
21 Jun 1994
TL;DR: This work presents a hardware emulation environment based on dynamically reconfigurable field programmable devices for hardware acceleration of fault simulation in a built-in self-test environment and rapid prototyping of new BIST techniques.
Abstract: Circuit emulation, using dynamically reconfigurable hardware is a high speed alternative to circuit simulation, especially for large and complex designs. Dynamic reconfiguration enhances the ability to efficiently analyze the test of combinational and sequential circuits by providing statistical information on fault grading, detectability, and signature analysis. We present a hardware emulation environment based on dynamically reconfigurable field programmable devices. For this work our main interests are in hardware acceleration of fault simulation in a built-in self-test environment and rapid prototyping of new BIST techniques. >

Proceedings ArticleDOI
01 Jan 1994
TL;DR: This paper discusses aspects to consider when mapping array multiplier designs to field programmable gate arrays (FPGAs), and two design examples are developed and mapped to the Xilinx family of FPGAs.
Abstract: This paper discusses aspects to consider when mapping array multiplier designs to field programmable gate arrays (FPGAs). FPGAs provide configurable logic through an array of configurable logic modules interconnected by programmable routing resources and surrounded by programmable Input/Output blocks. However due to the lack of consistent structure most typical designs do not map well to FPGAs. The structure of array multipliers make them a natural fit for FPGA realization, potentially delivering the performance and utilization originally promised at the introduction of FPGAs. Two design examples are developed and mapped to the Xilinx family of FPGAs. The results of this effort are reported and projections are made as to how the designs performance vary when they are scaled for larger applications and the speed grades of the components are changed. >

Journal ArticleDOI
TL;DR: It is expected that more and more digital designs will be built on FPGAs, and FPGA based designs will outnumber the MPGA based designs in the near future.
Abstract: Advances in VLSI technology have brought very complex digital systems onto a very small area of silicon. However, the design cycle times associated with the conventional VLSI designs have called for a technology that will support very rapid system prototyping. Also, the non-recurring engineering (NRE) costs associated with existing VLSI technologies are very high. An ideal design style should support rapid prototyping at almost no or very nominal NRE costs. FPGA is an attempt in this direction. FPGAs have a potential for matching both speed and density of mask programmed gate arrays (MPGAs). It is expected that more and more digital designs will be built on FPGAs, and FPGA based designs will outnumber the MPGA based designs in the near future. The author begins by discussing the nature of FPGAs and their architecture (with design tools). The author goes on to discuss custom computing. >