scispace - formally typeset
Search or ask a question

Showing papers presented at "Southern Conference Programmable Logic in 2009"


Proceedings ArticleDOI
01 Apr 2009
TL;DR: The Cube, a massively-parallel FPGA-based platform is presented, which can perform a full search on the 40-bit key space within 3 minutes, this being 359 times faster than a multi-threaded software implementation running on a 2.5GHz Intel Quad-Core Xeon processor.
Abstract: Cube, a massively-parallel FPGA-based platform is presented. The machine is made from boards each containing 64 FPGA devices and eight boards can be connected in a cube structure for a total of 512 FPGA devices. With high bandwidth systolic inter-FPGA communication and a flexible programming scheme, the result is a low power, high density and scalable supercomputing machine suitable for various large scale parallel applications. A RC4 key search engine was built as an demonstration application. In a fully implemented Cube, the engine can perform a full search on the 40-bit key space within 3 minutes, this being 359 times faster than a multi-threaded software implementation running on a 2.5GHz Intel Quad-Core Xeon processor.

60 citations


Proceedings ArticleDOI
01 Apr 2009
TL;DR: The results highlight that PRESENT is well suited for high-speed and high-throughput applications, especially its hardware efficiency, i.e. the throughput per slice, is noteworthy.
Abstract: In this paper we investigate the performance of the block cipher PRESENT on FPGAs. We provide implementation results of an efficiency (i.e. throughput per slice) optimized design and compare them with other block ciphers. Though PRESENT was originally designed with a minimal hardware footprint in mind, our results also highlight that PRESENT is well suited for high-speed and high-throughput applications. Especially its hardware efficiency, i.e. the throughput per slice, is noteworthy.

40 citations


Proceedings ArticleDOI
01 Apr 2009
TL;DR: A low cost FPGA based solution for a real-time moving object tracking system based on a soft RISC processor capable of running kernel based mean shift tracking algorithm within the required time constraint is presented.
Abstract: This paper presents a low cost FPGA based solution for a real-time moving object tracking system. A specialized architecture is presented based on a soft RISC processor capable of running kernel based mean shift tracking algorithm. The system includes a frame grabber unit that stores the video frame in DDR RAM using direct memory access, a video display unit to monitor the tracking statistics and a soft processor capable of running mean shift tracking algorithm within the required time constraint.

35 citations


Proceedings ArticleDOI
01 Apr 2009
TL;DR: This paper presents FuSE, a single fault injection tool which covers multiple domains as well as different fault injection purposes, and has been designed for usage with the SEmulator® - an FPGA-based hardware accelerator.
Abstract: The ongoing miniaturization of digital circuits makes them more and more susceptible to faults which also complicates the design of fault tolerant systems. In this context fault injection plays an important role in the process of fault tolerance validation. As a result many fault injection tools have emerged during the last decade. However these tools only operate on specific domains and can therefore be referred to as hardware- or software-, simulation- or emulation based techniques. In this paper we present FuSE, a single fault injection tool which covers multiple domains as well as different fault injection purposes. FuSE has been designed for usage with the SEmulator® - an FPGA-based hardware accelerator. The created tool set has been fully automated for the fault injection process and only requires a VHDL description and a testbench of the circuit under test. FuSE can then perform fault injection experiments with a diagnostic resolution that is known from simulation-based approaches, but at a speed that even handles long running experiments with ease.

34 citations


Proceedings ArticleDOI
01 Apr 2009
TL;DR: Results for big operands show that the decimal adder works faster than an equivalent binary implementation and furthermore the coding / decoding processes are no more needed.
Abstract: This paper presents a study of the classical BCD adders from which a carry-chain type adder is redesigned to fit within the Xilinx FPGAs. Some new concepts are presented to compute the P and G functions for carry-chain optimization purposes. Several alternative designs are then presented with the corresponding time performances and area consumption figures. In order to compare the results, the straight implementation of a decimal ripple-carry adder and the FPGA optimized base 2 adder for the same range are implemented. Results for big operands show that the decimal adder works faster than an equivalent binary implementation and furthermore the coding / decoding processes are no more needed.

20 citations


Proceedings ArticleDOI
01 Apr 2009
TL;DR: This paper proposes the implementation of a flexible interconnection network supporting dynamicity of dynamically reconfigurable circuits such as Xilinx Virtex family of FPGA and the proposed architecture is fully compliant with the present state-of-art dynamically reconfigured circuits.
Abstract: Dynamic reconfiguration of FPGAs allows the dynamic management of various tasks that describe an application. This new feature permits, for optimization purpose, to place tasks on line in an available region of the FPGA. Dynamic reconfiguration of tasks leads to some communication problems since tasks are not present in the matrix during all computation time. This dynamicity needs to be supported by the interconnection network. In this paper, we propose the implementation of a flexible interconnection network supporting such dynamicity. The proposed architecture is fully compliant with the present state-of-art dynamically reconfigurable circuits such as Xilinx Virtex family of FPGA.

15 citations


Proceedings ArticleDOI
01 Apr 2009
TL;DR: A codesign methodology to build hardware accelerators to minimize the running time of a protein energy minimization algorithm and it has been shown that significant speedups can be obtained by moving core time consuming functions onto an FPGA.
Abstract: Bioinformatics applications are computationally very expensive programs. They work with large data sets and also consume a lot of CPU cycles and often require high degrees of precision. An important application in this area is tertiary structure prediction of proteins. This paper reports a codesign methodology to build hardware accelerators to minimize the running time of a protein energy minimization algorithm. It has been shown that significant speedups can be obtained by moving core time consuming functions onto an FPGA. It has been shown that a 5 fold decrease in the run time of the application can be achieved by simply moving one core function into hardware. Upto an order of magnitude improvement in runtimes can be obtained by moving two functions (core functions in many other bioinformatics applications) which consume 99% of the CPU cycles in the chosen application. A generalized speedup analysis using single and multiple FPGA cards has also been presented.

15 citations


Proceedings ArticleDOI
01 Apr 2009
TL;DR: A novel hardware-based XML parsing technique that makes use of a content-addressable memory that must be configured with a skeleton of the XML document being parsed, multiple state machines acting on the multilevel nature of XML, and dual-port memory modules is presented.
Abstract: The need for efficient XML parsing becomes a key requirement in the underlying technologies of web information and distributed systems. Even though new software-based XML parsing techniques have been studied to improve XML processing, the verboseness nature of XML does not help much to achieve substantial improvement. Hardware-based solutions can be an obvious choice to parse XML in a very efficient manner. In this paper, a novel hardware-based XML parsing technique is presented. The technique makes use of (1) a content-addressable memory that must be configured with a skeleton of the XML document being parsed, (2) multiple state machines acting on the multilevel nature of XML, and (3) dual-port memory modules. The architecture of an XML parser using this technique and its associated modules is designed, implemented, and tested on an FPGA. Our results show that a processing rate of at least two bytes of XML data on average is performed in each clock cycle.

14 citations


Proceedings ArticleDOI
01 Apr 2009
TL;DR: This paper describes the design and implementation of an embedded architecture for real-time PIV based on FPGA technology, and proposes a bus architecture to manage multiple interfaces among the processing modules and external devices.
Abstract: Particle image velocimetry (PIV) allows measuring distributed flow velocity fields. It is well established as an experimental tool in modern fluid dynamics research, being applied to liquid, gases and multiphase flows. Images of tracer particles are processed by means of a statistical strategy, which makes its real-time implementation difficult to achieve. In this paper, we describe the design and implementation of an embedded architecture for real-time PIV based on FPGA technology. The proposed scheme has allowed us to exploit the low-level parallelization in both, the direct cross-correlation computation and interrogation windows handling. We propose a bus architecture to manage multiple interfaces among the processing modules and external devices. By using this scheme, we achieved design flexibility and improved processing speed. Major benefits of the speed improvement are enhanced experimental capabilities like feedback control, on-line flow regimen visualization, and a significant speed up in off-line processing. We show experimental results of a physical field of velocities calculated in real-time.

13 citations


Proceedings ArticleDOI
01 Apr 2009
TL;DR: A novel double digit decimal multiplication (DDDM) technique that performs 2 digit multiplications simultaneously in one clock cycle is presented that offers low latency and high throughput.
Abstract: Decimal multiplication is an integral part of financial, commercial, and internet-based computations. This paper presents a novel double digit decimal multiplication (DDDM) technique that performs 2 digit multiplications simultaneously in one clock cycle. This design offers low latency and high throughput. When multiplying two n-digit operands to produce a 2n-digit product, the design has a latency of [(n/2 +1] cycles. The paper presents area and delay comparisons for 7-digit, 16-digit, 34-digit double digit decimal multipliers on different families of Xilinx, Altera, Actel and Quick Logic FPGAs. The multipliers presented can be extended to support decimal floating-point multiplication for IEEE P754 standard.

12 citations


Proceedings ArticleDOI
01 Apr 2009
TL;DR: The 3SP Design Space Exploration System automatically quantifies acceleration opportunities for programs across a wide range of heterogeneous architectures to allow designers to identify promising implementation platforms before investing in a particular hardware/ software codesign.
Abstract: This paper introduces the 3SP Design Space Exploration System. 3SP automatically quantifies acceleration opportunities for programs across a wide range of heterogeneous architectures to allow designers to identify promising implementation platforms before investing in a particular hardware/ software codesign. 3SP uses a novel program execution model to integrate comprehensive hardware characteristics including clock speed, number of execution units, issue rates, bandwidths and latencies with software program execution, parallelism, control and data flow measurements to estimate codesign performance for evaluating opportunities for hardware acceleration.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: A software that automatically generates compilable Hardware Description Logic (VHDL) modules that implements Ciratefi in Field Programmable Gate Array (FPGA) devices that accelerates the time to process a frame from 7s to 1.06ms, which may lead to cost-effective high-performance co-processing computer vision systems.
Abstract: Template matching is a classical problem in computer vision. It consists in detecting the presence of a given template in a digital image. This task becomes considerably more complex with the invariance to rotation, scale, translation, brightness and contrast (RSTBC). A novel RSTBC-invariant robust template matching algorithm named Ciratefi was recently proposed. However, its execution in a conventional computer takes several seconds. Moreover, the implementation of its general version in hardware is difficult, because there are many adjustable parameters. This paper proposes a software that automatically generates compilable Hardware Description Logic (VHDL) modules that implements Ciratefi in Field Programmable Gate Array (FPGA) devices. The proposed solution accelerates the time to process a frame from 7s (in a 3GHz PC) to 1.06ms. This excellent performance (more than the required for a real-time system) may lead to cost-effective high-performance co-processing computer vision systems.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: A secure FPGA is proposed for secure implementation of crypto devices that is resistant against multiple side channel attacks such as Power Attacks and Fault Attacks and shows the native resistance of SCAR-FPGA.
Abstract: In design of embedded systems for security applications, flexibility and tamper-resistance are two important factors to be considered. High frequency of updates and high costs of ASIC and their long design time urge us to use a secure FPGA as an alternative. In this paper a secure FPGA is proposed for secure implementation of crypto devices. The FPGA architecture is based on Asynchronous methodology and is resistant against multiple side channel attacks such as Power Attacks and Fault Attacks. AES algorithm implementation shows the native resistance of SCAR-FPGA.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: The SystemVerilog implementation of the Open Verification Methodology is exercised on an 8b/10b RTL open core design in the hope of being a simple yet complete exercise to expose the key features of OVM.
Abstract: The SystemVerilog implementation of the Open Verification Methodology (OVM) is exercised on an 8b/10b RTL open core design in the hope of being a simple yet complete exercise to expose the key features of OVM. Emphasis is put onto the actual usage of the verification components rather than a complete verification flow aiming at being of help to readers unfamiliar with OVM seeking to apply the methodology to their own designs. A link that takes you to the complete code is given to reinforce this aim. We found the methodology easy to use but intimidating at first glance specially for someone with little experience in object oriented programming. However it is clear to see the flexibility, portability and reusability of verification code once you manage to give some first steps.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: This work presents a chopper control using VHDL (very high speed integrated circuit hardware description language - VHSIC-HDL) language to control a PMDC motor (permanent magnetic direct current) using two programs: MATLAB/Simulink and ModelSim.
Abstract: In many industrial applications it is necessary to convert a constant-voltage dc (direct current) source into a variable-voltage/variable-current source for the speed control of the dc motor drive. The variable dc voltage is controlled by chopping the input voltage by varying the on and off times of a converter, and the type of converter capable of such a function is known as a chopper. This work presents a chopper control using VHDL (very high speed integrated circuit hardware description language - VHSIC-HDL) language to control a PMDC motor (permanent magnetic direct current). The simulation is performed using two programs: MATLAB/Simulink and ModelSim, working in a co-simulation mode, provided by Link for ModelSim toolbox from Simulink. While the motor and inverter dynamics is performed in MATLAB, the control algorithm of the chopper runs in the ModelSim program. The simulation results are presented and analyzed in this work.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: A performance-driven strategy is proposed to find the best unrolling factor for each loop, such that the closer the match of run-time conditions and compile-time parameters, the higher the performance.
Abstract: This paper presents a method for optimising parallelisation and scheduling of task graphs containing representation of loops for implementation in heterogeneous computing systems with both software and hardware processors. The method integrates loop unrolling with task scheduling and determines the extent to which each loop should be unrolled to maximise performance, while meeting size constraints. A performance-driven strategy is proposed to find the best unrolling factor for each loop, such that the closer the match of run-time conditions and compile-time parameters, the higher the performance. Experimental results obtained using a speech recognition system show the proposed method outperforms an approach without unrolling by 2.1 times, and using the processing time of a 2.6GHz microprocessor as a reference, a speed up of 10 times can be achieved when compile-time and run-time parameters are matched, while the performance drops gradually when they are different.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: It is found that, despite improvements through process technology and low-power modes, current devices need further improvements to be sufficiently power efficient for mobile applications.
Abstract: This paper proposes a methodology for characterising power consumption of the fine-grain fabric in reconfigurable architectures. It covers active and inactive power as well as advanced low-power modes. A method based on random number generators is adopted for comparing activity modes. We illustrate our approach using four field-programmable gate arrays (FPGAs) that span a range of process technologies: Virtex-II Pro, Spartan-3E, Spartan-3AN, and Virtex-5. We find that, despite improvements through process technology and low-power modes, current devices need further improvements to be sufficiently power efficient for mobile applications.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: This work presents a comparison between two hardware development methodologies in order to design a Theora video decoder IP core from algorithm down to FPGA, resulting in a 56% time reduction in the decoding process when compared to a software library.
Abstract: An important share of the consumer electronics market is focused on devices capable of running multimedia applications, like audio and video decoders. In order to achieve the performance level demanded by these applications, it is important to develop specialized hardware IPs in order to cope with the most computational intensive parts. Nowadays, designers are facing the challenge of integrating several components, including processor, memory, and specialized IP cores, into a single chip, giving raise to the so called Systems-on-chip (SoC). The high complexity of such systems and the strict time-to-market in the electronics industry motivated the introduction of new design methodologies during the last years. This work presents a comparison between two hardware development methodologies in order to design a Theora video decoder IP core from algorithm down to FPGA.We first implemented it in hand-written RTL code using VHDL, resulting in a 56% time reduction in the decoding process when compared to a software library. The second methodology implements the same hardware using SystemC and behavioral synthesis. The second IP core was developed in 70% less time with satisfactory results. We compare the two approaches in terms of area and latency.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: This paper presents a timing-driven non-uniform depopulation based clustering technique, T-NDPack, that targets critical path delay and channel width constraints simultaneously and adjusts the capacity of the CLB based on the criticality of the logic block.
Abstract: Low-cost FPGAs have comparable number of Configurable Logic Blocks (CLBs) with respect to resource-rich FPGAs but have much less routing tracks. This leads to the difficulty for CAD tools to successfully and optimally map a circuit into these devices. Instead of switching to resource-rich FPGAs, the designers could employ depopulation based clustering technique which underuses CLBs, hence improves routability by spreading the logic over the architecture. However, all depopulation based clustering algorithms to this date increase critical path delay. In this paper, we present a timing-driven non-uniform depopulation based clustering technique, T-NDPack, that targets critical path delay and channel width constraints simultaneously. We adjust the capacity of the CLB based on the criticality of the logic block. Paper analyzes the effect of depopulation strategies on area and delay performance. Results show that T-NDPack reduces minimum channel width by 11.07% while increasing the number of CLBs by 13.28%. More importantly, T-NDPack decreases critical path delay by 2.89%.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: In this work, rigid-body transformation is applied on the test image to register it with the reference image; and correlation coefficient is used as the similarity metric between the two images.
Abstract: Reconfigurable computers (RCs) with hardware (FPGA) co-processors can achieve significant performance improvement compared to traditional computers for certain categories of applications. The potential amount of speedup an RC can deliver depends on the intrinsic parallelism of the target application as well as the characteristics of the target platform. In this paper, we use image registration implementation as a case study to show how a hardware implementation is parameterized by co-processor architecture, particularly the local memory layout. Image registration is a fundamental task in image processing used to match two or more pictures taken at different times, from different sensors, or from different viewpoints. One of several basic transformations in image registration is rigid-body transformation, which is composed of a combination of a rotation θ, a translation (t x ,t y ), and a scale change (s). In this work, rigid-body transformation is applied on the test image to register it with the reference image; and correlation coefficient is used as the similarity metric between the two images. Two different algorithms, exhaustive search algorithm and Discrete Wavelet Transform (DWT)-based search algorithm, are implemented on hardware (i.e., FPGA device on Cray XD1 reconfigurable computer). The hardware implementation of exhaustive search algorithm is 10× faster than the software implementation. The performance improvement of DWT-based search algorithm in hardware is roughly 2 folds compared to the corresponding software implementation.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: A new way on hardware/software partitioning problem solving using an artificial neural network trained with resilient back propagation algorithm and feed-forward architecture is proposed.
Abstract: Hardware/Software co-design is an important problem nowadays and is involved in many hardware research and development fields. Hardware/software partitioning problem is one of the most important questions on co-design that defines how parts of a hardware/software system should be implemented on which a fast and good solution is essential. This paper proposes a new way on hardware/software partitioning problem solving using an artificial neural network trained with resilient back propagation algorithm and feed-forward architecture.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: The proposed hardware core is the first step for implementing successfully classifiers like SpikeProp algorithm and can be useful for data classifying and clustering applications, because this coding scheme has been used in the past and an efficient mapping of this technique in hardware can improve the actual performance of these applications.
Abstract: Recently, Spiking Neural Networks (SNNs) have obtained the interest of Machine Learning researchers due to the rich dynamics shown by these information processing models. One of the most important problems that must be addressed for implementing efficient SNNs is the information encoding. In this paper, an implementation of a high-performance hardware architecture for population information coding based on Gaussian Receptive Fields (GRFs) is proposed. This architecture can be useful for data classifying and clustering applications, because this coding scheme has been used in the past, and an efficient mapping of this technique in hardware can improve the actual performance of these applications. The GRFs information coding can be efficiently implemented on FPGA technology, because it contains several operations that can be computed in parallel like the exponential function. The proposed hardware architecture was implemented, tested and validated with several random datasets. The proposed hardware core is the first step for implementing successfully classifiers like SpikeProp algorithm. Synthesis and timing results for the proposed hardware architecture are presented.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: It is shown that a single core on 412MHz XC5VLX330T FPGA can evaluate a rigid transformation of a 3D image with 16 million voxels in 35ms, over 108 times faster than a multi-threaded implementation running on a 2.5GHz Intel Quad-Core Xeon platform.
Abstract: This paper proposes techniques for accelerating a software based image registration algorithm for 3D medical images targeting a reconfigurable hardware platform. Various methods, including dedicated fixed point arithmetic, error model based bit width analysis, architecture exploration and application-specific memory modules, are applied to address issues from the software algorithm and to maximize the performance of FPGA technology. Based on the reconfigurability of FPGA devices, the system can be extended to swap modules optimized for different parameters, and to adopt more advanced registration algorithms. We show that a single core on 412MHz XC5VLX330T FPGA can evaluate a rigid transformation of a 3D image with 16 million voxels in 35ms. With 30 cores on an FPGA, it is over 108 times faster than a multi-threaded implementation running on a 2.5GHz Intel Quad-Core Xeon platform.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: A design of microprocessor, MSP430, in balsa language and the functional verification of the controller is presented and back-end retargeting is performed as a part of the design methodology in Balsa.
Abstract: Balsa developed by Advanced Processor Technology(APT) Group of Manchester University presents robust design environment that supports both a framework for synthesizing asynchronous hardware systems and the language for describing such systems. In this paper, a design of microprocessor, MSP430, in balsa language and the functional verification of the controller is presented. Back-end retargeting is performed as a part of the design methodology in Balsa. By back-end retargeting procedure, a new technology library including FPGA cell library is incorporated into Balsa design environment. Moreover, the circuit area is analyzed and reduced in different implementation styles by replacing helpercells in balsa into standard cells of the target library.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: The SIFT architecture is extended to support 8- and 16-bit data and kernel-based operators such as dilation/erosion, smoothing, border enhancement, and sharpness filters and considerably improved the clock operation frequency and processing parallelism of SIFT.
Abstract: The Image Foresting Transform (IFT) is a general tool for the design of image processing operators based on dynamic programming. Silicon Image Forest Transform (SIFT) is a fast 8-bit data architecture for IFT-based operators in FPGA. It can implement queue-based methods such as morphological reconstructions, watershed transforms, shape saliences, distance transforms, skeletonization, edge tracking, with runtime gains from hundreds to thousands over the respective implementations in software. In this paper, we further extend the SIFT architecture to support 8- and 16-bit data and kernel-based operators such as dilation/erosion, smoothing, border enhancement, and sharpness filters. Moreover, we considerably improved the clock operation frequency and processing parallelism of SIFT.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: The experimental results show that the proposed methods can detect or correct 100% single errors in Switch Modules by imposing area overhead between 2% and 60%, delay overhead between 25% and 100% and power consumption overhead between 1% and 25%.
Abstract: This paper proposes three methods to mitigate and tolerate SEU-caused errors on the configuration bits of SRAM-based field programmable gate arrays. The proposed methods are based on error detection and correction codes which are able to detect or correct SEU-caused errors in Switch Modules. The effects of proposed methods on the various parameters such as area, delay and power consumption for ten ITC'99 benchmark circuits have been evaluated with synopsis® CAD tool and compared with previous work. The experimental results show that the proposed methods can detect or correct 100% single errors in Switch Modules by imposing area overhead between 2% and 60%, delay overhead between 25% and 100% and power consumption overhead between 1% and 25%.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: The proposed digit recurrence algorithm has two different architectures, a first one for general hardware implementation, and the second one is optimized for configurable logic (FPGAs).
Abstract: In this paper we present radix r = 2k divider for fixed-point operands. The divider divides in a radix r = 2k, producing k bits at each iteration. The proposed digit recurrence algorithm has two different architectures, a first one for general hardware implementation, and the second one is optimized for configurable logic (FPGAs). Results show a speedup greater to three times respect to a classical non-restoring division implemented in Xilinx Devices. Additionally a throughput-latency-area comparison of pipelined and sequential dividers implementation is disclosed.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: In this paper, the authors consider the general problem of mapping a given logic circuit onto an SRAM-based FPGA with programmable logic blocks of arbitrary architectures and formulate the problem as a graph matching problem and present an architecture independent algorithm for this purpose.
Abstract: In this paper, we consider the general problem of mapping a given logic circuit onto an SRAM-based FPGA with programmable logic blocks of arbitrary architectures. We formulate the problem as a graph matching problem and present an architecture-independent algorithm for this purpose. This algorithm also obtains a best area saving of 4% compared to architecture-dependent methods.

Proceedings ArticleDOI
Xiaoxuan She1
01 Apr 2009
TL;DR: This paper outlines evolvable hardware based on a 2-LUT (2-input lookup table) array, which allows the evolution of large circuits via decomposition, and demonstrates that the proposed scheme improves the development of logic circuits in terms of the number of generations, area and delay, reduces computational time and enables the evolutionof large circuits.
Abstract: Evolvable hardware (EHW) refers to self-reconfiguration hardware design, where the configuration is under the control of an evolution algorithm. One of the main difficulties in using EHW to solve real-world problems is scalability, which limits the size of the circuit that may be evolved. This paper outlines evolvable hardware based on a 2-LUT (2-input lookup table) array, which allows the evolution of large circuits via decomposition. The proposed EHW has been tested with multipliers and logic circuits taken from the Microelectronics Centre of North Carolina (MCNC) benchmark library. The experimental results demonstrate that the proposed scheme improves the evolution of logic circuits in terms of the number of generations, area and delay, reduces computational time and enables the evolution of large circuits.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: The communication manager module embedded in a dedicated system configurable via INTERNET design description and XILINX Spartan 3 FPGA implementation are presented.
Abstract: The communication manager module embedded in a dedicated system configurable via INTERNET design description and XILINX Spartan 3 FPGA implementation are presented.