scispace - formally typeset
Search or ask a question

Showing papers presented at "Southern Conference Programmable Logic in 2007"


Proceedings ArticleDOI
18 Jun 2007
TL;DR: An 64-bit FPGA implementation of the 128- bit block and 128 bit-key AES cipher, designed by Joan Daemen and Vincent Rijmen, and operating at 224 Mbps (maximum throughput).
Abstract: The Rijndael cipher, designed by Joan Daemen and Vincent Rijmen, has been selected as the official advanced encryption standard (AES) and it is well suited for hardware use. This implementation can be carried out through several trade-offs between area and speed. This paper presents an 64-bit FPGA implementation of the 128- bit block and 128 bit-key AES cipher. Selected FPGA Family is Spartan 3. The cipher consumes 52 clock cycles for algorithm encryption, resulting in a throughput of 120 Mbps. Synthesis results in the use of 1643 slices, 975 flip flops, 3055 4-input look up tables and operates at 224 Mbps (maximum throughput). The design target was optimization of speed and cost.

46 citations


Proceedings ArticleDOI
18 Jun 2007
TL;DR: A reconfigurable platform for sensor networks is presented that has features that allow easy reuse of the node in several applications avoiding redesigning the system from scratch.
Abstract: A reconfigurable platform for sensor networks is presented. This platform has features that allow easy reuse of the node in several applications avoiding redesigning the system from scratch. The node includes an FPGA which is the core of the reconfiguration capabilities of the node. Several hardware interfaces for sensor standard protocols like I2C or PWM have been developed and implemented in the FPGA. Remote reconfiguration is an important feature and sensor networks can take advantage of it in order to improve the global performance.

41 citations


Proceedings ArticleDOI
18 Jun 2007
TL;DR: It will be shown that in spite of the disparity in concepts behind those tools, the methodology will be able to formally uncover the basic differences among them and analytically assess their comparative performance, utilization, and ease-of-use.
Abstract: Most application developers are willing to give up some performance and chip utilization in exchange of productivity. High-level tools for developing reconfigurable computing applications trade performance with ease-of-use. However, it is hard to know in a general sense how much performance and utilization one is giving up and how much ease-of-use he/she is gaining. More importantly, given the lack of standards and the uncertainty generated by sales literature, it is very hard to know the real differences that exist among different high-level programming paradigms. In order to do so, one needs a formal methodology and/or a framework that uses a common set of metrics and common experiments over a number of representative tools. In this work, we consider three representative high-level tools, Impulse-C, Mitrion-C, and DSPLogic in the Cray XD1 environment. These tools were selected to represent imperative programming, functional programming and graphical programming, and thereby demonstrate the applicability of our methodology. It will be shown that in spite of the disparity in concepts behind those tools, our methodology will be able to formally uncover the basic differences among them and analytically assess their comparative performance, utilization, and ease-of-use.

28 citations


Proceedings ArticleDOI
18 Jun 2007
TL;DR: This work addresses the challenge introduced by the partial dynamic reconfiguration trying to propose a novel design flow, using the CoDeveloper framework to speedup the design process.
Abstract: The design of embedded systems has rapidly changed during the last decade. It is possible to identify two main responsible factors: hardware/software codesign and dynamic reconfiguration. The work presented in this paper tries to investigate how to consider the reconfiguration as an explicit dimension in the design flow for embedded systems. This work addresses the challenge introduced by the partial dynamic reconfiguration trying to propose a novel design flow, using the CoDeveloper framework to speedup the design process. The proposed flow allows the designer to define his/her desired specification using an high level design language such as C. Finally, it provides results showing how the proposed flow can be used by the designer to have more information useful in making the correct decisions during the design of his/her embedded system.

27 citations


Proceedings ArticleDOI
18 Jun 2007
TL;DR: Two minutia-based fingerprint matching algorithms have been selected and implemented in a FPGA in order to compare the requirements and performance of software and hardware implementations.
Abstract: Fingerprint is the most widely used and studied biometric technique because of its universality, distinctiveness, and decreasing cost of the sensing devices. Among the fingerprint identification techniques, minutiae-based algorithms are the most mature. However, these methods are computationally expensive, particularly for comparison with large databases. This work is devoted to study the performance gains that can be achieved with the use of FPGAs. To this purpose, two minutia-based fingerprint matching algorithms have been selected and implemented in a FPGA in order to compare the requirements and performance of software and hardware implementations. Experimental results demonstrate the feasibility of implementing fingerprint matching algorithms in current FPGA devices achieving speed-ups of one or two orders of magnitude. Customization of the proposed implementations can lead to several architectures optimized in size, price, speed or accuracy.

17 citations


Proceedings ArticleDOI
18 Jun 2007
TL;DR: This work focuses on the application of both field programmable analog arrays (FPAAs) and fieldprogrammable gate arrays (FPGAs) as an unique system for implementing IEEE 1451.4 sensor interfaces.
Abstract: This work focuses on the application of both field programmable analog arrays (FPAAs) and field programmable gate arrays (FPGAs) as an unique system for implementing IEEE 1451.4 sensor interfaces. The inherent reconfigurability of these two hardware platforms allows increasing the versatility of the overall system, leading to a variety of sensor connectivity and remote measurement and control options.

15 citations


Proceedings ArticleDOI
18 Jun 2007
TL;DR: New heuristic algorithms based on EC scheduling algorithms are introduced and it is shown that they provide up to an order of magnitude improvement in scheduling and execution times.
Abstract: Current work on automatic task partitioning and scheduling for reconfigurable computing (RC) systems strictly addresses the FPGA hardware, and does not take advantage of the synergy between the microprocessor and the FPGA. Efforts on partitioning between muP and the FPGA are a manual and laborious effort, as a formal methodology for automatic hardware-software partitioning has not been established. Related fields such as heterogeneous computing (HC) and embedded computing (EC) have an extensive body of work for scheduling for heterogeneous processors. Unlike the HC scheduling algorithms, the EC algorithms take into account the differences in computational capabilities of each processing element. In this work, we adapt EC scheduling algorithms for RC systems, and show how simply adapting the algorithms alone is not sufficient to take advantage of the reconfigurable hardware. We introduce new heuristic algorithms based on EC scheduling algorithms and show that they provide up to an order of magnitude improvement in scheduling and execution times.

15 citations


Proceedings ArticleDOI
18 Jun 2007
TL;DR: This paper explains some methods used for increasing the resolution of PWMs and proposes a new method based on the resources available in almost every FPGA nowadays.
Abstract: Pulse width modulation (PWM) is a very common technique used in different applications, from the control of motors, switching power converters (power supplies), audio amplifiers or illumination systems. In some of those applications, the pulse frequency has increased so much in the last years that the resolution obtained with classical (counter) techniques is not enough. This paper explains some methods used for increasing the resolution of PWMs and proposes a new method based on the resources available in almost every FPGA nowadays.

14 citations


Proceedings ArticleDOI
18 Jun 2007
TL;DR: The aim of this work is to show that for this kind of application the FPGA technology is not only a viable solution but probably also the best one in some situations, especially when a huge amount of parallel data processing is needed.
Abstract: This paper describes the development of the tip-tilt mirror control for an adaptive optics system based on the use of FPGA technology instead of using the traditional approach with DSP or microprocessor. The aim of this work is to show that for this kind of application the FPGA technology is not only a viable solution but probably also the best one in some situations, especially when a huge amount of parallel data processing is needed. This will probably be the case of adaptive optics systems for very large telescopes. A brief description of the whole adaptive optics system will be given although the paper will focus especially on the tip-tilt mirror control and its implementation with FPGAs. A new version of the deformable mirror control is also being implemented in FPGAs and will be finished in the near future.

14 citations


Proceedings ArticleDOI
18 Jun 2007
TL;DR: Preliminary results indicate that this solution will be able to process HDTV frames in real time, considering the baseline profile of H.264/AVC video compression standard.
Abstract: This paper presents the design of a hardware architecture for the entropy coder of H.264/AVC video compression standard, considering the baseline profile. The baseline entropy coder is composed of two main blocks: Exp-Golomb coder and CAVLC coder. This paper presents the architectural design of these two blocks. These architectures were described in VHDL and synthesized to an Altera Stratix-II FPGA. From the synthesis results it was possible to verify that the Exp-Golomb and CAVLC coders reached a throughput of 15.9 million of samples per second for the Exp-Golomb coder and of 103.8 million of samples per second for CAVLC coder. The H.264/AVC baseline entropy coder is being designed through the integration of these two coders and preliminary results indicate that this solution will be able to process HDTV frames in real time.

13 citations


Proceedings ArticleDOI
18 Jun 2007
TL;DR: A new dedicated reconfigurable architecture for efficient Gauss-Jordan matrix inversion algorithm on reconfigured hardware platforms is proposed and analysed and the results show a performance improvement of 2x over the previous implementation, using only half of the memory andhalf of the floating-point units.
Abstract: This paper presents a new architecture for efficient Gauss-Jordan matrix inversion algorithm on reconfigurable hardware platforms. The results show that currently available re- configurable computing technology can easily achieve significantly higher floating-point performance than high-end CPUs, running state-of-the-art routines for large matrices operations. For common reconfigurable systems, where the FPGAs are directly coupled to the on-board memory, the achievable performance scales directly with the number of realizable simultaneous memory accesses. A new dedicated reconfigurable architecture is proposed and analysed and the results show a performance improvement of 2x over the previous implementation, using only half of the memory and half of the floating-point units. Benchmarking against Matlab, which features high performance matrix inversion routines, shows that a 100 MHz FPGA can easily surpass the performance of 3,2 GHz Intel Pentium IV processors. This is possible having only 5 double-port memory banks or 9 single-port memory banks connected to the FPGA.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: A Braille note taker implemented in hardware that includes the keyboard controller and translator has been hierarchically described and implemented and is able to perform Braille to text translation as well as note taking.
Abstract: This paper describes a Braille note taker implemented in hardware. The system is able to perform Braille to text translation as well as note taking. A method is presented on how to achieve Braille note taking using a Braille keyboard. To perform Braille to text translation, a translating system has been built based on previous work. Using very high speed integrated circuit hardware description language (VHDL) and a field programmable gate arrays (FPGAs) development platform, a system that includes the keyboard controller and translator has been hierarchically described and implemented.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: An embedded implementation of Player client mobile robot using the NIOS II softcore is presented, which has been experimentally tested on a FPGA board and compared to a standard PC implementation.
Abstract: Mobile robotics and embedded systems are two research areas that have been receiving a considerable attention in years. Combining these two research topics is a very interesting and promising task. Some of the problems of controlling robots using embedded systems are designing device drivers, provide network communication, and develop complex control algorithms under hardware limitations. Player is one of the most used controllers for mobile robots and sensors. It has been widely used by the robotics community. This paper presents an embedded implementation of Player client mobile robot using the NIOS II softcore. Our client has been experimentally tested on a FPGA board and compared to a standard PC implementation.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: A platform for developing fully FPGA-based embedded systems designed for image and video processing applications and makes feasible the interaction with the user and the run-time customization of processing algorithms is described.
Abstract: When image and video processing applications are moving towards consumer markets, there exists clearly the need of replacing PC-based software solutions with embedded processor. In this context, the enhanced characteristics of the modern FPGA devices make possible to build whole systems with improved performance and reduced costs. In this paper we describe a platform for developing fully FPGA-based embedded systems designed for image and video processing applications. It is a hardware/software system which makes the design process easier and faster. It also makes feasible the interaction with the user and the run-time customization of processing algorithms.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: The test results indicate that the hardware-based translator achieves the same results as software-based commercial translators, and moreover, this system achieves superior throughput compared to Blenkhorn's original algorithm.
Abstract: This paper describes a fast text to braille translator based on field programmable gate arrays (FPGAs). Compared with most commercial methods, this translator is able to carry out the translation in hardware instead of using software. To achieve the fast translation, a FPGA with big programmable resource has been utilized, and an algorithm, proposed by Paul Blenkhorn, has been revised to perform the fast translation. The translator has been described using very high speed integrated circuit hardware description language (VHDL). The test results indicate that the hardware-based translator achieves the same results as software-based commercial translators, and moreover, this system achieves superior throughput compared to Blenkhorn's original algorithm.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: This work presents an area optimized FPGA implementation of an IP core to compute the base-N logarithm providing a very good speed-area ratio and reports the implementation results of a fixed point version of the algorithm.
Abstract: In this work, we present an area optimized FPGA implementation of an IP core to compute the base-N logarithm. Nevertheless, we also discuss the area, speed and precision trade-offs. We selected an algorithm that could be implemented on any FPGA avoiding vendor specific features like block RAMs, embedded multipliers, etc. We report the implementation results of a fixed point version of the algorithm using various common configurations on Xilinx and Actel devices. This implementation achieved the required area goals providing a very good speed-area ratio.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: An implementation of a symmetric multiprocessor (SMP) system on an FPGA using a vendor provided soft-core processor and a new set of software libraries specially developed for writing applications for this kind of systems is presented.
Abstract: Advances in FPGA technologies allow designing highly complex systems using on-chip FPGA resources and intellectual property (IP) cores. Furthermore, it is possible to build multiprocessor systems using hard-core or soft-core processors increasing the range of applications that can be implemented on an FPGA. This paper presents an implementation of a symmetric multiprocessor (SMP) system on an FPGA using a vendor provided soft-core processor and a new set of software libraries specially developed for writing applications for this kind of systems. Experimental results show how this approach can improve performance of parallelizable software applications.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: A design for an adaptive background modelling algorithm; implementation of the algorithm on an FPGA device; and performance evaluation for the hardware architecture are reported.
Abstract: In this paper we present a hardware architecture for adaptive background modelling. Adaptive background models are used in a variety of computer vision applications, ranging from traffic monitoring to biometric identification. We report (a) a design for an adaptive background modelling algorithm; (b) implementation of the algorithm on an FPGA device; and (c) performance evaluation for our hardware architecture. One of our designs, running on a Xilinx XC2V1000 FPGA at 81 MHz, can process VGA quality 640times480 pixel frames at 132 frames per second using 291 slices and a single memory bank.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: A phase locked loop (PLL) based on digital signal processing and random sampling and the random sampling scheme is proposed for the implementation of complete-digital high-frequency systems, without limitations imposed by the analog to digital converter and the signal processing unit.
Abstract: A phase locked loop (PLL) based on digital signal processing and random sampling is proposed in this paper. Field programmable gate array (FPGA) technology is used to implement a prototype. The random sampling scheme is used to reduce the sampling frequency requirements without aliasing effects. The possibility of sampling and processing at lower frequencies allows the implementation of complete-digital high-frequency systems, without limitations imposed by the analog to digital converter and the signal processing unit. The basic principles are presented, and the implemented algorithms are described. Experimental results show the PLL performance.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: A hand-held portable application to inspect genetic microarrays is presented, which may be used in the field with important advantages on autonomy and robustness, and ideal for applications in telemedicine.
Abstract: There are many fields in biomedical engineering where portability, compactness and robustness at the application level is an important requirement. Genomic microarray tests are becoming a routine exploration in many areas of medicine, biology and pharmacology. These tests have to fulfill strict and cumbersome protocols which prevent them from being universally applied in remote areas or situations where access to hospital facilities is difficult, as in the case of ship crews, military detachments or in the attention to isolated populations in underdeveloped countries. Seeking for solutions to translate sophisticated exploration methods as these may imply the detection and treatment of different genetic-induced illnesses and help in extending high standard medicine to these situations. In the present paper a hand-held portable application to inspect genetic microarrays is presented, which may be used in the field with important advantages on autonomy and robustness. The system is based on CCD microarray image scanning plus advanced image processing software for robust detection and reading supported by an FPGA. The platform is conceived for stand-alone use and low weight and consumption, making it ideal for applications in telemedicine. Facilities for data storage in a laptop are also provided.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: In this article, a triple module redundancy (TMR) technique is proposed to protect carry-select adders against SETs, which explores the inherent duplication existing in carry select adders to reduce resource overhead.
Abstract: The drastic shrink in transistor dimensions is making circuits more susceptible to radiation-induced soft errors. While single-event upsets are beginning to be a concern for electronic systems fabricated with nanometer CMOS technology at the sea level, single-event transients (SETs) are also expected to be a serious problem for the upcoming technologies. Thanks to the high logic density and fast turnaround time, FPGAs are currently the main fabric used to implement electronic systems. However, to provide high logic density FPGA devices are also fabricated with state-of-the-art CMOS technology and thus are also susceptible to soft errors. This paper presents a novel technique to protect carry-select adders against SETs. Such technique is based on triple module redundancy (TMR) and explores the inherent duplication existing in carry-select adders to reduce resource overhead.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: This work presents a novel, accurate, and fast post-layout logic perturbation method for improving LUT-based FPGA routing without affecting the placement, which can reduce critical path delay by up to 31.74% without disturbing placement or sacrificing area.
Abstract: This work presents a novel, accurate, and fast post-layout logic perturbation method for improving LUT-based FPGA routing without affecting the placement. The ATPG-based rewiring techniques are used to design the rewiring engine, which is embedded into VPR, the most powerful academic FPGA CAD tool currently. Compared with VPR's high-quality results, our method can reduce critical path delay by up to 31.74% (avg. 10%) without disturbing placement or sacrificing area. The CPU time used by the rewiring engine is only 5% of the total time consumed by VPR's placement and routing. All the benchmark circuits can be placed and routed within 3 minutes, which is much faster than the SPFD approach. This paper also analyzes the power of the ATPG- based rewiring techniques in LUT-based FPGAs. Experimental results show that 3% of all nets can be replaced by their alternative wires for FPGA performance improvement.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: This paper explains how to start working with Tcl/Tk using simple examples and two complete applications are presented to show in more detail the capabilities of the language.
Abstract: Tcl/Tk scripting language has become the de-facto standard for EDA tools. This paper explains how to start working with Tcl/Tk using simple examples. Two complete applications are presented to show in more detail the capabilities of the language. In one script average power consumption of a digital system is automated. A second script creates a virtual display driven by the simulation of a graphic card.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: A new protection method for IP cores to be implemented over FPGAs to protect the author rights of reusable IP cores by means of a digital signature that uniquely identifies both the original design and the design recipient.
Abstract: The intellectual property protection of reusable design modules are becoming a problem with the expansion of this design strategy. This paper propose a new protection method for IP cores to be implemented over FPGAs. The aim is to protect the author rights of reusable IP cores by means of a digital signature that uniquely identifies both the original design and the design recipient. The technique relies on a procedure that spreads a digital signature in cells of look-up tables of designs at HDL design level, not increasing the area of the system. The technique includes a procedure for signature extraction that allows to detect the ownership right without interfering the normal operation of the system and requiring minimal modifications to the system. The IPP technique has been implemented on programmable devices, with negligible performance penalties.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: An efficient VHDL implementation on an inexpensive Xilinx Spartan-3 FPGA was achieved that fulfill performance requirements, adding no dead time due to digital processing, and two-layer feed-forward networks with three neurons in the hidden layer reached precision goal.
Abstract: Nuclear pulses parameters estimation is needed in many nuclear applications. Its precision and performance requirements are very demanding, especially in PET applications. Quality of PET images depends on the energy and time resolution of gamma pulses detection. Neural networks estimators were analyzed in contrast with common methods. Two-layer feed-forward networks with three neurons in the hidden layer reached precision goal. The chosen estimators allowed the use of 40 MHz free running ADC obtaining precision of 1ns in the timestamp determination, exceeding coincidence detection requirements. An efficient VHDL implementation on an inexpensive Xilinx Spartan-3 FPGA was achieved that fulfill performance requirements, adding no dead time due to digital processing. The estimators and its FPGA implementations were verified on hardware and characterization were done using nuclear shaped pulses synthesized with an arbitrary function generator.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: This study compares quantitatively different switching techniques widely used in NoCs in terms of power consumption, area overhead and delay with a post lay-out gate-level simulation.
Abstract: To ensure low power consumption while maintaining flexibility and performance, future systems-on-chip (SoC) will integrate many processor nodes and memory units. To interconnect these IP nodes, networks-on-chip (NoC) have been proposed as an efficient and scalable alternative to shared buses. One major problem consists in being able to compare choices and strategies in NoC design. To tackle this problem, we propose a complete highly configurable framework called Polymorpher which enables a quantitative comparison of the performance and energy consumption of different NoC communication component architectures. Our models are based on a set of basic VHDL communication components that can be reused for different designs. This common test-bed allows us to fairly and accurately compare different types of communication components in terms of energy consumption, delay and area. In particular, the framework enables easy instantiation and exploration of different types of routers. We have chosen to explore different switching strategies and parameters as an example of the possibilities offered by our tool. Our study compares quantitatively different switching techniques widely used in NoCs (store and forward, virtual cut through, wormhole) in terms of power consumption, area overhead and delay with a post lay-out gate-level simulation.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: A modulator implemented in a FPGA for power matrix converters to operate as a peripheral unit of a digitally-controlled system in order to generate duty cycles for each of the converter switches and provide a safe commutation of the switching devices.
Abstract: This work presents a modulator implemented in a FPGA for power matrix converters. Its function is to operate as a peripheral unit of a digitally-controlled system in order to generate duty cycles for each of the converter switches and provide a safe commutation of the switching devices. The correct generation of duty cycles and sequences was verified, as well as the safe commutation of bi-directional switches. The performance of the modulator in conjunction with the power stage and a DSP, which performs the high level control layer, was also analyzed.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: The dataflow model and its dynamic dataflow graph (DDFG), which is the basic structure to execute dataflow programs, are described and DDFGs of control flow statements used in the C language, such as do-while and switch, are proposed.
Abstract: Many modern scientific and engineering applications such as weather forecast, medical diagnostics, artificial intelligence, and industrial automation, demand increased computational capacity. Actual high-performance architectures are focused on the concepts of parallel processing. One of these architectures is the dataflow model, which explores parallelism in a natural form. This paper describes briefly the dataflow model and its dynamic dataflow graph (DDFG), which is the basic structure to execute dataflow programs. DDFGs of control flow statements used in the C language, such as do-while and switch, are proposed. The results of a "proof-of-concept" for the control flow DDFGs are presented at the end of this paper.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: A genetic algorithm has been developed to solve the problem of dynamically reconfigurable modules allocation and both crossover and mutation processes try to change the previously found location for the new module in order to achieve a better fitness.
Abstract: The advances in the programmable hardware have lead to new architectures, where the hardware can be dynamically adapted to the application to gain better performance. One of the problems in realizing dynamically reconfigurable systems is the allocation of dynamically reconfigurable modules. In this scenario, when a new module has to be reconfigured in the system, there is the need to find a suitable free place where it can be configured. In this work a genetic algorithm has been developed to solve the problem of dynamically reconfigurable modules allocation. The search task has been modeled with a genetic algorithm in which each chromosome represents a configuration status of the programmable devices and both crossover and mutation processes try to change the previously found location for the new module in order to achieve a better fitness, that stands for the goodness of the final solution.

Proceedings ArticleDOI
18 Jun 2007
TL;DR: A remote automatic meter reading (AMR) system as a result of the Mirakonta project is presented, offering a hardware-software product that covers most needs of the end users.
Abstract: This article presents a remote automatic meter reading (AMR) system as a result of the Mirakonta project, in which are involved several university investigation groups and a private company. The AMR system is composed by reader devices and a group of elements that allow a wireless connection to a central server. The system is based on taking a picture of the fluid meter digit display, preprocessing and transmitting it to a remote element using a radio frequency ad-hoc protocol. The reader device is implemented on a FPGA based system and the firmware can be updated remotely (remote reconfiguration). In addition, the different parameters of the reader can be configured remotely. To meet the low power requirements a sleep and wake-up technique is applied. All the requirements of the system have been covered, offering a hardware-software product that covers most needs of the end users.