scispace - formally typeset
Search or ask a question

Showing papers on "Reconfigurable computing published in 2009"


Book ChapterDOI
14 Oct 2009
TL;DR: This work proposes a real-time implementation of the semi-global matching algorithm with algorithmic extensions for automotive applications on a reconfigurable hardware platform resulting in a low power consumption of under 3W.
Abstract: Many real-time stereo vision systems are available on low-power platforms. They all either use a local correlation-like stereo engine or perform dynamic programming variants on a scan-line. However, when looking at high-performance global stereo methods as listed in the upper third of the Middlebury database, the low-power real-time implementations for these methods are still missing. We propose a real-time implementation of the semi-global matching algorithm with algorithmic extensions for automotive applications on a reconfigurable hardware platform resulting in a low power consumption of under 3W. The algorithm runs at 25Hz processing image pairs of size 750x480 pixels and computing stereo on a 680x400 image part with up to a maximum of 128 disparities.

292 citations


Journal ArticleDOI
TL;DR: The ReconOS programming model and its execution environment is discussed, implementations based on modern platform FPGAs and the operating systems eCos and Linux are presented, time and area overheads of the proposed mechanisms are evaluated and the feasibility of the multithreading design approach is demonstrated on several case studies.
Abstract: Rising logic densities together with the inclusion of dedicated processor cores push reconfigurable devices from being applied for glue logic and prototyping towards implementing complete reconfigurable systems-on-chip. The mix of fast CPU cores and fine-grained reconfigurable logic allows to map both sequential, control-dominated code and highly parallel data-centric computations onto one platform. However, traditional design techniques that view specialized hardware circuits as passive coprocessors are ill-suited for programming these reconfigurable computers. In particular, the programming models for software—running on an embedded operating system—and digital hardware—synthesized to an FPGA—lack commonalities, which hinders design space exploration and severely impairs the potential for code reuse.In this article, we present ReconOS, an execution environment based on existing embedded operating systems that extends the multithreaded programming model established in the software domain to reconfigurable hardware. Using threads and common synchronization and communication services as an abstraction layer, ReconOS allows for the creation of portable and flexible multithreaded applications targeting CPU/FPGA systems. This article discusses the ReconOS programming model and its execution environment, presents implementations based on modern platform FPGAs and the operating systems eCos and Linux, evaluates time and area overheads of the proposed mechanisms and, finally, demonstrates the feasibility of the multithreading design approach on several case studies.

155 citations


Proceedings ArticleDOI
09 Nov 2009
TL;DR: This paper presents a control flow enforcement technique based on an Instruction Based Memory Access Control (IBMAC) implemented in hardware specifically designed to protect low-cost embedded systems against malicious manipulation of their control flow as well as preventing accidental stack overflows.
Abstract: This paper presents a control flow enforcement technique based on an Instruction Based Memory Access Control (IBMAC) implemented in hardware. It is specifically designed to protect low-cost embedded systems against malicious manipulation of their control flow as well as preventing accidental stack overflows. This is achieved by using a simple hardware modification to divide the stack in a data and a control flow stack (or return stack). Moreover access to the control flow stack is restricted only to return and call instructions, which prevents control flow manipulation. Previous solutions tackled the problem of control flow injection on general purpose computing devices and are rarely applicable to the simpler low-cost embedded devices, that lack for example of a Memory Management Unit (MMU) or execution rings. Our approach is binary compatible with legacy applications and only requires minimal changes to the tool-chain. Additionally, it does not increase memory usage, allows an optimal usage of stack memory and prevents accidental stack corruption at run-time. We have implemented and tested IBMAC on the AVR micro-controller using both a simulator and an implementation of the modified core on a FPGA. The implementation on reconfigurable hardware showed a small resulting overhead in terms of number of gates, and therefore a low overhead of expected production costs.

107 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present the OpenDF framework and recall that dataflow programming was once invented to address the problem of parallel computing, and discuss the problems with an imperative style, von Neumann programs, and present what they believe are the advantages of using a data flow programming model.
Abstract: This paper presents the OpenDF framework and recalls that dataflow programming was once invented to address the problem of parallel computing. We discuss the problems with an imperative style, von Neumann programs, and present what we believe are the advantages of using a dataflow programming model. The CAL actor language is briefly presented and its role in the ISO/MPEG standard is discussed. The Dataflow Interchange Format (DIF) and related tools can be used for analysis of actors and networks, demonstrating the advantages of a dataflow approach. Finally, an overview of a case study implementing an MPEG- 4 decoder is given.

81 citations


Journal ArticleDOI
TL;DR: The virtual embedded block scheme is proposed to model embedded blocks using existing field-programmable gate array (FPGA) tools and can achieve four times improvement in speed and 25 times reduction in area compared with a traditional FPGA device.
Abstract: This paper presents an architecture for a reconfigurable device that is specifically optimized for floating-point applications. Fine-grained units are used for implementing control logic and bit-oriented operations, while parameterized and reconfigurable word-based coarse-grained units incorporating word-oriented lookup tables and floating-point operations are used to implement datapaths. In order to facilitate comparison with existing FPGA devices, the virtual embedded block scheme is proposed to model embedded blocks using existing field-programmable gate array (FPGA) tools. This methodology involves adopting existing FPGA resources to model the size, position, and delay of the embedded elements. The standard design flow offered by FPGA and computer-aided design vendors is then applied and static timing analysis can be used to estimate the performance of the FPGA with the embedded blocks. On selected floating-point benchmark circuits, our results indicate that the proposed architecture can achieve four times improvement in speed and 25 times reduction in area compared with a traditional FPGA device.

76 citations


Journal ArticleDOI
TL;DR: This survey explores the field of coarse-grained reconfigurable computing on the basis of the hardware aspects of granularity, reconfigurability, and interconnection networks, and identifies the emerging trends of introduction of asynchronous techniques at the architectural level and the use of nano-electronics from technological perspective.

71 citations


01 Oct 2009
TL;DR: The main differences between software-based systems with respect to FPGA- based systems, and the main features for FGPA technology and its real-time applications are focused on.
Abstract: This paper reviews the state of the art of field programmable gate array (FPGA) with the focus on FPGA-based systems. The paper starts with an overview of FPGA in the previous literature, after that starts to get an idea about FPGA programming. FPGA-based neural networks also provided in this paper in order to highlight the best advantage by using FPGA with this type of intelligent systems, and a survey of FPGA-based control systems design with different applications. In this paper, we focus on the main differences between software-based systems with respect to FPGA-based systems, and the main features for FPGA technology and its real-time applications. FPGA-based robotics systems design also provided in this review, finally, the most popular simulation results with FPGA design and implementations are highlighted.

62 citations


Journal ArticleDOI
TL;DR: This work evaluates and compares the effectiveness of common hiding countermeasures against DPA in FPGA-based designs, using the Whirlpool hash function as a case study and develops a new design flow called Isolated WDDL (IWDDL).
Abstract: Security protocols are frequently accelerated by implementing the underlying cryptographic functions in reconfigurable hardware. However, unprotected hardware implementations are susceptible to side-channel attacks, and Differential Power Analysis (DPA) has been shown to be especially powerful. In this work, we evaluate and compare the effectiveness of common hiding countermeasures against DPA in FPGA-based designs, using the Whirlpool hash function as a case study. In particular, we develop a new design flow called Isolated WDDL (IWDDL). In contrast with previous works, IWDDL isolates the direct and complementary circuit paths, and also provides DPA resistance in the Hamming distance power model. The analysis is supported using actual implementation results.

53 citations


Proceedings ArticleDOI
05 Apr 2009
TL;DR: The CFE system is described and the results from several SEU detection circuits that were performed on the spacecraft are described, including those for single-event upset (SEU) monitoring and mitigation.
Abstract: The Cibola Flight Experiment (CFE) is an experimental small satellite developed at the Los Alamos National Laboratory to demonstrate the feasibility of using FPGA-based reconfigurable computing for sensor processing in a space environment. The CFE satellite was launched on March 8, 2007 in low-earth orbit and has operated extremely well since its deployment. The nine Xilinx Virtex FPGAs used in the payload have been used for several high-throughput sensor processing applications and for single-event upset (SEU) monitoring and mitigation. This paper will describe the CFE system and summarize its operational results. In addition, this paper will describe the results from several SEU detection circuits that were performed on the spacecraft.

52 citations


Journal ArticleDOI
TL;DR: This paper proposes a relocation filter that can be implemented both as a hardware and a software component, and can be customized to meet all the different constraints associated with these different target architectures.
Abstract: The research described in this paper shows how the runtime relocation of a reconfigurable component can be obtained using a system component that is able to update the bitstream information, moving the reconfigurable module in the desired position. This scenario defines the so-called partial bitstream relocation activity. This paper proposes a relocation filter that can be implemented both as a hardware and a software component. The former is hosted in the static part of the reconfigurable architecture, while the latter is made to be run on the processor placed on the field-programmable gate array (FPGA). The proposed approach has also been validated over different FPGAs, i.e., Virtex II Pro, Virtex 4, and Virtex 5, proposing a runtime relocation support that can be customized to meet all the different constraints associated with these different target architectures.

51 citations


Journal ArticleDOI
TL;DR: This work will investigate the potential of PRTR on HPRC by formally analyzing the execution model and experimentally verifying the analytical findings by enabling PRTR for the first time, to the best of the knowledge, on one of the current H PRC systems, Cray XD1.
Abstract: Runtime Reconfiguration (RTR) has been traditionally utilized as a means for exploiting the flexibility of High-Performance Reconfigurable Computers (HPRCs). However, the RTR feature comes with the cost of high configuration overhead which might negatively impact the overall performance. Currently, modern FPGAs have more advanced mechanisms for reducing the configuration overheads, particularly Partial Runtime Reconfiguration (PRTR). It has been perceived that PRTR on HPRC systems can be the trend for improving the performance. In this work, we will investigate the potential of PRTR on HPRC by formally analyzing the execution model and experimentally verifying our analytical findings by enabling PRTR for the first time, to the best of our knowledge, on one of the current HPRC systems, Cray XD1. Our approach is general and can be applied to any of the available HPRC systems. The paper will conclude with recommendations and conditions, based on our conceptual and experimental work, for the optimal utilization of PRTR as well as possible future usage in HPRC.

Journal ArticleDOI
TL;DR: A new model for the partitioning and scheduling of a specification on partially dynamically reconfigurable hardware is proposed based on a new graph-theoretic approach, which aims to obtain near optimality even if performed independently from the subsequent phase.
Abstract: This paper proposes a new model for the partitioning and scheduling of a specification on partially dynamically reconfigurable hardware. Although this problem can be solved optimally only by tackling its subproblems jointly, the exceeding complexity of such a task leads to a decomposition into two phases. The partitioning phase is based on a new graph-theoretic approach, which aims to obtain near optimality even if performed independently from the subsequent phase. For the scheduling phase, a new integer linear programming formulation and a heuristic approach are developed. Both take into account configuration prefetching and module reuse. The experimental results show that the proposed method compares favorably with existing solutions.

Proceedings ArticleDOI
09 Dec 2009
TL;DR: This paper presents proof-carrying hardware (PCH) as a novel approach to reconfigurable system security and presents a tool flow and experimental results demonstrating the feasibility and potential of the PCH approach.
Abstract: Dynamically reconfigurable hardware combines hardware performance with software-like flexibility and finds increasing use in networked systems. The capability to load hardware modules at runtime provides these systems with an unparalleled degree of adaptivity, but at the same time poses new challenges for security and safety. In this paper, we present proof-carrying hardware (PCH) as a novel approach to reconfigurable system security. PCH takes a key concept from software security, known as proof-carrying code, into the reconfigurable hardware domain. We outline the PCH concept and discuss runtime combinational equivalence checking as a first verification problem applying the concept. We present a tool flow and experimental results demonstrating the feasibility and potential of the PCH approach.

Proceedings ArticleDOI
05 Apr 2009
TL;DR: This paper presents a partially reconfigurable FPGA-based architecture and methodology to provide increased WSN flexibility and computational resources, resulting in superior power consumption and performance compared to a microprocessor capable of satisfying similar demands.
Abstract: Wireless sensor networks (WSNs) are typicallycomposed of very small, battery-operated devices (sensor nodes) containing simple microprocessors with few computational resources. However, the rapidly increasing popularity of WSNs has placed increased computational demands upon these systems, due to increasingly complex operating environments and enhanced data-sensing technology. Whereas introducing more powerful microprocessors into sensor nodes addressesthese demands, sensor nodes do not contain sufficient energy reserves to support these microprocessors. In this paper, we present a partially reconfigurable FPGA-based architecture and methodology to provide increased WSN flexibility and computational resources, resulting in superior power consumption and performance compared to a microprocessor capable of satisfying similar demands.

Proceedings ArticleDOI
20 Jul 2009
TL;DR: Extensions to OpenMP 3.0 that try to address this second challenge and an implementation in a prototype runtime system are presented and a hybrid host/device operational mode to hide some of these overheads are proposed, significantly improving the performance of the applications.
Abstract: Reconfigurable computing is one of the paths to explore towards low-power supercomputing. However, programming these reconfigurable devices is not an easy task and still requires significant research and development efforts to make it really productive. In addition, the use of these devices as accelerators in multicore, SMPs and ccNUMA architectures adds an additional level of programming complexity in order to specify the offloading of tasks to reconfigurable devices and the interoperability with current shared-memory programming paradigms such as OpenMP. This paper presents extensions to OpenMP 3.0 that try to address this second challenge and an implementation in a prototype runtime system. With these extensions the programmer can easily express the offloading of an already existing reconfigurable binary code (bitstream) hiding all the complexities related with device configuration, bitstream loading, data arrangement and movement to the device memory. Our current prototype implementation targets the SGI Altix systems with RASC blades (based on the Virtex 4 FPGA). We analyze the overheads introduced in this implementation and propose a hybrid host/device operational mode to hide some of these overheads, significantly improving the performance of the applications. A complete evaluation of the system is done with a matrix multiplication kernel, including an estimation considering different FPGA frequencies.

Proceedings ArticleDOI
20 Oct 2009
TL;DR: A structural hardware architecture designed for a small chip area and high speed performance is proposed which provides a low-cost security telecommunication solution while holding or increasing the encryption throughput rate.
Abstract: In this paper, we present a new approach for realtime FPGA implementation of the random key based Lorenz's chaotic generator for data stream encryption. We propose a structural hardware architecture designed for a small chip area and high speed performance. This architecture is particularly attractive since it provides a low-cost security telecommunication solution while holding or increasing the encryption throughput rate. We show its feasibility through implementation which is detailed and presented using Virtex Xilinx FPGA. This architecture employs only 1926 slices and allows achieving a random key throughput rate of 124 Mbps by using a low system clock with a frequency of up to 15,5 MHz allowing low power consumption especially for embedded applications.

Book
17 Feb 2009
TL;DR: Focusing on system-level design and verification techniques, this text allows readers to immediately grasp concepts and put them into practice.
Abstract: Focusing on system-level design and verification techniques, this text allows readers to immediately grasp concepts and put them into practice. It starts with an overview of reconfigurable computing architectures and platforms and demonstrates how to develop reconfigurable systems. This sets up the discussion of the hardware, software, and system techniques that form the core of the text. The authors classify design and verification techniques into primary and secondary categories, allowing the appropriate ones to be easily located and compared. The techniques discussed range from system modeling and system-level design to co-simulation and formal verification. Case studies illustrate real-world applications. Multi Copy

Journal ArticleDOI
TL;DR: A virtual hardware mechanism, including the logic virtualization and the hardware device virtualization, is proposed, for dynamically partially reconfigurable systems, which can reduce up to 26% of the time required by using the conventional hardware reuse.
Abstract: The dynamic partial reconfiguration technology enables an embedded system to adapt its hardware functionalities at run-time to changing environment conditions. However, reconfigurable hardware functions are still managed as conventional hardware devices, and the enhancement of system performance using the partial reconfiguration technology is thus still limited. To further raise the utilization of reconfigurable hardware designs, we propose a virtual hardware mechanism, including the logic virtualization and the hardware device virtualization, for dynamically partially reconfigurable systems. Using the logic virtualization technique, a hardware function that has been configured in the field-programmable gate array (FPGA) can be virtualized to support more than one software application at run-time. Using the hardware device virtualization, a software application can access two or more different hardware functions through the same device node. In a network security reconfigurable system for multimedia applications, our experimental results also demonstrate that the utilization of reconfigurable hardware functions can be further raised using the virtual hardware mechanism. Furthermore, the virtual hardware mechanism can also reduce up to 26% of the time required by using the conventional hardware reuse.

Proceedings ArticleDOI
29 Sep 2009
TL;DR: This paper has implemented a cooperative scheduling technique for reconfigurable hardware threads as a feasible compromise between computational efficiency and implementation complexity and evaluated its overheads and performance on a prototype.
Abstract: Preemptive multitasking, a popular technique for timesharing of computational resources in software-based systems, faces considerable difficulties when applied to partially reconfigurable hardware. In this paper, we propose a cooperative scheduling technique for reconfigurable hardware threads as a feasible compromise between computational efficiency and implementation complexity. We have implemented this mechanism for the multithreaded reconfigurable operating system ReconOS and evaluated its overheads and performance on a prototype.

Proceedings ArticleDOI
22 Feb 2009
TL;DR: An analytical model that relates the architectural parameters of an FPGA to the average prerouting wirelength of anFPGA implementation and two applications of the model to FPGAs architectural design are presented.
Abstract: This paper describes an analytical model that relates the architectural parameters of an FPGA to the average prerouting wirelength of an FPGA implementation. Both homogeneous and heterogeneous FPGAs are considered. For homogeneous FPGAs, the model relates the lookup-table size, the cluster size, and the number of inputs per cluster to the expected wirelength. For heterogeneous FPGAs, the number and positioning of the embedded blocks, as well as the number of pins on each embedded block is considered. Two applications of the model to FPGA architectural design are also presented.

Proceedings ArticleDOI
29 Jul 2009
TL;DR: A platform based on the electronic DNA (eDNA) is proposed and its capabilities are shown through simulation, its capabilities as a new generation of robust reconfigurable hardware platforms.
Abstract: This paper presents the concept of a biological inspired reconfigurable hardware cell architecture which supports self-organisation and self-healing. Two fundamental processes in biology, namely fertilization-to-birth and cell self-healing have inspired the development of this cell architecture. In biology as well as in our hardware cell architecture it is the DNA which enables these processes. We propose a platform based on the electronic DNA (eDNA) and show through simulation, its capabilities as a new generation of robust reconfigurable hardware platforms. We have created a Java based simulator to simulate our self-organisation and self-healing algorithms and the results obtained from this looks promising.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: The analyses and optimizations of the CHiMPS compiler that construct many-cache caches are presented, showing a performance advantage of 7.8x over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater.
Abstract: Many-cache is a memory architecture that efficiently supports caching in commercially available FPGAs. It facilitates FPGA programming for high-performance computing (HPC) developers by providing them with memory performance that is greater and power consumption that is less than their current CPU platforms, but without sacrificing their familiar, C-based programming environment.Many-cache creates multiple, multi-banked caches on top of an FGPA's small, independent memories, each targeting a particular data structure or region of memory in an application and each customized for the memory operations that access it. The caches are automatically generated from C source by the CHiMPS C-to-FPGA compiler.This paper presents the analyses and optimizations of the CHiMPS compiler that construct many-cache caches. An architectural evaluation of CHiMPS-generated FPGAs demonstrates a performance advantage of 7.8x (geometric mean) over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater, by a geometric mean of 21.3x.

Journal ArticleDOI
TL;DR: A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations.
Abstract: Biological sequence alignment is an essential tool used in molecular biology and biomedical applications. The growing volume of genetic data and the complexity of sequence alignment present a challenge in obtaining alignment results in a timely manner. Knownmethods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.

Proceedings ArticleDOI
22 Feb 2009
TL;DR: A high-performance system architecture that is based on the Intel® Xeon® platform in which one or more FPGAs, acting as application accelerators, replace oneor more processors in a dual/multi-processor (DP/MP) platform is described.
Abstract: Growing demand for energy-efficient, high-performance systems has resulted in the growth of innovative heterogeneous computing system architectures that use FPGAs. FPGA-based architectures enable designers to implement custom instruction streams executing on potentially thousands of compute elements. Traditionally, FPGAs have been used as compute elements on PCI devices; however, this does not allow the FPGAs to be co-processors. This paper describes a high-performance system architecture that is based on the Intel® Xeon® platform in which one or more FPGAs, acting as application accelerators, replace one or more processors in a dual/multi-processor (DP/MP) platform. The FPGA is thus connected directly to the Front Side Bus (FSB) and enjoys the same privileges as a processor, i.e., full participation in the coherency protocol, unrestricted access to system memory and to other processors via the high bandwidth, and low latency connection to the FSB. In addition, we also describe a software layer called the "Accelerator Abstraction Layer (AAL)", which provides a uniform, hardware- and/or platform-independent application interface. Applications written on AAL can be ported to multiple platforms that have different types of accelerators and the application does not have to be modified. In addition, the AAL also enables the developer/user to reprogram the FPGA on the fly (analogous to an operating system context switch) thereby utilizing the programmable nature of the FPGA. The resulting hardware/software stack creates a flexible and powerful platform for accelerator innovation and deployment.

Posted Content
TL;DR: The architecture proposed in this paper is an optimal hardware implementation algorithm and takes dynamic partially reconfigurable of FPGA and is good solution to preserve confidentiality and accessibility to the information in the numeric communication.
Abstract: This paper addresses efficient hardware/software implementation approaches for the AES (Advanced Encryption Standard) algorithm and describes the design and performance testing algorithm for embedded system. Also, with the spread of reconfigurable hardware such as FPGAs (Field Programmable Gate Array) embedded cryptographic hardware became cost-effective. Nevertheless, it is worthy to note that nowadays, even hardwired cryptographic algorithms are not so safe. From another side, the self-reconfiguring platform is reported that enables an FPGA to dynamically reconfigure itself under the control of an embedded microprocessor. Hardware acceleration significantly increases the performance of embedded systems built on programmable logic. Allowing a FPGA-based MicroBlaze processor to self-select the coprocessors uses can help reduce area requirements and increase a system's versatility. The architecture proposed in this paper is an optimal hardware implementation algorithm and takes dynamic partially reconfigurable of FPGA. This implementation is good solution to preserve confidentiality and accessibility to the information in the numeric communication.

Journal ArticleDOI
TL;DR: In this paper, a dynamically variable step search (DVSS) algorithm for processing high definition video formats and a dynamically reconfigurable hardware architecture for efficiently implementing DVSS algorithm are presented.
Abstract: Motion Estimation (ME) is the most computationally intensive part of video compression and video enhancement systems. For the recently available high definition (HD) video formats, the computational complexity of full search (FS) ME algorithm is prohibitively high, whereas the PSNR obtained by fast search ME algorithms is low. Therefore, in this paper, we present dynamically variable step search (DVSS) ME algorithm for processing high definition video formats and a dynamically reconfigurable hardware architecture for efficiently implementing DVSS algorithm. The simulation results showed that DVSS algorithm performs very close to FS algorithm by searching much fewer search locations than FS algorithm and it outperforms successful fast search ME algorithms by searching more search locations than these algorithms. The proposed hardware is implemented in VHDL and is capable of processing high definition video formats in real time. Therefore, it can be used in consumer electronics products for video compression, frame rate up-conversion and de-interlacing.

Journal ArticleDOI
TL;DR: The RC Amenability Test, or RAT, a methodology and model developed for this purpose, supporting rapid exploration and prediction of strategic design tradeoffs during the formulation stage of application development is presented.
Abstract: While the promise of achieving speedup and additional benefits such as high performance per watt with FPGAs continues to expand, chief among the challenges with the emerging paradigm of reconfigurable computing is the complexity in application design and implementation. Before a lengthy development effort is undertaken to map a given application to hardware, it is important that a high-level parallel algorithm crafted for that application first be analyzed relative to the target platform, so as to ascertain the likelihood of success in terms of potential speedup. This article presents the RC Amenability Test, or RAT, a methodology and model developed for this purpose, supporting rapid exploration and prediction of strategic design tradeoffs during the formulation stage of application development.

Proceedings ArticleDOI
05 Apr 2009
TL;DR: This paper compares the system performance of three alternate policies for reconfigurable hardware kernel preemption in a multi-process system: block, drop and roll, and finds the best-performing policy is able to achieve on average within 4% of the performance of an idealized, zero-overhead save and restore method on a mixed application workload.
Abstract: preemption in multi-tasking operating systems. Multi-tasking, and by consequence, preemption, is key to effective CPU sharing. However, it is much more expensive to save and restore context data in reconfigurable hardware than it is in traditional software. The configuration and current state comprises a large amount of data, making the transfer a long and expensive operation. In this paper, we explore alternatives to the save and restore operation for hardware multi-tasking. We compare the system performance of three alternate policies for reconfigurable hardware kernel preemption in a multi-process system: block, drop and roll. The best-performing policy is able to achieve on average within 4% of the performance of an idealized, zero-overhead save and restore method on a mixed application workload.

Proceedings ArticleDOI
22 Feb 2009
TL;DR: It is shown that timing performed in the FPGA can achieve a resolution that is suitable for small-animal scanners, and will outperform the analog version given a low enough sampling period for the ADC.
Abstract: Modern Field Programmable Gate Arrays (FPGAs) are capable of performing complex discrete signal processing algorithms with clock rates above 100MHz. This combined with FPGA's low expense, ease of use, and selected dedicated hardware make them an ideal technology for a data acquisition system for positron emission tomography (PET) scanner. Our laboratory is producing a high-resolution, small-animal PET scanner that utilizes FPGAs as the core of the front-end electronics. For this next generation scanner, functions that are typically performed in dedicated circuits, or offline, are being migrated to the FPGA. This will not only simplify the electronics, but the features of modern FPGAs can be utilizes to add significant signal processing power to produce higher resolution images. In this paper two such processes, sub-clock rate pulse timing and event localization, will be discussed in detail. We show that timing performed in the FPGA can achieve a resolution that is suitable for small-animal scanners, and will outperform the analog version given a low enough sampling period for the ADC. We will also show that the position of events in the scanner can be determined in real time using a statistical positioning based algorithm.

Journal ArticleDOI
TL;DR: This work presents hardware decompression accelerators for widening the bottleneck between slow nonvolatile memories on the one side and high-speed FPGA configuration interfaces and fast softcore CPUs on the other side and discusses different compression algorithms suitable for a hardware accelerated decompression on FPGAs as well as on CPLDs.
Abstract: In this work, we present hardware decompression accelerators for widening the bottleneck between slow nonvolatile memories on the one side and high-speed FPGA configuration interfaces and fast softcore CPUs on the other side. We discuss different compression algorithms suitable for a hardware accelerated decompression on FPGAs as well as on CPLDs. The algorithms will be investigated with respect to the achievable compression ratio, throughput, and hardware overhead. This leads to various decompressor implementations with one capable to decompress at high data rates of up to 400 megabytes per second under optimal conditions while only requiring slightly more than a hundred lookup tables. We will evaluate how these decompressors perform on configuration bitstreams for different FPGAs as well as for softcore CPU binaries.