scispace - formally typeset
Search or ask a question
Author

James Coole

Other affiliations: Research Triangle Park
Bio: James Coole is an academic researcher from University of Florida. The author has contributed to research in topics: Cache & Tree traversal. The author has an hindex of 9, co-authored 13 publications receiving 297 citations. Previous affiliations of James Coole include Research Triangle Park.

Papers
More filters
Proceedings ArticleDOI
24 Oct 2010
TL;DR: In this paper, a virtual reconfigurable architectures for different application domains, implemented on top of commercial off-the-shelf (COTS) devices, is proposed to hide the complexity of fine-grained physical devices and enable circuit portability across all devices that implement the intermediate fabric.
Abstract: Although hardware/software partitioning of embedded applications onto FPGAs is widely known to have performance and power advantages, FPGA usage has been typically limited to hardware experts, due largely to several problems: 1) difficulty of integrating hardware design tools into well-established software tool flows, 2) increasingly lengthy FPGA design iterations due to placement and routing, and 3) a lack of portability and interoperability resulting from device/platform-specific tools and bitfiles. In this paper, we directly address the last two problems by introducing intermediate fabrics, which are virtual reconfigurable architectures specialized for different application domains, implemented on top of commercial-off-the-shelf devices. Such specialization enables near-instantaneous placement and routing by hiding the complexity of fine-grained physical devices, while also enabling circuit portability across all devices that implement the intermediate fabric. When combined with existing work on runtime synthesis from software binaries, intermediate fabrics reduce the effects of all three problems by enabling transparent usage of COTS FPGAs by software designers. In this paper, we explore intermediate fabric architectures using specialization techniques to minimize area and performance overhead of the virtual fabric while maximizing routability and speedup of placement and routing. We present results showing an average placement and routing speedup of 554x, with an average area overhead of 10% and clock overhead of 18%, which corresponds to an average frequency of 195 MHz.

103 citations

Journal ArticleDOI
TL;DR: In this letter, virtual reconfigurable architectures, referred to as intermediate fabrics, are evaluated, which enable near-instant placement and routing of applications for commercial FPGAs.
Abstract: Field-programmable gate arrays (FPGAs) suffer from lower application design productivity than other devices, which is largely due to compilation taking hours or even days. Making FPGA compilation comparable to software compilation is critical for continued FPGA usage due to competitive technologies, such as graphics-processing units, that use languages with runtime compilation models. In this letter, we evaluate virtual reconfigurable architectures, referred to as intermediate fabrics, which enable near-instant placement and routing of applications for commercial FPGAs.

46 citations

Journal ArticleDOI
TL;DR: A back-end synthesis approach for potentially any OpenCL tool that uses virtual coarse-grained reconfiguration contexts to speed up compilation by 4,211× at a cost of 1.8× system resource overhead, while also enabling 144× faster reconfigurations to support different kernels and rapid changes to kernels.
Abstract: High-level synthesis from OpenCL has shown significant potential, but current approaches conflict with mainstream OpenCL design methodologies owing to orders-of-magnitude longer field-programmable gate array compilation times and limited support for changing or adding kernels after system compilation. In this article, the authors introduce a back-end synthesis approach for potentially any OpenCL tool. This approach uses virtual coarse-grained reconfiguration contexts to speed up compilation by 4,211× at a cost of 1.8× system resource overhead, while also enabling 144× faster reconfiguration to support different kernels and rapid changes to kernels.

35 citations

Proceedings ArticleDOI
02 May 2015
TL;DR: This paper introduces a family of overlay architectures called super nets and an associated design methodology that uses data path merging to provide minimal-overhead support for multiple source net lists, and optionally provides an adjustable amount of source flexibility through a secondary interconnect network.
Abstract: Previous work has shown that virtual architectures, or overlays, can greatly reduce lengthy FPGA compile times by providing application-specialized resources along with a flexible interconnect to support application changes. However, retaining full configurability of interconnect has also required significant area overhead. In this paper, we introduce a family of overlay architectures called super nets and an associated design methodology that uses data path merging to provide minimal-overhead support for multiple source net lists, and optionally provides an adjustable amount of source flexibility through a secondary interconnect network. We demonstrate that super nets can enable runtime compilation up to 13,000× faster than direct register-transfer logic (RTL) implementation, with up to 70% lower area than selectively enabled RTL data paths. Finally, we explore the design space of this family of overlays and show that it affords significant freedom to trade additional area for increased flexibility to support deviations from the source set, as introduced during development or by optimizations performed at runtime.

31 citations

Proceedings ArticleDOI
07 Oct 2012
TL;DR: This paper introduces Block Place and Route (BPR), an FPGA CAD approach that modifies traditional place-and-route algorithms to operate at a higher-level of abstraction by pre-computing the internal placement and routing of reused cores.
Abstract: Numerous studies have shown the advantages of hardware and software co-design using FPGAs. However, increasingly lengthy place-and-route times represent a barrier to the broader adoption of this technology by significantly reducing designer productivity and turns-per-day, especially compared to more traditional design environments offered by competitive technologies such as GPUs. In this paper, we address this challenge by introducing a new approach to FPGA application design that significantly reduces compile times by exploiting the functional reuse common throughout modern FPGA applications, e.g. as shared code libraries and unchanged modules between compiles. To evaluate this approach, we introduce Block Place and Route (BPR), an FPGA CAD approach that modifies traditional placement and routing to operate at a higher-level of abstraction by pre-computing the internal placement and routing of reused cores. By extending traditional place-and-route algorithms such as simulated-annealing placement and negotiated-congestion routing to abstract away the detailed implementation of reused cores, we show that BPR is capable of orders-of-magnitude speedup in place-and-route over commercial tools with acceptably low overhead for a variety of applications.

23 citations


Cited by
More filters
Book
01 Jan 2007
TL;DR: Digital Hardware Evolution.
Abstract: Digital Hardware Evolution.- An Online EHW Pattern Recognition System Applied to Sonar Spectrum Classification.- Design of Electronic Circuits Using a Divide-and-Conquer Approach.- Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel.- An Intrinsic Evolvable Hardware Based on Multiplexer Module Array.- Estimating Array Connectivity and Applying Multi-output Node Structure in Evolutionary Design of Digital Circuits.- Research on the Online Evaluation Approach for the Digital Evolvable Hardware.- Research on Multi-objective On-Line Evolution Technology of Digital Circuit Based on FPGA Model.- Evolutionary Design of Generic Combinational Multipliers Using Development.- Analog Hardware Evolution.- Automatic Synthesis of Practical Passive Filters Using Clonal Selection Principle-Based Gene Expression Programming.- Research on Fault-Tolerance of Analog Circuits Based on Evolvable Hardware.- Analog Circuit Evolution Based on FPTA-2.- Bio-inspired Systems.- Knowledge Network Management System with Medicine Self Repairing Strategy.- Design of a Cell in Embryonic Systems with Improved Efficiency and Fault-Tolerance.- Design on Operator-Based Reconfigurable Hardware Architecture and Cell Circuit.- Bio-inspired Systems with Self-developing Mechanisms.- Development of a Tiny Computer-Assisted Wireless EEG Biofeedback System.- Steps Forward to Evolve Bio-inspired Embryonic Cell-Based Electronic Systems.- Evolution of Polymorphic Self-checking Circuits.- Mechanical Hardware Evolution.- Sliding Algorithm for Reconfigurable Arrays of Processors.- System-Level Modeling and Multi-objective Evolutionary Design of Pipelined FFT Processors for Wireless OFDM Receivers.- Reducing the Area on a Chip Using a Bank of Evolved Filters.- Evolutionary Design.- Walsh Function Systems: The Bisectional Evolutional Generation Pattern.- Extrinsic Evolvable Hardware on the RISA Architecture.- Evolving and Analysing "Useful" Redundant Logic.- Adaptive Transmission Technique in Underwater Acoustic Wireless Communication.- Autonomous Robot Path Planning Based on Swarm Intelligence and Stream Functions.- Research on Adaptive System of the BTT-45 Air-to-Air Missile Based on Multilevel Hierarchical Intelligent Controller.- The Design of an Evolvable On-Board Computer.- Evolutionary Algorithms in Hardware Design.- Extending Artificial Development: Exploiting Environmental Information for the Achievement of Phenotypic Plasticity.- UDT-Based Multi-objective Evolutionary Design of Passive Power Filters of a Hybrid Power Filter System.- Designing Electronic Circuits by Means of Gene Expression Programming II.- Designing Polymorphic Circuits with Evolutionary Algorithm Based on Weighted Sum Method.- Robust and Efficient Multi-objective Automatic Adjustment for Optical Axes in Laser Systems Using Stochastic Binary Search Algorithm.- Minimization of the Redundant Sensor Nodes in Dense Wireless Sensor Networks.- Evolving in Extended Hamming Distance Space: Hierarchical Mutation Strategy and Local Learning Principle for EHW.- Hardware Implementation of Evolutionary Algorithms.- Adaptive and Evolvable Analog Electronics for Space Applications.- Improving Flexibility in On-Line Evolvable Systems by Reconfigurable Computing.- Evolutionary Design of Resilient Substitution Boxes: From Coding to Hardware Implementation.- A Sophisticated Architecture for Evolutionary Multiobjective Optimization Utilizing High Performance DSP.- FPGA-Based Genetic Algorithm Kernel Design.- Using Systolic Technique to Accelerate an EHW Engine for Lossless Image Compression.

231 citations

Journal ArticleDOI
TL;DR: This work reviews FPGA reconfiguration, looking at architectures built for the purpose, and the properties of modern commercial architectures, and investigates design flows and identifies the key challenges in making reconfigurable FPGAs systems easier to design.
Abstract: Dynamic and partial reconfiguration are key differentiating capabilities of field programmable gate arrays (FPGAs). While they have been studied extensively in academic literature, they find limited use in deployed systems. We review FPGA reconfiguration, looking at architectures built for the purpose, and the properties of modern commercial architectures. We then investigate design flows and identify the key challenges in making reconfigurable FPGA systems easier to design. Finally, we look at applications where reconfiguration has found use, as well as proposing new areas where this capability places FPGAs in a unique position for adoption.

122 citations

Journal ArticleDOI
27 Feb 2018
TL;DR: It is demonstrated how system designers can exploit hybrid and reconfigurable computing on SmallSats to harness these advantages for a variety of purposes, and several recent missions by NASA and industry that feature these principles and technologies are highlighted.
Abstract: Due to the increasing demands of onboard sensor and autonomous processing, one of the principal needs and challenges for future spacecraft is onboard computing. Space computers must provide high performance and reliability (which are often at odds), using limited resources (power, size, weight, and cost), in an extremely harsh environment (due to radiation, temperature, vacuum, and vibration). As spacecraft shrink in size, while assuming a growing role for science and defense missions, the challenges for space computing become particularly acute. For example, processing capabilities on CubeSats (smaller class of SmallSats) have been extremely limited to date, often featuring microcontrollers with performance and reliability barely sufficient to operate the vehicle let alone support various sensor and autonomous applications. This article surveys the challenges and opportunities of onboard computers for small satellites (SmallSats) and focuses upon new concepts, methods, and technologies that are revolutionizing their capabilities, in terms of two guiding themes: hybrid computing and reconfigurable computing. These innovations are of particular need and value to CubeSats and other Smallsats. With new technologies, such as CHREC Space Processor (CSP), we demonstrate how system designers can exploit hybrid and reconfigurable computing on SmallSats to harness these advantages for a variety of purposes, and we highlight several recent missions by NASA and industry that feature these principles and technologies.

101 citations

Proceedings ArticleDOI
08 Oct 2018
TL;DR: AMORPHOS is presented, which encapsulates user FPGA logic in morphable tasks, or Morphlets, which provides isolation and protection across mutually distrustful protection domains, extending the guarantees of software processes.
Abstract: Cloud providers such as Amazon and Microsoft have begun to support on-demand FPGA acceleration in the cloud, and hardware vendors will support FPGAs in future processors. At the same time, technology advancements such as 3D stacking, through-silicon vias (TSVs), and FinFETs have greatly increased FPGA density. The massive parallelism of current FPGAs can support not only extremely large applications, but multiple applications simultaneously as well.System support for FPGAs, however, is in its infancy. Unlike software, where resource configurations are limited to simple dimensions of compute, memory, and I/O, FPGAs provide a multi-dimensional sea of resources known as the FPGA fabric: logic cells, floating point units, memories, and I/O can all be wired together, leading to spatial constraints on FPGA resources. Current stacks either support only a single application or statically partition the FPGA fabric into fixed-size slots. These designs cannot efficiently support diverse workloads: the size of the largest slot places an artificial limit on application size, and oversized slots result in wasted FPGA resources and reduced concurrency.This paper presents AMORPHOS, which encapsulates user FPGA logic in morphable tasks, or Morphlets. Morphlets provide isolation and protection across mutually distrustful protection domains, extending the guarantees of software processes. Morphlets can morph, dynamically altering their deployed form based on resource requirements and availability. To build Morphlets, developers provide a parameterized hardware design that interfaces with AMORPHOS, along with a mesh, which specifies external resource requirements. AMORPHOS explores the parameter space, generating deployable Morphlets of varying size and resource requirements. AMORPHOS multiplexes Morphlets on the FPGA in both space and time to maximize FPGA utilization.We implement AMORPHOS on Amazon F1 [1] and Microsoft Catapult [92]. We show that protected sharing and dynamic scalability support on workloads such as DNN inference and blockchain mining improves aggregate throughput up to 4× and 23× on Catapult and F1 respectively.

98 citations

Proceedings ArticleDOI
01 May 2011
TL;DR: This work presents results from creating a new FPGA design flow based on hard macros called HMF low, designed for rapid prototyping that has shown speedups of 10-50X over the fastest configuration of the Xilinx tools.
Abstract: The FPGA compilation process (synthesis, map, place, and route) is a time consuming task that severely limits designer productivity. Compilation time can be reduced by saving implementation data in the form of hard macros. Hard macros consist of previously synthesized, placed and routed circuits that enable rapid design assembly because of the native FPGA circuitry (primitives and nets)which they encapsulate. This work presents results from creating a new FPGA design flow based on hard macros called HMF low. HMF low has shown speedups of 10-50X over the fastest configuration of the Xilinx tools. Designed for rapid prototyping, HMF low achieves these speedups by only utilizing up to 50 percent of the resources on an FPGA and produces implementations that run 2-4X slower than those produced by Xilinx. These speedups are obtained on a wide range of benchmark designs with some exceeding 18,000 slices on a Virtex 4 LX200.

97 citations