scispace - formally typeset
Search or ask a question
Author

Blair Fort

Bio: Blair Fort is an academic researcher from University of Toronto. The author has contributed to research in topics: High-level synthesis & Debugging. The author has an hindex of 6, co-authored 11 publications receiving 542 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This work uses a first-published methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.
Abstract: High-level synthesis (HLS) is increasingly popular for the design of high-performance and energy-efficient heterogeneous systems, shortening time-to-market and addressing today’s system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hardware functionality. Additionally, HLS is particularly interesting for designing field-programmable gate array circuits, where hardware implementations can be easily refined and replaced in the target device. Recent years have seen much activity in the HLS research community, with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, perform different internal optimizations, and produce results of different quality, even for the very same input description. Hence, it is challenging to compare their performance and understand which is the best for the hardware to be implemented. We present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research community. We also present a first-published methodology to evaluate different HLS tools. We use our methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.

433 citations

Proceedings ArticleDOI
24 Apr 2006
TL;DR: A multithreaded (MT) soft processor for area reduction in SoPC implementations is presented, which can achieve an area savings of about 45% for the processor itself in addition to the area savings due to not replicating CI logic blocks.
Abstract: The growth in size and performance of Field Programmable Gate Arrays (FPGAs) has compelled System-on-a- Programmable-Chip (SoPC) designers to use soft processors for controlling systems with large numbers of intellectual property (IP) blocks. Soft processors control IP blocks, which are accessed by the processor either as peripheral devices or/and by using custom instructions (CIs). In large systems, chip multiprocessors (CMPs) are used to execute many programs concurrently. When these programs require the use of the same IP blocks which are accessed as peripheral devices, they may have to stall waiting for their turn. In the case of CIs, the FPGA logic blocks that implement the CIs may have to be replicated for each processor. In both of these cases FPGA area is wasted, either by idle soft processors or the replication of CI logic blocks. This paper presents a multithreaded (MT) soft processor for area reduction in SoPC implementations. AnMT processor allows multiple programs to access the same IP without the need for the logic replication or the replication of whole processors. We first designed a single-threaded processor that is instruction-set compatible to Altera?s Nios II soft processor. Our processor is approximately the same size as the Nios II Economy version, with equivalent performance. We augmented our processor to have 4-way interleaved multithreading capabilities. This paper compares the area usage and performance of the MT processor versus two CMP systems, using Altera?s and our single-threaded processors, separately. Our results show that we can achieve an area savings of about 45% for the processor itself, in addition to the area savings due to not replicating CI logic blocks.

80 citations

Proceedings ArticleDOI
29 Sep 2013
TL;DR: This paper presents on overview of the LegUp design methodology and system architecture, and discusses ongoing work on profiling, hardware/software partitioning, hardware accelerator quality improvements, Pthreads/OpenMP support, visualization tools, and debugging support.
Abstract: Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level synthesis framework that simplifies the hardware accelerator design process [8]. With LegUp, a designer can start from an embedded application running on a processor and incrementally migrate portions of the program to hardware accelerators implemented on an FPGA. The final application then executes on an automatically-generated software/hardware coprocessor system. This paper presents on overview of the LegUp design methodology and system architecture, and discusses ongoing work on profiling, hardware/software partitioning, hardware accelerator quality improvements, Pthreads/OpenMP support, visualization tools, and debugging support.

56 citations

Proceedings ArticleDOI
26 Aug 2014
TL;DR: The LegUp framework is overviewed and support for an embedded ARM processor, as is available on Altera's recently released SoC FPGA, HLS support for software parallelization schemes -- pthreads and OpenMP, and a preliminary debugging and verification framework providing C source-level debugging of HLS hardware are described.
Abstract: LegUp [1] is an open-source high-level synthesis (HLS) tool that accepts a C program as input and automatically synthesizes it into a hybrid system. The hybrid system comprises an embedded processor and custom accelerators that realize user-designated compute-intensive parts of the program with improved throughput and energy efficiency. In this paper, we overview the LegUp framework and describe several recent developments: 1) support for an embedded ARM processor, as is available on Altera's recently released SoC FPGA, 2) HLS support for software parallelization schemes -- pthreads and OpenMP, 3) enhancements to LegUp's core HLS algorithms that raise the quality of the auto-generated hardware, and, 4) a preliminary debugging and verification framework providing C source-level debugging of HLS hardware. Since its first release in 2011, LegUp has been downloaded over 1000 times by groups around the world, providing a powerful platform for new research in high-level synthesis algorithms and embedded systems design.

31 citations

Proceedings ArticleDOI
04 Apr 2005
TL;DR: The UT Nios implementation of Altera's Nios architecture is described and a benchmark set appropriate for soft-core processors is defined.
Abstract: Soft-core processors exploit the flexibility of field programmable gate arrays (FPGAs) to allow a system designer to customize the processor to the needs of a target application. This paper describes the UT Nios implementation of Altera's Nios architecture. A benchmark set appropriate for soft-core processors is defined. Using the benchmark set, the performance of UT Nios is explored and compared with the commercial implementation.

25 citations


Cited by
More filters
Journal ArticleDOI
15 Apr 2015
TL;DR: This work surveys the field of reconfigurable computing, providing a guide to the body-of-knowledge accumulated in architecture, compute models, tools, run-time reconfiguration, and applications.
Abstract: Reconfigurable architectures can bring unique capabilities to computational tasks. They offer the performance and energy efficiency of hardware with the flexibility of software. In some domains, they are the only way to achieve the required, real-time performance without fabricating custom integrated circuits. Their functionality can be upgraded and repaired during their operational lifecycle and specialized to the particular instance of a task. We survey the field of reconfigurable computing, providing a guide to the body-of-knowledge accumulated in architecture, compute models, tools, run-time reconfiguration, and applications.

178 citations

Proceedings ArticleDOI
19 Oct 2008
TL;DR: A SPARC-based processor with predictable timing and instruction-set extensions that provide precise timing control is described, and the effectiveness of this precision-timed (PRET) architecture is demonstrated through example applications running in simulation.
Abstract: In a hard real-time embedded system, the time at which a result is computed is as important as the result itself. Modern processors go to extreme lengths to ensure their function is predictable, but have abandoned predictable timing in favor of average-case performance. Real-time operating systems provide timing-aware scheduling policies, but without precise worst-case execution time bounds they cannot provide guarantees.We describe an alternative in this paper: a SPARC-based processor with predictable timing and instruction-set extensions that provide precise timing control. Its pipeline executes multiple, independent hardware threads to avoid costly, unpredictable bypassing, and its exposed memory hierarchy provides predictable latency. We demonstrate the effectiveness of this precision-timed (PRET) architecture through example applications running in simulation.

171 citations

Proceedings ArticleDOI
11 Jun 2018
TL;DR: This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning.
Abstract: Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for productivity and are difficult to target from higher level languages. HLS tools are more productive, but offer an ad-hoc mix of software and hardware abstractions which make performance optimizations difficult. In this work, we describe a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators. We describe Spatial's hardware-centric abstractions for both programmer productivity and design performance, and summarize the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning. We demonstrate the language's ability to target FPGAs and CGRAs from common source code. We show that applications written in Spatial are, on average, 42% shorter and achieve a mean speedup of 2.9x over SDAccel HLS when targeting a Xilinx UltraScale+ VU9P FPGA on an Amazon EC2 F1 instance.

154 citations

Proceedings ArticleDOI
21 Feb 2010
TL;DR: A new design is introduced that efficiently combines block RAMs into multi-ported memories with arbitrary numbers of read and write ports and true random access to any memory location, while achieving significantly higher operating frequencies than conventional approaches.
Abstract: Multi-ported memories are challenging to implement with FPGAs since the provided block RAMs typically have only two ports. We present a thorough exploration of the design space of FPGA-based soft multi-ported memories by evaluating conventional solutions to this problem, and introduce a new design that efficiently combines block RAMs into multi-ported memories with arbitrary numbers of read and write ports and true random access to any memory location, while achieving significantly higher operating frequencies than conventional approaches. For example we build a 256-location, 32-bit, 12-ported (4-write, 8-read) memory that operates at 281 MHz on Altera Stratix III FPGAs while consuming an area equivalent to 3679 ALMs: a 43% speed improvement and 84% area reduction over a pure ALM implementation, and a 61% speed improvement over a pure "multipumped" implementation, although the pure multipumped implementation is 7.2x smaller.

132 citations