Home
/
Authors
/
Blair Fort

Author

Blair Fort

Bio: Blair Fort is an academic researcher from University of Toronto. The author has contributed to research in topics: High-level synthesis & Debugging. The author has an hindex of 6, co-authored 11 publications receiving 542 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Survey and Evaluation of FPGA High-Level Synthesis Tools

[...]

Razvan Nane¹, Vlad-Mihai Sima¹, Christian Pilato², Jongsok Choi³, Blair Fort³, Andrew Canis³, Yu Ting Chen³, Hsuan Hsiao³, Stephen J. Brown³, Fabrizio Ferrandi², Jason H. Anderson³, Koen Bertels¹ - Show less +8 more•Institutions (3)

Delft University of Technology¹, Polytechnic University of Milan², University of Toronto³

01 Oct 2016-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This work uses a first-published methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.

...read moreread less

Abstract: High-level synthesis (HLS) is increasingly popular for the design of high-performance and energy-efficient heterogeneous systems, shortening time-to-market and addressing today’s system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hardware functionality. Additionally, HLS is particularly interesting for designing field-programmable gate array circuits, where hardware implementations can be easily refined and replaced in the target device. Recent years have seen much activity in the HLS research community, with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, perform different internal optimizations, and produce results of different quality, even for the very same input description. Hence, it is challenging to compare their performance and understand which is the best for the hardware to be implemented. We present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research community. We also present a first-published methodology to evaluate different HLS tools. We use our methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.

...read moreread less

433 citations

Proceedings Article•DOI•

A Multithreaded Soft Processor for SoPC Area Reduction

[...]

Blair Fort¹, Davor Capalija¹, Zvonko G. Vranesic¹, Stephen J. Brown¹•Institutions (1)

University of Toronto¹

24 Apr 2006

TL;DR: A multithreaded (MT) soft processor for area reduction in SoPC implementations is presented, which can achieve an area savings of about 45% for the processor itself in addition to the area savings due to not replicating CI logic blocks.

...read moreread less

Abstract: The growth in size and performance of Field Programmable Gate Arrays (FPGAs) has compelled System-on-a- Programmable-Chip (SoPC) designers to use soft processors for controlling systems with large numbers of intellectual property (IP) blocks. Soft processors control IP blocks, which are accessed by the processor either as peripheral devices or/and by using custom instructions (CIs). In large systems, chip multiprocessors (CMPs) are used to execute many programs concurrently. When these programs require the use of the same IP blocks which are accessed as peripheral devices, they may have to stall waiting for their turn. In the case of CIs, the FPGA logic blocks that implement the CIs may have to be replicated for each processor. In both of these cases FPGA area is wasted, either by idle soft processors or the replication of CI logic blocks. This paper presents a multithreaded (MT) soft processor for area reduction in SoPC implementations. AnMT processor allows multiple programs to access the same IP without the need for the logic replication or the replication of whole processors. We first designed a single-threaded processor that is instruction-set compatible to Altera?s Nios II soft processor. Our processor is approximately the same size as the Nios II Economy version, with equivalent performance. We augmented our processor to have 4-way interleaved multithreading capabilities. This paper compares the area usage and performance of the MT processor versus two CMP systems, using Altera?s and our single-threaded processors, separately. Our results show that we can achieve an area savings of about 45% for the processor itself, in addition to the area savings due to not replicating CI logic blocks.

...read moreread less

80 citations

Proceedings Article•DOI•

From software to accelerators with LegUp high-level synthesis

[...]

Andrew Canis¹, Jongsok Choi¹, Blair Fort¹, Ruolong Lian¹, Qijing Huang¹, Nazanin Calagar¹, Marcel Gort¹, Jia Jun Qin¹, Mark Aldham¹, Tomasz Czajkowski¹, Stephen J. Brown¹, Jason H. Anderson¹ - Show less +8 more•Institutions (1)

University of Toronto¹

29 Sep 2013

TL;DR: This paper presents on overview of the LegUp design methodology and system architecture, and discusses ongoing work on profiling, hardware/software partitioning, hardware accelerator quality improvements, Pthreads/OpenMP support, visualization tools, and debugging support.

...read moreread less

Abstract: Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level synthesis framework that simplifies the hardware accelerator design process [8]. With LegUp, a designer can start from an embedded application running on a processor and incrementally migrate portions of the program to hardware accelerators implemented on an FPGA. The final application then executes on an automatically-generated software/hardware coprocessor system. This paper presents on overview of the LegUp design methodology and system architecture, and discusses ongoing work on profiling, hardware/software partitioning, hardware accelerator quality improvements, Pthreads/OpenMP support, visualization tools, and debugging support.

...read moreread less

56 citations

Proceedings Article•DOI•

Automating the Design of Processor/Accelerator Embedded Systems with LegUp High-Level Synthesis

[...]

Blair Fort¹, Andrew Canis¹, Jongsok Choi¹, Nazanin Calagar¹, Ruolong Lian¹, Stefan Hadjis¹, Yu Ting Chen¹, Mathew Hall¹, Bain Syrowik¹, Tomasz Czajkowski, Stephen J. Brown¹, Jason H. Anderson¹ - Show less +8 more•Institutions (1)

University of Toronto¹

26 Aug 2014

TL;DR: The LegUp framework is overviewed and support for an embedded ARM processor, as is available on Altera's recently released SoC FPGA, HLS support for software parallelization schemes -- pthreads and OpenMP, and a preliminary debugging and verification framework providing C source-level debugging of HLS hardware are described.

...read moreread less

Abstract: LegUp [1] is an open-source high-level synthesis (HLS) tool that accepts a C program as input and automatically synthesizes it into a hybrid system. The hybrid system comprises an embedded processor and custom accelerators that realize user-designated compute-intensive parts of the program with improved throughput and energy efficiency. In this paper, we overview the LegUp framework and describe several recent developments: 1) support for an embedded ARM processor, as is available on Altera's recently released SoC FPGA, 2) HLS support for software parallelization schemes -- pthreads and OpenMP, 3) enhancements to LegUp's core HLS algorithms that raise the quality of the auto-generated hardware, and, 4) a preliminary debugging and verification framework providing C source-level debugging of HLS hardware. Since its first release in 2011, LegUp has been downloaded over 1000 times by groups around the world, providing a powerful platform for new research in high-level synthesis algorithms and embedded systems design.

...read moreread less

31 citations

Proceedings Article•DOI•

Experiences with soft-core processor design

[...]

Franjo Plavec¹, Blair Fort¹, Zvonko G. Vranesic¹, Stephen J. Brown¹•Institutions (1)

University of Toronto¹

04 Apr 2005

TL;DR: The UT Nios implementation of Altera's Nios architecture is described and a benchmark set appropriate for soft-core processors is defined.

...read moreread less

Abstract: Soft-core processors exploit the flexibility of field programmable gate arrays (FPGAs) to allow a system designer to customize the processor to the needs of a target application. This paper describes the UT Nios implementation of Altera's Nios architecture. A benchmark set appropriate for soft-core processors is defined. Using the benchmark set, the performance of UT Nios is explored and compared with the commercial implementation.

...read moreread less

25 citations

Cited by

PDF

Open Access

More filters

Book•

IEEE transactions on computer-aided design of integrated circuits and systems : a publication of the IEEE Circuits and Systems Society

[...]

Ieee Circuits

01 Jan 1982

729 citations

Journal Article•DOI•

Reconfigurable Computing Architectures

[...]

Russell Tessier¹, Kenneth L. Pocek², André DeHon³•Institutions (3)

University of Massachusetts Amherst¹, Intel², University of Pennsylvania³

15 Apr 2015

TL;DR: This work surveys the field of reconfigurable computing, providing a guide to the body-of-knowledge accumulated in architecture, compute models, tools, run-time reconfiguration, and applications.

...read moreread less

Abstract: Reconfigurable architectures can bring unique capabilities to computational tasks. They offer the performance and energy efficiency of hardware with the flexibility of software. In some domains, they are the only way to achieve the required, real-time performance without fabricating custom integrated circuits. Their functionality can be upgraded and repaired during their operational lifecycle and specialized to the particular instance of a task. We survey the field of reconfigurable computing, providing a guide to the body-of-knowledge accumulated in architecture, compute models, tools, run-time reconfiguration, and applications.

...read moreread less

178 citations

Proceedings Article•DOI•

Predictable programming on a precision timed architecture

[...]

Ben Lickly¹, Isaac Liu¹, Sungjun Kim², Hiren D. Patel¹, Stephen A. Edwards², Edward A. Lee¹ - Show less +2 more•Institutions (2)

University of California, Berkeley¹, Columbia University²

19 Oct 2008

TL;DR: A SPARC-based processor with predictable timing and instruction-set extensions that provide precise timing control is described, and the effectiveness of this precision-timed (PRET) architecture is demonstrated through example applications running in simulation.

...read moreread less

Abstract: In a hard real-time embedded system, the time at which a result is computed is as important as the result itself. Modern processors go to extreme lengths to ensure their function is predictable, but have abandoned predictable timing in favor of average-case performance. Real-time operating systems provide timing-aware scheduling policies, but without precise worst-case execution time bounds they cannot provide guarantees.We describe an alternative in this paper: a SPARC-based processor with predictable timing and instruction-set extensions that provide precise timing control. Its pipeline executes multiple, independent hardware threads to avoid costly, unpredictable bypassing, and its exposed memory hierarchy provides predictable latency. We demonstrate the effectiveness of this precision-timed (PRET) architecture through example applications running in simulation.

...read moreread less

171 citations

Proceedings Article•DOI•

Spatial: a language and compiler for application accelerators

[...]

David Koeplinger¹, Matthew Feldman¹, Raghu Prabhakar¹, Yaqi Zhang¹, Stefan Hadjis¹, Ruben Fiszel², Tian Zhao¹, Luigi Nardi¹, Ardavan Pedram¹, Christos Kozyrakis¹, Kunle Olukotun¹ - Show less +7 more•Institutions (2)

Stanford University¹, École Polytechnique Fédérale de Lausanne²

11 Jun 2018

TL;DR: This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning.

...read moreread less

Abstract: Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for productivity and are difficult to target from higher level languages. HLS tools are more productive, but offer an ad-hoc mix of software and hardware abstractions which make performance optimizations difficult. In this work, we describe a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators. We describe Spatial's hardware-centric abstractions for both programmer productivity and design performance, and summarize the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning. We demonstrate the language's ability to target FPGAs and CGRAs from common source code. We show that applications written in Spatial are, on average, 42% shorter and achieve a mean speedup of 2.9x over SDAccel HLS when targeting a Xilinx UltraScale+ VU9P FPGA on an Amazon EC2 F1 instance.

...read moreread less

154 citations

Proceedings Article•DOI•

Efficient multi-ported memories for FPGAs

[...]

Charles Eric LaForest¹, J. Gregory Steffan¹•Institutions (1)

University of Toronto¹

21 Feb 2010

TL;DR: A new design is introduced that efficiently combines block RAMs into multi-ported memories with arbitrary numbers of read and write ports and true random access to any memory location, while achieving significantly higher operating frequencies than conventional approaches.

...read moreread less

Abstract: Multi-ported memories are challenging to implement with FPGAs since the provided block RAMs typically have only two ports. We present a thorough exploration of the design space of FPGA-based soft multi-ported memories by evaluating conventional solutions to this problem, and introduce a new design that efficiently combines block RAMs into multi-ported memories with arbitrary numbers of read and write ports and true random access to any memory location, while achieving significantly higher operating frequencies than conventional approaches. For example we build a 256-location, 32-bit, 12-ported (4-write, 8-read) memory that operates at 281 MHz on Altera Stratix III FPGAs while consuming an area equivalent to 3679 ALMs: a 43% speed improvement and 84% area reduction over a pure ALM implementation, and a 61% speed improvement over a pure "multipumped" implementation, although the pure multipumped implementation is 7.2x smaller.

...read moreread less

132 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127

Collapse