Home
/
Authors
/
Hsuan Hsiao

Author

Hsuan Hsiao

Bio: Hsuan Hsiao is an academic researcher from University of Toronto. The author has contributed to research in topics: High-level synthesis & Design space exploration. The author has an hindex of 5, co-authored 13 publications receiving 376 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Survey and Evaluation of FPGA High-Level Synthesis Tools

[...]

Razvan Nane¹, Vlad-Mihai Sima¹, Christian Pilato², Jongsok Choi³, Blair Fort³, Andrew Canis³, Yu Ting Chen³, Hsuan Hsiao³, Stephen J. Brown³, Fabrizio Ferrandi², Jason H. Anderson³, Koen Bertels¹ - Show less +8 more•Institutions (3)

Delft University of Technology¹, Polytechnic University of Milan², University of Toronto³

01 Oct 2016-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This work uses a first-published methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.

...read moreread less

Abstract: High-level synthesis (HLS) is increasingly popular for the design of high-performance and energy-efficient heterogeneous systems, shortening time-to-market and addressing today’s system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hardware functionality. Additionally, HLS is particularly interesting for designing field-programmable gate array circuits, where hardware implementations can be easily refined and replaced in the target device. Recent years have seen much activity in the HLS research community, with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, perform different internal optimizations, and produce results of different quality, even for the very same input description. Hence, it is challenging to compare their performance and understand which is the best for the hardware to be implemented. We present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research community. We also present a first-published methodology to evaluate different HLS tools. We use our methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.

...read moreread less

433 citations

Proceedings Article•DOI•

uGEMM: unary computing architecture for GEMM applications

[...]

Di Wu¹, Jingjie Li¹, Ruokai Yin¹, Hsuan Hsiao², Younghyun Kim¹, Joshua San Miguel¹ - Show less +2 more•Institutions (2)

University of Wisconsin-Madison¹, University of Toronto²

30 May 2020

TL;DR: In this article, an area and energy-efficient unary general matrix multiplication (GEMM) architecture is proposed, which relaxes previously-imposed constraints on input bit streams, such as low correlation and long stream length.

...read moreread less

Abstract: General matrix multiplication (GEMM) is universal in various applications, such as signal processing, machine learning, and computer vision. Conventional GEMM hardware architectures based on binary computing exhibit low area and energy efficiency as they scale due to the spatial nature of number representation and computing. Unary computing, on the other hand, can be performed with extremely simple processing units, often just with a single logic gate. But currently there exist no efficient architectures for unary GEMM. In this paper, we present uGEMM, an area- and energy-efficient unary GEMM architecture enabled by novel arithmetic units. The proposed design relaxes previously-imposed constraints on input bit streams---low correlation and long stream length---and achieves superior area and energy efficiency over existing unary systems. Furthermore, uGEMM's output bit streams exhibit higher accuracy and faster convergence, enabling dynamic energy-accuracy scaling on resource-constrained systems.

...read moreread less

36 citations

Proceedings Article•DOI•

The EH model: early design space exploration of intermittent processor architectures

[...]

Joshua San Miguel¹, Karthik Ganesan², Mario Badr², Chunqiu Xia², Rose Li², Hsuan Hsiao², Natalie Enright Jerger² - Show less +3 more•Institutions (2)

University of Wisconsin-Madison¹, University of Toronto²

20 Oct 2018

TL;DR: The EH model is proposed, which characterizes an intermittent system's ability to maximize how much of its available energy is spent on useful processor execution and parametrizes the energy costs associated with intermittent execution to allow an intuitive understanding of how forward progress can change.

...read moreread less

Abstract: Energy-harvesting devices---which operate solely on energy collected from their environment---have brought forth a new paradigm of intermittent computing. These devices succumb to frequent power outages that would cause conventional systems to be stuck in a perpetual loop of restarting computation and never making progress. Ensuring forward progress in an intermittent execution model requires saving state in nonvolatile memory (backup) and potentially re-executing from the last saved state upon a power loss (restore). The interplay between spending energy on useful processing and spending energy on these necessary overheads yield unexpected trade-offs. To facilitate early design space exploration, the field of intermittent computing requires better models for 1) generalizing and reasoning about these trade-offs and 2) helping architects and programmers in making early-stage design decisions. We propose the EH model, which characterizes an intermittent system's ability to maximize how much of its available energy is spent on useful processor execution. The model parametrizes the energy costs associated with intermittent execution to allow an intuitive understanding of how forward progress can change. We use the EH model to explore how forward progress is impacted with the frequency of backups and the energy cost of backups and restores. We validate the EH model with hardware measurements on an MSP430 and characterize its parameters via simulation. We also demonstrate how architects and programmers can use the model to explore the design space of intermittent processors, derive insights, and model new optimizations that are unique to intermittent processor architectures.

...read moreread less

23 citations

Book Chapter•DOI•

LegUp High-Level Synthesis

[...]

Andrew Canis¹, Jongsok Choi¹, Blair Fort¹, Bain Syrowik¹, Ruolong Lian¹, Yu Ting Chen¹, Hsuan Hsiao¹, Jeffrey Goeders², Stephen J. Brown¹, Jason H. Anderson¹ - Show less +6 more•Institutions (2)

University of Toronto¹, University of British Columbia²

01 Jan 2016

TL;DR: This section overviews LegUp, its programming model, unique aspects of the tool versus other HLS offerings, and concludes with a case study.

...read moreread less

Abstract: LegUp is a High-level Synthesis tool under active development at the University of Toronto since 2011. The tool is on its fourth public release, is open source and freely downloadable. LegUp has been the subject of over 15 publications and has been downloaded by over 1500 groups from around the world. In this section, we overview LegUp, its programming model, unique aspects of the tool versus other HLS offerings, and conclude with a case study.

...read moreread less

17 citations

Proceedings Article•DOI•

CGRA-ME: An Open-Source Framework for CGRA Architecture and CAD Research : (Invited Paper)

[...]

Jason H. Anderson¹, Rami Beidas¹, Vimal Chacko¹, Hsuan Hsiao¹, Xiaoyi Ling¹, Omar Ragheb¹, Xinyuan Wang¹, Tianyi Yu¹ - Show less +4 more•Institutions (1)

University of Toronto¹

07 Jul 2021

TL;DR: The CGRA-ME as mentioned in this paper is a software framework that enables the modeling and exploration of coarse-grained reconfigurable arrays (CGRAs) architectures, as well as research on CGRA CAD algorithms.

...read moreread less

Abstract: Coarse-grained reconfigurable arrays (CGRAs) are programmable hardware platforms that can be used to realize application-specific accelerators for higher performance and energy efficiency. A CGRA is a 2D array of configurable logic blocks & interconnect, where the logic blocks are typically large & ALU-like, and the interconnect is word-wide. CGRA-ME is a software framework that enables the modelling and exploration of CGRA architectures, as well as research on CGRA CAD algorithms. With CGRA-ME, an architect can specify a CGRA architecture at a high level of abstraction. A set of applications can be mapped onto the architecture to assess the mappability, power, performance and cost. CGRA-ME also allows one to generate synthesizable Verilog RTL for the modelled CGRA, permitting its implementation as an ASIC or FPGA overlay. In this paper, we describe the CGRA-ME framework [5] and overview its capabilities and current limitations. We discuss ongoing and prior research conducted with the framework, as well as outline future plans. We believe CGRA-ME will be a valuable contribution to the community, enabling new research on CGRA CAD & architectures.

...read moreread less

10 citations

Cited by

PDF

Open Access

More filters

Book•

IEEE transactions on computer-aided design of integrated circuits and systems : a publication of the IEEE Circuits and Systems Society

[...]

Ieee Circuits

01 Jan 1982

729 citations

Proceedings Article•DOI•

Spatial: a language and compiler for application accelerators

[...]

David Koeplinger¹, Matthew Feldman¹, Raghu Prabhakar¹, Yaqi Zhang¹, Stefan Hadjis¹, Ruben Fiszel², Tian Zhao¹, Luigi Nardi¹, Ardavan Pedram¹, Christos Kozyrakis¹, Kunle Olukotun¹ - Show less +7 more•Institutions (2)

Stanford University¹, École Polytechnique Fédérale de Lausanne²

11 Jun 2018

TL;DR: This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning.

...read moreread less

Abstract: Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for productivity and are difficult to target from higher level languages. HLS tools are more productive, but offer an ad-hoc mix of software and hardware abstractions which make performance optimizations difficult. In this work, we describe a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators. We describe Spatial's hardware-centric abstractions for both programmer productivity and design performance, and summarize the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning. We demonstrate the language's ability to target FPGAs and CGRAs from common source code. We show that applications written in Spatial are, on average, 42% shorter and achieve a mean speedup of 2.9x over SDAccel HLS when targeting a Xilinx UltraScale+ VU9P FPGA on an Amazon EC2 F1 instance.

...read moreread less

154 citations

Journal Article•DOI•

Are We There Yet? A Study on the State of High-Level Synthesis

[...]

Sakari Lahti¹, Panu Sjovall¹, Jarno Vanne¹, Timo Hämäläinen¹•Institutions (1)

Tampere University of Technology¹

01 May 2019-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: HLS is currently a viable option for fast prototyping and for designs with short time to market and to help close the QoR gap, a survey of literature focused on improving HLS concludes.

...read moreread less

Abstract: To increase productivity in designing digital hardware components, high-level synthesis (HLS) is seen as the next step in raising the design abstraction level. However, the quality of results (QoRs) of HLS tools has tended to be behind those of manual register-transfer level (RTL) flows. In this paper, we survey the scientific literature published since 2010 about the QoR and productivity differences between the HLS and RTL design flows. Altogether, our survey spans 46 papers and 118 associated applications. Our results show that on average, the QoR of RTL flow is still better than that of the state-of-the-art HLS tools. However, the average development time with HLS tools is only a third of that of the RTL flow, and a designer obtains over four times as high productivity with HLS. Based on our findings, we also present a model case study to sum up the best practices in comparative studies between HLS and RTL. The outcome of our case study is also in line with the survey results, as using an HLS tool is seen to increase the productivity by a factor of six. In addition, to help close the QoR gap, we present a survey of literature focused on improving HLS. Our results let us conclude that HLS is currently a viable option for fast prototyping and for designs with short time to market.

...read moreread less

99 citations

Journal Article•DOI•

Software-defined Radios: Architecture, state-of-the-art, and challenges

[...]

Rami Akeela¹, Behnam Dezfouli¹•Institutions (1)

Santa Clara University¹

01 Sep 2018-Computer Communications

TL;DR: In this article, a survey of the state-of-the-art software-defined radio (SDR) platforms in the context of wireless communication protocols is presented, with a focus on programmability, flexibility, portability, and energy efficiency.

...read moreread less

91 citations

Journal Article•DOI•

Transformations of High-Level Synthesis Codes for High-Performance Computing

[...]

Johannes de Fine Licht¹, Maciej Besta¹, Simon Meierhans¹, Torsten Hoefler¹•Institutions (1)

University of Zurich¹

01 May 2021-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A collection of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications, is presented, aiming to establish a common toolbox to guide both performance engineers and compiler engineers in tapping into the performance potential offered by spatial computing architectures using HLS.

...read moreread less

Abstract: Spatial computing architectures promise a major stride in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target spatial computing architectures, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes, due to fundamentally distinct aspects of hardware design, such as programming for deep pipelines, distributed memory resources, and scalable routing. To alleviate this, we present a collection of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. We systematically identify classes of transformations (pipelining, scalability, and memory), the characteristics of their effect on the HLS code and the resulting hardware (e.g., increasing data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip dataflow, allowing for massively parallel architectures. To quantify the effect of various transformations, we cover the optimization process of a sample set of HPC kernels, provided as open source reference codes. We aim to establish a common toolbox to guide both performance engineers and compiler engineers in tapping into the performance potential offered by spatial computing architectures using HLS.

...read moreread less

83 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110

Collapse