Home
/
Authors
/
Tomohiro Ueno

Author

Tomohiro Ueno

Other affiliations: Utsunomiya University

Bio: Tomohiro Ueno is an academic researcher from Tohoku University. The author has contributed to research in topics: Scalability & Data stream mining. The author has an hindex of 6, co-authored 16 publications receiving 77 citations. Previous affiliations of Tomohiro Ueno include Utsunomiya University.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Performance Evaluation of Pipelined Communication Combined with Computation in OpenCL Programming on FPGA

[...]

Norihisa Fujita¹, Ryohei Kobayashi¹, Yoshiki Yamaguchi¹, Tomohiro Ueno, Kentaro Sano, Taisuke Boku¹ - Show less +2 more•Institutions (1)

University of Tsukuba¹

18 May 2020

TL;DR: A Communication Integrated Reconfigurable CompUting System (CIRCUS) is proposed to enable us to utilize high-speed interconnection of FPGAS from OpenCL and makes a fused single pipeline combining the computation and the communication, which hides the communication latency by completely overlapping them.

...read moreread less

Abstract: In recent years, much High Performance Computing (HPC) researchers attract to utilize Field Programmable Gate Arrays (FPGAs) for HPC applications. We can use FPGAs for communication as well as computation thanks to FPGA's I/O capabilities. HPC scientists cannot utilize FPGAs for their applications because of the difficulty of the FPGA development, however High Level Synthesis (HLS) allows them to use with appropriate costs. In this study, we propose a Communication Integrated Reconfigurable CompUting System (CIRCUS) to enable us to utilize high-speed interconnection of FPGAS from OpenCL. CIRCUS makes a fused single pipeline combining the computation and the communication, which hides the communication latency by completely overlapping them. In this paper, we present the detail of the implementation and the evaluation result using two benchmarks: pingpong benchmark and allreduce benchmark.

...read moreread less

20 citations

Journal Article•DOI•

Bandwidth Compression of Floating-Point Numerical Data Streams for FPGA-Based High-Performance Computing

[...]

Tomohiro Ueno¹, Kentaro Sano¹, Satoru Yamamoto¹•Institutions (1)

Tohoku University¹

27 May 2017-ACM Transactions on Reconfigurable Technology and Systems

TL;DR: A hardware-based bandwidth compression technique that can be applied to field-programmable gate array-- (FPGA) based high-performance computation with a logically wider effective memory bandwidth and a multichannel serializer and deserializer that enable applications to use multiple channels of computational data with the bandwidth compression.

...read moreread less

Abstract: Although computational performance is often limited by insufficient bandwidth to/from an external memory, it is not easy to physically increase off-chip memory bandwidth In this study, we propose a hardware-based bandwidth compression technique that can be applied to field-programmable gate array-- (FPGA) based high-performance computation with a logically wider effective memory bandwidth Our proposed hardware approach can boost the performance of FPGA-based stream computations by applying a data compression technique to effectively transfer more data streams To apply this data compression technique to bandwidth compression via hardware, several requirements must first be satisfied, including an acceptable level of compression performance and a sufficiently small hardware footprint Our proposed hardware-based bandwidth compressor utilizes an efficient prediction-based data compression algorithm Moreover, we propose a multichannel serializer and deserializer that enable applications to use multiple channels of computational data with the bandwidth compression The serializer encodes compressed data blocks of multiple channels into a data stream, which is efficiently written to an external memory Based on preliminary evaluation, we define an encoding format considering both high compression ratio and small hardware area As a result, we demonstrate that our area saving bandwidth compressor increases performance of an FPGA-based fluid dynamics simulation by deploying more processing elements to exploit spatial parallelism with the enhanced memory bandwidth

...read moreread less

16 citations

Journal Article•DOI•

Efficient custom computing of fully-streamed lattice boltzmann method on tightly-coupled FPGA cluster

[...]

Kentaro Sano¹, Yoshiaki Kono¹, Hayato Suzuki¹, Ryotaro Chiba¹, Ryo Ito¹, Tomohiro Ueno¹, Kyo Koizumi¹, Satoru Yamamoto¹ - Show less +4 more•Institutions (1)

Tohoku University¹

18 Jun 2014-ACM Sigarch Computer Architecture News

TL;DR: The detailed design of a custom computing machine for fully-streamed LBM computation on multiple FPGAs is presented, and its efficiency is evaluated with prototype implementation.

...read moreread less

Abstract: This paper presents the detailed design of a custom computing machine for fully-streamed LBM computation on multiple FPGAs, and evaluates its efficiency with prototype implementation. We design a unit for completely streamed computation including boundary treatment with a newly introduced cell attribute. Experimental results demonstrate that the proposed machine achieves high utilization of PEs, 99 % of the peak performance, for one and two FPGAs computing a large lattice. This is due to our fully-streamed design to allow all arithmetic units to be efficienly utilized with a constant memory bandwidth, and the architecture to exploit a low-latency accelerator domain network (ADN) of a tightly-coupled FPGA cluster for scalable computation.

...read moreread less

13 citations

Journal Article•DOI•

Scalability analysis of deeply pipelined tsunami simulation with multiple FPGAs

[...]

Antoniette Mondigo¹, Tomohiro Ueno, Kentaro Sano, Hiroyuki Takizawa¹•Institutions (1)

Tohoku University¹

01 May 2019-IEICE Transactions on Information and Systems

TL;DR: This paper presents a scalable architecture of a deeply pipelined stream computing platform, where available parallelism and inter-FPGA link characteristics are investigated to achieve a scaled performance.

...read moreread less

Abstract: Since the hardware resource of a single FPGA is limited, one idea to scale the performance of FPGA-based HPC applications is to expand the design space with multiple FPGAs. This paper presents a scalable architecture of a deeply pipelined stream computing platform, where available parallelism and inter-FPGA link characteristics are investigated to achieve a scaled performance. For a practical exploration of this vast design space, a performance model is presented and verified with the evaluation of a tsunami simulation application implemented on Intel Arria 10 FPGAs. Finally, scalability analysis is performed, where speedup is achieved when increasing the computing pipeline over multiple FPGAs while maintaining the problem size of computation. Performance is scaled with multiple FPGAs; however, performance degradation occurs with insufficient available bandwidth and large pipeline overhead brought by inadequate data stream size. Tsunami simulation results show that the highest scaled performance for 8 cascaded Arria 10 FPGAs is achieved with a single pipeline of 5 stream processing elements (SPEs), which obtained a scaled performance of 2.5 TFlops and a parallel efficiency of 98%, indicating the strong scalability of the multi-FPGA stream computing platform. key words: tsunami simulation, stream computing, scalability, multiple FPGAs, high-performance computing

...read moreread less

11 citations

Book Chapter•DOI•

Comparison of Direct and Indirect Networks for High-Performance FPGA Clusters

[...]

Antoniette Mondigo¹, Tomohiro Ueno, Kentaro Sano, Hiroyuki Takizawa¹•Institutions (1)

Tohoku University¹

01 Apr 2020

TL;DR: This paper introduces a scalable platform of indirectly-connected FPGAs, where its Ethernet-switching network allows flexibly customized inter-FPGA connectivity and demonstrates good performance and scalability for large HPC applications.

...read moreread less

Abstract: As field programmable gate arrays (FPGAs) become a favorable choice in exploring new computing architectures for the post-Moore era, a flexible network architecture for scalable FPGA clusters becomes increasingly important in high performance computing (HPC). In this paper, we introduce a scalable platform of indirectly-connected FPGAs, where its Ethernet-switching network allows flexibly customized inter-FPGA connectivity. However, for certain applications such as in stream computing, it is necessary to establish a connection-oriented datapath with backpressure between FPGAs. Due to the lack of physical backpressure channel in the network, we utilized our existing credit-based network protocol with flow control to provide receiver FPGA awareness and tailored it to minimize overall communication overhead for the proposed framework. To know its performance characteristics, we implemented necessary data transfer hardware on Intel Arria 10 FPGAs, modeled and obtained its communication performance, and compared it to a direct network. Results show that our proposed indirect framework achieves approximately 3% higher effective network bandwidth than our existing direct inter-FPGA network, which demonstrates good performance and scalability for large HPC applications.

...read moreread less

11 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Suitability Analysis of FPGAs for Heterogeneous Platforms in HPC

[...]

Fernando A. Escobar¹, Xin Chang¹, Carlos Valderrama¹•Institutions (1)

University of Mons¹

01 Feb 2016-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A deep survey around state of the art research and implementation of HPC algorithms is performed; features relevant to each family are extracted and list them as key factors to obtain higher performance.

...read moreread less

Abstract: High performance computing (HPC) systems currently integrate several resources such as multi-cores (CPUs), graphic processing units (GPUs) and reconfigurable logic devices, like field programmable gate arrays (FPGAs). The role of the latter two has traditionally being confined to act as secondary accelerators rather than as main execution units. We perform a deep survey around state of the art research and implementation of HPC algorithms; we extract features relevant to each family and list them as key factors to obtain higher performance. Due to the broad spectra of the survey we only include the most complete references found. We provide a general classification of the 13 HPC families with respect to their needs and suitability for hardware implementation. In addition, we present an analysis based on current and future technology availability as well as in particular aspects identified in the survey. Finally we list general guidelines and opportunities to be accounted for in future heterogeneous designs that employ FPGAs for HPC.

...read moreread less

41 citations

Journal Article•DOI•

FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks

[...]

Kentaro Sano¹, Satoru Yamamoto¹•Institutions (1)

Tohoku University¹

01 Oct 2017-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper presents an architecture and design for scalable fluid simulation based on data-flow computing with a state-of-the-art FPGA and introduces spatial and temporal parallelism to further scale the performance by adding more stream processing elements (SPEs) in an array.

...read moreread less

Abstract: High-performance and low-power computation is required for large-scale fluid dynamics simulation. Due to the inefficient architecture and structure of CPUs and GPUs, they now have a difficulty in improving power efficiency for the target application. Although FPGAs become promising alternatives for power-efficient and high-performance computation due to their new architecture having floating-point (FP) DSP blocks, their relatively narrow memory bandwidth requires an appropriate way to fully exploit the advantage. This paper presents an architecture and design for scalable fluid simulation based on data-flow computing with a state-of-the-art FPGA. To exploit available hardware resources including FP DSPs, we introduce spatial and temporal parallelism to further scale the performance by adding more stream processing elements (SPEs) in an array. Performance modeling and prototype implementation allow us to explore the design space for both the existing Altera Arria10 and the upcoming Intel Stratix10 FPGAs. We demonstrate that Arria10 10AX115 FPGA achieves 519 GFlops at 9.67 GFlops/W only with a stream bandwidth of 9.0 GB/s, which is 97.9 percent of the peak performance of 18 implemented SPEs. We also estimate that Stratix10 FPGA can scale up to 6844 GFlops by combining spatial and temporal parallelism adequately.

...read moreread less

30 citations

Development of a MEMS-based acoustic energy harvester

[...]

Stephen Horowitz

01 Jan 2004

19 citations

Journal Article•DOI•

Bandwidth Compression of Floating-Point Numerical Data Streams for FPGA-Based High-Performance Computing

[...]

Tomohiro Ueno¹, Kentaro Sano¹, Satoru Yamamoto¹•Institutions (1)

Tohoku University¹

27 May 2017-ACM Transactions on Reconfigurable Technology and Systems

...read moreread less

16 citations

Journal Article•DOI•

Analyses of whispering-gallery modes in small resonators

[...]

Haiyong Quan¹, Zhixiong Guo¹•Institutions (1)

Rutgers University¹

01 Jul 2009-Journal of Micro-nanolithography Mems and Moems

TL;DR: In this article, the wave equation is solved by using the method of separation of variables based on the eigenvalue technique, and the resonance frequencies as well as the E-field distributions in two exemplary small resonators are presented for a variety of modes.

...read moreread less

Abstract: Whispering-gallery (WG) modes in photonic microdevices made of dielectric circularly planar resonators are analyzed. The wave equation is solved by using the method of separation of variables based on the eigenvalue technique. The resonant frequency at an azimuthal mode is decided by iteration using the bisection method based on the continuity conditions at the resonator peripheral boundary. The radial mode is determined by the critical points of the field intensity profile in the radial direction via the first derivative test. The resonance frequencies as well as the E-field distributions in two exemplary small resonators are presented for a variety of modes. Comparison with numerical predictions is conducted, and a good agreement is found. The geometric optics method is found inappropriate for small resonators.

...read moreread less

14 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Collapse