S. Nalesh

Researcher at Indian Institute of Science

Publications - 21

Citations - 216

S. Nalesh is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Fast Fourier transform & Hardware acceleration. The author has an hindex of 8, co-authored 19 publications receiving 139 citations. Previous affiliations of S. Nalesh include Cochin University of Science and Technology.

Papers

PDF

Open Access

More filters

Journal ArticleDOI

High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture

S Kala, +3 more

- 03 Oct 2019 -

IEEE Transactions on Very Large Scale In...

TL;DR: A unified architecture named UniWiG is proposed, where both Winograd-based convolution and GEMM can be accelerated using the same set of processing elements, which leads to efficient utilization of FPGA hardware resources while computing all layers in the CNN.

...read moreread less

Journal ArticleDOI

A Hardware Architecture for Radial Basis Function Neural Network Classifier

Mahnaz Mohammadi, +3 more

- 01 Mar 2018 -

IEEE Transactions on Parallel and Distri...

TL;DR: A flexible and scalable hardware accelerator for realization of classification using RBFNN, which puts no limitation on the dimension of the input data is developed and comparison of results shows that scalability of the hardware architecture makes it favorable solution for classification of very large data sets.

...read moreread less

Proceedings ArticleDOI

UniWiG: Unified Winograd-GEMM Architecture for Accelerating CNN on FPGAs

S Kala, +3 more

TL;DR: A unified architecture named UniWiG is proposed, where both Winograd based convolution and general matrix multiplication (GEMM) can be accelerated using the same set of processing elements, which enables efficient utilization of FPGA hardware resources for accelerating all the layers in the CNNs.

...read moreread less

Proceedings ArticleDOI

High throughput, low latency, memory optimized 64K point FFT architecture using novel radix-4 butterfly unit

S Kala, +4 more

TL;DR: A fully parallel 64K point radix-44 FFT processor that shows significant reduction in intermediate memory but with increased hardware complexity and reduced latency with comparable throughput and area is proposed.

...read moreread less

Proceedings ArticleDOI

Micro-architectural Enhancements in Distributed Memory CGRAs for LU and QR Factorizations

Farhad Merchant, +10 more

TL;DR: This paper carries out extensive micro-architectural exploration for accelerating core kernels like Matrix Multiplication (MM) (BLAS-3) for LU and QR factorizations and achieves up to 8x speed-up for MM in a CGRA environment.

...read moreread less