scispace - formally typeset
S

S. Nalesh

Researcher at Indian Institute of Science

Publications -  21
Citations -  216

S. Nalesh is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Fast Fourier transform & Hardware acceleration. The author has an hindex of 8, co-authored 19 publications receiving 139 citations. Previous affiliations of S. Nalesh include Cochin University of Science and Technology.

Papers
More filters
Journal ArticleDOI

High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture

TL;DR: A unified architecture named UniWiG is proposed, where both Winograd-based convolution and GEMM can be accelerated using the same set of processing elements, which leads to efficient utilization of FPGA hardware resources while computing all layers in the CNN.
Journal ArticleDOI

A Hardware Architecture for Radial Basis Function Neural Network Classifier

TL;DR: A flexible and scalable hardware accelerator for realization of classification using RBFNN, which puts no limitation on the dimension of the input data is developed and comparison of results shows that scalability of the hardware architecture makes it favorable solution for classification of very large data sets.
Proceedings ArticleDOI

UniWiG: Unified Winograd-GEMM Architecture for Accelerating CNN on FPGAs

TL;DR: A unified architecture named UniWiG is proposed, where both Winograd based convolution and general matrix multiplication (GEMM) can be accelerated using the same set of processing elements, which enables efficient utilization of FPGA hardware resources for accelerating all the layers in the CNNs.
Proceedings ArticleDOI

High throughput, low latency, memory optimized 64K point FFT architecture using novel radix-4 butterfly unit

TL;DR: A fully parallel 64K point radix-44 FFT processor that shows significant reduction in intermediate memory but with increased hardware complexity and reduced latency with comparable throughput and area is proposed.
Proceedings ArticleDOI

Micro-architectural Enhancements in Distributed Memory CGRAs for LU and QR Factorizations

TL;DR: This paper carries out extensive micro-architectural exploration for accelerating core kernels like Matrix Multiplication (MM) (BLAS-3) for LU and QR factorizations and achieves up to 8x speed-up for MM in a CGRA environment.