K Swetha Bharati
Bio: K Swetha Bharati is an academic researcher from Indian Institute of Technology Madras. The author has contributed to research in topics: Software pipelining & Assembly language. The author has an hindex of 1, co-authored 1 publications receiving 4 citations.
••03 May 2015
TL;DR: This paper implements OCR and speech recognition on DSP and shows how they can be optimized using fixed point routines and illustrates the optimal usage of DSP resources like MAC units, shifters and software pipelining through assembly code structuring which massively reduces the MIPS consumed by the processor.
Abstract: In this paper, we discuss efficient implementation of machine learning algorithms on DSPs. Specifically, we implement OCR and speech recognition on DSP and show how they can be optimized using fixed point routines. We illustrate the optimal usage of DSP resources like MAC units, shifters and software pipelining through assembly code structuring which massively reduces the MIPS consumed by the processor. We also describe how floating point overheads can be reduced by equivalent fixed point routines for real time implementations. Though the Blackfin-533 DSP is chosen for this illustration, the ideas presented here apply to other fixed point DSPs as well.
01 Jan 2003
TL;DR: Comprehensive in scope, and gentle in approach, this book will help you achieve a thorough grasp of the basics and move gradually to more sophisticated DSP concepts and applications.
Abstract: From the Publisher: This is undoubtedly the most accessible book on digital signal processing (DSP) available to the beginner. Using intuitive explanations and well-chosen examples, this book gives you the tools to develop a fundamental understanding of DSP theory. The author covers the essential mathematics by explaining the meaning and significance of the key DSP equations. Comprehensive in scope, and gentle in approach, the book will help you achieve a thorough grasp of the basics and move gradually to more sophisticated DSP concepts and applications.
01 Jan 2005
01 Apr 2018
TL;DR: In this paper, Gaussian distribution functions are designed using C28x real-time digital signal processor (DSP) that is embedded in the TMS320C2000 modem designed for powerline communication (PLC) at the low voltage distribution end of the smart grid, where numerous devices that generate massive amount of data exist.
Abstract: The smart grid (SG) is a large-scale network and it is an integral part of the Internet of Things (IoT). For a more effective big data analytic in large-scale IoT networks, reliable solutions are being designed such that many real-time decisions will be taken at the edge of the network close to where data is being generated. Gaussian functions are extensively applied in the field of statistical machine learning, pattern recognition, adaptive algorithms for function approximation, etc. It is envisaged that soon, some of these machine learning solutions and other gaussian function based applications that have low computation and low-memory footprint will be deployed for edge analytics in large-scale IoT networks. Hence, it will be of immense benefit if an adaptive, low-cost, method of designing gaussian functions becomes available. In this paper, gaussian distribution functions are designed using C28x real-time digital signal processor (DSP) that is embedded in the TMS320C2000 modem designed for powerline communication (PLC) at the low voltage distribution end of the smart grid, where numerous devices that generate massive amount of data exist. Open-source embedded C programming language is used to program the C28x for real-time gaussian function generation. The designed gaussian waveforms are stored in lookup tables (LUTs) in the C28x embedded DSP, and could be deployed for a variety of applications at the edge of the SG and IoT network. The novelty of the design is that the gaussian functions are designed with a generic, low-cost, fixed-point DSP, different from state of the art in which gaussian functions are designed using expensive arbitrary waveform generators and other specialized circuits. C28x DSP is selected for this design since it is already existing as an embedded DSP in many smart grid applications and in other numerous industrial systems that are part of the large scale IoT network, hence it is envisaged that integration of any gaussian function based solution using this DSP in the smart grid and other IoT systems may not be too challenging.
11 Jun 2022
TL;DR: This paper proposes SpEaC --- a coarse-grained reconfigurable spatial architecture - as an energy-efficient programmable processor for earable applications, which outperforms programmable cores modeled after M4, M7, A53, and HiFi4 DSP by 99.3% and outperforms low power Mali T628 MP6 GPU across all kernels.
Abstract: Earables such as earphones [15, 16, 73], hearing aids , and smart glasses [2, 14] are poised to be a prominent programmable computing platform in the future. In this paper, we ask the question: what kind of programmable hardware would be needed to support earable computing in future? To understand hardware requirements, we propose EarBench, a suite of representative emerging earable applications with diverse sensor-based inputs and computation requirements. Our analysis of EarBench applications shows that, on average, there is a 13.54×-3.97× performance gap between the computational needs of EarBench applications and the performance of the microprocessors that several of today's programmable earable SoCs are based on; more complex microprocessors have unacceptable energy efficiency for Earable applications. Our analysis also shows that EarBench applications are dominated by a small number of digital signal processing (DSP) and machine learning (ML)-based kernels that have significant computational similarity. We propose SpEaC --- a coarse-grained reconfigurable spatial architecture - as an energy-efficient programmable processor for earable applications. SpEaC targets earable applications efficiently using a) a reconfigurable fixed-point multiply-and-add augmented reduction tree-based substrate with support for vectorized complex operations that is optimized for the earable ML and DSP kernel code and b) a tightly coupled control core for executing other code (including non-matrix computation, or non-multiply or add operations in the earable DSP kernel code). Unlike other CGRAs that typically target general-purpose computations, SpEaC substrate is optimized for energy-efficient execution of the earable kernels at the expense of generality. Across all our kernels, SpEaC outperforms programmable cores modeled after M4, M7, A53, and HiFi4 DSP by 99.3×, 32.5×, 14.8×, and 9.8× respectively. At 63 mW in 28 nm, the energy efficiency benefits are 1.55 ×, 9.04×, 68.3 ×, and 32.7 × respectively; energy efficiency benefits are 15.7 × -- 1087 × over a low power Mali T628 MP6 GPU.
TL;DR: An open source high level synthesis fixed-to-floating and floating- to-fixed conversion tool is presented for embedded design, communication systems, and signal processing applications.
Abstract: An open source high level synthesis fixed-to-floating and floating-to-fixed conversion tool is presented for embedded design, communication systems, and signal processing applications. Many systems use a fixed point number system. Fixed point numbers often need to be converted to floating point numbers for higher accuracy, dynamic range, fixed-length transmission limitations or end user requirements. A similar conversion system is needed to convert floating point numbers to fixed point numbers due to the advantages that fixed point numbers offer when compared with floating point number systems, such as compact hardware, reduced verification time and design effort. The latest embedded and SoC designs use both number systems together to improve accuracy or reduce required hardware in the same design. The proposed open source design and verification tool converts fixed point numbers to floating point numbers, and floating point numbers to fixed point numbers using the IEEE-754 floating point number standard. This open source design tool generates HDL code and its test bench that can be implemented in FPGA and VLSI systems. The design can be compiled and simulated using open source Iverilog/GTKWave and verified using Octave. A high level synthesis tool and GUI are designed using C#. The proposed design tool can increase productivity by reducing the design and verification time, as well as reduce the development cost due to the open source nature of the design tool. The proposed design tool can be used as a standalone block generator or implemented into current designs to improve range, accuracy, and reduce the development cost. The generated design has been implemented on Xilinx FPGAs.