scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Implementation of machine learning applications on a fixed-point DSP

03 May 2015-pp 1458-1463
TL;DR: This paper implements OCR and speech recognition on DSP and shows how they can be optimized using fixed point routines and illustrates the optimal usage of DSP resources like MAC units, shifters and software pipelining through assembly code structuring which massively reduces the MIPS consumed by the processor.
Abstract: In this paper, we discuss efficient implementation of machine learning algorithms on DSPs. Specifically, we implement OCR and speech recognition on DSP and show how they can be optimized using fixed point routines. We illustrate the optimal usage of DSP resources like MAC units, shifters and software pipelining through assembly code structuring which massively reduces the MIPS consumed by the processor. We also describe how floating point overheads can be reduced by equivalent fixed point routines for real time implementations. Though the Blackfin-533 DSP is chosen for this illustration, the ideas presented here apply to other fixed point DSPs as well.
Citations
More filters
Book
01 Jan 2003
TL;DR: Comprehensive in scope, and gentle in approach, this book will help you achieve a thorough grasp of the basics and move gradually to more sophisticated DSP concepts and applications.
Abstract: From the Publisher: This is undoubtedly the most accessible book on digital signal processing (DSP) available to the beginner. Using intuitive explanations and well-chosen examples, this book gives you the tools to develop a fundamental understanding of DSP theory. The author covers the essential mathematics by explaining the meaning and significance of the key DSP equations. Comprehensive in scope, and gentle in approach, the book will help you achieve a thorough grasp of the basics and move gradually to more sophisticated DSP concepts and applications.

162 citations

Book
01 Jan 2005

121 citations

Proceedings ArticleDOI
01 Apr 2018
TL;DR: In this paper, Gaussian distribution functions are designed using C28x real-time digital signal processor (DSP) that is embedded in the TMS320C2000 modem designed for powerline communication (PLC) at the low voltage distribution end of the smart grid, where numerous devices that generate massive amount of data exist.
Abstract: The smart grid (SG) is a large-scale network and it is an integral part of the Internet of Things (IoT). For a more effective big data analytic in large-scale IoT networks, reliable solutions are being designed such that many real-time decisions will be taken at the edge of the network close to where data is being generated. Gaussian functions are extensively applied in the field of statistical machine learning, pattern recognition, adaptive algorithms for function approximation, etc. It is envisaged that soon, some of these machine learning solutions and other gaussian function based applications that have low computation and low-memory footprint will be deployed for edge analytics in large-scale IoT networks. Hence, it will be of immense benefit if an adaptive, low-cost, method of designing gaussian functions becomes available. In this paper, gaussian distribution functions are designed using C28x real-time digital signal processor (DSP) that is embedded in the TMS320C2000 modem designed for powerline communication (PLC) at the low voltage distribution end of the smart grid, where numerous devices that generate massive amount of data exist. Open-source embedded C programming language is used to program the C28x for real-time gaussian function generation. The designed gaussian waveforms are stored in lookup tables (LUTs) in the C28x embedded DSP, and could be deployed for a variety of applications at the edge of the SG and IoT network. The novelty of the design is that the gaussian functions are designed with a generic, low-cost, fixed-point DSP, different from state of the art in which gaussian functions are designed using expensive arbitrary waveform generators and other specialized circuits. C28x DSP is selected for this design since it is already existing as an embedded DSP in many smart grid applications and in other numerous industrial systems that are part of the large scale IoT network, hence it is envisaged that integration of any gaussian function based solution using this DSP in the smart grid and other IoT systems may not be too challenging.

8 citations


Cites background from "Implementation of machine learning ..."

  • ...The approximate gaussian function is useful for DSPs that have low memory and low computing power, synonymous with DSPs that are always available at the edges of large scale network....

    [...]

  • ...Future consideration for this method of gaussian distribution function design for real-time DSPs may include devising a method by which this method can be improved to yield an improved gaussian distribution function bell shape with an even lower statistical MSE value....

    [...]

  • ...This indicates that the design of gaussian distribution functions for low-cost, low-computing power and low memory DSPs discussed in this paper could be applied for a lot of edge analytic applications in the SG and other large-scale networks....

    [...]

  • ...With the advent of edge analytics, much research is ongoing so that advanced analytics, including machine learning applications could be shifted to the edge of the network using low memory DSPs [8]....

    [...]

  • ...The case described in [18] regarding an approximate gaussian distribution function would be appropriate for edge analytics in SG and IoT, since memory and computational power is often limited in DSPs at the edges of most large-scale networks such as the SG....

    [...]

Proceedings ArticleDOI
11 Jun 2022
TL;DR: This paper proposes SpEaC --- a coarse-grained reconfigurable spatial architecture - as an energy-efficient programmable processor for earable applications, which outperforms programmable cores modeled after M4, M7, A53, and HiFi4 DSP by 99.3% and outperforms low power Mali T628 MP6 GPU across all kernels.
Abstract: Earables such as earphones [15, 16, 73], hearing aids [28], and smart glasses [2, 14] are poised to be a prominent programmable computing platform in the future. In this paper, we ask the question: what kind of programmable hardware would be needed to support earable computing in future? To understand hardware requirements, we propose EarBench, a suite of representative emerging earable applications with diverse sensor-based inputs and computation requirements. Our analysis of EarBench applications shows that, on average, there is a 13.54×-3.97× performance gap between the computational needs of EarBench applications and the performance of the microprocessors that several of today's programmable earable SoCs are based on; more complex microprocessors have unacceptable energy efficiency for Earable applications. Our analysis also shows that EarBench applications are dominated by a small number of digital signal processing (DSP) and machine learning (ML)-based kernels that have significant computational similarity. We propose SpEaC --- a coarse-grained reconfigurable spatial architecture - as an energy-efficient programmable processor for earable applications. SpEaC targets earable applications efficiently using a) a reconfigurable fixed-point multiply-and-add augmented reduction tree-based substrate with support for vectorized complex operations that is optimized for the earable ML and DSP kernel code and b) a tightly coupled control core for executing other code (including non-matrix computation, or non-multiply or add operations in the earable DSP kernel code). Unlike other CGRAs that typically target general-purpose computations, SpEaC substrate is optimized for energy-efficient execution of the earable kernels at the expense of generality. Across all our kernels, SpEaC outperforms programmable cores modeled after M4, M7, A53, and HiFi4 DSP by 99.3×, 32.5×, 14.8×, and 9.8× respectively. At 63 mW in 28 nm, the energy efficiency benefits are 1.55 ×, 9.04×, 68.3 ×, and 32.7 × respectively; energy efficiency benefits are 15.7 × -- 1087 × over a low power Mali T628 MP6 GPU.

1 citations

Journal ArticleDOI
TL;DR: An open source high level synthesis fixed-to-floating and floating- to-fixed conversion tool is presented for embedded design, communication systems, and signal processing applications.
Abstract: An open source high level synthesis fixed-to-floating and floating-to-fixed conversion tool is presented for embedded design, communication systems, and signal processing applications. Many systems use a fixed point number system. Fixed point numbers often need to be converted to floating point numbers for higher accuracy, dynamic range, fixed-length transmission limitations or end user requirements. A similar conversion system is needed to convert floating point numbers to fixed point numbers due to the advantages that fixed point numbers offer when compared with floating point number systems, such as compact hardware, reduced verification time and design effort. The latest embedded and SoC designs use both number systems together to improve accuracy or reduce required hardware in the same design. The proposed open source design and verification tool converts fixed point numbers to floating point numbers, and floating point numbers to fixed point numbers using the IEEE-754 floating point number standard. This open source design tool generates HDL code and its test bench that can be implemented in FPGA and VLSI systems. The design can be compiled and simulated using open source Iverilog/GTKWave and verified using Octave. A high level synthesis tool and GUI are designed using C#. The proposed design tool can increase productivity by reducing the design and verification time, as well as reduce the development cost due to the open source nature of the design tool. The proposed design tool can be used as a standalone block generator or implemented into current designs to improve range, accuracy, and reduce the development cost. The generated design has been implemented on Xilinx FPGAs.

Cites background from "Implementation of machine learning ..."

  • ...Most embedded systems, System-on-Chip (SoC) and transmission systems are implemented using either fixed point, floating point or hybrid number systems wherein fixed [1] [2] and floating point numbers [3] [4] can be used together in the same chip [5]-[7]....

    [...]

References
More filters
01 Jan 2009
TL;DR: A method to implement in FPGA (Field Programmable Gate Array) circuits different approximation of the sigmoid function is proposed, with the major benefit in the possibility to design neural networks by means of predefined block systems created in System Generator environment.
Abstract: In this paper, is proposed a method to implement in FPGA (Field Programmable Gate Array) circuits different approximation of the sigmoid function. Three previously published piecewise linear and one piecewise second-order approximation are analyzed from point of view of hardware resources utilization, induced errors caused by the approximation function and bits representation, power consumption and speed processing. The major benefit of the proposed method resides in the possibility to design neural networks by means of predefined block systems created in System Generator environment and the possibility to create a higher level design tools used to implement neural networks in logical circuits.

38 citations


"Implementation of machine learning ..." refers methods in this paper

  • ...We have therefore used Allipi approximation of the sigmoid function [12] as given below....

    [...]

Book
01 Jan 2007
TL;DR: Neither the publisher nor author shall be liable for any loss of profi t or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Abstract: No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifi cally disclaim any implied warranties of merchant-ability or fi tness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profi t or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

33 citations

Book ChapterDOI
01 Jan 2009
TL;DR: The ability of neural networks to learn from experience in solving the problems of segmentation and character recognition is exploited, and the grouping of blocks into logical units and the determination of reading order within each logical unit helped in reconstructing automatically the document image in an editable format.
Abstract: We present a complete optical character recognition (OCR) system for Tamil magazines/documents. All the standard elements of OCR process like de-skewing, preprocessing, segmentation, character recognition, and reconstruction are implemented. Experience with OCR problems teaches that for most subtasks of OCR, there is no single technique that gives perfect results for every type of document image. We exploit the ability of neural networks to learn from experience in solving the problems of segmentation and character recognition. Text segmentation of Tamil newsprint poses a new challenge owing to its italic-like font type; problems that arise in recognition of touching and close characters are discussed. Character recognition efficiency varied from 94 to 97% for this type of font. The grouping of blocks into logical units and the determination of reading order within each logical unit helped us in reconstructing automatically the document image in an editable format.

8 citations


"Implementation of machine learning ..." refers methods in this paper

  • ...The techniques explained can be adopted to port the systems on other Digital Signal processors as well....

    [...]

  • ...• The emphasis is more on various methods to reduce computational overhead on the DSP and faster implementing, while maintaining the recognition results accurately enough to fulfil the task....

    [...]