Home
/
Authors
/
Harvinder Singh

Author

Harvinder Singh

Bio: Harvinder Singh is an academic researcher from STMicroelectronics. The author has contributed to research in topics: Multiplexer & Bandwidth (signal processing). The author has an hindex of 3, co-authored 4 publications receiving 140 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems

[...]

Giuseppe Desoli¹, Nitin Chawla¹, Thomas Boesch¹, Surinder Pal Singh¹, Elio Guidetti¹, Fabio De Ambroggi¹, Tommaso Majo¹, Paolo Zambotti¹, Manuj Ayodhyawasi¹, Harvinder Singh¹, Nalin Aggarwal¹ - Show less +7 more•Institutions (1)

STMicroelectronics¹

01 Feb 2017

TL;DR: A booming number of computer vision, speech recognition, and signal processing applications, are increasingly benefiting from the use of deep convolutional neural networks, with a DCNN significantly outperforming classical approaches for the first time.

...read moreread less

Abstract: A booming number of computer vision, speech recognition, and signal processing applications, are increasingly benefiting from the use of deep convolutional neural networks (DCNN) stemming from the seminal work of Y. LeCun et al. [1] and others that led to winning the 2012 ImageNet Large Scale Visual Recognition Challenge with AlexNet [2], a DCNN significantly outperforming classical approaches for the first time. In order to deploy these technologies in mobile and wearable devices, hardware acceleration plays a critical role for real-time operation with very limited power consumption and with embedded memory overcoming the limitations of fully programmable solutions.

...read moreread less

143 citations

Patent•

A minimal area integrated circuit implementation of a polyphase interpolation filter using coefficients symmetry

[...]

Aditya Bhuvanagiri¹, Harvinder Singh¹, Rakesh Malik¹, Nitin Chawla¹•Institutions (1)

STMicroelectronics¹

29 Aug 2005

TL;DR: In this article, a minimal area integrated polyphase interpolation filter using a symmetry of coefficients for a channel of input data is proposed, which includes an input interface block for synchronizing the input signal to a first internal clock signal; a memory block for providing multiple delayed output signals; a multiplexer input interface for outputting a selected plurality of signals for generating mirror image coefficient sets in response to a second set of internal control signals.

...read moreread less

Abstract: A minimal area integrated polyphase interpolation filter uses a symmetry of coefficients for a channel of input data. The filter includes an input interface block for synchronizing the input signal to a first internal clock signal; a memory block for providing multiple delayed output signals; a multiplexer input interface block for outputting a selected plurality of signals for generating mirror image coefficient sets in response to a second set of internal control signals, a coefficient block for generating mirror image and/or symmetric coefficient sets, and to output a plurality of filtered signals, an output multiplexer block for performing selection, gain control and data width control on said plurality of filtered signals, an output register block synchronizing the filtered signals, and a control block generating clock signals for realization of the filter and to delay between two channels to access a coefficient set, thereby minimizing hardware in the filter.

...read moreread less

10 citations

Proceedings Article•DOI•

A 1GHz digital channel multiplexer for satellite OutDoor Unit based on a 65nm CMOS transceiver

[...]

Pierre Busson¹, Nitin Chawla¹, J. Bach¹, S. Le Tual¹, Harvinder Singh¹, V. Gupta¹, Pascal Urard¹ - Show less +3 more•Institutions (1)

STMicroelectronics¹

29 May 2009

TL;DR: Introducing digital processing in this RF dominated application to sort and assemble user channels removes the need of in-band SAW filters, offers full flexibility of channel selection, and supports up to 50 users simultaneously.

...read moreread less

Abstract: Satellite digital TV broadcast reception today requires a multiple Low-Noise Block (multi-LNB) head on the dish, as well as a multi-tuner set-top box (STB). Connecting multiple OutDoor Units (ODU) to the set-top boxes traditionally needed multiple cables. A first step has been achieved with so-called satellite Channel Stacking Switch™ technology (CSS), able to deliver the full suite of TV programs to all STBs in a single home through a reduced number of cables. However, this pure analog/RF technology does not offer enough flexibility in terms of the number of simultaneous users (12 users maximum) and requires multiple external components like SAW filters, increasing significantly the cost of the solution [1]. Introducing digital processing in this RF dominated application to sort and assemble user channels removes the need of in-band SAW filters, offers full flexibility of channel selection, and supports up to 50 users simultaneously.

...read moreread less

3 citations

Journal Article•DOI•

A 1 GHz Digital Channel Multiplexer for Satellite Outdoor Unit

[...]

Pierre Busson¹, Nitin Chawla, J. Bach¹, S. Le Tual¹, Harvinder Singh, V. Gupta, Pascal Urard¹ - Show less +3 more•Institutions (1)

STMicroelectronics¹

01 Jan 2010-IEEE Journal of Solid-state Circuits

TL;DR: In this article, a digital channel multiplexer for satellite outdoor unit running at 1 GHz clock frequency is implemented in 65 nm CMOS mixed oxide dual voltage technology, based on a 1 GS/s digital signal processor (DSP) approach with 500 MHz input and output bandwidth.

...read moreread less

Abstract: A digital channel multiplexer for satellite outdoor unit running at 1 GHz clock frequency is implemented in 65 nm CMOS mixed oxide dual voltage technology. This multiplexer, based on a 1 GS/s digital signal processor (DSP) approach with 500 MHz input and output bandwidth, embeds two 8 bit 1 GS/s analog-digital converters (ADCs) and two 8 bit 1 GS/s digital-analog converter (DACs). It consumes less that 1022 mW at ambient temperature while achieving noise rejection up to 42.5 dB on a single tone, and > 37 dB on modulated satellite channels.

...read moreread less

2 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Neuro-Inspired Computing With Emerging Nonvolatile Memorys

[...]

Shimeng Yu¹•Institutions (1)

Arizona State University¹

23 Jan 2018

TL;DR: This comprehensive review summarizes state of the art, challenges, and prospects of the neuro-inspired computing with emerging nonvolatile memory devices and presents a device-circuit-algorithm codesign methodology to evaluate the impact of nonideal device effects on the system-level performance.

...read moreread less

Abstract: This comprehensive review summarizes state of the art, challenges, and prospects of the neuro-inspired computing with emerging nonvolatile memory devices. First, we discuss the demand for developing neuro-inspired architecture beyond today’s von-Neumann architecture. Second, we summarize the various approaches to designing the neuromorphic hardware (digital versus analog, spiking versus nonspiking, online training versus offline training) and discuss why emerging nonvolatile memory is attractive for implementing the synapses in the neural network. Then, we discuss the desired device characteristics of the synaptic devices (e.g., multilevel states, weight update nonlinearity/asymmetry, variation/noise), and survey a few representative material systems and device prototypes reported in the literature that show the analog conductance tuning. These candidates include phase change memory, resistive memory, ferroelectric memory, floating-gate transistors, etc. Next, we introduce the crossbar array architecture to accelerate the weighted sum and weight update operations that are commonly used in the neuro-inspired machine learning algorithms, and review the recent progresses of array-level experimental demonstrations for pattern recognition tasks. In addition, we discuss the peripheral neuron circuit design issues and present a device-circuit-algorithm codesign methodology to evaluate the impact of nonideal device effects on the system-level performance (e.g., learning accuracy). Finally, we give an outlook on the customization of the learning algorithms for efficient hardware implementation.

...read moreread less

730 citations

Journal Article•DOI•

The era of hyper-scaling in electronics

[...]

Sayeef Salahuddin¹, Kai Ni², Suman Datta²•Institutions (2)

University of California, Berkeley¹, University of Notre Dame²

01 Aug 2018

TL;DR: This Perspective argues that electronics is poised to enter a new era of scaling – hyper-scaling – driven by advances in beyond-Boltzmann transistors, embedded non-volatile memories, monolithic three-dimensional integration and heterogeneous integration techniques.

...read moreread less

Abstract: In the past five decades, the semiconductor industry has gone through two distinct eras of scaling: the geometric (or classical) scaling era and the equivalent (or effective) scaling era. As transistor and memory features approach 10 nanometres, it is apparent that room for further scaling in the horizontal direction is running out. In addition, the rise of data abundant computing is exacerbating the interconnect bottleneck that exists in conventional computing architecture between the compute cores and the memory blocks. Here we argue that electronics is poised to enter a new, third era of scaling — hyper-scaling — in which resources are added when needed to meet the demands of data abundant workloads. This era will be driven by advances in beyond-Boltzmann transistors, embedded non-volatile memories, monolithic three-dimensional integration and heterogeneous integration techniques. This Perspective argues that electronics is poised to enter a new era of scaling – hyper-scaling – driven by advances in beyond-Boltzmann transistors, embedded non-volatile memories, monolithic three-dimensional integration, and heterogeneous integration techniques.

...read moreread less

343 citations

Proceedings Article•DOI•

CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices

[...]

Caiwen Ding¹, Siyu Liao², Yanzhi Wang¹, Zhe Li¹, Ning Liu¹, Youwei Zhuo³, Chao Wang³, Xuehai Qian³, Yu Bai⁴, Geng Yuan¹, Xiaolong Ma¹, Yipeng Zhang¹, Jian Tang¹, Qinru Qiu¹, Xue Lin⁵, Bo Yuan² - Show less +12 more•Institutions (5)

Syracuse University¹, City University of New York², University of Southern California³, California State University, Fullerton⁴, Northeastern University⁵

14 Oct 2017

TL;DR: The CirCNN architecture is proposed, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc) and FFT can be used as the key computing kernel which ensures universal and small-footprint implementations.

...read moreread less

Abstract: Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy.To overcome these limitations, this paper proposes CirCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CirCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from $\mathrm {O}(n^{2})$ to $\mathrm {O}(n$ log n) and the storage complexity from $\mathrm {O}(n^{2})$ to O(n), with negligible accuracy loss. Compared to other approaches, CirCNN is distinct due to its mathematical rigor: the DNNs based on CirCNN can converge to the same “effectiveness” as DNNs without compression. We propose the CirCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc In CirCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CIR-CNN in FPGA, ASIC and embedded processors. Our results show that CirCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CirCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.CCS Concepts• Computer systems organization$\rightarrow $ Embedded hardware;

...read moreread less

262 citations

Journal Article•DOI•

UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision

[...]

Jinmook Lee¹, Changhyeon Kim¹, Sanghoon Kang¹, Dongjoo Shin¹, Sangyeob Kim¹, Hoi-Jun Yoo¹ - Show less +2 more•Institutions (1)

KAIST¹

01 Jan 2019-IEEE Journal of Solid-state Circuits

TL;DR: An energy-efficient deep neural network (DNN) accelerator, unified neural processing unit (UNPU), is proposed for mobile deep learning applications and is the first DNN accelerator ASIC that can support fully variable weight bit precision from 1 to 16 bit.

...read moreread less

Abstract: An energy-efficient deep neural network (DNN) accelerator, unified neural processing unit (UNPU), is proposed for mobile deep learning applications. The UNPU can support both convolutional layers (CLs) and recurrent or fully connected layers (FCLs) to support versatile workload combinations to accelerate various mobile deep learning applications. In addition, the UNPU is the first DNN accelerator ASIC that can support fully variable weight bit precision from 1 to 16 bit. It enables the UNPU to operate on the accuracy-energy optimal point. Moreover, the lookup table (LUT)-based bit-serial processing element (LBPE) in the UNPU achieves the energy consumption reduction compared to the conventional fixed-point multiply-and-accumulate (MAC) array by 23.1%, 27.2%, 41%, and 53.6% for the 16-, 8-, 4-, and 1-bit weight precision, respectively. Besides the energy efficiency improvement, the unified DNN core architecture of the UNPU improves the peak performance for CL by 1.15 $\times$ compared to the previous work. It makes the UNPU operate on the lower voltage and frequency for the given DNN to increase energy efficiency. The UNPU is implemented in 65-nm CMOS technology and occupies the $4 \times 4$ mm2 die area. The UNPU can operates from 0.63- to 1.1-V supply voltage with maximum frequency of 200 MHz. The UNPU has peak performance of 345.6 GOPS for 16-bit weight precision and 7372 GOPS for 1-bit weight precision. The wide operating range of UNPU makes the UNPU achieve the power efficiency of 3.08 TOPS/W for 16-bit weight precision and 50.6 TOPS/W for 1-bit weight precision. The functionality of the UNPU is successfully demonstrated on the verification system using ImageNet deep CNN (VGG-16).

...read moreread less

225 citations

Journal Article•DOI•

A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications

[...]

Shouyi Yin¹, Peng Ouyang², Shibin Tang¹, Fengbin Tu¹, Li Xiudong¹, Shixuan Zheng¹, Tianyi Lu¹, Gu Jiangyuan¹, Leibo Liu¹, Shaojun Wei¹ - Show less +6 more•Institutions (2)

Tsinghua University¹, Beihang University²

01 Apr 2018-IEEE Journal of Solid-state Circuits

TL;DR: Thinker is an energy efficient reconfigurable hybrid-NN processor fabricated in 65-nm technology designed to exploit data reuse and guarantee parallel data access, which improves computing throughput and energy efficiency.

...read moreread less

Abstract: Hybrid neural networks (hybrid-NNs) have been widely used and brought new challenges to NN processors. Thinker is an energy efficient reconfigurable hybrid-NN processor fabricated in 65-nm technology. To achieve high energy efficiency, three optimization techniques are proposed. First, each processing element (PE) supports bit-width adaptive computing to meet various bit-widths of neural layers, which raises computing throughput by 91% and improves energy efficiency by $1.93 \times $ on average. Second, PE array supports on-demand array partitioning and reconfiguration for processing different NNs in parallel, which results in 13.7% improvement of PE utilization and improves energy efficiency by $1.11 \times $ . Third, a fused data pattern-based multi-bank memory system is designed to exploit data reuse and guarantee parallel data access, which improves computing throughput and energy efficiency by $1.11 \times $ and $1.17 \times $ , respectively. Measurement results show that this processor achieves 5.09-TOPS/W energy efficiency at most.

...read moreread less

185 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

Collapse