scispace - formally typeset
Search or ask a question
Author

Hantao Huang

Other affiliations: MediaTek
Bio: Hantao Huang is an academic researcher from Nanyang Technological University. The author has contributed to research in topics: Artificial neural network & Speedup. The author has an hindex of 10, co-authored 38 publications receiving 323 citations. Previous affiliations of Hantao Huang include MediaTek.

Papers
More filters
Journal ArticleDOI
Leibin Ni1, Hantao Huang1, Zichuan Liu1, Rajiv V. Joshi2, Hao Yu1 
TL;DR: Based on numerical results for fingerprint matching that is mapped on the proposed RRAM-crossbar, the proposed architecture has shown 2.86x faster speed, 154x better energy efficiency, and 100x smaller area when compared to the same design by CMOS-based ASIC.
Abstract: The recently emerging resistive random-access memory (RRAM) can provide nonvolatile memory storage but also intrinsic computing for matrix-vector multiplication, which is ideal for the low-power and high-throughput data analytics accelerator performed in memory. However, the existing RRAM crossbar--based computing is mainly assumed as a multilevel analog computing, whose result is sensitive to process nonuniformity as well as additional overhead from AD-conversion and I/O. In this article, we explore the matrix-vector multiplication accelerator on a binary RRAM crossbar with adaptive 1-bit-comparator--based parallel conversion. Moreover, a distributed in-memory computing architecture is also developed with the according control protocol. Both memory array and logic accelerator are implemented on the binary RRAM crossbar, where the logic-memory pair can be distributed with the control bus protocol. Experimental results have shown that compared to the analog RRAM crossbar, the proposed binary RRAM crossbar can achieve significant area savings with better calculation accuracy. Moreover, significant speedup can be achieved for matrix-vector multiplication in neural network--based machine learning such that the overall training and testing time can be both reduced. In addition, large energy savings can be also achieved when compared to the traditional CMOS-based out-of-memory computing architecture.

65 citations

Journal ArticleDOI
TL;DR: Results show that compared to the traditional static and centralized energy-management system (EMS), and the recent multiagent EMS using price-demand competition, the proposed uncertainty-aware MG-EMS can achieve up to up to utilization rate improvements and balanced energy allocation improvements.
Abstract: This paper presents a cyber-physical management of smart buildings based on smart-gateway network with distributed and real-time energy data collection and analytics. We consider a building with multiple rooms supplied with one main electricity grid and one additional solar energy grid. Based on smart-gateway network, energy signatures of rooms are first extracted with consideration of uncertainty and further classified as different types of agents. Then, a multiagent minority-game (MG)-based demand-response management is introduced to reduce peak demand on the main electricity grid and also to fairly allocate solar energy on the additional grid. Experiment results show that compared to the traditional static and centralized energy-management system (EMS), and the recent multiagent EMS using price-demand competition, the proposed uncertainty-aware MG-EMS can achieve up to $50\times $ and $145\times $ utilization rate improvements, respectively, regarding to the fairness of solar energy resource allocation. More importantly, the peak load from the main electricity grid is reduced by 38.50% in summer and 15.83% in winter based on benchmarked energy data of building. Lastly, an average 23% uncertainty can be reduced with an according 37% balanced energy allocation improved comparing to the MG-EMS without consideration of uncertainty.

55 citations

Journal ArticleDOI
TL;DR: A three-dimensional (3-D) multilayer CMOS-RRAM accelerator for a tensorized neural network that can be achieved by tensorization with acceptable accuracy loss is introduced.
Abstract: It is a grand challenge to develop highly parallel yet energy-efficient machine learning hardware accelerator. This paper introduces a three-dimensional (3-D) multilayer CMOS-RRAM accelerator for a tensorized neural network. Highly parallel matrix–vector multiplication can be performed with low power in the proposed 3-D multilayer CMOS-RRAM accelerator. The adoption of tensorization can significantly compress the weight matrix of a neural network using much fewer parameters. Simulation results using the benchmark MNIST show that the proposed accelerator has ${\text{1.283}} \times$ speed-up, ${\text{4.276}} \times$ energy-saving, and ${\text{9.339}} \times$ area-saving compared to the 3-D CMOS-ASIC implementation; and ${\text{6.37}}\times$ speed-up and ${\text{2612}} \times$ energy-saving compared to 2-D CPU implementation. In addition, ${\text{14.85}}\times$ model compression can be achieved by tensorization with acceptable accuracy loss.

33 citations

Proceedings ArticleDOI
Wei Han1, Hantao Huang1, Tao Han1
01 Dec 2020
TL;DR: This paper proposes a localization-aware answer prediction network (LaAP-Net) that not only generates the answer to the question but also predicts a bounding box as evidence of the generated answer.
Abstract: Image text carries essential information to understand the scene and perform reasoning. Text-based visual question answering (text VQA) task focuses on visual questions that require reading text in images. Existing text VQA systems generate an answer by selecting from optical character recognition (OCR) texts or a fixed vocabulary. Positional information of text is underused and there is a lack of evidence for the generated answer. As such, this paper proposes a localization-aware answer prediction network (LaAP-Net) to address this challenge. Our LaAP-Net not only generates the answer to the question but also predicts a bounding box as evidence of the generated answer. Moreover, a context-enriched OCR representation (COR) for multimodal fusion is proposed to facilitate the localization task. Our proposed LaAP-Net outperforms existing approaches on three benchmark datasets for the text VQA task by a noticeable margin.

29 citations

Proceedings ArticleDOI
12 Mar 2012
TL;DR: The proposed MG-EMS can significantly reduce the cost and improve the stability of micro-grid of smart buildings with a high utilization rate of solar energy, and reduce peak energy demand for main power-grid by 30.6%.
Abstract: Real-time and decentralized energy resource allocation has become the main feature to develop for the next generation energy management system (EMS). In this paper, a minority game (MG)-based EMS (MG-EMS) is proposed for smart buildings with hybrid energy sources: main energy resource from electrical power-grid and renewable energy resource from solar photovoltaic (PV) cells. Compared to the traditional static and centralized EMS (SC-EMS), and the recent multi-agent-based EMS (MA-EMS) based on price-demand competition, our proposed MG-EMS can achieve up to 51x and 147x utilization rate improvements respectively regarding to the fairness of solar energy resource allocation. In addition, the proposed MG-EMS can also reduce peak energy demand for main power-grid by 30.6%. As such, one can significantly reduce the cost and improve the stability of micro-grid of smart buildings with a high utilization rate of solar energy.

28 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The aim of this survey is to enable researchers and system designers to get insights into the working and applications of CPSs and motivate them to propose novel solutions for making wide-scale adoption of CPS a tangible reality.
Abstract: Cyberphysical systems (CPSs) are new class of engineered systems that offer close interaction between cyber and physical components. The field of CPS has been identified as a key area of research, and CPSs are expected to play a major role in the design and development of future systems. In this paper, we survey recent advancements made in the development and applications of CPSs. We classify the existing research work based on their characteristics and identify the future challenges. We also discuss the examples of prototypes of CPSs. The aim of this survey is to enable researchers and system designers to get insights into the working and applications of CPSs and motivate them to propose novel solutions for making wide-scale adoption of CPS a tangible reality.

653 citations

Journal ArticleDOI
20 Mar 2020
TL;DR: This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures.
Abstract: Domain-specific hardware is becoming a promising topic in the backdrop of improvement slow down for general-purpose processors due to the foreseeable end of Moore’s Law. Machine learning, especially deep neural networks (DNNs), has become the most dazzling domain witnessing successful applications in a wide spectrum of artificial intelligence (AI) tasks. The incomparable accuracy of DNNs is achieved by paying the cost of hungry memory consumption and high computational complexity, which greatly impedes their deployment in embedded systems. Therefore, the DNN compression concept was naturally proposed and widely used for memory saving and compute acceleration. In the past few years, a tremendous number of compression techniques have sprung up to pursue a satisfactory tradeoff between processing efficiency and application accuracy. Recently, this wave has spread to the design of neural network accelerators for gaining extremely high performance. However, the amount of related works is incredibly huge and the reported approaches are quite divergent. This research chaos motivates us to provide a comprehensive survey on the recent advances toward the goal of efficient compression and execution of DNNs without significantly compromising accuracy, involving both the high-level algorithms and their applications in hardware design. In this article, we review the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification. We explain their compression principles, evaluation metrics, sensitivity analysis, and joint-way use. Then, we answer the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures. In the end, we discuss several existing issues such as fair comparison, testing workloads, automatic compression, influence on security, and framework/hardware-level support, and give promising topics in this field and the possible challenges as well. This article attempts to enable readers to quickly build up a big picture of neural network compression and acceleration, clearly evaluate various methods, and confidently get started in the right way.

499 citations

Journal ArticleDOI
TL;DR: To bridge the gap between theory and practicality of CS, different CS acquisition strategies and reconstruction approaches are elaborated systematically in this paper.
Abstract: Compressive Sensing (CS) is a new sensing modality, which compresses the signal being acquired at the time of sensing. Signals can have sparse or compressible representation either in original domain or in some transform domain. Relying on the sparsity of the signals, CS allows us to sample the signal at a rate much below the Nyquist sampling rate. Also, the varied reconstruction algorithms of CS can faithfully reconstruct the original signal back from fewer compressive measurements. This fact has stimulated research interest toward the use of CS in several fields, such as magnetic resonance imaging, high-speed video acquisition, and ultrawideband communication. This paper reviews the basic theoretical concepts underlying CS. To bridge the gap between theory and practicality of CS, different CS acquisition strategies and reconstruction approaches are elaborated systematically in this paper. The major application areas where CS is currently being used are reviewed here. This paper also highlights some of the challenges and research directions in this field.

334 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present a taxonomy of CNN acceleration methods in terms of three levels, i.e. structure level, algorithm level, and implementation level, for CNN architectures compression, algorithm optimization and hardware-based improvement.

233 citations

Journal ArticleDOI
TL;DR: Experimental results show that the proposed hybrid dimensionality reduction method with the ensemble of the base learners contributes more critical features and significantly outperforms individual approaches, achieving high accuracy and low false alarm rates.

200 citations