scispace - formally typeset
Search or ask a question
Author

Kaisheng Ma

Bio: Kaisheng Ma is an academic researcher from Tsinghua University. The author has contributed to research in topics: Computer science & Pruning (decision trees). The author has an hindex of 20, co-authored 73 publications receiving 1513 citations. Previous affiliations of Kaisheng Ma include Pennsylvania State University & Peking University.


Papers
More filters
Proceedings ArticleDOI
01 Oct 2019
TL;DR: A general training framework named self distillation, which notably enhances the performance of convolutional neural networks through shrinking the size of the network rather than aggrandizing it, and can also provide flexibility of depth-wise scalable inference on resource-limited edge devices.
Abstract: Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the applications' boundaries to some accuracy-crucial domains, researchers have been investigating approaches to boost accuracy through either deeper or wider network structures, which brings with them the exponential increment of the computational and storage cost, delaying the responding time. In this paper, we propose a general training framework named self distillation, which notably enhances the performance (accuracy) of convolutional neural networks through shrinking the size of the network rather than aggrandizing it. Different from traditional knowledge distillation - a knowledge transformation methodology among networks, which forces student neural networks to approximate the softmax layer outputs of pre-trained teacher neural networks, the proposed self distillation framework distills knowledge within network itself. The networks are firstly divided into several sections. Then the knowledge in the deeper portion of the networks is squeezed into the shallow ones. Experiments further prove the generalization of the proposed self distillation framework: enhancement of accuracy at average level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as maximum. In addition, it can also provide flexibility of depth-wise scalable inference on resource-limited edge devices. Our codes have been released on github.

550 citations

Proceedings ArticleDOI
09 Mar 2015
TL;DR: The simulation platform in this paper is calibrated using measured results from a fabricated nonvolatile processor and used to explore the design space for a nonVolatile processor with different architectures, different input power sources, and policies for maximizing forward progress.
Abstract: Energy harvesting has been widely investigated as a promising method of providing power for ultra-low-power applications. Such energy sources include solar energy, radio-frequency (RF) radiation, piezoelectricity, thermal gradients, etc. However, the power supplied by these sources is highly unreliable and dependent upon ambient environment factors. Hence, it is necessary to develop specialized systems that are tolerant to this power variation, and also capable of making forward progress on the computation tasks. The simulation platform in this paper is calibrated using measured results from a fabricated nonvolatile processor and used to explore the design space for a nonvolatile processor with different architectures, different input power sources, and policies for maximizing forward progress.

225 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: New metrics of nonvolatile processors to consider energy harvesting factors for the first time are proposed and the nonvolatility processor design from circuit to system level is explored.
Abstract: Energy harvesting is gaining more and more attentions due to its characteristics of ultra-long operation time without maintenance. However, frequent unpredictable power failures from energy harvesters bring performance and reliability challenges to traditional processors. Nonvolatile processors are promising to solve such a problem due to their advantage of zero leakage and efficient backup and restore operations. To optimize the nonvolatile processor design, this paper proposes new metrics of nonvolatile processors to consider energy harvesting factors for the first time. Furthermore, we explore the nonvolatile processor design from circuit to system level. A prototype of energy harvesting nonvolatile processor is set up and experimental results show that the proposed performance metric meets the measured results by less than 6.27% average errors. Finally, the energy consumption of nonvolatile processor is analyzed under different benchmarks.

127 citations

Proceedings ArticleDOI
05 Jun 2016
TL;DR: This work proposes a 2-transistor (2T) FEFET-based nonvolatile memory with separate read and write paths that achieves non-destructive read and lower write power at iso-write speed compared to standard FE-RAM.
Abstract: Ferroelectric FETs (FEFETs) offer intriguing possibilities for the design of low power nonvolatile memories by virtue of their three-terminal structure coupled with the ability of the ferroelectric (FE) material to retain its polarization in the absence of an electric field. Utilizing the distinct features of FEFETs, we propose a 2-transistor (2T) FEFET-based nonvolatile memory with separate read and write paths. With proper co-design at the device, cell and array levels, the proposed design achieves non-destructive read and lower write power at iso-write speed compared to standard FE-RAM. In addition, the FEFET-based memory exhibits high distinguishability with six orders of magnitude difference in the read currents corresponding to the two states. Comparative analysis based on experimentally calibrated models shows significant improvement of access energy-delay. For example, at a fixed write time of 550ps, the write voltage and energy are 58.5% and 67.7% lower than FERAM, respectively. These benefits are achieved with 2.4 times the area overhead. Further exploration of the proposed FEFET memory in energy harvesting nonvolatile processors shows an average improvement of 27% in forward progress over FERAM.

99 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: The authors proposed a framework of concurrent adversarial training and weight pruning that enables model compression while still preserving the adversarial robustness and essentially tackles the dilemma of adversarial learning, which is well known that deep neural networks are vulnerable to adversarial attacks, which are implemented by adding crafted perturbations onto benign examples.
Abstract: It is well known that deep neural networks (DNNs) are vulnerable to adversarial attacks, which are implemented by adding crafted perturbations onto benign examples. Min-max robust optimization based adversarial training can provide a notion of security against adversarial attacks. However, adversarial robustness requires a significantly larger capacity of the network than that for the natural training with only benign examples. This paper proposes a framework of concurrent adversarial training and weight pruning that enables model compression while still preserving the adversarial robustness and essentially tackles the dilemma of adversarial training. Furthermore, this work studies two hypotheses about weight pruning in the conventional setting and finds that weight pruning is essential for reducing the network model size in the adversarial setting; training a small model from scratch even with inherited initialization from the large model cannot achieve neither adversarial robustness nor high standard accuracy. Code is available at https://github.com/yeshaokai/Robustness-Aware-Pruning-ADMM.

97 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications can be found in this paper.
Abstract: In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

1,027 citations

Journal ArticleDOI
20 Mar 2020
TL;DR: This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures.
Abstract: Domain-specific hardware is becoming a promising topic in the backdrop of improvement slow down for general-purpose processors due to the foreseeable end of Moore’s Law. Machine learning, especially deep neural networks (DNNs), has become the most dazzling domain witnessing successful applications in a wide spectrum of artificial intelligence (AI) tasks. The incomparable accuracy of DNNs is achieved by paying the cost of hungry memory consumption and high computational complexity, which greatly impedes their deployment in embedded systems. Therefore, the DNN compression concept was naturally proposed and widely used for memory saving and compute acceleration. In the past few years, a tremendous number of compression techniques have sprung up to pursue a satisfactory tradeoff between processing efficiency and application accuracy. Recently, this wave has spread to the design of neural network accelerators for gaining extremely high performance. However, the amount of related works is incredibly huge and the reported approaches are quite divergent. This research chaos motivates us to provide a comprehensive survey on the recent advances toward the goal of efficient compression and execution of DNNs without significantly compromising accuracy, involving both the high-level algorithms and their applications in hardware design. In this article, we review the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification. We explain their compression principles, evaluation metrics, sensitivity analysis, and joint-way use. Then, we answer the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures. In the end, we discuss several existing issues such as fair comparison, testing workloads, automatic compression, influence on security, and framework/hardware-level support, and give promising topics in this field and the possible challenges as well. This article attempts to enable readers to quickly build up a big picture of neural network compression and acceleration, clearly evaluate various methods, and confidently get started in the right way.

499 citations

Journal ArticleDOI
01 Jan 2019-Nature
TL;DR: A scalable spintronic logic device operating via spin–orbit transduction and magnetoelectric switching and using advanced quantum materials shows non-volatility and improved performance and energy efficiency compared with CMOS devices.
Abstract: Since the early 1980s, most electronics have relied on the use of complementary metal–oxide–semiconductor (CMOS) transistors. However, the principles of CMOS operation, involving a switchable semiconductor conductance controlled by an insulating gate, have remained largely unchanged, even as transistors are miniaturized to sizes of 10 nanometres. We investigated what dimensionally scalable logic technology beyond CMOS could provide improvements in efficiency and performance for von Neumann architectures and enable growth in emerging computing such as artifical intelligence. Such a computing technology needs to allow progressive miniaturization, reduce switching energy, improve device interconnection and provide a complete logic and memory family. Here we propose a scalable spintronic logic device that operates via spin–orbit transduction (the coupling of an electron’s angular momentum with its linear momentum) combined with magnetoelectric switching. The device uses advanced quantum materials, especially correlated oxides and topological states of matter, for collective switching and detection. We describe progress in magnetoelectric switching and spin–orbit detection of state, and show that in comparison with CMOS technology our device has superior switching energy (by a factor of 10 to 30), lower switching voltage (by a factor of 5) and enhanced logic density (by a factor of 5). In addition, its non-volatility enables ultralow standby power, which is critical to modern computing. The properties of our device indicate that the proposed technology could enable the development of multi-generational computing. A scalable spintronic device operating via spin–orbit transduction and magnetoelectric switching and using advanced quantum materials shows non-volatility and improved performance and energy efficiency compared with CMOS devices.

482 citations