Home
/
Authors
/
Kaisheng Ma

Author

Kaisheng Ma

Other affiliations: Pennsylvania State University, Peking University, University of California, Santa Barbara ...read more

Bio: Kaisheng Ma is an academic researcher from Tsinghua University. The author has contributed to research in topics: Computer science & Pruning (decision trees). The author has an hindex of 20, co-authored 73 publications receiving 1513 citations. Previous affiliations of Kaisheng Ma include Pennsylvania State University & Peking University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

[...]

Linfeng Zhang¹, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao¹, Kaisheng Ma¹ - Show less +2 more•Institutions (1)

Tsinghua University¹

01 Oct 2019

TL;DR: A general training framework named self distillation, which notably enhances the performance of convolutional neural networks through shrinking the size of the network rather than aggrandizing it, and can also provide flexibility of depth-wise scalable inference on resource-limited edge devices.

...read moreread less

Abstract: Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the applications' boundaries to some accuracy-crucial domains, researchers have been investigating approaches to boost accuracy through either deeper or wider network structures, which brings with them the exponential increment of the computational and storage cost, delaying the responding time. In this paper, we propose a general training framework named self distillation, which notably enhances the performance (accuracy) of convolutional neural networks through shrinking the size of the network rather than aggrandizing it. Different from traditional knowledge distillation - a knowledge transformation methodology among networks, which forces student neural networks to approximate the softmax layer outputs of pre-trained teacher neural networks, the proposed self distillation framework distills knowledge within network itself. The networks are firstly divided into several sections. Then the knowledge in the deeper portion of the networks is squeezed into the shallow ones. Experiments further prove the generalization of the proposed self distillation framework: enhancement of accuracy at average level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as maximum. In addition, it can also provide flexibility of depth-wise scalable inference on resource-limited edge devices. Our codes have been released on github.

...read moreread less

550 citations

Proceedings Article•DOI•

Architecture exploration for ambient energy harvesting nonvolatile processors

[...]

Kaisheng Ma¹, Yang Zheng¹, Shuangchen Li¹, Karthik Swaminathan¹, Xueqing Li¹, Yongpan Liu², Jack Sampson¹, Yuan Xie³, Vijaykrishnan Narayanan¹ - Show less +5 more•Institutions (3)

Pennsylvania State University¹, Tsinghua University², University of California, Santa Barbara³

09 Mar 2015

TL;DR: The simulation platform in this paper is calibrated using measured results from a fabricated nonvolatile processor and used to explore the design space for a nonVolatile processor with different architectures, different input power sources, and policies for maximizing forward progress.

...read moreread less

Abstract: Energy harvesting has been widely investigated as a promising method of providing power for ultra-low-power applications. Such energy sources include solar energy, radio-frequency (RF) radiation, piezoelectricity, thermal gradients, etc. However, the power supplied by these sources is highly unreliable and dependent upon ambient environment factors. Hence, it is necessary to develop specialized systems that are tolerant to this power variation, and also capable of making forward progress on the computation tasks. The simulation platform in this paper is calibrated using measured results from a fabricated nonvolatile processor and used to explore the design space for a nonvolatile processor with different architectures, different input power sources, and policies for maximizing forward progress.

...read moreread less

225 citations

Proceedings Article•DOI•

Ambient energy harvesting nonvolatile processors: from circuit to system

[...]

Yongpan Liu¹, Zewei Li¹, Hehe Li¹, Yiqun Wang¹, Xueqing Li², Kaisheng Ma², Shuangchen Li³, Meng-Fan Chang⁴, Jack Sampson², Yuan Xie³, Jiwu Shu¹, Huazhong Yang¹ - Show less +8 more•Institutions (4)

Tsinghua University¹, Pennsylvania State University², University of California, Santa Barbara³, National Tsing Hua University⁴

07 Jun 2015

TL;DR: New metrics of nonvolatile processors to consider energy harvesting factors for the first time are proposed and the nonvolatility processor design from circuit to system level is explored.

...read moreread less

Abstract: Energy harvesting is gaining more and more attentions due to its characteristics of ultra-long operation time without maintenance. However, frequent unpredictable power failures from energy harvesters bring performance and reliability challenges to traditional processors. Nonvolatile processors are promising to solve such a problem due to their advantage of zero leakage and efficient backup and restore operations. To optimize the nonvolatile processor design, this paper proposes new metrics of nonvolatile processors to consider energy harvesting factors for the first time. Furthermore, we explore the nonvolatile processor design from circuit to system level. A prototype of energy harvesting nonvolatile processor is set up and experimental results show that the proposed performance metric meets the measured results by less than 6.27% average errors. Finally, the energy consumption of nonvolatile processor is analyzed under different benchmarks.

...read moreread less

127 citations

Proceedings Article•DOI•

Nonvolatile memory design based on ferroelectric FETs

[...]

Sumitha George¹, Kaisheng Ma¹, Ahmedullah Aziz¹, Xueqing Li¹, Asif Islam Khan², Sayeef Salahuddin², Meng-Fan Chang³, Suman Datta⁴, Jack Sampson¹, Sumeet Kumar Gupta¹, Vijaykrishnan Narayanan¹ - Show less +7 more•Institutions (4)

Pennsylvania State University¹, University of California, Berkeley², National Tsing Hua University³, University of Notre Dame⁴

05 Jun 2016

TL;DR: This work proposes a 2-transistor (2T) FEFET-based nonvolatile memory with separate read and write paths that achieves non-destructive read and lower write power at iso-write speed compared to standard FE-RAM.

...read moreread less

Abstract: Ferroelectric FETs (FEFETs) offer intriguing possibilities for the design of low power nonvolatile memories by virtue of their three-terminal structure coupled with the ability of the ferroelectric (FE) material to retain its polarization in the absence of an electric field. Utilizing the distinct features of FEFETs, we propose a 2-transistor (2T) FEFET-based nonvolatile memory with separate read and write paths. With proper co-design at the device, cell and array levels, the proposed design achieves non-destructive read and lower write power at iso-write speed compared to standard FE-RAM. In addition, the FEFET-based memory exhibits high distinguishability with six orders of magnitude difference in the read currents corresponding to the two states. Comparative analysis based on experimentally calibrated models shows significant improvement of access energy-delay. For example, at a fixed write time of 550ps, the write voltage and energy are 58.5% and 67.7% lower than FERAM, respectively. These benefits are achieved with 2.4 times the area overhead. Further exploration of the proposed FEFET memory in energy harvesting nonvolatile processors shows an average improvement of 27% in forward progress over FERAM.

...read moreread less

99 citations

Proceedings Article•DOI•

Adversarial Robustness vs. Model Compression, or Both?

[...]

Shaokai Ye¹, Xue Lin², Kaidi Xu², Sijia Liu³, Hao Cheng⁴, Jan-Henrik Lambrechts¹, Huan Zhang⁵, Aojun Zhou, Kaisheng Ma¹, Yanzhi Wang² - Show less +6 more•Institutions (5)

Tsinghua University¹, Northeastern University², IBM³, Xi'an Jiaotong University⁴, University of California, Los Angeles⁵

01 Oct 2019

TL;DR: The authors proposed a framework of concurrent adversarial training and weight pruning that enables model compression while still preserving the adversarial robustness and essentially tackles the dilemma of adversarial learning, which is well known that deep neural networks are vulnerable to adversarial attacks, which are implemented by adding crafted perturbations onto benign examples.

...read moreread less

Abstract: It is well known that deep neural networks (DNNs) are vulnerable to adversarial attacks, which are implemented by adding crafted perturbations onto benign examples. Min-max robust optimization based adversarial training can provide a notion of security against adversarial attacks. However, adversarial robustness requires a significantly larger capacity of the network than that for the natural training with only benign examples. This paper proposes a framework of concurrent adversarial training and weight pruning that enables model compression while still preserving the adversarial robustness and essentially tackles the dilemma of adversarial training. Furthermore, this work studies two hypotheses about weight pruning in the conventional setting and finds that weight pruning is essential for reducing the network model size in the adversarial setting; training a small model from scratch even with inherited initialization from the large model cannot achieve neither adversarial robustness nor high standard accuracy. Code is available at https://github.com/yeshaokai/Robustness-Aware-Pruning-ADMM.

...read moreread less

97 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

5分で分かる!? 有名論文ナナメ読み：Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

[...]

柴田知秀

15 Feb 2020

1,595 citations

Journal Article•DOI•

Knowledge Distillation: A Survey

[...]

Jianping Gou¹, Jianping Gou², Baosheng Yu¹, Stephen J. Maybank³, Dacheng Tao¹ - Show less +1 more•Institutions (3)

University of Sydney¹, Jiangsu University², Birkbeck, University of London³

09 Jun 2020-arXiv: Learning

TL;DR: A comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications can be found in this paper.

...read moreread less

Abstract: In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

...read moreread less

1,027 citations

Journal Article•DOI•

Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey

[...]

Lei Deng¹, Guoqi Li¹, Song Han², Luping Shi¹, Yuan Xie³ - Show less +1 more•Institutions (3)

Tsinghua University¹, Massachusetts Institute of Technology², University of California, Santa Barbara³

20 Mar 2020

TL;DR: This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures.

...read moreread less

Abstract: Domain-specific hardware is becoming a promising topic in the backdrop of improvement slow down for general-purpose processors due to the foreseeable end of Moore’s Law. Machine learning, especially deep neural networks (DNNs), has become the most dazzling domain witnessing successful applications in a wide spectrum of artificial intelligence (AI) tasks. The incomparable accuracy of DNNs is achieved by paying the cost of hungry memory consumption and high computational complexity, which greatly impedes their deployment in embedded systems. Therefore, the DNN compression concept was naturally proposed and widely used for memory saving and compute acceleration. In the past few years, a tremendous number of compression techniques have sprung up to pursue a satisfactory tradeoff between processing efficiency and application accuracy. Recently, this wave has spread to the design of neural network accelerators for gaining extremely high performance. However, the amount of related works is incredibly huge and the reported approaches are quite divergent. This research chaos motivates us to provide a comprehensive survey on the recent advances toward the goal of efficient compression and execution of DNNs without significantly compromising accuracy, involving both the high-level algorithms and their applications in hardware design. In this article, we review the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification. We explain their compression principles, evaluation metrics, sensitivity analysis, and joint-way use. Then, we answer the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures. In the end, we discuss several existing issues such as fair comparison, testing workloads, automatic compression, influence on security, and framework/hardware-level support, and give promising topics in this field and the possible challenges as well. This article attempts to enable readers to quickly build up a big picture of neural network compression and acceleration, clearly evaluate various methods, and confidently get started in the right way.

...read moreread less

499 citations

Journal Article•DOI•

Scalable energy-efficient magnetoelectric spin-orbit logic.

[...]

Sasikanth Manipatruni¹, Dmitri E. Nikonov¹, Lin Chia-Ching¹, Gosavi Tanay¹, Huichu Liu¹, Bhagwati Prasad², Yen Lin Huang², Yen Lin Huang³, Everton Bonturim², Ramamoorthy Ramesh³, Ramamoorthy Ramesh², Ian A. Young¹ - Show less +8 more•Institutions (3)

Intel¹, University of California, Berkeley², Lawrence Berkeley National Laboratory³

01 Jan 2019-Nature

TL;DR: A scalable spintronic logic device operating via spin–orbit transduction and magnetoelectric switching and using advanced quantum materials shows non-volatility and improved performance and energy efficiency compared with CMOS devices.

...read moreread less

Abstract: Since the early 1980s, most electronics have relied on the use of complementary metal–oxide–semiconductor (CMOS) transistors. However, the principles of CMOS operation, involving a switchable semiconductor conductance controlled by an insulating gate, have remained largely unchanged, even as transistors are miniaturized to sizes of 10 nanometres. We investigated what dimensionally scalable logic technology beyond CMOS could provide improvements in efficiency and performance for von Neumann architectures and enable growth in emerging computing such as artifical intelligence. Such a computing technology needs to allow progressive miniaturization, reduce switching energy, improve device interconnection and provide a complete logic and memory family. Here we propose a scalable spintronic logic device that operates via spin–orbit transduction (the coupling of an electron’s angular momentum with its linear momentum) combined with magnetoelectric switching. The device uses advanced quantum materials, especially correlated oxides and topological states of matter, for collective switching and detection. We describe progress in magnetoelectric switching and spin–orbit detection of state, and show that in comparison with CMOS technology our device has superior switching energy (by a factor of 10 to 30), lower switching voltage (by a factor of 5) and enhanced logic density (by a factor of 5). In addition, its non-volatility enables ultralow standby power, which is critical to modern computing. The properties of our device indicate that the proposed technology could enable the development of multi-generational computing. A scalable spintronic device operating via spin–orbit transduction and magnetoelectric switching and using advanced quantum materials shows non-volatility and improved performance and energy efficiency compared with CMOS devices.

...read moreread less

482 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse