Home
/
Authors
/
Zhijian Liu

Author

Zhijian Liu

Bio: Zhijian Liu is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Deep learning & Speedup. The author has an hindex of 21, co-authored 36 publications receiving 2993 citations.

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

[...]

Yihui He¹, Ji Lin², Zhijian Liu², Hanrui Wang², Li-Jia Li³, Song Han² - Show less +2 more•Institutions (3)

Carnegie Mellon University¹, Massachusetts Institute of Technology², Google³

08 Sep 2018

TL;DR: This paper proposes AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality and achieves state-of-the-art model compression results in a fully automated way without any human efforts.

...read moreread less

Abstract: Model compression is an effective technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted features and require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality. We achieved state-of-the-art model compression results in a fully automated way without any human efforts. Under 4\(\times \) FLOPs reduction, we achieved 2.7% better accuracy than the hand-crafted model compression method for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet-V1 and achieved a speedup of 1.53\(\times \) on the GPU (Titan Xp) and 1.95\(\times \) on an Android phone (Google Pixel 1), with negligible loss of accuracy.

...read moreread less

1,086 citations

Posted Content•

AMC: AutoML for Model Compression and Acceleration on Mobile Devices.

[...]

Yihui He¹, Ji Lin², Zhijian Liu², Hanrui Wang², Li-Jia Li³, Song Han² - Show less +2 more•Institutions (3)

Carnegie Mellon University¹, Massachusetts Institute of Technology², Google³

10 Feb 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposed AutoML for Model Compression (AMC) which leverages reinforcement learning to provide the model compression policy, which outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor.

...read moreread less

Abstract: Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted heuristics and rule-based policies that require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverage reinforcement learning to provide the model compression policy. This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor. Under 4x FLOPs reduction, we achieved 2.7% better accuracy than the handcrafted model compression policy for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet and achieved 1.81x speedup of measured inference latency on an Android phone and 1.43x speedup on the Titan XP GPU, with only 0.1% loss of ImageNet Top-1 accuracy.

...read moreread less

538 citations

Proceedings Article•DOI•

HAQ: Hardware-Aware Automated Quantization With Mixed Precision

[...]

Kuan Wang¹, Zhijian Liu¹, Yujun Lin¹, Ji Lin¹, Song Han¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

15 Jun 2019

TL;DR: Huang et al. as discussed by the authors introduced the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and took the hardware accelerator's feedback in the design loop.

...read moreread less

Abstract: Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. There are plenty of specialized hardware for neural networks, but little research has been done for specialized neural network optimization for a particular hardware architecture. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design.

...read moreread less

467 citations

Book Chapter•DOI•

Deep Leakage from Gradients

[...]

Ligeng Zhu¹, Zhijian Liu¹, Song Han¹•Institutions (1)

Massachusetts Institute of Technology¹

21 Jun 2019

TL;DR: In this paper, the authors show that they can obtain the private training set from the publicly shared gradients, which is called deep leakage from gradient and practically validate the effectiveness of their algorithm on both computer vision and natural language processing tasks.

...read moreread less

Abstract: Passing gradient is a widely used scheme in modern multi-node learning system (e.g, distributed training, collaborative learning). In a long time, people used to believe that gradients are safe to share: i.e, the training set will not be leaked by gradient sharing. However, in this paper, we show that we can obtain the private training set from the publicly shared gradients. The leaking only takes few gradient steps to process and can obtain the original training set instead of look-alike alternatives. We name this leakage as \textit{deep leakage from gradient} and practically validate the effectiveness of our algorithm on both computer vision and natural language processing tasks. We empirically show that our attack is much stronger than previous approaches and thereby and raise people's awareness to rethink the gradients' safety. We also discuss some possible strategies to defend this deep leakage.

...read moreread less

450 citations

Posted Content•

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

[...]

Kuan Wang¹, Zhijian Liu¹, Yujun Lin¹, Ji Lin¹, Song Han¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

21 Nov 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Hardware-Aware Automated Quantization (HAQ) framework is introduced which leverages the reinforcement learning to automatically determine the quantization policy, and takes the hardware accelerator's feedback in the design loop to generate direct feedback signals to the RL agent.

...read moreread less

Abstract: Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design.

...read moreread less

401 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Posted Content•

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

[...]

Mingxing Tan¹, Quoc V. Le¹•Institutions (1)

Google¹

28 May 2019-arXiv: Learning

TL;DR: A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.

...read moreread less

Abstract: Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at this https URL.

...read moreread less

6,222 citations

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Proceedings Article•

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

[...]

Mingxing Tan¹, Quoc V. Le¹•Institutions (1)

Google¹

24 May 2019

TL;DR: EfficientNet-B7 as discussed by the authors proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient, which achieves state-of-the-art accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference.

...read moreread less

3,445 citations

The theory of affordances

[...]

博之三嶋

01 Nov 2008

2,686 citations

Proceedings Article•DOI•

MnasNet: Platform-Aware Neural Architecture Search for Mobile

[...]

Mingxing Tan¹, Bo Chen¹, Ruoming Pang¹, Vijay K. Vasudevan¹, Mark Sandler¹, Andrew Howard¹, Quoc V. Le¹ - Show less +3 more•Institutions (1)

Google¹

01 Jun 2019

TL;DR: In this article, the authors propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporates model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency.

...read moreread less

Abstract: Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, we propose a novel factorized hierarchical search space that encourages layer diversity throughout the network. Experimental results show that our approach consistently outperforms state-of-the-art mobile CNN models across multiple vision tasks. On the ImageNet classification task, our MnasNet achieves 75.2% top-1 accuracy with 78ms latency on a Pixel phone, which is 1.8× faster than MobileNetV2 with 0.5% higher accuracy and 2.3× faster than NASNet with 1.2% higher accuracy. Our MnasNet also achieves better mAP quality than MobileNets for COCO object detection. Code is at https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet.

...read moreread less

1,841 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse