scispace - formally typeset
Search or ask a question
Author

Zhenyu Liao

Other affiliations: University at Buffalo
Bio: Zhenyu Liao is an academic researcher from Boston University. The author has contributed to research in topics: Quantization (signal processing) & Artificial neural network. The author has an hindex of 9, co-authored 20 publications receiving 212 citations. Previous affiliations of Zhenyu Liao include University at Buffalo.

Papers
More filters
Proceedings ArticleDOI
14 Jun 2015
TL;DR: In this paper, a linear-sized spectral sparsifier with regret minimization was constructed in O(n2+e) running time using matrix multiplicative weight updates (MWU), which is an instance of the Follow-the-Regularized Leader framework.
Abstract: In this paper, we provide a novel construction of the linear-sized spectral sparsifiers of Batson, Spielman and Srivastava [11]. While previous constructions required Ω(n4) running time [11, 45], our sparsification routine can be implemented in almost-quadratic running time O(n2+e). The fundamental conceptual novelty of our work is the leveraging of a strong connection between sparsification and a regret minimization problem over density matrices. This connection was known to provide an interpretation of the randomized sparsifiers of Spielman and Srivastava [39] via the application of matrix multiplicative weight updates (MWU) [17, 43]. In this paper, we explain how matrix MWU naturally arises as an instance of the Follow-the-Regularized-Leader framework and generalize this approach to yield a larger class of updates. This new class allows us to accelerate the construction of linear-sized spectral sparsifiers, and give novel insights on the motivation behind Batson, Spielman and Srivastava [11].

66 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: In this paper, the authors investigate a novel option to achieve this goal by enabling adaptive bit-widths of weights and activations in the model, and propose a new technique named Switchable Clipping Level (S-CL) to further improve quantized models at the lowest bitwidth.
Abstract: Deep neural networks with adaptive configurations have gained increasing attention due to the instant and flexible deployment of these models on platforms with different resource budgets. In this paper, we investigate a novel option to achieve this goal by enabling adaptive bit-widths of weights and activations in the model. We first examine the benefits and challenges of training quantized model with adaptive bit-widths, and then experiment with several approaches including direct adaptation, progressive training and joint training. We discover that joint training is able to produce comparable performance on the adaptive model as individual models. We also propose a new technique named Switchable Clipping Level (S-CL) to further improve quantized models at the lowest bit-width. With our proposed techniques applied on a bunch of models including MobileNet V1/V2 and ResNet50, we demonstrate that bit-width of weights and activations is a new option for adaptively executable deep neural networks, offering a distinct opportunity for improved accuracy-efficiency trade-off as well as instant adaptation according to the platform constraints in real-world applications.

46 citations

Posted Content
TL;DR: It is demonstrated that bit-width of weights and activations is a new option for adaptively executable deep neural networks, offering a distinct opportunity for improved accuracy-efficiency trade-off as well as instant adaptation according to the platform constraints in real-world applications.
Abstract: Deep neural networks with adaptive configurations have gained increasing attention due to the instant and flexible deployment of these models on platforms with different resource budgets. In this paper, we investigate a novel option to achieve this goal by enabling adaptive bit-widths of weights and activations in the model. We first examine the benefits and challenges of training quantized model with adaptive bit-widths, and then experiment with several approaches including direct adaptation, progressive training and joint training. We discover that joint training is able to produce comparable performance on the adaptive model as individual models. We further propose a new technique named Switchable Clipping Level (S-CL) to further improve quantized models at the lowest bit-width. With our proposed techniques applied on a bunch of models including MobileNet-V1/V2 and ResNet-50, we demonstrate that bit-width of weights and activations is a new option for adaptively executable deep neural networks, offering a distinct opportunity for improved accuracy-efficiency trade-off as well as instant adaptation according to the platform constraints in real-world applications.

46 citations

Journal Article
TL;DR: This work proposes a new formulation that maximizes the $\ell^4$-norm over the orthogonal group, to learn the entire dictionary, and gives a conceptually simple and yet effective algorithm based on `matching, stretching, and projection' (MSP).

41 citations

Posted Content
TL;DR: It is shown that a simple universal perturbation can fool a series of state-of-the-art defenses, and it is verified that regionally homogeneous perturbations can well transfer across different vision tasks.
Abstract: This paper focuses on learning transferable adversarial examples specifically against defense models (models to defense adversarial attacks). In particular, we show that a simple universal perturbation can fool a series of state-of-the-art defenses. Adversarial examples generated by existing attacks are generally hard to transfer to defense models. We observe the property of regional homogeneity in adversarial perturbations and suggest that the defenses are less robust to regionally homogeneous perturbations. Therefore, we propose an effective transforming paradigm and a customized gradient transformer module to transform existing perturbations into regionally homogeneous ones. Without explicitly forcing the perturbations to be universal, we observe that a well-trained gradient transformer module tends to output input-independent gradients (hence universal) benefiting from the under-fitting phenomenon. Thorough experiments demonstrate that our work significantly outperforms the prior art attacking algorithms (either image-dependent or universal ones) by an average improvement of 14.0% when attacking 9 defenses in the transfer-based attack setting. In addition to the cross-model transferability, we also verify that regionally homogeneous perturbations can well transfer across different vision tasks (attacking with the semantic segmentation task and testing on the object detection task). The code is available here: this https URL.

33 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Van Kampen as mentioned in this paper provides an extensive graduate-level introduction which is clear, cautious, interesting and readable, and could be expected to become an essential part of the library of every physical scientist concerned with problems involving fluctuations and stochastic processes.
Abstract: N G van Kampen 1981 Amsterdam: North-Holland xiv + 419 pp price Dfl 180 This is a book which, at a lower price, could be expected to become an essential part of the library of every physical scientist concerned with problems involving fluctuations and stochastic processes, as well as those who just enjoy a beautifully written book. It provides an extensive graduate-level introduction which is clear, cautious, interesting and readable.

3,647 citations

Journal ArticleDOI
TL;DR: This tutorial-style overview highlights the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees and reviews two contrasting approaches: two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and global landscape analysis and initialization-free algorithms.
Abstract: Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, and robust principal component analysis. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.

369 citations

Journal Article
TL;DR: Katyusha as mentioned in this paper is a primal-only stochastic gradient method with negative momentum on top of Nesterov's momentum, which can be incorporated into a variance reduction based algorithm and speed it up, both in terms of sequential and parallel performance.
Abstract: Nesterov's momentum trick is famously known for accelerating gradient descent, and has been proven useful in building fast iterative algorithms. However, in the stochastic setting, counterexamples exist and prevent Nesterov's momentum from providing similar acceleration, even if the underlying problem is convex and finite-sum. We introduce $\mathtt{Katyusha}$, a direct, primal-only stochastic gradient method to fix this issue. In convex finite-sum stochastic optimization, $\mathtt{Katyusha}$ has an optimal accelerated convergence rate, and enjoys an optimal parallel linear speedup in the mini-batch setting. The main ingredient is $\textit{Katyusha momentum}$, a novel "negative momentum" on top of Nesterov's momentum. It can be incorporated into a variance-reduction based algorithm and speed it up, both in terms of $\textit{sequential and parallel}$ performance. Since variance reduction has been successfully applied to a growing list of practical problems, our paper suggests that in each of such cases, one could potentially try to give Katyusha a hug.

273 citations

Proceedings ArticleDOI
19 Jun 2017
TL;DR: Katyusha as discussed by the authors is a primal-only stochastic gradient method with negative momentum on top of Nesterov's momentum that can be incorporated into a variance reduction based algorithm and speed it up.
Abstract: Nesterov's momentum trick is famously known for accelerating gradient descent, and has been proven useful in building fast iterative algorithms. However, in the stochastic setting, counterexamples exist and prevent Nesterov's momentum from providing similar acceleration, even if the underlying problem is convex. We introduce Katyusha, a direct, primal-only stochastic gradient method to fix this issue. It has a provably accelerated convergence rate in convex (off-line) stochastic optimization. The main ingredient is Katyusha momentum, a novel "negative momentum" on top of Nesterov's momentum that can be incorporated into a variance-reduction based algorithm and speed it up. Since variance reduction has been successfully applied to a growing list of practical problems, our paper suggests that in each of such cases, one could potentially give Katyusha a hug.

241 citations