Open AccessPosted Content
Optimization Algorithm Inspired Deep Neural Network Structure Design.
Reads0
Chats0
TLDR
In this article, the authors propose the hypothesis that the neural network structure design can be inspired by optimization algorithms and a faster optimization algorithm may lead to a better neural network architecture, and they prove that the propagation in the feed-forward neural network with the same linear transformation in different layers is equivalent to minimizing some function using the gradient descent algorithm.Abstract:
Deep neural networks have been one of the dominant machine learning approaches in recent years. Several new network structures are proposed and have better performance than the traditional feedforward neural network structure. Representative ones include the skip connection structure in ResNet and the dense connection structure in DenseNet. However, it still lacks a unified guidance for the neural network structure design. In this paper, we propose the hypothesis that the neural network structure design can be inspired by optimization algorithms and a faster optimization algorithm may lead to a better neural network structure. Specifically, we prove that the propagation in the feedforward neural network with the same linear transformation in different layers is equivalent to minimizing some function using the gradient descent algorithm. Based on this observation, we replace the gradient descent algorithm with the heavy ball algorithm and Nesterov's accelerated gradient descent algorithm, which are faster and inspire us to design new and better network structures. ResNet and DenseNet can be considered as two special cases of our framework. Numerical experiments on CIFAR-10, CIFAR-100 and ImageNet verify the advantage of our optimization algorithm inspired structures over ResNet and DenseNet.read more
Citations
More filters
Journal ArticleDOI
A Review on Deep Learning in Medical Image Reconstruction
Haimiao Zhang,Bin Dong +1 more
TL;DR: In this paper, the authors provide a conceptual review of some recent works on deep modeling from the unrolling dynamics viewpoint, which stimulate new designs of neural network architectures with inspirations from optimization algorithms and numerical differential equations.
Posted Content
Simple and Deep Graph Convolutional Networks
TL;DR: This article proposed GCNII, an extension of the vanilla GCN model with two simple yet effective techniques: initial residual and identity mapping, and provided theoretical and empirical evidence that the two techniques effectively relieves the problem of over-smoothing.
Posted Content
Differentiable Linearized ADMM
TL;DR: This paper is the first to provide the convergence analysis for the learning-based optimization method on constrained problems, and rigorously proves that there exist a set of learnable parameters for D-LADMM to generate globally converged solutions.
Posted Content
ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding.
TL;DR: In this paper, the authors formulate neural architecture search as a sparse coding problem and propose a differentiable search on a compressed lower-dimensional space that has the same validation loss as the original sparse solution space.
Journal ArticleDOI
Do We Really Need a Learnable Classifier at the End of Deep Neural Network?
TL;DR: The analytical work based on the layer-peeled model indicates that the feature learning with a fixed ETF classifier naturally leads to the neural collapse state even when the dataset is imbalanced among classes.
References
More filters
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Posted Content
Deep Residual Learning for Image Recognition
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings ArticleDOI
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Proceedings ArticleDOI
Fully convolutional networks for semantic segmentation
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.