scispace - formally typeset
Open AccessPosted Content

Optimization Algorithm Inspired Deep Neural Network Structure Design.

Reads0
Chats0
TLDR
In this article, the authors propose the hypothesis that the neural network structure design can be inspired by optimization algorithms and a faster optimization algorithm may lead to a better neural network architecture, and they prove that the propagation in the feed-forward neural network with the same linear transformation in different layers is equivalent to minimizing some function using the gradient descent algorithm.
Abstract
Deep neural networks have been one of the dominant machine learning approaches in recent years. Several new network structures are proposed and have better performance than the traditional feedforward neural network structure. Representative ones include the skip connection structure in ResNet and the dense connection structure in DenseNet. However, it still lacks a unified guidance for the neural network structure design. In this paper, we propose the hypothesis that the neural network structure design can be inspired by optimization algorithms and a faster optimization algorithm may lead to a better neural network structure. Specifically, we prove that the propagation in the feedforward neural network with the same linear transformation in different layers is equivalent to minimizing some function using the gradient descent algorithm. Based on this observation, we replace the gradient descent algorithm with the heavy ball algorithm and Nesterov's accelerated gradient descent algorithm, which are faster and inspire us to design new and better network structures. ResNet and DenseNet can be considered as two special cases of our framework. Numerical experiments on CIFAR-10, CIFAR-100 and ImageNet verify the advantage of our optimization algorithm inspired structures over ResNet and DenseNet.

read more

Citations
More filters
Journal ArticleDOI

A Review on Deep Learning in Medical Image Reconstruction

TL;DR: In this paper, the authors provide a conceptual review of some recent works on deep modeling from the unrolling dynamics viewpoint, which stimulate new designs of neural network architectures with inspirations from optimization algorithms and numerical differential equations.
Posted Content

Simple and Deep Graph Convolutional Networks

TL;DR: This article proposed GCNII, an extension of the vanilla GCN model with two simple yet effective techniques: initial residual and identity mapping, and provided theoretical and empirical evidence that the two techniques effectively relieves the problem of over-smoothing.
Posted Content

Differentiable Linearized ADMM

TL;DR: This paper is the first to provide the convergence analysis for the learning-based optimization method on constrained problems, and rigorously proves that there exist a set of learnable parameters for D-LADMM to generate globally converged solutions.
Posted Content

ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding.

TL;DR: In this paper, the authors formulate neural architecture search as a sparse coding problem and propose a differentiable search on a compressed lower-dimensional space that has the same validation loss as the original sparse solution space.
Journal ArticleDOI

Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

TL;DR: The analytical work based on the layer-peeled model indicates that the feature learning with a fixed ETF classifier naturally leads to the neural collapse state even when the dataset is imbalanced among classes.
References
More filters
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Posted Content

Deep Residual Learning for Image Recognition

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.