scispace - formally typeset
Open AccessJournal ArticleDOI

Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science

Reads0
Chats0
TLDR
This article proposed sparse evolutionary training of artificial neural networks, an algorithm which evolves an initial sparse topology (Erdős-Renyi random graph) of two consecutive layers of neurons into a scale-free topology, during learning.
Abstract
Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods. Taking inspiration from the network properties of biological neural networks (e.g. sparsity, scale-freeness), we argue that (contrary to general practice) artificial neural networks, too, should not have fully-connected layers. Here we propose sparse evolutionary training of artificial neural networks, an algorithm which evolves an initial sparse topology (Erdős–Renyi random graph) of two consecutive layers of neurons into a scale-free topology, during learning. Our method replaces artificial neural networks fully-connected layers with sparse ones before training, reducing quadratically the number of parameters, with no decrease in accuracy. We demonstrate our claims on restricted Boltzmann machines, multi-layer perceptrons, and convolutional neural networks for unsupervised and supervised learning on 15 datasets. Our approach has the potential to enable artificial neural networks to scale up beyond what is currently possible.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

The State of Sparsity in Deep Neural Networks

TL;DR: It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted.
Posted Content

TabNet: Attentive Interpretable Tabular Learning

TL;DR: It is demonstrated that TabNet outperforms other neural network and decision tree variants on a wide range of non-performance-saturated tabular datasets and yields interpretable feature attributions plus insights into the global model behavior.
Proceedings Article

Pruning neural networks without any data by iteratively conserving synaptic flow

TL;DR: The data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important, and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models, datasets, and sparsity constraints.
Posted Content

Picking Winning Tickets Before Training by Preserving Gradient Flow

TL;DR: This work argues that efficient training requires preserving the gradient flow through the network, and proposes a simple but effective pruning criterion called Gradient Signal Preservation (GraSP), which achieves significantly better performance than the baseline at extreme sparsity levels.
Posted Content

Sparse Networks from Scratch: Faster Training without Losing Performance

TL;DR: This work develops sparse momentum, an algorithm which uses exponentially smoothed gradients (momentum) to identify layers and weights which reduce the error efficiently and shows that the benefits of momentum redistribution and growth increase with the depth and size of the network.
References
More filters
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Journal ArticleDOI

Collective dynamics of small-world networks

TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.
Book

Deep Learning

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Journal ArticleDOI

Emergence of Scaling in Random Networks

TL;DR: A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
Trending Questions (2)
How to create an artificial neural network?

Our approach has the potential to enable artificial neural networks to scale up beyond what is currently possible. Artificial neural networks are artificial intelligence computing methods which are inspired by biological neural networks.

How many artificial neural networks are there?

sparsity, scale-freeness), we argue that (contrary to general practice) artificial neural networks, too, should not have fully-connected layers.