scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Pruning algorithms-a survey

R. Reed1
01 Sep 1993-IEEE Transactions on Neural Networks (IEEE Trans Neural Netw)-Vol. 4, Iss: 5, pp 740-747
TL;DR: The approach taken by the methods described here is to train a network that is larger than necessary and then remove the parts that are not needed.
Abstract: A rule of thumb for obtaining good generalization in systems trained by examples is that one should use the smallest system that will fit the data. Unfortunately, it usually is not obvious what size is best; a system that is too small will not be able to learn the data while one that is just big enough may learn very slowly and be very sensitive to initial conditions and learning parameters. This paper is a survey of neural network pruning algorithms. The approach taken by the methods described here is to train a network that is larger than necessary and then remove the parts that are not needed. >
Citations
More filters
Journal ArticleDOI
TL;DR: A new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains, and implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an m-dimensional Euclidean space.
Abstract: Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an m-dimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities.

5,701 citations


Cites background from "Pruning algorithms-a survey"

  • ...Thus, second-order learning algorithms [62], pruning [63], and growing learning algorithms [64]–[66] designed for static networks cannot be directly applied to GNNs....

    [...]

Book
01 Jan 1996
TL;DR: Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks in this self-contained account.
Abstract: From the Publisher: Pattern recognition has long been studied in relation to many different (and mainly unrelated) applications, such as remote sensing, computer vision, space research, and medical imaging. In this book Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks. Unifying principles are brought to the fore, and the author gives an overview of the state of the subject. Many examples are included to illustrate real problems in pattern recognition and how to overcome them.This is a self-contained account, ideal both as an introduction for non-specialists readers, and also as a handbook for the more expert reader.

5,632 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present a state-of-the-art survey of ANN applications in forecasting and provide a synthesis of published research in this area, insights on ANN modeling issues, and future research directions.

3,680 citations

Journal ArticleDOI
TL;DR: This paper presents a general introduction and discussion of recent applications of the multilayer perceptron, one type of artificial neural network, in the atmospheric sciences.

2,389 citations

Journal ArticleDOI
TL;DR: The steps that should be followed in the development of artificial neural network models are outlined, including the choice of performance criteria, the division and pre-processing of the available data, the determination of appropriate model inputs and network architecture, optimisation of the connection weights (training) and model validation.
Abstract: Artificial Neural Networks (ANNs) are being used increasingly to predict and forecast water resources variables. In this paper, the steps that should be followed in the development of such models are outlined. These include the choice of performance criteria, the division and pre-processing of the available data, the determination of appropriate model inputs and network architecture, optimisation of the connection weights (training) and model validation. The options available to modellers at each of these steps are discussed and the issues that should be considered are highlighted. A review of 43 papers dealing with the use of neural network models for the prediction and forecasting of water resources variables is undertaken in terms of the modelling process adopted. In all but two of the papers reviewed, feedforward networks are used. The vast majority of these networks are trained using the backpropagation algorithm. Issues in relation to the optimal division of the available data, data pre-processing and the choice of appropriate model inputs are seldom considered. In addition, the process of choosing appropriate stopping criteria and optimising network geometry and internal network parameters is generally described poorly or carried out inadequately. All of the above factors can result in non-optimal model performance and an inability to draw meaningful comparisons between different models. Future research efforts should be directed towards the development of guidelines which assist with the development of ANN models and the choice of when ANNs should be used in preference to alternative approaches, the assessment of methods for extracting the knowledge that is contained in the connection weights of trained ANNs and the incorporation of uncertainty into ANN models.

2,181 citations


Cites methods from "Pruning algorithms-a survey"

  • ...A review of pruning algorithms is given by Reed (1993)....

    [...]

References
More filters
Proceedings ArticleDOI
05 Nov 1984
TL;DR: This paper regards learning as the phenomenon of knowledge acquisition in the absence of explicit programming, and gives a precise methodology for studying this phenomenon from a computational viewpoint.
Abstract: Humans appear to be able to learn new concepts without needing to be programmed explicitly in any conventional sense. In this paper we regard learning as the phenomenon of knowledge acquisition in the absence of explicit programming. We give a precise methodology for studying this phenomenon from a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learnt using it in a reasonable (polynomial) number of steps. We find that inherent algorithmic complexity appears to set serious limits to the range of concepts that can be so learnt. The methodology and results suggest concrete principles for designing realistic learning systems.

5,311 citations

Journal ArticleDOI
TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

1,967 citations

Journal ArticleDOI
01 Jan 1988
TL;DR: It is shown that if m O(W/ ∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 ∊/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 2 ∊ of future test examples drawn from the same distribution.
Abstract: We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. Assume 0 < e ≤ 1/8. We show that if m ≥ O(W/e log N/e) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 - e/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 1 - e of future test examples drawn from the same distribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than Ω(W/e) random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 - e fraction of the future test examples.

1,649 citations

Journal ArticleDOI
TL;DR: Shadow arrays are introduced which keep track of the incremental changes to the synaptic weights during a single pass of back-propagating learning and are ordered by decreasing sensitivity numbers so that the network can be efficiently pruned by discarding the last items of the sorted list.
Abstract: The sensitivity of the global error (cost) function to the inclusion/exclusion of each synapse in the artificial neural network is estimated. Introduced are shadow arrays which keep track of the incremental changes to the synaptic weights during a single pass of back-propagating learning. The synapses are then ordered by decreasing sensitivity numbers so that the network can be efficiently pruned by discarding the last items of the sorted list. Unlike previous approaches, this simple procedure does not require a modification of the cost function, does not interfere with the learning process, and demands a negligible computational overhead. >

684 citations

Journal ArticleDOI
TL;DR: A more complicated penalty term is proposed in which the distribution of weight values is modeled as a mixture of multiple gaussians, which allows the parameters of the mixture model to adapt at the same time as the network learns.
Abstract: One way of simplifying neural networks so they generalize better is to add an extra term to the error function that will penalize complexity. Simple versions of this approach include penalizing the sum of the squares of the weights or penalizing the number of nonzero weights. We propose a more complicated penalty term in which the distribution of weight values is modeled as a mixture of multiple gaussians. A set of weights is simple if the weights have high probability density under the mixture model. This can be achieved by clustering the weights into subsets with the weights in each cluster having very similar values. Since we do not know the appropriate means or variances of the clusters in advance, we allow the parameters of the mixture model to adapt at the same time as the network learns. Simulations on two different problems demonstrate that this complexity term is more effective than previous complexity terms.

683 citations