Pruning algorithms-a survey

doi:10.1109/72.248452

Home
/
Papers
/
Pruning algorithms-a survey

Journal Article•DOI•

Pruning algorithms-a survey

R. Reed¹•Institutions (1)

University of Washington¹

01 Sep 1993-IEEE Transactions on Neural Networks (IEEE Trans Neural Netw)-Vol. 4, Iss: 5, pp 740-747

TL;DR: The approach taken by the methods described here is to train a network that is larger than necessary and then remove the parts that are not needed.

read less

Abstract: A rule of thumb for obtaining good generalization in systems trained by examples is that one should use the smallest system that will fit the data. Unfortunately, it usually is not obvious what size is best; a system that is too small will not be able to learn the data while one that is just big enough may learn very slowly and be very sensitive to initial conditions and learning parameters. This paper is a survey of neural network pruning algorithms. The approach taken by the methods described here is to train a network that is larger than necessary and then remove the parts that are not needed. >

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The Graph Neural Network Model

[...]

Franco Scarselli¹, Marco Gori¹, Ah Chung Tsoi², Markus Hagenbuchner³, Gabriele Monfardini¹ - Show less +1 more•Institutions (3)

University of Siena¹, Hong Kong Baptist University², University of Wollongong³

01 Jan 2009-IEEE Transactions on Neural Networks

TL;DR: A new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains, and implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an m-dimensional Euclidean space.

...read moreread less

Abstract: Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an m-dimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities.

...read moreread less

5,701 citations

Cites background from "Pruning algorithms-a survey"

...Thus, second-order learning algorithms [62], pruning [63], and growing learning algorithms [64]–[66] designed for static networks cannot be directly applied to GNNs....
[...]

Book•

Pattern recognition and neural networks

[...]

Brian D. Ripley¹, N. L. Hjort•Institutions (1)

University of Oxford¹

01 Jan 1996

TL;DR: Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks in this self-contained account.

...read moreread less

Abstract: From the Publisher: Pattern recognition has long been studied in relation to many different (and mainly unrelated) applications, such as remote sensing, computer vision, space research, and medical imaging. In this book Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks. Unifying principles are brought to the fore, and the author gives an overview of the state of the subject. Many examples are included to illustrate real problems in pattern recognition and how to overcome them.This is a self-contained account, ideal both as an introduction for non-specialists readers, and also as a handbook for the more expert reader.

...read moreread less

5,632 citations

Journal Article•DOI•

Forecasting with artificial neural networks: the state of the art

[...]

Guoqiang Zhang¹, B. Eddy Patuwo¹, Michael Y. Hu¹•Institutions (1)

Saint Petersburg State University¹

01 Mar 1998-International Journal of Forecasting

TL;DR: In this paper, the authors present a state-of-the-art survey of ANN applications in forecasting and provide a synthesis of published research in this area, insights on ANN modeling issues, and future research directions.

...read moreread less

3,680 citations

Journal Article•DOI•

Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences

[...]

M.W. Gardner¹, Stephen Dorling¹•Institutions (1)

University of East Anglia¹

01 Aug 1998-Atmospheric Environment

TL;DR: This paper presents a general introduction and discussion of recent applications of the multilayer perceptron, one type of artificial neural network, in the atmospheric sciences.

...read moreread less

2,389 citations

Journal Article•DOI•

Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications

[...]

Holger R. Maier¹, Graeme C. Dandy¹•Institutions (1)

University of Adelaide¹

01 Jan 2000-Environmental Modelling and Software

TL;DR: The steps that should be followed in the development of artificial neural network models are outlined, including the choice of performance criteria, the division and pre-processing of the available data, the determination of appropriate model inputs and network architecture, optimisation of the connection weights (training) and model validation.

...read moreread less

Abstract: Artificial Neural Networks (ANNs) are being used increasingly to predict and forecast water resources variables. In this paper, the steps that should be followed in the development of such models are outlined. These include the choice of performance criteria, the division and pre-processing of the available data, the determination of appropriate model inputs and network architecture, optimisation of the connection weights (training) and model validation. The options available to modellers at each of these steps are discussed and the issues that should be considered are highlighted. A review of 43 papers dealing with the use of neural network models for the prediction and forecasting of water resources variables is undertaken in terms of the modelling process adopted. In all but two of the papers reviewed, feedforward networks are used. The vast majority of these networks are trained using the backpropagation algorithm. Issues in relation to the optimal division of the available data, data pre-processing and the choice of appropriate model inputs are seldom considered. In addition, the process of choosing appropriate stopping criteria and optimising network geometry and internal network parameters is generally described poorly or carried out inadequately. All of the above factors can result in non-optimal model performance and an inability to draw meaningful comparisons between different models. Future research efforts should be directed towards the development of guidelines which assist with the development of ANN models and the choice of when ANNs should be used in preference to alternative approaches, the assessment of methods for extracting the knowledge that is contained in the connection weights of trained ANNs and the incorporation of uncertainty into ANN models.

...read moreread less

2,181 citations

Cites methods from "Pruning algorithms-a survey"

...A review of pruning algorithms is given by Reed (1993)....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

A theory of the learnable

[...]

Leslie G. Valiant¹•Institutions (1)

Harvard University¹

05 Nov 1984

TL;DR: This paper regards learning as the phenomenon of knowledge acquisition in the absence of explicit programming, and gives a precise methodology for studying this phenomenon from a computational viewpoint.

...read moreread less

Abstract: Humans appear to be able to learn new concepts without needing to be programmed explicitly in any conventional sense. In this paper we regard learning as the phenomenon of knowledge acquisition in the absence of explicit programming. We give a precise methodology for studying this phenomenon from a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learnt using it in a reasonable (polynomial) number of steps. We find that inherent algorithmic complexity appears to set serious limits to the range of concepts that can be so learnt. The methodology and results suggest concrete principles for designing realistic learning systems.

...read moreread less

5,311 citations

Journal Article•DOI•

Learnability and the Vapnik-Chervonenkis dimension

[...]

Anselm Blumer¹, Andrzej Ehrenfeucht², David Haussler³, Manfred K. Warmuth³•Institutions (3)

Tufts University¹, University of Colorado Boulder², University of California, Santa Cruz³

01 Oct 1989-Journal of the ACM

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.

...read moreread less

Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

...read moreread less

1,967 citations

Journal Article•DOI•

What Size Net Gives Valid Generalization

[...]

Eric B. Baum¹, David Haussler²•Institutions (2)

Princeton University¹, University of California, Santa Cruz²

01 Jan 1988

TL;DR: It is shown that if m O(W/ ∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 ∊/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 2 ∊ of future test examples drawn from the same distribution.

...read moreread less

Abstract: We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. Assume 0 < e ≤ 1/8. We show that if m ≥ O(W/e log N/e) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 - e/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 1 - e of future test examples drawn from the same distribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than Ω(W/e) random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 - e fraction of the future test examples.

...read moreread less

1,649 citations

Journal Article•DOI•

A simple procedure for pruning back-propagation trained neural networks

[...]

E.D. Karnin¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 Jun 1990-IEEE Transactions on Neural Networks

TL;DR: Shadow arrays are introduced which keep track of the incremental changes to the synaptic weights during a single pass of back-propagating learning and are ordered by decreasing sensitivity numbers so that the network can be efficiently pruned by discarding the last items of the sorted list.

...read moreread less

Abstract: The sensitivity of the global error (cost) function to the inclusion/exclusion of each synapse in the artificial neural network is estimated. Introduced are shadow arrays which keep track of the incremental changes to the synaptic weights during a single pass of back-propagating learning. The synapses are then ordered by decreasing sensitivity numbers so that the network can be efficiently pruned by discarding the last items of the sorted list. Unlike previous approaches, this simple procedure does not require a modification of the cost function, does not interfere with the learning process, and demands a negligible computational overhead. >

...read moreread less

684 citations

Journal Article•DOI•

Simplifying neural networks by soft weight-sharing

[...]

Steven J. Nowlan¹, Geoffrey E. Hinton²•Institutions (2)

Salk Institute for Biological Studies¹, University of Toronto²

01 Jul 1992-Neural Computation

TL;DR: A more complicated penalty term is proposed in which the distribution of weight values is modeled as a mixture of multiple gaussians, which allows the parameters of the mixture model to adapt at the same time as the network learns.

...read moreread less

Abstract: One way of simplifying neural networks so they generalize better is to add an extra term to the error function that will penalize complexity. Simple versions of this approach include penalizing the sum of the squares of the weights or penalizing the number of nonzero weights. We propose a more complicated penalty term in which the distribution of weight values is modeled as a mixture of multiple gaussians. A set of weights is simple if the weights have high probability density under the mixture model. This can be achieved by clustering the weights into subsets with the weights in each cluster having very similar values. Since we do not know the appropriate means or variances of the clusters in advance, we allow the parameters of the mixture model to adapt at the same time as the network learns. Simulations on two different problems demonstrate that this complexity term is more effective than previous complexity terms.

...read moreread less

683 citations