When Networks Disagree: Ensemble Methods for Hybrid Neural Networks

Home
/
Papers
/
When Networks Disagree: Ensemble Methods for Hybrid Neural Networks

When Networks Disagree: Ensemble Methods for Hybrid Neural Networks

27 Oct 1992-

TL;DR: Experimental results show that the ensemble method dramatically improves neural network performance on difficult real-world optical character recognition tasks.

read less

Abstract: : This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, the authors construct a hybrid estimator that is as good or better in the mean square error sense than any estimator in the population. They argue that the ensemble method presented has several properties: (1) it efficiently uses all the networks of a population -- none of the networks need to be discarded; (2) it efficiently uses all of the available data for training without over-fitting; (3) it inherently performs regularization by smoothing in functional space, which helps to avoid over-fitting; (4) it utilizes local minima to construct improved estimates whereas other neural network algorithms are hindered by local minima; (5) it is ideally suited for parallel computation; (6) it leads to a very useful and natural measure of the number of distinct estimators in a population; and (7) the optimal parameters of the ensemble estimator are given in closed form. Experimental results show that the ensemble method dramatically improves neural network performance on difficult real-world optical character recognition tasks.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•

Neural networks for pattern recognition

[...]

Christopher M. Bishop¹•Institutions (1)

Aston University¹

01 Jan 1995

TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

19,056 citations

Cites background from "When Networks Disagree: Ensemble Me..."

...These drawbacks can be overcome by combining the networks together to form a committee (Perrone and Cooper, 1993; Perrone, 1994)....
[...]

Journal Article•DOI•

Statistical pattern recognition: a review

[...]

Anil K. Jain¹, Robert P. W. Duin², Jianchang Mao³•Institutions (3)

Michigan State University¹, Delft University of Technology², IBM³

01 Jan 2000-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

...read moreread less

Abstract: The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

...read moreread less

6,527 citations

Book•

Pattern recognition and neural networks

[...]

Brian D. Ripley¹, N. L. Hjort•Institutions (1)

University of Oxford¹

01 Jan 1996

TL;DR: Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks in this self-contained account.

...read moreread less

Abstract: From the Publisher: Pattern recognition has long been studied in relation to many different (and mainly unrelated) applications, such as remote sensing, computer vision, space research, and medical imaging. In this book Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks. Unifying principles are brought to the fore, and the author gives an overview of the state of the subject. Many examples are included to illustrate real problems in pattern recognition and how to overcome them.This is a self-contained account, ideal both as an introduction for non-specialists readers, and also as a handbook for the more expert reader.

...read moreread less

5,632 citations

Book•

Applied Predictive Modeling

[...]

Max Kuhn, Kjell Johnson

17 May 2013

TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.

...read moreread less

Abstract: General Strategies.- Regression Models.- Classification Models.- Other Considerations.- Appendix.- References.- Indices.

...read moreread less

3,672 citations

Cites methods from "When Networks Disagree: Ensemble Me..."

...As an alternative, several models can be created using different starting values and averaging the results of these model to produce a more stable prediction (Perrone and Cooper 1993; Ripley 1995; Tumer and Ghosh 1996)....
[...]

Journal Article•DOI•

Time series forecasting using a hybrid ARIMA and neural network model

[...]

G.Peter Zhang¹•Institutions (1)

J. Mack Robinson College of Business¹

01 Jan 2003-Neurocomputing

TL;DR: Experimental results with real data sets indicate that the combined model can be an effective way to improve forecasting accuracy achieved by either of the models used separately.

...read moreread less

3,155 citations

Cites background from "When Networks Disagree: Ensemble Me..."

...In general, it has been observed that it is more e5ective to combine individual forecasts that are based on di5erent information sets [15,31]....
[...]
...will have lower generalization variance or error [15,20,31]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175

Collapse

References

PDF

Open Access

More filters

Book•

The jackknife, the bootstrap, and other resampling plans

[...]

Bradley Efron

01 Jan 1987

TL;DR: The Delta Method and the Influence Function Cross-Validation, Jackknife and Bootstrap Balanced Repeated Replication (half-sampling) Random Subsampling Nonparametric Confidence Intervals as mentioned in this paper.

...read moreread less

Abstract: The Jackknife Estimate of Bias The Jackknife Estimate of Variance Bias of the Jackknife Variance Estimate The Bootstrap The Infinitesimal Jackknife The Delta Method and the Influence Function Cross-Validation, Jackknife and Bootstrap Balanced Repeated Replications (Half-Sampling) Random Subsampling Nonparametric Confidence Intervals.

...read moreread less

7,007 citations

"When Networks Disagree: Ensemble Me..." refers methods in this paper

...The statistical resampling techniques ofjackkni ng, bootstrapping and cross validation have proven useful for generating improved regres-sion estimates through bias reduction (Efron, 1982; Miller, 1974; Stone, 1974; Gray and Schucany,1972; H ardle, 1990; Wahba, 1990, for review)....
[...]
...In general, withthis framework we can now easily extend the statistical jackknife, bootstrap and cross validationtechniques (Efron, 1982; Miller, 1974; Stone, 1974) to nd better regression functions.4The cross-validatory hold-out set is a subset of the total data available to us and is used to…...
[...]
...In general, with this framework we can now easily extend the statistical jackknife, bootstrap and cross validation techniques (Efron, 1982; Miller, 1974; Stone, 1974) to find better regression functions....
[...]

Book•

Spline models for observational data

[...]

Grace Wahba

01 Mar 1990

TL;DR: In this paper, a theory and practice for the estimation of functions from noisy data on functionals is developed, where convergence properties, data based smoothing parameter selection, confidence intervals, and numerical methods are established which are appropriate to a number of problems within this framework.

...read moreread less

Abstract: This book serves well as an introduction into the more theoretical aspects of the use of spline models. It develops a theory and practice for the estimation of functions from noisy data on functionals. The simplest example is the estimation of a smooth curve, given noisy observations on a finite number of its values. Convergence properties, data based smoothing parameter selection, confidence intervals, and numerical methods are established which are appropriate to a number of problems within this framework. Methods for including side conditions and other prior information in solving ill posed inverse problems are provided. Data which involves samples of random variables with Gaussian, Poisson, binomial, and other distributions are treated in a unified optimization context. Experimental design questions, i.e., which functionals should be observed, are studied in a general context. Extensions to distributed parameter system identification problems are made by considering implicitly defined functionals.

...read moreread less

6,120 citations

"When Networks Disagree: Ensemble Me..." refers methods in this paper

...The statistical resampling techniques ofjackkni ng, bootstrapping and cross validation have proven useful for generating improved regres-sion estimates through bias reduction (Efron, 1982; Miller, 1974; Stone, 1974; Gray and Schucany,1972; H ardle, 1990; Wahba, 1990, for review)....
[...]
...Experimental results are provided which show that the ensemble method dramatically im-proves neural network performance on di cult real-world optical character recognition tasks.1 IntroductionHybrid or multi-neural network systems have been frequently employed to improve results in clas-si cation and regression problems (Cooper, 1991; Reilly et al., 1988; Reilly et al., 1987; Sco eldet al., 1991; Baxt, 1992; Bridle and Cox, 1991; Buntine and Weigend, 1992; Hansen and Salamon,1990; Intrator et al., 1992; Jacobs et al., 1991; Lincoln and Skrzypek, 1990; Neal, 1992a; Neal,1992b; Pearlmutter and Rosenfeld, 1991; Wolpert, 1990; Xu et al., 1992; Xu et al., 1990)....
[...]
...…show that the ensemble method dramatically im-proves neural network performance on di cult real-world optical character recognition tasks.1 IntroductionHybrid or multi-neural network systems have been frequently employed to improve results in clas-si cation and regression problems (Cooper,…...
[...]

Journal Article•DOI•

Original Contribution: Stacked generalization

[...]

David H. Wolpert

05 Feb 1992-Neural Networks

TL;DR: The conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate.

...read moreread less

5,834 citations

"When Networks Disagree: Ensemble Me..." refers background or methods in this paper

...…al., 1987; Sco eldet al., 1991; Baxt, 1992; Bridle and Cox, 1991; Buntine and Weigend, 1992; Hansen and Salamon,1990; Intrator et al., 1992; Jacobs et al., 1991; Lincoln and Skrzypek, 1990; Neal, 1992a; Neal,1992b; Pearlmutter and Rosenfeld, 1991; Wolpert, 1990; Xu et al., 1992; Xu et al., 1990)....
[...]
...A further extension is to use a nonlinear network (Jacobs et al., 1991;Reilly et al., 1987; Wolpert, 1990) to learn how to combine the networks with weights that vary overthe feature space and then to average an ensemble of such networks....
[...]
...An alternative approach (Wolpert, 1990) which avoids the potential singularities in C is to allowa perceptron to learn the appropriate averaging weights....
[...]
...Given a population of regression estimators, weconstruct a hybrid estimator which is as good or better in the MSE sense than any estimatorin the population....
[...]

Journal Article•DOI•

Adaptive mixtures of local experts

[...]

Robert A. Jacobs¹, Michael I. Jordan¹, Steven J. Nowlan², Geoffrey E. Hinton²•Institutions (2)

Massachusetts Institute of Technology¹, University of Toronto²

01 Mar 1991-Neural Computation

TL;DR: A new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases, which is demonstrated to be able to be solved by a very simple expert network.

...read moreread less

Abstract: We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new link between these two apparently different approaches. We demonstrate that the learning procedure divides up a vowel discrimination task into appropriate subtasks, each of which can be solved by a very simple expert network.

...read moreread less

4,338 citations

Journal Article•DOI•

Neural network ensembles

[...]

Lars Kai Hansen¹, Peter Salamon¹•Institutions (1)

San Diego State University¹

01 Oct 1990-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is shown that the remaining residual generalization error can be reduced by invoking ensembles of similar networks, which helps improve the performance and training of neural networks for classification.

...read moreread less

Abstract: Several means for improving the performance and training of neural networks for classification are proposed Crossvalidation is used as a tool for optimizing network parameters and architecture It is shown that the remaining residual generalization error can be reduced by invoking ensembles of similar networks >

...read moreread less

3,891 citations