scispace - formally typeset
Search or ask a question

When Networks Disagree: Ensemble Methods for Hybrid Neural Networks

TL;DR: Experimental results show that the ensemble method dramatically improves neural network performance on difficult real-world optical character recognition tasks.
Abstract: : This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, the authors construct a hybrid estimator that is as good or better in the mean square error sense than any estimator in the population. They argue that the ensemble method presented has several properties: (1) it efficiently uses all the networks of a population -- none of the networks need to be discarded; (2) it efficiently uses all of the available data for training without over-fitting; (3) it inherently performs regularization by smoothing in functional space, which helps to avoid over-fitting; (4) it utilizes local minima to construct improved estimates whereas other neural network algorithms are hindered by local minima; (5) it is ideally suited for parallel computation; (6) it leads to a very useful and natural measure of the number of distinct estimators in a population; and (7) the optimal parameters of the ensemble estimator are given in closed form. Experimental results show that the ensemble method dramatically improves neural network performance on difficult real-world optical character recognition tasks.

Content maybe subject to copyright    Report

Citations
More filters
Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations


Cites background from "When Networks Disagree: Ensemble Me..."

  • ...These drawbacks can be overcome by combining the networks together to form a committee (Perrone and Cooper, 1993; Perrone, 1994)....

    [...]

Journal ArticleDOI
TL;DR: The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Abstract: The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

6,527 citations

Book
01 Jan 1996
TL;DR: Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks in this self-contained account.
Abstract: From the Publisher: Pattern recognition has long been studied in relation to many different (and mainly unrelated) applications, such as remote sensing, computer vision, space research, and medical imaging. In this book Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks. Unifying principles are brought to the fore, and the author gives an overview of the state of the subject. Many examples are included to illustrate real problems in pattern recognition and how to overcome them.This is a self-contained account, ideal both as an introduction for non-specialists readers, and also as a handbook for the more expert reader.

5,632 citations

Book
17 May 2013
TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.
Abstract: General Strategies.- Regression Models.- Classification Models.- Other Considerations.- Appendix.- References.- Indices.

3,672 citations


Cites methods from "When Networks Disagree: Ensemble Me..."

  • ...As an alternative, several models can be created using different starting values and averaging the results of these model to produce a more stable prediction (Perrone and Cooper 1993; Ripley 1995; Tumer and Ghosh 1996)....

    [...]

Journal ArticleDOI
TL;DR: Experimental results with real data sets indicate that the combined model can be an effective way to improve forecasting accuracy achieved by either of the models used separately.

3,155 citations


Cites background from "When Networks Disagree: Ensemble Me..."

  • ...In general, it has been observed that it is more e5ective to combine individual forecasts that are based on di5erent information sets [15,31]....

    [...]

  • ...will have lower generalization variance or error [15,20,31]....

    [...]

References
More filters
Book
01 Jan 1987
TL;DR: The Delta Method and the Influence Function Cross-Validation, Jackknife and Bootstrap Balanced Repeated Replication (half-sampling) Random Subsampling Nonparametric Confidence Intervals as mentioned in this paper.
Abstract: The Jackknife Estimate of Bias The Jackknife Estimate of Variance Bias of the Jackknife Variance Estimate The Bootstrap The Infinitesimal Jackknife The Delta Method and the Influence Function Cross-Validation, Jackknife and Bootstrap Balanced Repeated Replications (Half-Sampling) Random Subsampling Nonparametric Confidence Intervals.

7,007 citations


"When Networks Disagree: Ensemble Me..." refers methods in this paper

  • ...The statistical resampling techniques ofjackkni ng, bootstrapping and cross validation have proven useful for generating improved regres-sion estimates through bias reduction (Efron, 1982; Miller, 1974; Stone, 1974; Gray and Schucany,1972; H ardle, 1990; Wahba, 1990, for review)....

    [...]

  • ...In general, withthis framework we can now easily extend the statistical jackknife, bootstrap and cross validationtechniques (Efron, 1982; Miller, 1974; Stone, 1974) to nd better regression functions.4The cross-validatory hold-out set is a subset of the total data available to us and is used to…...

    [...]

  • ...In general, with this framework we can now easily extend the statistical jackknife, bootstrap and cross validation techniques (Efron, 1982; Miller, 1974; Stone, 1974) to find better regression functions....

    [...]

Book
01 Mar 1990
TL;DR: In this paper, a theory and practice for the estimation of functions from noisy data on functionals is developed, where convergence properties, data based smoothing parameter selection, confidence intervals, and numerical methods are established which are appropriate to a number of problems within this framework.
Abstract: This book serves well as an introduction into the more theoretical aspects of the use of spline models. It develops a theory and practice for the estimation of functions from noisy data on functionals. The simplest example is the estimation of a smooth curve, given noisy observations on a finite number of its values. Convergence properties, data based smoothing parameter selection, confidence intervals, and numerical methods are established which are appropriate to a number of problems within this framework. Methods for including side conditions and other prior information in solving ill posed inverse problems are provided. Data which involves samples of random variables with Gaussian, Poisson, binomial, and other distributions are treated in a unified optimization context. Experimental design questions, i.e., which functionals should be observed, are studied in a general context. Extensions to distributed parameter system identification problems are made by considering implicitly defined functionals.

6,120 citations


"When Networks Disagree: Ensemble Me..." refers methods in this paper

  • ...The statistical resampling techniques ofjackkni ng, bootstrapping and cross validation have proven useful for generating improved regres-sion estimates through bias reduction (Efron, 1982; Miller, 1974; Stone, 1974; Gray and Schucany,1972; H ardle, 1990; Wahba, 1990, for review)....

    [...]

  • ...Experimental results are provided which show that the ensemble method dramatically im-proves neural network performance on di cult real-world optical character recognition tasks.1 IntroductionHybrid or multi-neural network systems have been frequently employed to improve results in clas-si cation and regression problems (Cooper, 1991; Reilly et al., 1988; Reilly et al., 1987; Sco eldet al., 1991; Baxt, 1992; Bridle and Cox, 1991; Buntine and Weigend, 1992; Hansen and Salamon,1990; Intrator et al., 1992; Jacobs et al., 1991; Lincoln and Skrzypek, 1990; Neal, 1992a; Neal,1992b; Pearlmutter and Rosenfeld, 1991; Wolpert, 1990; Xu et al., 1992; Xu et al., 1990)....

    [...]

  • ...…show that the ensemble method dramatically im-proves neural network performance on di cult real-world optical character recognition tasks.1 IntroductionHybrid or multi-neural network systems have been frequently employed to improve results in clas-si cation and regression problems (Cooper,…...

    [...]

Journal ArticleDOI
TL;DR: The conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate.

5,834 citations


"When Networks Disagree: Ensemble Me..." refers background or methods in this paper

  • ...…al., 1987; Sco eldet al., 1991; Baxt, 1992; Bridle and Cox, 1991; Buntine and Weigend, 1992; Hansen and Salamon,1990; Intrator et al., 1992; Jacobs et al., 1991; Lincoln and Skrzypek, 1990; Neal, 1992a; Neal,1992b; Pearlmutter and Rosenfeld, 1991; Wolpert, 1990; Xu et al., 1992; Xu et al., 1990)....

    [...]

  • ...A further extension is to use a nonlinear network (Jacobs et al., 1991;Reilly et al., 1987; Wolpert, 1990) to learn how to combine the networks with weights that vary overthe feature space and then to average an ensemble of such networks....

    [...]

  • ...An alternative approach (Wolpert, 1990) which avoids the potential singularities in C is to allowa perceptron to learn the appropriate averaging weights....

    [...]

  • ...Given a population of regression estimators, weconstruct a hybrid estimator which is as good or better in the MSE sense than any estimatorin the population....

    [...]

Journal ArticleDOI
TL;DR: A new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases, which is demonstrated to be able to be solved by a very simple expert network.
Abstract: We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new link between these two apparently different approaches. We demonstrate that the learning procedure divides up a vowel discrimination task into appropriate subtasks, each of which can be solved by a very simple expert network.

4,338 citations

Journal ArticleDOI
TL;DR: It is shown that the remaining residual generalization error can be reduced by invoking ensembles of similar networks, which helps improve the performance and training of neural networks for classification.
Abstract: Several means for improving the performance and training of neural networks for classification are proposed Crossvalidation is used as a tool for optimizing network parameters and architecture It is shown that the remaining residual generalization error can be reduced by invoking ensembles of similar networks >

3,891 citations