Open AccessProceedings Article
Neural Network Ensembles, Cross Validation, and Active Learning
Anders Krogh,Jesper Vedelsby +1 more
- Vol. 7, pp 231-238
TLDR
It is shown how to estimate the optimal weights of the ensemble members using unlabeled data and how the ambiguity can be used to select new training data to be labeled in an active learning scheme.Abstract:
Learning of continuous valued functions using neural network ensembles (committees) can give improved accuracy, reliable estimation of the generalization error, and active learning. The ambiguity is defined as the variation of the output of ensemble members averaged over unlabeled data, so it quantifies the disagreement among the networks. It is discussed how to use the ambiguity in combination with cross-validation to give a reliable estimate of the ensemble generalization error, and how this type of ensemble cross-validation can sometimes improve performance. It is shown how to estimate the optimal weights of the ensemble members using unlabeled data. By a generalization of query by committee, it is finally shown how the ambiguity can be used to select new training data to be labeled in an active learning scheme.read more
Citations
More filters
Book
Neural networks for pattern recognition
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Journal ArticleDOI
Wrappers for feature subset selection
Ron Kohavi,George H. John +1 more
TL;DR: The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain and compares the wrapper approach to induction without feature subset selection and to Relief, a filter approach tofeature subset selection.
Journal ArticleDOI
Statistical pattern recognition: a review
TL;DR: The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Journal ArticleDOI
On combining classifiers
TL;DR: A common theoretical framework for combining classifiers which use distinct pattern representations is developed and it is shown that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision.
Book
Pattern recognition and neural networks
Brian D. Ripley,N. L. Hjort +1 more
TL;DR: Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks in this self-contained account.
References
More filters
Journal ArticleDOI
Original Contribution: Stacked generalization
TL;DR: The conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate.
Journal ArticleDOI
Neural network ensembles
Lars Kai Hansen,Peter Salamon +1 more
TL;DR: It is shown that the remaining residual generalization error can be reduced by invoking ensembles of similar networks, which helps improve the performance and training of neural networks for classification.
Journal ArticleDOI
Neural networks and the bias/variance dilemma
TL;DR: It is suggested that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallel-versus-serial hardware or other implementation issues.
Proceedings ArticleDOI
Query by committee
TL;DR: It is suggested that asymptotically finite information gain may be an important characteristic of good query algorithms, in which a committee of students is trained on the same data set.
Proceedings Article
Information, Prediction, and Query by Committee
TL;DR: It is shown that if the two-member committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries, and this exponential decrease holds for query learning of thresholded smooth functions.