scispace - formally typeset
Search or ask a question
Topic

Overfitting

About: Overfitting is a research topic. Over the lifetime, 11696 publications have been published within this topic receiving 441877 citations.


Papers
More filters
Book ChapterDOI
21 Jun 2000
TL;DR: Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.
Abstract: Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.

5,679 citations

Posted Content
TL;DR: The authors argue that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, which is supported by new quantitative results while giving the first explanation of the most intriguing fact about adversarial examples: their generalization across architectures and training sets.
Abstract: Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset.

4,967 citations

Journal ArticleDOI
TL;DR: Developments that reduce the computational costs of the underlying maximum a posteriori (MAP) algorithm, as well as statistical considerations that yield new insights into the accuracy with which the relative orientations of individual particles may be determined are described.

4,554 citations

Posted Content
TL;DR: With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.
Abstract: We propose a novel deep network structure called "Network In Network" (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the state-of-the-art classification performances with NIN on CIFAR-10 and CIFAR-100, and reasonable performances on SVHN and MNIST datasets.

3,905 citations

Journal ArticleDOI
30 Jan 1992-Nature
TL;DR: In this article, a statistical quantity (RfreeT) is defined to measure the agreement between observed and computed structure factor amplitudes for a 'test' set of reflections that is omitted in the modelling and refinement process.
Abstract: THE determination of macromolecular structure by crystallography involves fitting atomic models to the observed diffraction data1. The traditional measure of the quality of this fit, and presumably the accuracy of the model, is theR value. Despite stereochemical restraints2, it is possible to overfit or 'misfit' the diffraction data: an incorrect model can be refined to fairly good R values as several recent examples have shown3. Here I propose a reliable and unbiased indicator of the accuracy of such models. By analogy with the cross-validation method4,5 of testing statistical models I define a statistical quantity (RfreeT) that measures the agreement between observed and computed structure factor amplitudes for a 'test' set of reflections that is omitted in the modelling and refinement process. As examples show, there is a high correlation between RfreeT and the accuracy of the atomic model phases. This is useful because experimental phase information is usually inaccurate, incomplete or unavailable. I expect that RfreeT will provide a measure of the information content of recently proposed models of thermal motion and disorder6–8, time-averaging9 and bulk solvent10.

3,714 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
92% related
Support vector machine
73.6K papers, 1.7M citations
92% related
Convolutional neural network
74.7K papers, 2M citations
91% related
Artificial neural network
207K papers, 4.5M citations
91% related
Feature extraction
111.8K papers, 2.1M citations
89% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20232,065
20223,968
20212,035
20201,973
20191,503
20181,120