Open AccessProceedings Article
How Many Samples are Needed to Learn a Convolutional Neural Network
Simon S. Du,Yining Wang,Xiyu Zhai,Sivaraman Balakrishnan,Ruslan Salakhutdinov,Aarti Singh +5 more
- Vol. 31, pp 371-381
TLDR
The study of rigorously characterizing the sample complexity of estimating CNNs is initiated, showing that for an $m$-dimensional convolutional filter with linear activation acting on a d-dimensional input, the samplecomplexity of achieving population prediction error of $\epsilon$ is $\widetilde{O(m/\ep silon^2)$, whereas the sample-complexity for its FNN counterpart is lower bounded by $\Omega(d/\EpsilonAbstract:
A widespread folklore for explaining the success of convolutional neural network (CNN) is that CNN is a more compact representation than the fully connected neural network (FNN) and thus requires fewer samples for learning. We initiate the study of rigorously characterizing the sample complexity of learning convolutional neural networks. We show that for learning an m-dimensional convolutional filter with linear activation acting on a d-dimensional input, the sample complexity of achieving population prediction error of ϵ is
˜
O
(m/ϵ2) whereas its FNN counterpart needs at least Ω(d/ϵ2) samples. Since m≪d, this result demonstrates the advantage of using CNN. We further consider the sample complexity of learning a one-hidden-layer CNN with linear activation where both the m-dimensional convolutional filter and the r-dimensional output weights are unknown. For this model, we show the sample complexity is
˜
O
((m+r)/ϵ2) when the ratio between the stride size and the filter size is a constant. For both models, we also present lower bounds showing our sample complexities are tight up to logarithmic factors. Our main tools for deriving these results are localized empirical process and a new lemma characterizing the convolutional structure. We believe these tools may inspire further developments in understanding CNN.read more
Citations
More filters
Proceedings Article
Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks
TL;DR: In this paper, a simple 2-layer ReLU network with random initialization is analyzed and generalization bound independent of network size is shown to be robust to the size of the network.
Posted Content
Generalization bounds for deep convolutional neural networks
Philip M. Long,Hanie Sedghi +1 more
TL;DR: Borders on the generalization error of convolutional networks are proved in terms of the training loss, the number of parameters, the Lipschitz constant of the loss and the distance from the weights to the initial weights.
Journal ArticleDOI
Using convolutional neural network for predicting cyanobacteria concentrations in river water
TL;DR: This study successfully demonstrated the capability of the CNN model for cyanobacterial bloom prediction using high temporal frequency images and characterized its performance variations across the studied river reach.
Posted Content
Size-free generalization bounds for convolutional neural networks
Philip M. Long,Hanie Sedghi +1 more
TL;DR: In this article, the authors prove bounds on the generalization error of convolutional networks in terms of the training loss, the number of parameters, the Lipschitz constant of the loss and the distance from the weights to the initialweights.
Posted Content
Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?
TL;DR: This work describes a natural task on which a provable sample complexity gap can be shown, for standard training algorithms, and demonstrates a single target function, learning which on all possible distributions leads to an $O(1)$ vs $Omega(d^2/\varepsilon)$ gap.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Journal ArticleDOI
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more
TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Book ChapterDOI
Probability Inequalities for sums of Bounded Random Variables
TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.