Random search for hyper-parameter optimization

Open AccessJournal Article

Random search for hyper-parameter optimization

James Bergstra, +1 more

- 01 Mar 2012 -

Journal of Machine Learning Research

- Vol. 13, Iss: 1, pp 281-305

TLDR

This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid, and shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper- parameter optimization algorithms.

Abstract:

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

Random search for hyper-parameter optimization

Citations

Squeeze-and-Excitation Networks

Representation Learning: A Review and New Perspectives

Empirical evaluation of gated recurrent neural networks on sequence modeling

MobileNetV2: Inverted Residuals and Linear Bottlenecks

MobileNetV2: Inverted Residuals and Linear Bottlenecks

References

Gradient-based learning applied to document recognition

Optimization by Simulated Annealing

LIBSVM: A library for support vector machines

A simplex method for function minimization

Neural networks for pattern recognition

Related Papers (5)

Scikit-learn: Machine Learning in Python

Random Forests

Deep Residual Learning for Image Recognition

Long short-term memory

Deep learning

Trending Questions (2)