Abstract: Sherpa is a free open-source hyperparameter optimization library for machine learning models. It is designed for problems with computationally expensive iterative function evaluations, such as the hyperparameter tuning of deep neural networks. With Sherpa, scientists can quickly optimize hyperparameters using a variety of powerful and interchangeable algorithms. Additionally, the framework makes it easy to implement custom algorithms. Sherpa can be run on either a single machine or a cluster via a grid scheduler with minimal configuration. Finally, an interactive dashboard enables users to view the progress of models as they are trained, cancel trials, and explore which hyperparameter combinations are working best. Sherpa empowers machine learning researchers by automating the tedious aspects of model tuning and providing an extensible framework for developing automated hyperparameter-tuning strategies. Its source code and documentation are available at https://github.com/LarsHH/sherpa and https://parameter-sherpa.readthedocs.io/, respectively. A demo can be found at https://youtu.be/L95sasMLgP4. 1 Existing Hyperparameter Optimization Libraries Hyperparameter optimization algorithms for machine learning models have previously been implemented in software packages such as Spearmint [15], HyperOpt [2], Auto-Weka 2.0 [9], and Google Vizier [5] among others. Spearmint is a Python library based on Bayesian optimization using a Gaussian process. Hyperparameter exploration values are specified using the markup language YAML and run on a grid via SGE and MongoDB. Overall, it combines Bayesian optimization with the ability for distributed training. HyperOpt is a hyperparameter optimization framework that uses MongoDB to allow parallel computation. The user manually starts workers which receive tasks from the HyperOpt instance. It offers the use of Random Search and Bayesian optimization based on a Tree of Parzen Estimators. Auto-WEKA 2.0 implements the SMAC [6] algorithm for automatic model selection and hyperparameter optimization within the WEKA machine learning framework. It provides a graphical user interface and supports parallel runs on a single machine. It is meant to be accessible for novice users and specifically targets the problem of choosing a model. Auto-WEKA is related to Auto-Sklearn [4] and Auto-Net [11] which specifically focus on tuning Scikit-Learn models and fully-connected 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada. Table 1: Comparison to Existing Libraries Spearmint Auto-WEKA HyperOpt Google Vizier Sherpa Early Stopping No No No Yes Yes Dashboard/GUI Yes Yes No Yes Yes Distributed Yes No Yes Yes Yes Open Source Yes Yes Yes No Yes # of Algorithms 2 1 2 3 5 neural networks in Lasagne, respectively. Auto-WEKA, Auto-Sklearn, and Auto-Net focus on an end-to-end automatic approach. This makes it easy for novice users, but restricts the user to the respective machine learning library and the models it implements. In contrast our work aims to give the user more flexibility over library, model and hyper-parameter optimization algorithm selection. Google Vizier is a service provided by Google for its cloud machine learning platform. It incorporates recent innovation in Bayesian optimization such as transfer learning and provides visualizations via a dashboard. Google Vizier provides many key features of a current hyperparameter optimization tool to Google Cloud users and Google engineers, but is not available in an open source version. A similar situation occurs with other cloud based platforms like Microsoft Azure Hyperparameter Tuning 1 and Amazon SageMaker’s Hyperparameter Optimization 2. 2 Need for a new library The field of machine learning has experienced massive growth over recent years. Access to open source machine learning libraries such as Scikit-Learn [14], Keras [3], Tensorflow [1], PyTorch [13], and Caffe [8] allowed research in machine learning to be widely reproduced by the community making it easy for practitioners to apply state of the art methods to real world problems. The field of hyperparameter optimization for machine learning has also seen many innovations recently such as Hyperband [10], Population Based Training [7], Neural Architecture Search [17], and innovation in Bayesian optimization such as [16]. While the basic implementation of some of these algorithms can be trivial, evaluating trials in a distributed fashion and keeping track of results becomes cumbersome which makes it difficult for users to apply these algorithms to real problems. In short, Sherpa aims to curate implementations of these algorithms while providing infrastructure to run these in a distributed way. The aim is for the platform to be scalable from usage on a laptop to a computation grid.