Big Data applications are typically associated with systems involving large numbers of users, massive complex software systems, and large-scale heterogeneous computing and storage architectures. The construction of such systems involves many distributed design choices. The end products (e.g., recommendation systems, medical analysis tools, real-time game engines, speech recognizers) thus involve many tunable configuration parameters. These parameters are often specified and hard-coded into the software by various developers or teams. If optimized jointly, these parameters can result in significant improvements. Bayesian optimization is a powerful tool for the joint optimization of design choices that is gaining great popularity in recent years. It promises greater automation so as to increase both product quality and human productivity. This review paper introduces Bayesian optimization, highlights some of its methodological aspects, and showcases a wide range of applications.

/pdf/taking-the-human-out-of-the-loop-a-review-of-bayesian-4olcznoudw.pdf

Taking the Human Out of the Loop: A Review of Bayesian Optimization

A multi-armed bandit problem - or, simply, a bandit problem - is a sequential allocation problem defined by a set of actions. At each time step, a unit resource is allocated to an action and some observable payoff is obtained. The goal is to maximize the total payoff obtained in a sequence of allocations. The name bandit refers to the colloquial term for a slot machine (a "one-armed bandit" in American slang). In a casino, a sequential allocation problem is obtained when the player is facing many slot machines at once (a "multi-armed bandit"), and must repeatedly choose where to insert the next coin. Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the 1930s, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this book, the focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, it also analyzes some of the most important variants and extensions, such as the contextual bandit model. This monograph is an ideal reference for students and researchers with an interest in bandit problems.

Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems

The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published and on agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.

/pdf/privacy-preserving-data-publishing-a-survey-of-recent-4o22vc981u.pdf

Privacy-preserving data publishing: A survey of recent developments

We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems, and how this weight uncertainty can be used to drive the exploration-exploitation trade-off in reinforcement learning.

Weight Uncertainty in Neural Networks

/pdf/weight-uncertainty-in-neural-network-3viy2x7fhw.pdf

Weight Uncertainty in Neural Network

The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W. R. Thompson, dates back to 1933. This algorithm, referred to as Thompson Sampling, is a natural Bayesian algorithm. The basic idea is to choose an arm to play according to its probability of being the best arm. Thompson Sampling algorithm has experimentally been shown to be close to optimal. In addition, it is efficient to implement and exhibits several desirable properties such as small regret for delayed feedback. However, theoretical understanding of this algorithm was quite limited. In this paper, for the first time, we show that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multi-armed bandit problem. More precisely, for the stochastic two-armed bandit problem, the expected regret in timeT isO( lnT + 1 3 ). And, for the stochasticN -armed bandit problem, the expected regret in time T is O( h ( P N 1 2 ) 2 i lnT ). Our bounds are optimal but for the dependence on i and the

/pdf/analysis-of-thompson-sampling-for-the-multi-armed-bandit-2zbnp2qyn5.pdf

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state-of-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we design and analyze a generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. This is among the most important and widely studied version of the contextual bandits problem. We prove a high probability regret bound of O(d2/e√T1+e) in time T for any 0 < e < 1, where d is the dimension of each context vector and e is a parameter used by the algorithm. Our results provide the first theoretical guarantees for the contextual version of Thompson Sampling, and are close to the lower bound of Ω(d√T) for this problem. This essentially solves a COLT open problem of Chapelle and Li [COLT 2012].

/pdf/thompson-sampling-for-contextual-bandits-with-linear-payoffs-1u65b6udmm.pdf

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have comparable or better empirical performance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that proves the first near-optimal problem-independent bound of O( √ NT lnT ) on the expected regret of this algorithm. Our novel martingale-based analysis techniques are conceptually simple, and easily extend to distributions other than the Beta distribution. For the version of Thompson Sampling that uses Gaussian priors, we prove a problem-independent bound of O( √ NT lnN) on the expected regret, and demonstrate the optimality of this bound by providing a matching lower bound. This lower bound of Ω( √ NT lnN) is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of O( √ NT ) for the multi-armed bandit problem. Our near-optimal problem-independent bounds for Thompson Sampling solve a COLT 2012 open problem of Chapelle and Li. Additionally, our techniques simultaneously provide the optimal problem-dependent bound of (1+ ǫ) ∑ i lnT d(μi,μ1) +O(Nǫ2 ) on the expected regret. The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [2012b]. Appearing in Proceedings of the 16 International Conference on Artificial Intelligence and Statistics (AISTATS) 2013, Scottsdale, AZ, USA. Volume 31 of JMLR: W&CP 31. Copyright 2013 by the authors.

/pdf/further-optimal-regret-bounds-for-thompson-sampling-1bpja2rtxz.pdf

Further Optimal Regret Bounds for Thompson Sampling

A natural optimization model that formulates many online resource allocation problems is the online linear programming LP problem in which the constraint matrix is revealed column by column along with the corresponding objective coefficient. In such a model, a decision variable has to be set each time a column is revealed without observing the future inputs, and the goal is to maximize the overall objective function. In this paper, we propose a near-optimal algorithm for this general class of online problems under the assumptions of random order of arrival and some mild conditions on the size of the LP right-hand-side input. Specifically, our learning-based algorithm works by dynamically updating a threshold price vector at geometric time intervals, where the dual prices learned from the revealed columns in the previous period are used to determine the sequential decisions in the current period. Through dynamic learning, the competitiveness of our algorithm improves over the past study of the same problem. We also present a worst case example showing that the performance of our algorithm is near optimal.

/pdf/a-dynamic-near-optimal-algorithm-for-online-linear-2bfi2mpktc.pdf

A Dynamic Near-Optimal Algorithm for Online Linear Programming

The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W. R. Thompson, dates back to 1933. This algorithm, referred to as Thompson Sampling, is a natural Bayesian algorithm. The basic idea is to choose an arm to play according to its probability of being the best arm. Thompson Sampling algorithm has experimentally been shown to be close to optimal. In addition, it is efficient to implement and exhibits several desirable properties such as small regret for delayed feedback. However, theoretical understanding of this algorithm was quite limited. In this paper, for the first time, we show that Thompson Sampling algorithm achieves logarithmic expected regret for the multi-armed bandit problem. More precisely, for the two-armed bandit problem, the expected regret in time $T$ is $O(\frac{\ln T}{\Delta} + \frac{1}{\Delta^3})$. And, for the $N$-armed bandit problem, the expected regret in time $T$ is $O([(\sum_{i=2}^N \frac{1}{\Delta_i^2})^2] \ln T)$. Our bounds are optimal but for the dependence on $\Delta_i$ and the constant factors in big-Oh.

Shipra Agrawal

Papers

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

Thompson Sampling for Contextual Bandits with Linear Payoffs

Further Optimal Regret Bounds for Thompson Sampling

A Dynamic Near-Optimal Algorithm for Online Linear Programming

Analysis of Thompson Sampling for the multi-armed bandit problem