scispace - formally typeset
Open AccessProceedings Article

Second-order quantile methods for experts and combinatorial games

Reads0
Chats0
TLDR
In this article, the authors consider the second-order regret bounds and quantile bounds for sequential decision making and design strategies to adjust to the difficulty of the learning problem, both in the setting of prediction with expert advice and for more general combinatorial decision tasks.
Abstract
We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem. We study this question both in the setting of prediction with expert advice, and for more general combinatorial decision tasks. We are not satisfied with just guaranteeing minimax regret rates, but we want our algorithms to perform significantly better on easy data. Two popular ways to formalize such adaptivity are second-order regret bounds and quantile bounds. The underlying notions of ‘easy data’, which may be paraphrased as “the learning problem has small variance” and “multiple decisions are useful”, are synergetic. But even though there are sophisticated algorithms that exploit one of the two, no existing algorithm is able to adapt to both. The difficulty in combining the two notions lies in tuning a parameter called the learning rate, whose optimal value behaves non-monotonically. We introduce a potential function for which (very surprisingly!) it is sufficient to simply put a prior on learning rates; an approach that does not work for any previous method. By choosing the right prior we construct efficient algorithms and show that they reap both benefits by proving the first bounds that are both second-order and incorporate quantiles.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Online learning: A comprehensive survey

TL;DR: Online learning as mentioned in this paper is a family of machine learning methods, where a learner attempts to tackle some predictive (or any type of decision-making) task by learning from a sequence of data instances one by one at each time.
Posted Content

A Modern Introduction to Online Learning.

Francesco Orabona
- 31 Dec 2019 - 
TL;DR: This monograph introduces the basic concepts of Online Learning through a modern view of Online Convex Optimization, and presents first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings.
Posted Content

More Adaptive Algorithms for Adversarial Bandits

TL;DR: The main idea of the algorithm is to apply the optimism and adaptivity techniques to the well-known Online Mirror Descent framework with a special log-barrier regularizer to come up with appropriate optimistic predictions and correction terms in this framework.
Proceedings Article

Learning in Games: Robustness of Fast Convergence

TL;DR: It is shown that learning algorithms satisfying a $\textit{low approximate regret}$ property experience fast convergence to approximate optimality in a large class of repeated games.
Posted Content

Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals

TL;DR: New deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model are presented, allowing us to analyze stopping rules based on generalized likelihood ratios for a large class of sequential identification problems, and to construct tight confidence intervals for some functions of the means of the arms.
References
More filters
Journal ArticleDOI

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.
Book

Prediction, learning, and games

TL;DR: In this paper, the authors provide a comprehensive treatment of the problem of predicting individual sequences using expert advice, a general framework within which many related problems can be cast and discussed, such as repeated game playing, adaptive data compression, sequential investment in the stock market, sequential pattern analysis, and several other problems.
Journal ArticleDOI

The weighted majority algorithm

TL;DR: A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm, which is robust in the presence of errors in the data, and is called the Weighted Majority Algorithm.
Book

Online Learning and Online Convex Optimization

TL;DR: A modern overview of online learning is provided to give the reader a sense of some of the interesting ideas and in particular to underscore the centrality of convexity in deriving efficient online learning algorithms.
Proceedings ArticleDOI

Aggregating strategies