How to use expert advice

doi:10.1145/167088.167198

Proceedings ArticleDOI

How to use expert advice

Nicolò Cesa-Bianchi, +5 more

- pp 382-391

Chats0

TLDR

This work analyzes algorithms that predict a binary value by combining the predictions of several prediction strategies, called `experts', and shows how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context.

Abstract:

We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called `experts''. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also extend our analysis to the case in which log loss is used instead of the expected number of mistakes.

How to use expert advice

Citations

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Pattern recognition and neural networks

Selection of relevant features and examples in machine

Selection of relevant features and examples in machine learning

The Nonstochastic Multiarmed Bandit Problem

References

Neural Networks: A Comprehensive Foundation

Paper: Modeling by shortest data description

A theory of the learnable

Estimation of Dependences Based on Empirical Data

The weighted majority algorithm

Related Papers (5)

The weighted majority algorithm

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Exponentiated gradient versus gradient descent for linear predictors

A theory of the learnable