scispace - formally typeset
M

Michal Valko

Researcher at École Normale Supérieure

Publications -  171
Citations -  6266

Michal Valko is an academic researcher from École Normale Supérieure. The author has contributed to research in topics: Regret & Reinforcement learning. The author has an hindex of 26, co-authored 169 publications receiving 3088 citations. Previous affiliations of Michal Valko include University of Pittsburgh & French Institute for Research in Computer Science and Automation.

Papers
More filters
Posted Content

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

TL;DR: This work introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning that performs on par or better than the current state of the art on both transfer and semi- supervised benchmarks.
Proceedings Article

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

TL;DR: In this article, the authors investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS), and justify its use for fixed-confidence best-arm identification.
Posted Content

Finite-Time Analysis of Kernelised Contextual Bandits

TL;DR: This work proposes KernelUCB, a kernelised UCB algorithm, and gives a cumulative regret bound through a frequentist analysis and improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function.
Journal ArticleDOI

Outlier detection for patient monitoring and alerting

TL;DR: The hypothesis is that a patient-management decision that is unusual with respect to past patient care may be due to an error and that it is worthwhile to generate an alert if such a decision is encountered, and that the outlier-based alerting can lead to promising true alert rates.
Proceedings Article

Efficient learning by implicit exploration in bandit problems with side observations

TL;DR: This work proposes the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions and defines a new partial information setting that models online combinatorial optimization problems where the feedback received by the learner is between semi-bandit and full feedback.