W
Wei Chen
Researcher at Microsoft
Publications - 226
Citations - 14625
Wei Chen is an academic researcher from Microsoft. The author has contributed to research in topics: Maximization & Greedy algorithm. The author has an hindex of 47, co-authored 226 publications receiving 12843 citations. Previous affiliations of Wei Chen include University of British Columbia & Stony Brook University.
Papers
More filters
Journal Article
Combinatorial multi-armed bandit and its extension to probabilistically triggered arms
TL;DR: In this article, the authors define a general framework for combinatorial multi-armed bandit (CMAB) problems, where subsets of base arms with unknown distributions form super arms and the reward of the super arm depends on the outcomes of all played arms.
Proceedings Article
Combinatorial Pure Exploration of Multi-Armed Bandits
TL;DR: This paper presents general learning algorithms which work for all decision classes that admit offline maximization oracles in both fixed confidence and fixed budget settings and establishes a general problem-dependent lower bound for the CPE problem.
Proceedings Article
Combinatorial multi-armed bandit: general framework, results and applications
Wei Chen,Yajun Wang,Yang Yuan +2 more
TL;DR: The regret analysis is tight in that it matches the bound for classical MAB problem up to a constant factor, and it significantly improves the regret bound in a recent paper on combinatorial bandits with linear rewards.
Book ChapterDOI
Dual supervised learning
TL;DR: Dual supervised learning as discussed by the authors proposes to train the models of two dual tasks simultaneously, and explicitly exploit the probabilistic correlation between them to regularize the training process, which can improve the practical performances of both tasks.
Posted Content
Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms
TL;DR: The regret analysis is tight in that it matches the bound of UCB1 algorithm (up to a constant factor) for the classical MAB problem, and it significantly improves the regret bound in an earlier paper on combinatorial bandits with linear rewards.