Learning with Average Top-k Loss

Open AccessProceedings Article

Learning with Average Top-k Loss

- Vol. 30, pp 497-505

TLDR

In this paper, the average top-k loss was introduced as a new ensemble loss for supervised learning. But, it was shown that the average loss can lead to convex optimization problems that can be solved effectively with conventional sub-gradient based method.

Abstract:

In this work, we introduce the average top-$k$ (\atk) loss as a new ensemble loss for supervised learning. The \atk loss provides a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss. Furthermore, the \atk loss combines the advantages of them and can alleviate their corresponding drawbacks to better adapt to different data distributions. We show that the \atk loss affords an intuitive interpretation that reduces the penalty of continuous and convex individual losses on correctly classified data. The \atk loss can lead to convex optimization problems that can be solved effectively with conventional sub-gradient based method. We further study the Statistical Learning Theory of \matk by establishing its classification calibration and statistical consistency of \matk which provide useful insights on the practical choice of the parameter $k$. We demonstrate the applicability of \matk learning combined with different individual loss functions for binary and multi-class classification and regression using synthetic and real datasets.

Citations

PDF

Open Access

More filters

Posted Content

Long-tail learning via logit adjustment

Aditya Krishna Menon, +5 more

- 14 Jul 2020 -

arXiv: Learning

TL;DR: These techniques revisit the classic idea of logit adjustment based on the label frequencies, either applied post-hoc to a trained model, or enforced in the loss during training, to encourage a large relative margin between logits of rare versus dominant labels.

...read moreread less

Posted Content

Large-Scale Methods for Distributionally Robust Optimization

Daniel Levy, +3 more

- 12 Oct 2020 -

arXiv: Optimization and Control

TL;DR: This work proposes and analyzes algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets and proves that they require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications.

...read moreread less

Posted Content

When Do Curricula Work

Xiaoxia Wu, +2 more

- 05 Dec 2020 -

arXiv: Learning

TL;DR: The experiments demonstrate that curriculum, but not anti-curriculum can indeed improve the performance either with limited training time budget or in existence of noisy data, suggesting that any benefit is entirely due to the dynamic training set size.

...read moreread less

Posted Content

Coping with Label Shift via Distributionally Robust Optimisation

Jingzhao Zhang, +5 more

- 23 Oct 2020 -

arXiv: Learning

TL;DR: This paper proposes a model that minimises an objective based on distributionally robust optimisation (DRO), design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective, and establishes its convergence.

...read moreread less

Posted Content

Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization

Yan Yan, +4 more

- 13 Feb 2020 -

arXiv: Optimization and Control

TL;DR: This result is the first one that shows Epoch-GDA can achieve the optimal rate of O(1/T) for the duality gap of general SCSC min-max problems, leading to a nearly optimal complexity without resorting to smoothness or other structural conditions.

...read moreread less