Open AccessProceedings Article
Learning with Average Top-k Loss
Yanbo Fan,Siwei Lyu,Yiming Ying,Bao-Gang Hu +3 more
- Vol. 30, pp 497-505
TLDR
In this paper, the average top-k loss was introduced as a new ensemble loss for supervised learning. But, it was shown that the average loss can lead to convex optimization problems that can be solved effectively with conventional sub-gradient based method.Abstract:
In this work, we introduce the average top-$k$ (\atk) loss as a new ensemble loss for supervised learning. The \atk loss provides a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss. Furthermore, the \atk loss combines the advantages of them and can alleviate their corresponding drawbacks to better adapt to different data distributions. We show that the \atk loss affords an intuitive interpretation that reduces the penalty of continuous and convex individual losses on correctly classified data. The \atk loss can lead to convex optimization problems that can be solved effectively with conventional sub-gradient based method. We further study the Statistical Learning Theory of \matk by establishing its classification calibration and statistical consistency of \matk which provide useful insights on the practical choice of the parameter $k$. We demonstrate the applicability of \matk learning combined with different individual loss functions for binary and multi-class classification and regression using synthetic and real datasets.read more
Citations
More filters
Posted Content
Long-tail learning via logit adjustment
Aditya Krishna Menon,Sadeep Jayasumana,Ankit Singh Rawat,Himanshu Jain,Andreas Veit,Sanjiv Kumar +5 more
TL;DR: These techniques revisit the classic idea of logit adjustment based on the label frequencies, either applied post-hoc to a trained model, or enforced in the loss during training, to encourage a large relative margin between logits of rare versus dominant labels.
Posted Content
Large-Scale Methods for Distributionally Robust Optimization
TL;DR: This work proposes and analyzes algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets and proves that they require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications.
Posted Content
When Do Curricula Work
TL;DR: The experiments demonstrate that curriculum, but not anti-curriculum can indeed improve the performance either with limited training time budget or in existence of noisy data, suggesting that any benefit is entirely due to the dynamic training set size.
Posted Content
Coping with Label Shift via Distributionally Robust Optimisation
Jingzhao Zhang,Aditya Krishna Menon,Andreas Veit,Srinadh Bhojanapalli,Sanjiv Kumar,Suvrit Sra +5 more
TL;DR: This paper proposes a model that minimises an objective based on distributionally robust optimisation (DRO), design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective, and establishes its convergence.
Posted Content
Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization
TL;DR: This result is the first one that shows Epoch-GDA can achieve the optimal rate of O(1/T) for the duality gap of general SCSC min-max problems, leading to a nearly optimal complexity without resorting to smoothness or other structural conditions.