Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications

doi:10.1145/2939672.2939756

Proceedings ArticleDOI

Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications

- pp 935-944

TLDR

In this article, the authors propose propensity scored loss functions for extreme multi-label learning, which prioritize predicting the few relevant labels over the large number of irrelevant ones and provide unbiased estimates of the true loss function even when ground truth labels go missing under arbitrary probabilistic label noise models.

Abstract:

The choice of the loss function is critical in extreme multi-label learning where the objective is to annotate each data point with the most relevant subset of labels from an extremely large label set. Unfortunately, existing loss functions, such as the Hamming loss, are unsuitable for learning, model selection, hyperparameter tuning and performance evaluation. This paper addresses the issue by developing propensity scored losses which: (a) prioritize predicting the few relevant labels over the large number of irrelevant ones; (b) do not erroneously treat missing labels as irrelevant but instead provide unbiased estimates of the true loss function even when ground truth labels go missing under arbitrary probabilistic label noise models; and (c) promote the accurate prediction of infrequently occurring, hard to predict, but rewarding tail labels. Another contribution is the development of algorithms which efficiently scale to extremely large datasets with up to 9 million labels, 70 million points and 2 million dimensions and which give significant improvements over the state-of-the-art. This paper's results also apply to tagging, recommendation and ranking which are the motivating applications for extreme multi-label learning. They generalize previous attempts at deriving unbiased losses under the restrictive assumption that labels go missing uniformly at random from the ground truth. Furthermore, they provide a sound theoretical justification for popular label weighting heuristics used to recommend rare items. Finally, they demonstrate that the proposed contributions align with real world applications by achieving superior clickthrough rates on sponsored search advertising in Bing.

Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications

Citations

An Introduction to Neural Information Retrieval

Binary relevance for multi-label learning: an overview

Learning Tree-based Deep Model for Recommender Systems

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification

Taming Pretrained Transformers for Extreme Multi-label Text Classification

References

The central role of the propensity score in observational studies for causal effects

LIBLINEAR: A Library for Large Linear Classification

Image-Based Recommendations on Styles and Substitutes

Hidden factors and hidden topics: understanding rating dimensions with review text

Solving the apparent diversity-accuracy dilemma of recommender systems

Related Papers (5)

FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning

Sparse local embeddings for extreme multi-label classification

Deep Learning for Extreme Multi-label Text Classification

PD-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification