scispace - formally typeset
Proceedings ArticleDOI

Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications

TLDR
In this article, the authors propose propensity scored loss functions for extreme multi-label learning, which prioritize predicting the few relevant labels over the large number of irrelevant ones and provide unbiased estimates of the true loss function even when ground truth labels go missing under arbitrary probabilistic label noise models.
Abstract
The choice of the loss function is critical in extreme multi-label learning where the objective is to annotate each data point with the most relevant subset of labels from an extremely large label set. Unfortunately, existing loss functions, such as the Hamming loss, are unsuitable for learning, model selection, hyperparameter tuning and performance evaluation. This paper addresses the issue by developing propensity scored losses which: (a) prioritize predicting the few relevant labels over the large number of irrelevant ones; (b) do not erroneously treat missing labels as irrelevant but instead provide unbiased estimates of the true loss function even when ground truth labels go missing under arbitrary probabilistic label noise models; and (c) promote the accurate prediction of infrequently occurring, hard to predict, but rewarding tail labels. Another contribution is the development of algorithms which efficiently scale to extremely large datasets with up to 9 million labels, 70 million points and 2 million dimensions and which give significant improvements over the state-of-the-art. This paper's results also apply to tagging, recommendation and ranking which are the motivating applications for extreme multi-label learning. They generalize previous attempts at deriving unbiased losses under the restrictive assumption that labels go missing uniformly at random from the ground truth. Furthermore, they provide a sound theoretical justification for popular label weighting heuristics used to recommend rare items. Finally, they demonstrate that the proposed contributions align with real world applications by achieving superior clickthrough rates on sponsored search advertising in Bing.

read more

Citations
More filters
Book

An Introduction to Neural Information Retrieval

TL;DR: The monograph provides a complete picture of neural information retrieval techniques that culminate in supervised neural learning to rank models including deep neural network architectures that are trained end-to-end for ranking tasks.
Journal ArticleDOI

Binary relevance for multi-label learning: an overview

TL;DR: This paper aims to review the state of the art of binary relevance from three perspectives, and some of the recent studies on binary relevance aimed at issues other than label correlation exploitation are introduced.
Proceedings ArticleDOI

Learning Tree-based Deep Model for Recommender Systems

TL;DR: Wang et al. as mentioned in this paper proposed a novel tree-based method which can provide logarithmic complexity w.r.t. corpus size even with more expressive models such as deep neural networks.
Proceedings ArticleDOI

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification

TL;DR: DiSMEC as discussed by the authors is a large-scale distributed framework for learning one-versusrest linear classifiers coupled with explicit capacity control to control model size, which can learn classifiers for datasets consisting hundreds of thousands labels within few hours.
Proceedings ArticleDOI

Taming Pretrained Transformers for Extreme Multi-label Text Classification

TL;DR: X-Transformer is proposed, the first scalable approach to fine-tuning deep transformer models for the XMC problem and achieves new state-of-the-art results on four XMC benchmark datasets.
References
More filters
Journal ArticleDOI

The central role of the propensity score in observational studies for causal effects

Paul R. Rosenbaum, +1 more
- 01 Apr 1983 - 
TL;DR: The authors discusses the central role of propensity scores and balancing scores in the analysis of observational studies and shows that adjustment for the scalar propensity score is sufficient to remove bias due to all observed covariates.
Journal Article

LIBLINEAR: A Library for Large Linear Classification

TL;DR: LIBLINEAR is an open source library for large-scale linear classification that supports logistic regression and linear support vector machines and provides easy-to-use command-line tools and library calls for users and developers.
Proceedings ArticleDOI

Image-Based Recommendations on Styles and Substitutes

TL;DR: The approach is not based on fine-grained modeling of user annotations but rather on capturing the largest dataset possible and developing a scalable method for uncovering human notions of the visual relationships within.
Proceedings ArticleDOI

Hidden factors and hidden topics: understanding rating dimensions with review text

TL;DR: This paper aims to combine latent rating dimensions (such as those of latent-factor recommender systems) with latent review topics ( such as those learned by topic models like LDA), which more accurately predicts product ratings by harnessing the information present in review text.
Journal ArticleDOI

Solving the apparent diversity-accuracy dilemma of recommender systems

TL;DR: This paper introduces a new algorithm specifically to address the challenge of diversity and shows how it can be used to resolve this apparent dilemma when combined in an elegant hybrid with an accuracy-focused algorithm.
Related Papers (5)