Accelerating Deep Learning with Millions of Classes

doi:10.1007/978-3-030-58592-1_42

Book ChapterDOI

Accelerating Deep Learning with Millions of Classes

- pp 711-726

TLDR

This work proposes an training framework to handle extreme classification tasks based on Random Projection and demonstrates that the proposed framework is able to train deep learning models with millions of classes and achieve above 10⇥ speedup compared to existing approaches.

Abstract:

Deep learning has achieved remarkable success in many classification tasks because of its great power of representation learning for complex data. However, it remains challenging when extending to classification tasks with millions of classes. Previous studies are focused on solving this problem in a distributed fashion or using a sampling-based approach to reduce the computational cost caused by the softmax layer. However, these approaches still need high GPU memory in order to work with large models and it is non-trivial to extend them to parallel settings. To address these issues, we propose an efficient training framework to handle extreme classification tasks based on Random Projection. The key idea is that we first train a slimmed model with a random projected softmax classifier and then we recover it to the original classifier. We also show a theoretical guarantee that this recovered classifier can approximate the original classifier with a small error. Later, we extend our framework to parallel settings by adopting a communication reduction technique. In our experiments, we demonstrate that the proposed framework is able to train deep learning models with millions of classes and achieve above \(10{\times }\) speedup compared to existing approaches.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

Kunal Dahiya, +8 more

TL;DR: DeepXML as discussed by the authors decomposes the deep extreme multi-label task into four simpler sub-tasks, each of which can be trained accurately and efficiently, and chooses different components for each sub-task to generate algorithms with varying trade-offs between accuracy and scalability.

...read moreread less

Posted Content

Federated Deep AUC Maximization for Heterogeneous Data with a Constant Communication Complexity

Zhuoning Yuan, +4 more

- 09 Feb 2021 -

arXiv: Learning

TL;DR: Improved FDAM algorithms for heterogeneous data by solving the popular non-convex strongly-concave min-max formulation of DAM in a distributed fashion are proposed and shown to be effective on benchmark datasets, and on medical chest X-ray images from different organizations.

...read moreread less

Proceedings ArticleDOI

Massively Scaling Heteroscedastic Classifiers

Mark Patrick Collier, +5 more

TL;DR: This article proposed HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes, which can be viewed as a 3.5 billion class classification problem.

...read moreread less

Journal ArticleDOI

A Survey on Extreme Multi-label Learning

Tong Wei, +4 more

- 08 Oct 2022 -

arXiv.org

TL;DR: A formal definition for XML from the perspective of supervised learning is clarified, and possible research directions in XML, such as new evaluation metrics, the tail label problem, and weakly supervised XML are proposed.

...read moreread less

M assively s caling h eteroscedastic c lassifiers

Mark Patrick Collier, +4 more

TL;DR: This article proposed HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes, by directly learning it on the training data.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Proceedings ArticleDOI

Mask R-CNN

Kaiming He, +3 more

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

...read moreread less