scispace - formally typeset
Book ChapterDOI

Accelerating Deep Learning with Millions of Classes

TLDR
This work proposes an training framework to handle extreme classification tasks based on Random Projection and demonstrates that the proposed framework is able to train deep learning models with millions of classes and achieve above 10⇥ speedup compared to existing approaches.
Abstract
Deep learning has achieved remarkable success in many classification tasks because of its great power of representation learning for complex data. However, it remains challenging when extending to classification tasks with millions of classes. Previous studies are focused on solving this problem in a distributed fashion or using a sampling-based approach to reduce the computational cost caused by the softmax layer. However, these approaches still need high GPU memory in order to work with large models and it is non-trivial to extend them to parallel settings. To address these issues, we propose an efficient training framework to handle extreme classification tasks based on Random Projection. The key idea is that we first train a slimmed model with a random projected softmax classifier and then we recover it to the original classifier. We also show a theoretical guarantee that this recovered classifier can approximate the original classifier with a small error. Later, we extend our framework to parallel settings by adopting a communication reduction technique. In our experiments, we demonstrate that the proposed framework is able to train deep learning models with millions of classes and achieve above \(10{\times }\) speedup compared to existing approaches.

read more

Citations
More filters
Proceedings ArticleDOI

DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

TL;DR: DeepXML as discussed by the authors decomposes the deep extreme multi-label task into four simpler sub-tasks, each of which can be trained accurately and efficiently, and chooses different components for each sub-task to generate algorithms with varying trade-offs between accuracy and scalability.
Posted Content

Federated Deep AUC Maximization for Heterogeneous Data with a Constant Communication Complexity

TL;DR: Improved FDAM algorithms for heterogeneous data by solving the popular non-convex strongly-concave min-max formulation of DAM in a distributed fashion are proposed and shown to be effective on benchmark datasets, and on medical chest X-ray images from different organizations.
Proceedings ArticleDOI

Massively Scaling Heteroscedastic Classifiers

TL;DR: This article proposed HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes, which can be viewed as a 3.5 billion class classification problem.
Journal ArticleDOI

A Survey on Extreme Multi-label Learning

TL;DR: A formal definition for XML from the perspective of supervised learning is clarified, and possible research directions in XML, such as new evaluation metrics, the tail label problem, and weakly supervised XML are proposed.

M assively s caling h eteroscedastic c lassifiers

TL;DR: This article proposed HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes, by directly learning it on the training data.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings ArticleDOI

Mask R-CNN

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.
Related Papers (5)