Book ChapterDOI
Accelerating Deep Learning with Millions of Classes
Zhuoning Yuan,Zhishuai Guo,Xiaotian Yu,Xiaoyu Wang,Tianbao Yang +4 more
- pp 711-726
TLDR
This work proposes an training framework to handle extreme classification tasks based on Random Projection and demonstrates that the proposed framework is able to train deep learning models with millions of classes and achieve above 10⇥ speedup compared to existing approaches.Abstract:
Deep learning has achieved remarkable success in many classification tasks because of its great power of representation learning for complex data. However, it remains challenging when extending to classification tasks with millions of classes. Previous studies are focused on solving this problem in a distributed fashion or using a sampling-based approach to reduce the computational cost caused by the softmax layer. However, these approaches still need high GPU memory in order to work with large models and it is non-trivial to extend them to parallel settings. To address these issues, we propose an efficient training framework to handle extreme classification tasks based on Random Projection. The key idea is that we first train a slimmed model with a random projected softmax classifier and then we recover it to the original classifier. We also show a theoretical guarantee that this recovered classifier can approximate the original classifier with a small error. Later, we extend our framework to parallel settings by adopting a communication reduction technique. In our experiments, we demonstrate that the proposed framework is able to train deep learning models with millions of classes and achieve above \(10{\times }\) speedup compared to existing approaches.read more
Citations
More filters
Proceedings ArticleDOI
DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents
Kunal Dahiya,Deepak Saini,Anshul Mittal,Ankush Shaw,Kushal Dave,Akshay Soni,Himanshu Jain,Sumeet Agarwal,Manik Varma +8 more
TL;DR: DeepXML as discussed by the authors decomposes the deep extreme multi-label task into four simpler sub-tasks, each of which can be trained accurately and efficiently, and chooses different components for each sub-task to generate algorithms with varying trade-offs between accuracy and scalability.
Posted Content
Federated Deep AUC Maximization for Heterogeneous Data with a Constant Communication Complexity
TL;DR: Improved FDAM algorithms for heterogeneous data by solving the popular non-convex strongly-concave min-max formulation of DAM in a distributed fashion are proposed and shown to be effective on benchmark datasets, and on medical chest X-ray images from different organizations.
Proceedings ArticleDOI
Massively Scaling Heteroscedastic Classifiers
Mark Patrick Collier,Rodolphe Jenatton,Basil Mustafa,Neil Houlsby,Jesse Berent,Effrosyni Kokiopoulou +5 more
TL;DR: This article proposed HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes, which can be viewed as a 3.5 billion class classification problem.
Journal ArticleDOI
A Survey on Extreme Multi-label Learning
TL;DR: A formal definition for XML from the perspective of supervised learning is clarified, and possible research directions in XML, such as new evaluation metrics, the tail label problem, and weakly supervised XML are proposed.
M assively s caling h eteroscedastic c lassifiers
TL;DR: This article proposed HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes, by directly learning it on the training data.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings ArticleDOI
Mask R-CNN
TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.
Related Papers (5)
Learning A Task-Specific Deep Architecture For Clustering
Deep Ensemble Bayesian Active Learning : Addressing the Mode Collapse issue in Monte Carlo dropout via Ensembles
Remus Pop,Patric Fulop +1 more