Deep Neural Networks for YouTube Recommendations
Paul Covington,Jay Adams,Emre Sargin +2 more
- pp 191-198
Reads0
Chats0
TLDR
This paper details a deep candidate generation model and then describes a separate deep ranking model and provides practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.Abstract:
YouTube represents one of the largest scale and most sophisticated industrial recommendation systems in existence. In this paper, we describe the system at a high level and focus on the dramatic performance improvements brought by deep learning. The paper is split according to the classic two-stage information retrieval dichotomy: first, we detail a deep candidate generation model and then describe a separate deep ranking model. We also provide practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.read more
Citations
More filters
Journal ArticleDOI
AIRC: Attentive Implicit Relation Recommendation Incorporating Content Information for Bipartite Graphs
TL;DR: This paper proposes the attentive implicit relation recommendation incorporating content information (AIRC) framework that is designed for bipartite graphs based on the GC–MC algorithm and shows that the framework performs better than other state-of-art recommendation algorithms.
Proceedings ArticleDOI
A field study of related video recommendations: newest, most similar, or most relevant?
TL;DR: A field study of related video recommendations is described, where algorithms are deployed to recommend related movie trailers to suggest the potential to design non-personalized yet effective related item recommendation strategies.
Proceedings ArticleDOI
Hippie: A Data-Paralleled Pipeline Approach to Improve Memory-Efficiency and Scalability for Large DNN Training
TL;DR: Hippie as discussed by the authors integrates pipeline parallelism and data parallelism to improve the memory efficiency and scalability of large DNN training on distributed platforms by hiding gradient communication and introducing the last stage pipeline scheduling and recomputation for specific layers.
Proceedings ArticleDOI
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
TL;DR: Tensor Casting as mentioned in this paper proposes a generic accelerator architecture for tensor gather-scatter that encompasses all the key primitives of training embedding layers and provides $1.9 -21 \times$ improvements in training throughput compared to state-of-the-art approaches.
Posted Content
Click-Through Rate Prediction with the User Memory Network
TL;DR: The proposed MA-DNN is as simple as DNN, but has certain ability to exploit useful information contained in users' historical behaviors as RNN, and can be augmented to other models as well.
References
More filters
Proceedings Article
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Posted Content
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.