Deep Neural Networks for YouTube Recommendations
Paul Covington,Jay Adams,Emre Sargin +2 more
- pp 191-198
TLDR
This paper details a deep candidate generation model and then describes a separate deep ranking model and provides practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.Abstract:
YouTube represents one of the largest scale and most sophisticated industrial recommendation systems in existence. In this paper, we describe the system at a high level and focus on the dramatic performance improvements brought by deep learning. The paper is split according to the classic two-stage information retrieval dichotomy: first, we detail a deep candidate generation model and then describe a separate deep ranking model. We also provide practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.read more
Citations
More filters
Proceedings ArticleDOI
A Minimax Game for Instance based Selective Transfer Learning
Bo Wang,Minghui Qiu,Xisen Wang,Yaliang Li,Yu Gong,Xiaoyi Zeng,Jun Huang,Bo Zheng,Deng Cai,Jingren Zhou +9 more
TL;DR: This work proposes a general Minimax Game based model for selective transfer learning that outperforms the competing methods by a large margin and is shown to speed up the training process of the learning task in the target domain than traditional TL methods.
Proceedings ArticleDOI
Temporal Hierarchical Attention at Category- and Item-Level for Micro-Video Click-Through Prediction
TL;DR: This paper investigates how to model user's historical behaviors so as to predict the user's click-through of micro-videos, and proposes a Temporal Hierarchical Attention at Category- and Item-Level (THACIL) network for user behavior modeling.
Proceedings ArticleDOI
DAPPLE: a pipelined data parallel approach for training large models
Shiqing Fan,Yi Rong,Chen Meng,Zongyan Cao,Siyu Wang,Zhen Zheng,Chuan Wu,Guoping Long,Jun Yang,Lixue Xia,Lansong Diao,Xiaoyong Liu,Wei Lin +12 more
TL;DR: DAPPLE as mentioned in this paper is a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models, and it features a novel parallelization strategy planner to solve the partition and placement problems.
Posted Content
Interact and Decide: Medley of Sub-Attention Networks for Effective Group Recommendation
TL;DR: MoSAN as discussed by the authors proposes Medley of Sub-Attention Networks (MoSAN), a new novel neural architecture for the group recommendation task, in which each sub-attention module is representative of a single member, which models a user's preference with respect to all other group members.
Journal ArticleDOI
User-Personalized Review Rating Prediction Method Based on Review Text Content and User-Item Rating Matrix
TL;DR: This work proposes a user-personalized review rating prediction method by integrating the review text and user-item rating matrix information, which can significantly outperform the state-of-the-art methods.
References
More filters
Proceedings Article
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Posted Content
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.