scispace - formally typeset
Open AccessProceedings ArticleDOI

Deep Neural Networks for YouTube Recommendations

Paul Covington, +2 more
- pp 191-198
Reads0
Chats0
TLDR
This paper details a deep candidate generation model and then describes a separate deep ranking model and provides practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.
Abstract: 
YouTube represents one of the largest scale and most sophisticated industrial recommendation systems in existence. In this paper, we describe the system at a high level and focus on the dramatic performance improvements brought by deep learning. The paper is split according to the classic two-stage information retrieval dichotomy: first, we detail a deep candidate generation model and then describe a separate deep ranking model. We also provide practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.

read more

Citations
More filters
Proceedings ArticleDOI

Aspect-Aware Latent Factor Model: Rating Prediction with Ratings and Reviews

TL;DR: This paper applies a proposed aspect-aware topic model (ATM) on the review text to model user preferences and item features from different aspects, and estimates the aspect importance of a user towards an item, and introduces a weighted matrix to associate those latent factors with the same set of aspects discovered by ATM.
Proceedings ArticleDOI

Recommending what video to watch next: a multitask ranking system

TL;DR: This paper introduces a large scale multi-objective ranking system for recommending what video to watch next on an industrial video sharing platform and explored a variety of soft-parameter sharing techniques such as Multi-gate Mixture-of-Experts to efficiently optimize for multiple ranking objectives.
Posted Content

MOReL : Model-Based Offline Reinforcement Learning

TL;DR: Theoretically, it is shown that MOReL is minimax optimal (up to log factors) for offline RL, and through experiments, it matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
Journal ArticleDOI

Spatial-Aware Hierarchical Collaborative Deep Learning for POI Recommendation

TL;DR: The extensive experimental analysis shows that the proposed Spatial-Aware Hierarchical Collaborative Deep Learning model outperforms the state-of-the-art recommendation models, especially in out- of-town and cold-start recommendation scenarios.
Posted Content

Product-based Neural Networks for User Response Prediction

TL;DR: Product-based Neural Networks (PNN) as mentioned in this paper uses an embedding layer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between inter-field categories and further fully connected layers to explore high-order feature interactions.
References
More filters
Book ChapterDOI

I and J

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Posted Content

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more
- 11 Feb 2015 - 
TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Posted Content

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.