Deep Neural Networks for YouTube Recommendations

doi:10.1145/2959100.2959190

Home
/
Papers
/
Deep Neural Networks for YouTube Recommendations

Proceedings Article•DOI•

Deep Neural Networks for YouTube Recommendations

Paul Covington¹, Jay Adams¹, Emre Sargin¹•Institutions (1)

Google¹

07 Sep 2016-pp 191-198

TL;DR: This paper details a deep candidate generation model and then describes a separate deep ranking model and provides practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.

read less

Abstract: YouTube represents one of the largest scale and most sophisticated industrial recommendation systems in existence. In this paper, we describe the system at a high level and focus on the dramatic performance improvements brought by deep learning. The paper is split according to the classic two-stage information retrieval dichotomy: first, we detail a deep candidate generation model and then describe a separate deep ranking model. We also provide practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Graph Convolutional Neural Networks for Web-Scale Recommender Systems

[...]

Rex Ying¹, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton¹, Jure Leskovec¹ - Show less +2 more•Institutions (1)

Stanford University¹

19 Jul 2018

TL;DR: A novel method based on highly efficient random walks to structure the convolutions and a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model are developed.

...read moreread less

Abstract: Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. However, making these methods practical and scalable to web-scale recommendation tasks with billions of items and hundreds of millions of users remains an unsolved challenge. Here we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a data-efficient Graph Convolutional Network (GCN) algorithm, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model. We also develop an efficient MapReduce model inference algorithm to generate embeddings using a trained model. Overall, we can train on and embed graphs that are four orders of magnitude larger than typical GCN implementations. We show how GCN embeddings can be used to make high-quality recommendations in various settings at Pinterest, which has a massive underlying graph with 3 billion nodes representing pins and boards, and 17 billion edges. According to offline metrics, user studies, as well as A/B tests, our approach generates higher-quality recommendations than comparable deep learning based systems. To our knowledge, this is by far the largest application of deep graph embeddings to date and paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures.

...read moreread less

2,647 citations

Cites background from "Deep Neural Networks for YouTube Re..."

...We also do not consider non-deep learning approaches for generating item/content embeddings, since other works have already proven state-of-the-art performance of deep learning approaches for generating such embeddings [9, 12, 24]....
[...]

Proceedings Article•DOI•

DeepFM: a factorization-machine based neural network for CTR prediction

[...]

Huifeng Guo¹, Ruiming Tang², Yunming Ye¹, Zhenguo Li², Xiuqiang He² - Show less +1 more•Institutions (2)

Harbin Institute of Technology¹, Huawei²

19 Aug 2017

TL;DR: This paper shows that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions, and combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture.

...read moreread less

Abstract: Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide & Deep model from Google, DeepFM has a shared input to its "wide" and "deep" parts, with no need of feature engineering besides raw features. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of DeepFM over the existing models for CTR prediction, on both benchmark data and commercial data.

...read moreread less

1,695 citations

Proceedings Article•DOI•

Deep Interest Network for Click-Through Rate Prediction

[...]

Guorui Zhou¹, Xiaoqiang Zhu¹, Chenru Song¹, Ying Fan¹, Han Zhu¹, Xiao Ma¹, Yan Yanghui¹, Junqi Jin¹, Han Li¹, Kun Gai¹ - Show less +6 more•Institutions (1)

Alibaba Group¹

19 Jul 2018

TL;DR: A novel model: Deep Interest Network (DIN) is proposed which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad.

...read moreread less

Abstract: Click-through rate prediction is an essential task in industrial applications, such as online advertising. Recently deep learning based models have been proposed, which follow a similar Embedding&MLP paradigm. In these methods large scale sparse input features are first mapped into low dimensional embedding vectors, and then transformed into fixed-length vectors in a group-wise manner, finally concatenated together to fed into a multilayer perceptron (MLP) to learn the nonlinear relations among features. In this way, user features are compressed into a fixed-length representation vector, in regardless of what candidate ads are. The use of fixed-length vector will be a bottleneck, which brings difficulty for Embedding&MLP methods to capture user's diverse interests effectively from rich historical behaviors. In this paper, we propose a novel model: Deep Interest Network (DIN) which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad. This representation vector varies over different ads, improving the expressive ability of model greatly. Besides, we develop two techniques: mini-batch aware regularization and data adaptive activation function which can help training industrial deep networks with hundreds of millions of parameters. Experiments on two public datasets as well as an Alibaba real production dataset with over 2 billion samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with state-of-the-art methods. DIN now has been successfully deployed in the online display advertising system in Alibaba, serving the main traffic.

...read moreread less

1,317 citations

Cites background or methods from "Deep Neural Networks for YouTube Re..."

...Most of the popular model structures [3, 4, 21] share a similar Embedding&MLP paradigm, which we refer to as base model, as shown in the left of Fig....
[...]
...Deep Crossing [21], Wide&Deep Learning [4] and YouTube Recommendation CTR model [3] extend LS-PLM and FM by replacing the transformation function with complex MLP network, which enhances the model capability greatly....
[...]
...As fully connected networks can only handle fixed-length inputs, it is a common practice [3, 4] to transform the list of embedding vectors via a pooling layer to get a fixed-length vector:...
[...]
...Recently, inspired by the success of deep learning in computer vision [14] and natural language processing [1], deep learning based methods have been proposed for CTR prediction task [3, 4, 21, 26]....
[...]
..., searched terms or watched videos in YouTube recommender system [3]....
[...]

Journal Article•DOI•

Deep Learning Based Recommender System: A Survey and New Perspectives

[...]

Shuai Zhang¹, Lina Yao¹, Aixin Sun², Yi Tay²•Institutions (2)

University of New South Wales¹, Nanyang Technological University²

25 Feb 2019-ACM Computing Surveys

TL;DR: A comprehensive review of recent research efforts on deep learning-based recommender systems is provided in this paper, along with a comprehensive summary of the state-of-the-art.

...read moreread less

Abstract: With the growing volume of online information, recommender systems have been an effective strategy to overcome information overload. The utility of recommender systems cannot be overstated, given their widespread adoption in many web applications, along with their potential impact to ameliorate many problems related to over-choice. In recent years, deep learning has garnered considerable interest in many research fields such as computer vision and natural language processing, owing not only to stellar performance but also to the attractive property of learning feature representations from scratch. The influence of deep learning is also pervasive, recently demonstrating its effectiveness when applied to information retrieval and recommender systems research. The field of deep learning in recommender system is flourishing. This article aims to provide a comprehensive review of recent research efforts on deep learning-based recommender systems. More concretely, we provide and devise a taxonomy of deep learning-based recommendation models, along with a comprehensive summary of the state of the art. Finally, we expand on current trends and provide new perspectives pertaining to this new and exciting development of the field.

...read moreread less

1,070 citations

Posted Content•

LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation

[...]

Xiangnan He¹, Kuan Deng¹, Xiang Wang², Yan Li, Yongdong Zhang¹, Meng Wang³ - Show less +2 more•Institutions (3)

University of Science and Technology of China¹, National University of Singapore², Hefei University of Technology³

06 Feb 2020-arXiv: Information Retrieval

TL;DR: This work proposes a new model named LightGCN, including only the most essential component in GCN -- neighborhood aggregation -- for collaborative filtering, and is much easier to implement and train, exhibiting substantial improvements over Neural Graph Collaborative Filtering (NGCF) under exactly the same experimental setting.

...read moreread less

Abstract: Graph Convolution Network (GCN) has become new state-of-the-art for collaborative filtering. Nevertheless, the reasons of its effectiveness for recommendation are not well understood. Existing work that adapts GCN to recommendation lacks thorough ablation analyses on GCN, which is originally designed for graph classification tasks and equipped with many neural network operations. However, we empirically find that the two most common designs in GCNs -- feature transformation and nonlinear activation -- contribute little to the performance of collaborative filtering. Even worse, including them adds to the difficulty of training and degrades recommendation performance. In this work, we aim to simplify the design of GCN to make it more concise and appropriate for recommendation. We propose a new model named LightGCN, including only the most essential component in GCN -- neighborhood aggregation -- for collaborative filtering. Specifically, LightGCN learns user and item embeddings by linearly propagating them on the user-item interaction graph, and uses the weighted sum of the embeddings learned at all layers as the final embedding. Such simple, linear, and neat model is much easier to implement and train, exhibiting substantial improvements (about 16.0\% relative improvement on average) over Neural Graph Collaborative Filtering (NGCF) -- a state-of-the-art GCN-based recommender model -- under exactly the same experimental setting. Further analyses are provided towards the rationality of the simple LightGCN from both analytical and empirical perspectives.

...read moreread less

1,020 citations

Cites background or methods from "Deep Neural Networks for YouTube Re..."

...Collaborative Filtering (CF) is a prevalent technique in modern recommender systems [7, 45]....
[...]
...To alleviate information overload on the web, recommender system has been widely deployed to perform personalized information filtering [7, 45, 46]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Deep Sparse Rectifier Neural Networks

[...]

Xavier Glorot, Antoine Bordes, Yoshua Bengio

14 Jun 2011

TL;DR: This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-dierentiabil ity.

...read moreread less

Abstract: While logistic sigmoid neurons are more biologically plausible than hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-dierentiabil ity

...read moreread less

6,790 citations

"Deep Neural Networks for YouTube Re..." refers background in this paper

...Features are concatenated into a wide first layer, followed by several layers of fully connected Rectified Linear Units (ReLU) [6]....
[...]

Proceedings Article•

Large Scale Distributed Deep Networks

[...]

Jeffrey Dean¹, Greg S. Corrado¹, Rajat Monga¹, Kai Chen¹, Matthieu Devin¹, Mark Z. Mao¹, Marc'Aurelio Ranzato¹, Andrew W. Senior¹, Paul A. Tucker¹, Ke Yang¹, Quoc V. Le¹, Andrew Y. Ng¹ - Show less +8 more•Institutions (1)

Google¹

03 Dec 2012

TL;DR: This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.

...read moreread less

Abstract: Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

...read moreread less

3,475 citations

"Deep Neural Networks for YouTube Re..." refers methods in this paper

...Our system is built on Google Brain [4] which was recently open sourced as TensorFlow [1]....
[...]

Journal Article•DOI•

A survey of collaborative filtering techniques

[...]

Xiaoyuan Su¹, Taghi M. Khoshgoftaar¹•Institutions (1)

Florida Atlantic University¹

01 Jan 2009-Advances in Artificial Intelligence

TL;DR: From basic techniques to the state-of-the-art, this paper attempts to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.

...read moreread less

Abstract: As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy protection, etc., and their possible solutions. We then present three main categories of CF techniques: memory-based, modelbased, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.

...read moreread less

3,406 citations

"Deep Neural Networks for YouTube Re..." refers background in this paper

...YouTube is the world’s largest platform for creating, sharing and discovering video content....
[...]

Proceedings Article•DOI•

Collaborative Deep Learning for Recommender Systems

[...]

Hao Wang¹, Naiyan Wang¹, Dit-Yan Yeung¹•Institutions (1)

Hong Kong University of Science and Technology¹

10 Aug 2015

TL;DR: Wang et al. as discussed by the authors proposed a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix.

...read moreread less

Abstract: Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recently advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art.

...read moreread less

1,546 citations

Proceedings Article•DOI•

The YouTube video recommendation system

[...]

James Davidson¹, Benjamin Liebald¹, Junning Liu¹, Palash Nandy¹, Taylor Van Vleet¹, Ullas Gargi¹, Sujoy Gupta¹, Yu He¹, Mike Lambert¹, Blake Livingston¹, Dasarathi Sampath¹ - Show less +7 more•Institutions (1)

Google¹

26 Sep 2010

TL;DR: The video recommendation system in use at YouTube, the world's most popular online video community, is discussed, with details on the experimentation and evaluation framework used to test and tune new algorithms.

...read moreread less

Abstract: We discuss the video recommendation system in use at YouTube, the world's most popular online video community. The system recommends personalized sets of videos to users based on their activity on the site. We discuss some of the unique challenges that the system faces and how we address them. In addition, we provide details on the experimentation and evaluation framework used to test and tune new algorithms. We also present some of the findings from these experiments.

...read moreread less

1,069 citations