scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Deep User Match Network for Click-Through Rate Prediction

11 Jul 2021-pp 1890-1894
TL;DR: Wang et al. as discussed by the authors proposed a novel Deep User Match Network (DUMN) which measures the user-to-user relevance for CTR prediction by matching the target user and those who have interacted with candidate item and modeling their similarities in user representation space.
Abstract: Click-through rate (CTR) prediction is a crucial task in many applications (e.g. recommender systems). Recently deep learning based models have been proposed and successfully applied for CTR prediction by focusing on feature interaction or user interest based on the item-to-item relevance between user behaviors and candidate item. However, these existing models neglect the user-to-user relevance between the target user and those who like the candidate item, which can reflect the preference of target user. To this end, in this paper, we propose a novel Deep User Match Network (DUMN) which measures the user-to-user relevance for CTR prediction. Specifically, in DUMN, we design a User Representation Layer to learn a unified user representation which contains user latent interest based on user behaviors. Then, User Match Layer is designed to measure the user-to-user relevance by matching the target user and those who have interacted with candidate item and modeling their similarities in user representation space. Extensive experimental results on three public real-world datasets validate the effectiveness of DUMN compared with state-of-the-art methods.
Citations
More filters
Proceedings ArticleDOI
06 Jul 2022
TL;DR: A transformer-based multi-representational item network consisting of a multi-CLS representation sub module and contextualized global item representation submodule is proposed and it is proposed to decouple the time information and item behavior to avoid information overwhelming.
Abstract: Click-through rate (CTR) prediction is essential in the modelling of a recommender system. Previous studies mainly focus on user behavior modelling, while few of them consider candidate item representations. This makes the models strongly dependent on user representations, and less effective when user behavior is sparse. Furthermore, most existing works regard the candidate item as one fixed embedding and ignore the multi-representational characteristics of the item. To handle the above issues, we propose a Deep multi-Representational Item NetworK (DRINK) for CTR prediction. Specifically, to tackle the sparse user behavior problem, we construct a sequence of interacting users and timestamps to represent the candidate item; to dynamically capture the characteristics of the item, we propose a transformer-based multi-representational item network consisting of a multi-CLS representation submodule and contextualized global item representation submodule. In addition, we propose to decouple the time information and item behavior to avoid information overwhelming. Outputs of the above components are concatenated and fed into a MLP layer to fit the CTR. We conduct extensive experiments on real-world datasets of Amazon and the results demonstrate the effectiveness of the proposed model.

8 citations

Journal ArticleDOI
TL;DR: Adaptive mixture of experts (AdaMoE) is proposed, a new framework to alleviate the concept drift problem by adaptive filtering in the data stream of CTR prediction, which significantly outperforms all incremental learning frameworks considered.
Abstract: Click-through rate (CTR) prediction is a crucial task in web search, recommender systems, and online advertisement displaying. In practical application, CTR models often serve with high-speed user-generated data streams, whose underlying distribution rapidly changing over time. The concept drift problem inevitably exists in those streaming data, which can lead to performance degradation due to the timeliness issue. To ensure model freshness, incremental learning has been widely adopted in real-world production systems. However, it is hard for the incremental update to achieve the balance of the CTR models between the adaptability to cap-ture the fast-changing trends and generalization ability to retain common knowledge. In this paper, we propose adaptive mixture of experts (AdaMoE), a new framework to alleviate the concept drift problem by adaptive filtering in the data stream of CTR prediction. The extensive experiments on the offline industrial dataset and online A/B tests show that our AdaMoE significantly outperforms all incremental learning frameworks considered.

6 citations

Journal ArticleDOI
TL;DR: In this article , a drift-aware incremental learning framework based on ensemble learning is proposed to address catastrophic forgetting in CTR prediction in industrial data streams, where the model simply adapts to new data distribution all the time.
Abstract: Click-through rate (CTR) prediction is of great importance in recommendation systems and online advertising platforms. When served in industrial scenarios, the user-generated data observed by the CTR model typically arrives as a stream. Streaming data has the characteristic that the underlying distribution drifts over time and may recur. This can lead to catastrophic forgetting if the model simply adapts to new data distribution all the time. Also, it's inefficient to relearn distribution that has been occurred. Due to memory constraints and diversity of data distributions in large-scale industrial applications, conventional strategies for catastrophic forgetting such as replay, parameter isolation, and knowledge distillation are difficult to be deployed. In this work, we design a novel drift-aware incremental learning framework based on ensemble learning to address catastrophic forgetting in CTR prediction. With explicit error-based drift detection on streaming data, the framework further strengthens well-adapted ensembles and freezes ensembles that do not match the input distribution avoiding catastrophic interference. Both evaluations on offline experiments and A/B test shows that our method outperforms all baselines considered.

1 citations

Book ChapterDOI
TL;DR: Wang et al. as discussed by the authors proposed a deep user multi-interest network (DUMIN) which designs self-Interest Extraction Network (SIEN) and user-User Interest Extraction Networks (UIEN) to capture user different interests.
Abstract: Click-through rate (CTR) prediction is widely used in recommendation systems. Accurately modeling user interest is the key to improve the performance of CTR prediction task. Existing methods pay attention to model user interest from a single perspective to reflect user preferences, ignoring user different interests in different aspects, thus limiting the expressive ability of user interest. In this paper, we propose a novel Deep User Multi-Interest Network (DUMIN) which designs Self-Interest Extraction Network (SIEN) and User-User Interest Extraction Network (UIEN) to capture user different interests. First, SIEN uses attention mechanism and sequential network to focus on different parts in self-interest. Meanwhile, an auxiliary loss network is used to bring extra supervision for model training. Next, UIEN adopts multi-headed self-attention mechanism to learn a unified interest representation for each user who interacted with the candidate item. Then, attention mechanism is introduced to adaptively aggregate these interest representations to obtain user-user interest, which reflects the collaborative filtering information among users. Extensive experimental results on public real-world datasets show that proposed DUMIN outperforms various state-of-the-art methods.

1 citations

Journal ArticleDOI
TL;DR: A unified multitype user behavior sequence modeling framework named as MBIN, a.k.a. multifeedback behavior-based Interest modeling network, is proposed to cope with uncertainties in the noisy data and an interest fusion layer is introduced so as to effectively model and fuse various types of interest representations of users to achieve personalized interest fusion.
Abstract: With the rapid development of the Internet, the recommendation system is becoming more and more important in people’s life. Click-through rate prediction is a crucial task in the recommendation system, which directly determines the effect of the recommendation system. Recently, researchers have found that considering the user behavior sequence can greatly improve the accuracy of the click-through rate prediction model. However, the existing prediction models usually use the user click behavior sequence as the input of the model, which will make it difficult for the model to obtain a comprehensive user interest representation. In this paper, a unified multitype user behavior sequence modeling framework named as MBIN, a.k.a. multifeedback behavior-based Interest modeling network, is proposed to cope with uncertainties in the noisy data. The proposed adaptive model uses deep learning technology, obtains user interest representation through multihead attention, denoises user interest representation using the vector projection method, and fuses the user interests using adaptive dropout technology. First, an interest denoising layer is proposed in the MBIN, which can effectively mitigate the noise problem in user behavior sequences to obtain more accurate user interests. Second, an interest fusion layer is introduced so as to effectively model and fuse various types of interest representations of users to achieve personalized interest fusion. Then, we used auxiliary losses based on behavior sequences to enhance the effect of behavior sequence modeling and improve the effectiveness of user interest characterization. Finally, we conduct extensive experiments based on real-world and large-scale dataset to validate the effectiveness of our approach in CTR prediction tasks.
References
More filters
Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.
Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [33]). To our knowledge, our result is the first to surpass the reported human-level performance (5.1%, [26]) on this dataset.

11,732 citations

Proceedings ArticleDOI
Yehuda Koren1
24 Aug 2008
TL;DR: The factor and neighborhood models can now be smoothly merged, thereby building a more accurate combined model and a new evaluation metric is suggested, which highlights the differences among methods, based on their performance at a top-K recommendation task.
Abstract: Recommender systems provide users with personalized suggestions for products or services. These systems often rely on Collaborating Filtering (CF), where past transactions are analyzed in order to establish connections between users and products. The two more successful approaches to CF are latent factor models, which directly profile both users and products, and neighborhood models, which analyze similarities between products or users. In this work we introduce some innovations to both approaches. The factor and neighborhood models can now be smoothly merged, thereby building a more accurate combined model. Further accuracy improvements are achieved by extending the models to exploit both explicit and implicit feedback by the users. The methods are tested on the Netflix data. Results are better than those previously published on that dataset. In addition, we suggest a new evaluation metric, which highlights the differences among methods, based on their performance at a top-K recommendation task.

3,975 citations

Proceedings ArticleDOI
13 Dec 2010
TL;DR: Factorization Machines (FM) are introduced which are a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models and can mimic these models just by specifying the input data (i.e. the feature vectors).
Abstract: In this paper, we introduce Factorization Machines (FM) which are a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models. Like SVMs, FMs are a general predictor working with any real valued feature vector. In contrast to SVMs, FMs model all interactions between variables using factorized parameters. Thus they are able to estimate interactions even in problems with huge sparsity (like recommender systems) where SVMs fail. We show that the model equation of FMs can be calculated in linear time and thus FMs can be optimized directly. So unlike nonlinear SVMs, a transformation in the dual form is not necessary and the model parameters can be estimated directly without the need of any support vector in the solution. We show the relationship to SVMs and the advantages of FMs for parameter estimation in sparse settings. On the other hand there are many different factorization models like matrix factorization, parallel factor analysis or specialized models like SVD++, PITF or FPMC. The drawback of these models is that they are not applicable for general prediction tasks but work only with special input data. Furthermore their model equations and optimization algorithms are derived individually for each task. We show that FMs can mimic these models just by specifying the input data (i.e. the feature vectors). This makes FMs easily applicable even for users without expert knowledge in factorization models.

2,460 citations

Proceedings ArticleDOI
15 Sep 2016
TL;DR: Wide & Deep learning is presented---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems and is open-sourced in TensorFlow.
Abstract: Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow.

2,454 citations

Proceedings ArticleDOI
19 Aug 2017
TL;DR: This paper shows that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions, and combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture.
Abstract: Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide & Deep model from Google, DeepFM has a shared input to its "wide" and "deep" parts, with no need of feature engineering besides raw features. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of DeepFM over the existing models for CTR prediction, on both benchmark data and commercial data.

1,695 citations