scispace - formally typeset
Search or ask a question
Author

Ruiming Tang

Bio: Ruiming Tang is an academic researcher from Huawei. The author has contributed to research in topics: Computer science & Recommender system. The author has an hindex of 19, co-authored 116 publications receiving 2294 citations. Previous affiliations of Ruiming Tang include The Chinese University of Hong Kong & National University of Singapore.


Papers
More filters
Proceedings ArticleDOI
19 Aug 2017
TL;DR: This paper shows that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions, and combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture.
Abstract: Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide & Deep model from Google, DeepFM has a shared input to its "wide" and "deep" parts, with no need of feature engineering besides raw features. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of DeepFM over the existing models for CTR prediction, on both benchmark data and commercial data.

1,695 citations

Posted Content
TL;DR: DeepFM as mentioned in this paper combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture, which has a shared input to its "wide" and "deep" parts.
Abstract: Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide \& Deep model from Google, DeepFM has a shared input to its "wide" and "deep" parts, with no need of feature engineering besides raw features. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of DeepFM over the existing models for CTR prediction, on both benchmark data and commercial data.

504 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed Product-based Neural Network (PIN), which adopts a feature extractor to explore feature interactions and generalizes the kernel product to a net-in-net architecture.
Abstract: User response prediction is a crucial component for personalized information retrieval and filtering scenarios, such as recommender system and web search. The data in user response prediction is mostly in a multi-field categorical format and transformed into sparse representations via one-hot encoding. Due to the sparsity problems in representation and optimization, most research focuses on feature engineering and shallow modeling. Recently, deep neural networks have attracted research attention on such a problem for their high capacity and end-to-end training scheme. In this article, we study user response prediction in the scenario of click prediction. We first analyze a coupled gradient issue in latent vector-based models and propose kernel product to learn field-aware feature interactions. Then, we discuss an insensitive gradient issue in DNN-based models and propose Product-based Neural Network, which adopts a feature extractor to explore feature interactions. Generalizing the kernel product to a net-in-net architecture, we further propose Product-network in Network (PIN), which can generalize previous models. Extensive experiments on four industrial datasets and one contest dataset demonstrate that our models consistently outperform eight baselines on both area under curve and log loss. Besides, PIN makes great click-through rate improvement (relatively 34.67%) in online A/B test.

147 citations

Proceedings ArticleDOI
Bin Liu1, Ruiming Tang1, Yingzhi Chen2, Jinkai Yu1, Huifeng Guo1, Yuzhou Zhang1 
13 May 2019
TL;DR: A novel Feature Generation by Convolutional Neural Network (FGCNN) model with two components: Feature Generation and Deep Classifier, which significantly outperforms nine state-of-the-art models on three large-scale datasets.
Abstract: Click-Through Rate prediction is an important task in recommender systems, which aims to estimate the probability of a user to click on a given item. Recently, many deep models have been proposed to learn low-order and high-order feature interactions from original features. However, since useful interactions are always sparse, it is difficult for DNN to learn them effectively under a large number of parameters. In real scenarios, artificial features are able to improve the performance of deep models (such as Wide & Deep Learning), but feature engineering is expensive and requires domain knowledge, making it impractical in different scenarios. Therefore, it is necessary to augment feature space automatically. In this paper, We propose a novel Feature Generation by Convolutional Neural Network (FGCNN) model with two components: Feature Generation and Deep Classifier. Feature Generation leverages the strength of CNN to generate local patterns and recombine them to generate new features. Deep Classifier adopts the structure of IPNN to learn interactions from the augmented feature space. Experimental results on three large-scale datasets show that FGCNN significantly outperforms nine state-of-the-art models. Moreover, when applying some state-of-the-art models as Deep Classifier, better performance is always achieved, showing the great compatibility of our FGCNN model. This work explores a novel direction for CTR predictions: it is quite useful to reduce the learning difficulties of DNN by automatically identifying important features.

127 citations

Journal ArticleDOI
17 Jul 2019
TL;DR: Wang et al. as discussed by the authors propose a tree-structured policy gradient recommendation (TPGR) framework, where a balanced hierarchical clustering tree is built over the items and picking an item is formulated as seeking a path from the root to a certain leaf of the tree.
Abstract: Reinforcement learning (RL) has recently been introduced to interactive recommender systems (IRS) because of its nature of learning from dynamic interactions and planning for long-run performance. As IRS is always with thousands of items to recommend (i.e., thousands of actions), most existing RL-based methods, however, fail to handle such a large discrete action space problem and thus become inefficient. The existing work that tries to deal with the large discrete action space problem by utilizing the deep deterministic policy gradient framework suffers from the inconsistency between the continuous action representation (the output of the actor network) and the real discrete action. To avoid such inconsistency and achieve high efficiency and recommendation effectiveness, in this paper, we propose a Tree-structured Policy Gradient Recommendation (TPGR) framework, where a balanced hierarchical clustering tree is built over the items and picking an item is formulated as seeking a path from the root to a certain leaf of the tree. Extensive experiments on carefully-designed environments based on two real-world datasets demonstrate that our model provides superior recommendation performance and significant efficiency improvement over state-of-the-art methods.

100 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Proceedings ArticleDOI
Guorui Zhou1, Xiaoqiang Zhu1, Chenru Song1, Ying Fan1, Han Zhu1, Xiao Ma1, Yan Yanghui1, Junqi Jin1, Han Li1, Kun Gai1 
19 Jul 2018
TL;DR: A novel model: Deep Interest Network (DIN) is proposed which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad.
Abstract: Click-through rate prediction is an essential task in industrial applications, such as online advertising. Recently deep learning based models have been proposed, which follow a similar Embedding&MLP paradigm. In these methods large scale sparse input features are first mapped into low dimensional embedding vectors, and then transformed into fixed-length vectors in a group-wise manner, finally concatenated together to fed into a multilayer perceptron (MLP) to learn the nonlinear relations among features. In this way, user features are compressed into a fixed-length representation vector, in regardless of what candidate ads are. The use of fixed-length vector will be a bottleneck, which brings difficulty for Embedding&MLP methods to capture user's diverse interests effectively from rich historical behaviors. In this paper, we propose a novel model: Deep Interest Network (DIN) which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad. This representation vector varies over different ads, improving the expressive ability of model greatly. Besides, we develop two techniques: mini-batch aware regularization and data adaptive activation function which can help training industrial deep networks with hundreds of millions of parameters. Experiments on two public datasets as well as an Alibaba real production dataset with over 2 billion samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with state-of-the-art methods. DIN now has been successfully deployed in the online display advertising system in Alibaba, serving the main traffic.

1,317 citations

Proceedings Article
29 Apr 2018
TL;DR: It is shown that the graph convolution of the GCN model is actually a special form of Laplacian smoothing, which is the key reason why GCNs work, but it also brings potential concerns of over-smoothing with many convolutional layers.
Abstract: Many interesting problems in machine learning are being revisited with new deep learning tools. For graph-based semi-supervised learning, a recent important development is graph convolutional networks (GCNs), which nicely integrate local vertex features and graph topology in the convolutional layers. Although the GCN model compares favorably with other state-of-the-art methods, its mechanisms are not clear and it still requires considerable amount of labeled data for validation and model selection. In this paper, we develop deeper insights into the GCN model and address its fundamental limits. First, we show that the graph convolution of the GCN model is actually a special form of Laplacian smoothing, which is the key reason why GCNs work, but it also brings potential concerns of over-smoothing with many convolutional layers. Second, to overcome the limits of the GCN model with shallow architectures, we propose both co-training and self-training approaches to train GCNs. Our approaches significantly improve GCNs in learning with very few labels, and exempt them from requiring additional labels for validation. Extensive experiments on benchmarks have verified our theory and proposals.

1,273 citations