scispace - formally typeset
Search or ask a question
Author

Shuokai Li

Bio: Shuokai Li is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Computer science & Artificial intelligence. The author has an hindex of 2, co-authored 4 publications receiving 69 citations.

Papers
More filters
Proceedings ArticleDOI
18 Jul 2019
TL;DR: Experimental results showed that Meta-Embedding can significantly improve both the cold-start and warm-up performances for six existing CTR prediction models, ranging from lightweight models such as Factorization Machines to complicated deep modelssuch as PNN and DeepFM.
Abstract: Click-through rate (CTR) prediction has been one of the most central problems in computational advertising. Lately, embedding techniques that produce low-dimensional representations of ad IDs drastically improve CTR prediction accuracies. However, such learning techniques are data demanding and work poorly on new ads with little logging data, which is known as the cold-start problem. In this paper, we aim to improve CTR predictions during both the cold-start phase and the warm-up phase when a new ad is added to the candidate pool. We propose Meta-Embedding, a meta-learning-based approach that learns to generate desirable initial embeddings for new ad IDs. The proposed method trains an embedding generator for new ad IDs by making use of previously learned ads through gradient-based meta-learning. In other words, our method learns how to learn better embeddings. When a new ad comes, the trained generator initializes the embedding of its ID by feeding its contents and attributes. Next, the generated embedding can speed up the model fitting during the warm-up phase when a few labeled examples are available, compared to the existing initialization methods. Experimental results on three real-world datasets showed that Meta-Embedding can significantly improve both the cold-start and warm-up performances for six existing CTR prediction models, ranging from lightweight models such as Factorization Machines to complicated deep models such as PNN and DeepFM. All of the above apply to conversion rate (CVR) predictions as well.

111 citations

Posted Content
Feiyang Pan1, Shuokai Li1, Xiang Ao1, Pingzhong Tang1, Qing He1 
TL;DR: Meta-Embedding as discussed by the authors is a meta-learning-based approach that learns to generate desirable initial embeddings for new ad IDs by making use of previously learned ads through gradient-based meta learning.
Abstract: Click-through rate (CTR) prediction has been one of the most central problems in computational advertising. Lately, embedding techniques that produce low-dimensional representations of ad IDs drastically improve CTR prediction accuracies. However, such learning techniques are data demanding and work poorly on new ads with little logging data, which is known as the cold-start problem. In this paper, we aim to improve CTR predictions during both the cold-start phase and the warm-up phase when a new ad is added to the candidate pool. We propose Meta-Embedding, a meta-learning-based approach that learns to generate desirable initial embeddings for new ad IDs. The proposed method trains an embedding generator for new ad IDs by making use of previously learned ads through gradient-based meta-learning. In other words, our method learns how to learn better embeddings. When a new ad comes, the trained generator initializes the embedding of its ID by feeding its contents and attributes. Next, the generated embedding can speed up the model fitting during the warm-up phase when a few labeled examples are available, compared to the existing initialization methods. Experimental results on three real-world datasets showed that Meta-Embedding can significantly improve both the cold-start and warm-up performances for six existing CTR prediction models, ranging from lightweight models such as Factorization Machines to complicated deep models such as PNN and DeepFM. All of the above apply to conversion rate (CVR) predictions as well.

24 citations

Proceedings ArticleDOI
20 Apr 2022
TL;DR: This paper proposes an entity debiasing framework (ENDEF) which generalizes fake news detection models to the future data by mitigating entity bias from a cause-effect perspective and demonstrates that the proposed framework can largely improve the performance of base fake news detectors, and online tests verify its superiority in practice.
Abstract: The wide dissemination of fake news is increasingly threatening both individuals and society. Fake news detection aims to train a model on the past news and detect fake news of the future. Though great efforts have been made, existing fake news detection methods overlooked the unintended entity bias in the real-world data, which seriously influences models' generalization ability to future data. For example, 97% of news pieces in 2010-2017 containing the entity 'Donald Trump' are real in our data, but the percentage falls down to merely 33% in 2018. This would lead the model trained on the former set to hardly generalize to the latter, as it tends to predict news pieces about 'Donald Trump' as real for lower training loss. In this paper, we propose an entity debiasing framework (ENDEF) which generalizes fake news detection models to the future data by mitigating entity bias from a cause-effect perspective. Based on the causal graph among entities, news contents, and news veracity, we separately model the contribution of each cause (entities and contents) during training. In the inference stage, we remove the direct effect of the entities to mitigate entity bias. Extensive offline experiments on the English and Chinese datasets demonstrate that the proposed framework can largely improve the performance of base fake news detectors, and online tests verify its superiority in practice. To the best of our knowledge, this is the first work to explicitly improve the generalization ability of fake news detection models to the future data. The code has been released at https://github.com/ICTMCG/ENDEF-SIGIR2022.

16 citations

Journal ArticleDOI
TL;DR: The authors designed a search space over augmentation policies by integrating several common augmentation operations, and adopted a population-based training method to search the best augmentation schedule for text classification and machine translation tasks.

4 citations

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper explored the intrinsic correlations of different types of knowledge by self-supervised learning, and proposed the model SSCR, which stands for Self-Supervised learning for Conversational Recommendation.
Abstract: Conversational recommender system (CRS) aims to model user preference through interactive conversations. Although there are some works, they still have two drawbacks: (1) they rely on large amounts of training data and suffer from data sparsity problem; and (2) they do not fully leverage different types of knowledge extracted from dialogues. To address these issues in CRS, we explore the intrinsic correlations of different types of knowledge by self-supervised learning, and propose the model SSCR, which stands for Self-Supervised learning for Conversational Recommendation. The main idea is to jointly consider both the semantic and structural knowledge via three self-supervision signals in both recommendation and dialogue modules. First, we carefully design two auxiliary self-supervised objectives: token-level task and sentence-level task, to explore the semantic knowledge. Then, we extract the structural knowledge based on external knowledge graphs from user mentioned entities. Finally, we model the inter-information between the semantic and structural knowledge with the advantages of contrastive learning. As existing similarity functions fail to achieve this goal, we propose a novel similarity function based on negative log-likelihood loss. Comprehensive experimental results on two real-world CRS datasets (including both English and Chinese with about 10,000 dialogues) show the superiority of our proposed method. Concretely, in recommendation, SSCR gets an improvement about 5%∼15% compared with state-of-the-art baselines on hit rate, mean reciprocal rank and normalized discounted cumulative gain. In dialogue generation, SSCR outperforms baselines on both automatic evaluations (distinct n-gram, BLEU and perplexity) and human evaluations (fluency and informativeness).

4 citations


Cited by
More filters
Proceedings ArticleDOI
23 Aug 2020
TL;DR: This work proposes a novel semantic-enhanced tasks constructor and a co-adaptation meta-learner to address the two questions for how to capture HIN-based semantics in the meta-learning setting, and how to learn the general knowledge that can be easily adapted to multifaceted semantics.
Abstract: Cold-start recommendation has been a challenging problem due to sparse user-item interactions for new users or items. Existing efforts have alleviated the cold-start issue to some extent, most of which approach the problem at the data level. Earlier methods often incorporate auxiliary data as user or item features, while more recent methods leverage heterogeneous information networks (HIN) to capture richer semantics via higher-order graph structures. On the other hand, recent meta-learning paradigm sheds light on addressing cold-start recommendation at the model level, given its ability to rapidly adapt to new tasks with scarce labeled data, or in the context of cold-start recommendation, new users and items with very few interactions. Thus, we are inspired to develop a novel meta-learning approach named MetaHIN to address cold-start recommendation on HINs, to exploit the power of meta-learning at the model level and HINs at the data level simultaneously. The solution is non-trivial, for how to capture HIN-based semantics in the meta-learning setting, and how to learn the general knowledge that can be easily adapted to multifaceted semantics, remain open questions. In MetaHIN, we propose a novel semantic-enhanced tasks constructor and a co-adaptation meta-learner to address the two questions. Extensive experiments demonstrate that MetaHIN significantly outperforms the state of the arts in various cold-start scenarios. (Code and dataset are available at https://github.com/rootlu/MetaHIN.)

166 citations

Proceedings ArticleDOI
25 Jul 2020
TL;DR: The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile.
Abstract: In this paper, we study collaborative filtering in an interactive setting, in which the recommender agents iterate between making recommendations and updating the user profile based on the interactive feedback. The most challenging problem in this scenario is how to suggest items when the user profile has not been well established, \ie recommend for cold-start users or warm-start users with taste drifting. Existing approaches either rely on overly pessimistic linear exploration strategy or adopt meta-learning based algorithms in a full exploitation way. In this work, to quickly catch up with the user's interests, we proposed to represent the exploration policy with a neural network and directly learn it from the feedback data. Specifically, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems. The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning. Extensive experiments and analysis conducted on three benchmark collaborative filtering datasets have demonstrated the advantage of our method over state-of-the-art methods.

82 citations

Proceedings ArticleDOI
25 Jul 2020
TL;DR: This work proposes a new training method, aiming to abandon the historical data during retraining through learning to transfer the past training experience, and designs a neural network-based transfer component, which transforms the old model to a new model that is tailored for future recommendations.
Abstract: Practical recommender systems need be periodically retrained to refresh the model with new interaction data. To pursue high model fidelity, it is usually desirable to retrain the model on both historical and new data, since it can account for both long-term and short-term user preference. However, a full model retraining could be very time-consuming and memory-costly, especially when the scale of historical data is large. In this work, we study the model retraining mechanism for recommender systems, a topic of high practical values but has been relatively little explored in the research community. Our first belief is that retraining the model on historical data is unnecessary, since the model has been trained on it before. Nevertheless, normal training on new data only may easily cause overfitting and forgetting issues, since the new data is of a smaller scale and contains fewer information on long-term user preference. To address this dilemma, we propose a new training method, aiming to abandon the historical data during retraining through learning to transfer the past training experience.Specifically, we design a neural network-based transfer component, which transforms the old model to a new model that is tailored for future recommendations. To learn the transfer component well, we optimize the "future performance'' -- i.e., the recommendation accuracy evaluated in the next time period. Our Sequential Meta-Learning(SML) method offers a general training paradigm that is applicable to any differentiable model. We demonstrate SML on matrix factorization and conduct experiments on two real-world datasets. Empirical results show that SML not only achieves significant speed-up, but also outperforms the full model retraining in recommendation accuracy, validating the effectiveness of our proposals. We release our codes at: https://github.com/zyang1580/SML.

71 citations

Proceedings ArticleDOI
11 Jul 2021
TL;DR: Wang et al. as mentioned in this paper proposed Meta Scaling and Shifting Networks to generate scaling and shifting functions for each item, respectively, which can directly transform cold item ID embeddings into warm feature space which can fit the model better.
Abstract: Recently, embedding techniques have achieved impressive success in recommender systems. However, the embedding techniques are data demanding and suffer from the cold-start problem. Especially, for the cold-start item which only has limited interactions, it is hard to train a reasonable item ID embedding, called cold ID embedding, which is a major challenge for the embedding techniques. The cold item ID embedding has two main problems: (1) A gap is existing between the cold ID embedding and the deep model. (2) Cold ID embedding would be seriously affected by noisy interaction. However, most existing methods do not consider both two issues in the cold-start problem, simultaneously. To address these problems, we adopt two key ideas: (1) Speed up the model fitting for the cold item ID embedding (fast adaptation). (2) Alleviate the influence of noise. Along this line, we propose Meta Scaling and Shifting Networks to generate scaling and shifting functions for each item, respectively. The scaling function can directly transform cold item ID embeddings into warm feature space which can fit the model better, and the shifting function is able to produce stable embeddings from the noisy embeddings. With the two meta networks, we propose Meta Warm Up Framework (MWUF) which learns to warm up cold ID embeddings. Moreover, MWUF is a general framework that can be applied upon various existing deep recommendation models. The proposed model is evaluated on three popular benchmarks, including both recommendation and advertising datasets. The evaluation results demonstrate its superior performance and compatibility.

52 citations

Proceedings ArticleDOI
TL;DR: In this paper, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems.
Abstract: In this paper, we study collaborative filtering in an interactive setting, in which the recommender agents iterate between making recommendations and updating the user profile based on the interactive feedback. The most challenging problem in this scenario is how to suggest items when the user profile has not been well established, i.e., recommend for cold-start users or warm-start users with taste drifting. Existing approaches either rely on overly pessimistic linear exploration strategy or adopt meta-learning based algorithms in a full exploitation way. In this work, to quickly catch up with the user's interests, we propose to represent the exploration policy with a neural network and directly learn it from the feedback data. Specifically, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems. The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning. Extensive experiments and analysis conducted on three benchmark collaborative filtering datasets have demonstrated the advantage of our method over state-of-the-art methods.

46 citations