L
Long Xia
Researcher at York University
Publications - 31
Citations - 1690
Long Xia is an academic researcher from York University. The author has contributed to research in topics: Reinforcement learning & Recommender system. The author has an hindex of 17, co-authored 30 publications receiving 905 citations. Previous affiliations of Long Xia include Chinese Academy of Sciences.
Papers
More filters
Proceedings ArticleDOI
Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning
TL;DR: Zhang et al. as discussed by the authors model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage reinforcement learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedback.
Proceedings ArticleDOI
Deep reinforcement learning for page-wise recommendations
TL;DR: A principled approach to jointly generate a set of complementary items and the corresponding strategy to display them in a 2-D page is proposed and a novel page-wise recommendation framework based on deep reinforcement learning, DeepPage, which can optimize a page of items with proper display based on real-time feedback from users is proposed.
Proceedings ArticleDOI
Deep Reinforcement Learning for Page-wise Recommendations
TL;DR: Zhang et al. as discussed by the authors proposed a novel page-wise recommendation framework based on deep reinforcement learning, DeepPage, which can optimize a page of items with proper display based on real-time feedback from users.
Proceedings ArticleDOI
Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems
TL;DR: Extensive experiments on synthetic data and a real-world large scale data show that FeedRec effectively optimizes the long-term user engagement and outperforms state-of-the-arts.
Proceedings ArticleDOI
Pseudo Dyna-Q: A Reinforcement Learning Framework for Interactive Recommendation
TL;DR: The proposed PDQ not only avoids the instability of convergence and high computation cost of existing approaches but also provides unlimited interactions without involving real customers, and a proved upper bound of empirical error of reward function guarantees that the learned offline policy has lower bias and variance.