scispace - formally typeset
L

Long Xia

Researcher at York University

Publications -  31
Citations -  1690

Long Xia is an academic researcher from York University. The author has contributed to research in topics: Reinforcement learning & Recommender system. The author has an hindex of 17, co-authored 30 publications receiving 905 citations. Previous affiliations of Long Xia include Chinese Academy of Sciences.

Papers
More filters
Proceedings ArticleDOI

Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning

TL;DR: Zhang et al. as discussed by the authors model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage reinforcement learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedback.
Proceedings ArticleDOI

Deep reinforcement learning for page-wise recommendations

TL;DR: A principled approach to jointly generate a set of complementary items and the corresponding strategy to display them in a 2-D page is proposed and a novel page-wise recommendation framework based on deep reinforcement learning, DeepPage, which can optimize a page of items with proper display based on real-time feedback from users is proposed.
Proceedings ArticleDOI

Deep Reinforcement Learning for Page-wise Recommendations

TL;DR: Zhang et al. as discussed by the authors proposed a novel page-wise recommendation framework based on deep reinforcement learning, DeepPage, which can optimize a page of items with proper display based on real-time feedback from users.
Proceedings ArticleDOI

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

TL;DR: Extensive experiments on synthetic data and a real-world large scale data show that FeedRec effectively optimizes the long-term user engagement and outperforms state-of-the-arts.
Proceedings ArticleDOI

Pseudo Dyna-Q: A Reinforcement Learning Framework for Interactive Recommendation

TL;DR: The proposed PDQ not only avoids the instability of convergence and high computation cost of existing approaches but also provides unlimited interactions without involving real customers, and a proved upper bound of empirical error of reward function guarantees that the learned offline policy has lower bias and variance.