L
Lantao Yu
Researcher at Stanford University
Publications - 41
Citations - 4493
Lantao Yu is an academic researcher from Stanford University. The author has contributed to research in topics: Reinforcement learning & Generative model. The author has an hindex of 15, co-authored 38 publications receiving 3063 citations. Previous affiliations of Lantao Yu include Shanghai Jiao Tong University.
Papers
More filters
Proceedings Article
Seqgan: sequence generative adversarial nets with policy gradient
TL;DR: SeqGAN as mentioned in this paper models the data generator as a stochastic policy in reinforcement learning (RL), and the RL reward signal comes from the discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search.
Posted Content
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
TL;DR: Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update.
Proceedings ArticleDOI
IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models
TL;DR: A unified framework takes advantage of both schools of thinking in information retrieval modelling and shows that the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model to achieve a better estimation for document ranking.
Proceedings ArticleDOI
IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models
TL;DR: In this paper, a game theoretical minimax game is proposed to iteratively optimise both generative and discriminative models for document ranking, and the generative model is trained to fit the relevance distribution over documents via the signals from the discriminator.
Proceedings Article
MOPO: Model-based Offline Policy Optimization
Tianhe Yu,Garrett Thomas,Lantao Yu,Stefano Ermon,James Zou,Sergey Levine,Chelsea Finn,Tengyu Ma +7 more
TL;DR: Model-based offline policy optimization (MOPO) as discussed by the authors proposes to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics and theoretically shows that the algorithm maximizes a lower bound of the policy's return under the true MDP.