scispace - formally typeset
L

Lantao Yu

Researcher at Stanford University

Publications -  41
Citations -  4493

Lantao Yu is an academic researcher from Stanford University. The author has contributed to research in topics: Reinforcement learning & Generative model. The author has an hindex of 15, co-authored 38 publications receiving 3063 citations. Previous affiliations of Lantao Yu include Shanghai Jiao Tong University.

Papers
More filters
Proceedings Article

Seqgan: sequence generative adversarial nets with policy gradient

TL;DR: SeqGAN as mentioned in this paper models the data generator as a stochastic policy in reinforcement learning (RL), and the RL reward signal comes from the discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search.
Posted Content

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

TL;DR: Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update.
Proceedings ArticleDOI

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

TL;DR: A unified framework takes advantage of both schools of thinking in information retrieval modelling and shows that the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model to achieve a better estimation for document ranking.
Proceedings ArticleDOI

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

TL;DR: In this paper, a game theoretical minimax game is proposed to iteratively optimise both generative and discriminative models for document ranking, and the generative model is trained to fit the relevance distribution over documents via the signals from the discriminator.
Proceedings Article

MOPO: Model-based Offline Policy Optimization

TL;DR: Model-based offline policy optimization (MOPO) as discussed by the authors proposes to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics and theoretically shows that the algorithm maximizes a lower bound of the policy's return under the true MDP.