scispace - formally typeset
C

Chenjun Xiao

Researcher at University of Alberta

Publications -  24
Citations -  300

Chenjun Xiao is an academic researcher from University of Alberta. The author has contributed to research in topics: Computer science & Softmax function. The author has an hindex of 7, co-authored 15 publications receiving 157 citations. Previous affiliations of Chenjun Xiao include Northeastern University (China).

Papers
More filters
Proceedings Article

On the Global Convergence Rates of Softmax Policy Gradient Methods

TL;DR: It is shown that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization, which significantly expands the recent asymptotic convergence results.
Posted Content

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

TL;DR: Experimental results show that the proposed method can successfully adapt the planning horizon to account for state-dependent model accuracy, significantly improving the efficiency of policy learning compared to model-based and model-free baselines.
Proceedings Article

Escaping the Gravitational Pull of Softmax

TL;DR: The softmax is the standard transformation used in machine learning to map realvalued vectors to categorical distributions that poses serious drawbacks for gradient descent (ascent) optimization, and an alternative transformation, the escort mapping, is investigated that demonstrates better optimization properties.
Proceedings Article

Memory-Augmented Monte Carlo Tree Search

TL;DR: Experimental results show that MMCTS outperforms the original MCTS with the same number of simulations, and it is shown that the memory based value approximation is better than the vanilla Monte Carlo estimation with high probability under mild conditions.
Proceedings ArticleDOI

On principled entropy exploration in policy optimization

TL;DR: Experimental evaluations demonstrate that the proposed ECPO method significantly improves practical exploration and surpasses the empirical performance of state-of-the art policy optimization methods in a set of benchmark tasks.