C
Chenjun Xiao
Researcher at University of Alberta
Publications - 24
Citations - 300
Chenjun Xiao is an academic researcher from University of Alberta. The author has contributed to research in topics: Computer science & Softmax function. The author has an hindex of 7, co-authored 15 publications receiving 157 citations. Previous affiliations of Chenjun Xiao include Northeastern University (China).
Papers
More filters
Proceedings Article
On the Global Convergence Rates of Softmax Policy Gradient Methods
TL;DR: It is shown that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization, which significantly expands the recent asymptotic convergence results.
Posted Content
Learning to Combat Compounding-Error in Model-Based Reinforcement Learning
TL;DR: Experimental results show that the proposed method can successfully adapt the planning horizon to account for state-dependent model accuracy, significantly improving the efficiency of policy learning compared to model-based and model-free baselines.
Proceedings Article
Escaping the Gravitational Pull of Softmax
TL;DR: The softmax is the standard transformation used in machine learning to map realvalued vectors to categorical distributions that poses serious drawbacks for gradient descent (ascent) optimization, and an alternative transformation, the escort mapping, is investigated that demonstrates better optimization properties.
Proceedings Article
Memory-Augmented Monte Carlo Tree Search
TL;DR: Experimental results show that MMCTS outperforms the original MCTS with the same number of simulations, and it is shown that the memory based value approximation is better than the vanilla Monte Carlo estimation with high probability under mild conditions.
Proceedings ArticleDOI
On principled entropy exploration in policy optimization
TL;DR: Experimental evaluations demonstrate that the proposed ECPO method significantly improves practical exploration and surpasses the empirical performance of state-of-the art policy optimization methods in a set of benchmark tasks.