M
Mohammad Ghavamzadeh
Researcher at Google
Publications - 207
Citations - 8517
Mohammad Ghavamzadeh is an academic researcher from Google. The author has contributed to research in topics: Reinforcement learning & Markov decision process. The author has an hindex of 45, co-authored 186 publications receiving 6307 citations. Previous affiliations of Mohammad Ghavamzadeh include University of Alberta & University of Massachusetts Amherst.
Papers
More filters
Journal ArticleDOI
A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges
Moloud Abdar,Farhad Pourpanah,Sadiq Hussain,Dana Rezazadegan,Li Liu,Mohammad Ghavamzadeh,Paul Fieguth,Xiaochun Cao,Abbas Khosravi,U. Rajendra Acharya,U. Rajendra Acharya,U. Rajendra Acharya,Vladimir Makarenkov,Saeid Nahavandi +13 more
TL;DR: This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning (RL), and outlines a few important applications of UZ methods.
Journal ArticleDOI
Natural actor-critic algorithms
TL;DR: Four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas are presented, and their convergence proofs are provided, providing the first convergence proofs and the first fully incremental algorithms.
Proceedings Article
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
TL;DR: A performance bound is proved for the two versions of the UGapE algorithm showing that the two problems are characterized by the same notion of complexity.
Posted Content
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
TL;DR: In this paper, the authors present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented via a chance constraint or a constraint on the conditional value-at-risk (CVaR) of the cumulative cost.
Proceedings Article
High confidence off-policy evaluation
TL;DR: This paper proposes an off-policy method for computing a lower confidence bound on the expected return of a policy and provides confidences regarding the accuracy of their estimates.