F
Francisco S. Melo
Researcher at Instituto Superior Técnico
Publications - 148
Citations - 2336
Francisco S. Melo is an academic researcher from Instituto Superior Técnico. The author has contributed to research in topics: Reinforcement learning & Markov decision process. The author has an hindex of 21, co-authored 138 publications receiving 1844 citations. Previous affiliations of Francisco S. Melo include Carnegie Mellon University & Technical University of Lisbon.
Papers
More filters
Proceedings ArticleDOI
An analysis of reinforcement learning with function approximation
TL;DR: The convergence properties of several variations of Q-learning when combined with function approximation are analyzed, extending the analysis of TD-learning in (Tsitsiklis & Van Roy, 1996a) to stochastic control settings.
Book ChapterDOI
Active Learning for Reward Estimation in Inverse Reinforcement Learning
TL;DR: An algorithm is proposed that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at "arbitrary" states, to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert.
Proceedings ArticleDOI
Affordance-based imitation learning in robots
TL;DR: An imitation learning algorithm for a humanoid robot on top of a general world model provided by learned object affordances, which is used to recognize the demonstration by another agent and infer the task to be learned.
Journal ArticleDOI
Decentralized MDPs with sparse interactions
Francisco S. Melo,Manuela Veloso +1 more
TL;DR: A new decision-theoretic model for decentralized sparse-interaction multiagent systems, Dec-SIMDPs, is contributed that explicitly distinguishes the situations in which the agents in the team must coordinate from those in which they can act independently.
Book ChapterDOI
Q-learning with linear function approximation
TL;DR: A set of conditions that implies the convergence of Q-learning with linear function approximation with probability 1, when a fixed learning policy is used are identified.