scispace - formally typeset
F

Francisco S. Melo

Researcher at Instituto Superior Técnico

Publications -  148
Citations -  2336

Francisco S. Melo is an academic researcher from Instituto Superior Técnico. The author has contributed to research in topics: Reinforcement learning & Markov decision process. The author has an hindex of 21, co-authored 138 publications receiving 1844 citations. Previous affiliations of Francisco S. Melo include Carnegie Mellon University & Technical University of Lisbon.

Papers
More filters
Proceedings ArticleDOI

An analysis of reinforcement learning with function approximation

TL;DR: The convergence properties of several variations of Q-learning when combined with function approximation are analyzed, extending the analysis of TD-learning in (Tsitsiklis & Van Roy, 1996a) to stochastic control settings.
Book ChapterDOI

Active Learning for Reward Estimation in Inverse Reinforcement Learning

TL;DR: An algorithm is proposed that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at "arbitrary" states, to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert.
Proceedings ArticleDOI

Affordance-based imitation learning in robots

TL;DR: An imitation learning algorithm for a humanoid robot on top of a general world model provided by learned object affordances, which is used to recognize the demonstration by another agent and infer the task to be learned.
Journal ArticleDOI

Decentralized MDPs with sparse interactions

TL;DR: A new decision-theoretic model for decentralized sparse-interaction multiagent systems, Dec-SIMDPs, is contributed that explicitly distinguishes the situations in which the agents in the team must coordinate from those in which they can act independently.
Book ChapterDOI

Q-learning with linear function approximation

TL;DR: A set of conditions that implies the convergence of Q-learning with linear function approximation with probability 1, when a fixed learning policy is used are identified.