This paper presents control and coordination algorithms for groups of vehicles. The focus is on autonomous vehicle networks performing distributed sensing tasks where each vehicle plays the role of a mobile tunable sensor. The paper proposes gradient descent algorithms for a class of utility functions which encode optimal coverage and sensing policies. The resulting closed-loop behavior is adaptive, distributed, asynchronous, and verifiably correct.

Coverage control for mobile sensing networks

Recent years have witnessed significant advances in reinforcement learning (RL), which has registered tremendous success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-agent RL techniques. Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. In this chapter, we provide a selective overview of MARL, with focus on algorithms backed by theoretical analysis. More specifically, we review the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. We also introduce several significant but challenging applications of these algorithms. Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. Some of the new angles extrapolate from our own research endeavors and interests. Our overall goal with this chapter is, beyond providing an assessment of the current state of the field on the mark, to identify fruitful future research directions on theoretical studies of MARL. We expect this chapter to serve as continuing stimulus for researchers interested in working on this exciting while challenging topic.

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A survey of distributed optimization

Recursive algorithms where random observations enter are studied in a fairly general framework. An important feature is that the observations may depend on previous ?outputs? of the algorithm. The considered class of algorithms contains, e.g., stochastic approximation algorithms, recursive identification algorithms, and algorithms for adaptive control of linear systems. It is shown how a deterministic differential equation can be associated with the algorithm. Problems like convergence with probality one, possible convergence points and asymptotic behavior of the algorithm can all be studied in terms of this differential equation. Theorems stating the precise relationships between the differential equation and the algorithm are given as well as examples of applications of the results to problems in identification and adaptive control.

/pdf/analysis-of-recursive-stochastic-algorithms-2jezx55px8.pdf

Analysis of Recursive Stochastic Algorithms

Federated learning is a distributed framework according to which a model is trained over a set of devices, while keeping data localized This framework faces several systems-oriented challenges which include (i) communication bottleneck since a large number of devices upload their local updates to a parameter server, and (ii) scalability as the federated network consists of millions of devices Due to these systems challenges as well as issues related to statistical heterogeneity of data and privacy concerns, designing a provably efficient federated learning method is of significant importance yet it remains challenging In this paper, we present FedPAQ, a communication-efficient Federated Learning method with Periodic Averaging and Quantization FedPAQ relies on three key features: (1) periodic averaging where models are updated locally at devices and only periodically averaged at the server; (2) partial device participation where only a fraction of devices participate in each round of the training; and (3) quantized message-passing where the edge nodes quantize their updates before uploading to the parameter server These features address the communications and scalability challenges in federated learning We also show that FedPAQ achieves near-optimal theoretical guarantees for strongly convex and non-convex loss functions and empirically demonstrate the communication-computation tradeoff provided by our method

/pdf/fedpaq-a-communication-efficient-federated-learning-method-28kg1d0cjx.pdf

FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization.

/pdf/finite-time-analysis-of-distributed-td-0-with-linear-1nlqsrloas.pdf

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning

In this paper, we consider the model-free reinforcement learning problem and study the popular Q-learning algorithm with linear function approximation for estimating the optimal policy. Despite its popularity, it is known that Q-learning with linear function approximation may diverge in general due to off-policy sampling. Our main contribution is to provide a finite-time bound and the convergence rate on the performance of Q-learning with linear function approximation under an assumption on the behavior policy. Unlike some prior work in the literature, we do not need to make the unnatural assumption that the samples are i.i.d. (since they are Markovian), and do not require an additional projection step in the algorithm. To show this result, we first consider a more general nonlinear stochastic approximation algorithm with Markovian noise, and derive a finite-time bound on the mean-square error, which we believe is of independent interest. Our proof is based on Lyapunov drift arguments and exploits the geometric mixing of the underlying Markov chain. We also provide numerical simulations to illustrate the effectiveness of our assumption on the behavior policy, and demonstrate the rate of convergence of Q-learning with linear function approximation.

Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis

We study distributed optimization problems over a network when the communication between the nodes is constrained, and therefore, information that is exchanged between the nodes must be quantized. Recent advances using the distributed gradient algorithm with a quantization scheme at a fixed resolution have established convergence, but at rates significantly slower than when the communications are unquantized. In this article, we introduce a novel quantization method, which we refer to as adaptive quantization, that allows us to match the convergence rates under perfect communications. Our approach adjusts the quantization scheme used by each node as the algorithm progresses: as we approach the solution, we become more certain about where the state variables are localized and adapt the quantizer codebook accordingly. We bound the convergence rates of the proposed method as a function of the communication bandwidth, the underlying network topology, and structural properties of the constituent objective functions. In particular, we show that if the objective functions are convex or strongly convex, then using adaptive quantization does not affect the rate of convergence of the distributed subgradient methods when the communications are quantized, except for a constant that depends on the resolution of the quantizer. To the best of our knowledge, the rates achieved in this article are better than any existing work in the literature for distributed gradient methods under finite communication bandwidths. We also provide numerical simulations that compare convergence properties of the distributed gradient methods with and without quantization for solving distributed regression problems for both quadratic and absolute loss functions.

Fast Convergence Rates of Distributed Subgradient Methods With Adaptive Quantization

Motivated by applications in reinforcement learning (RL), we study a nonlinear stochastic approximation (SA) algorithm under Markovian noise, and establish its finite-sample convergence bounds under various stepsizes. Specifically, we show that when using constant stepsize (i.e., $\epsilon_k\equiv \epsilon$), the algorithm achieves exponential fast convergence with asymptotic accuracy $\mathcal{O}(\epsilon\log(1/\epsilon))$. When using diminishing stepsizes with appropriate decay rate, the algorithm converges with rate $\mathcal{O}(\log(k)/k)$. Our proof is based on the Lyapunov drift arguments, and to handle the Markovian noise, we exploit the fast mixing of the underlying Markov chain. To demonstrate the generality of our theoretical results on Markovian SA, we use it to derive the finite-sample bounds of the popular $Q$-learning with linear function approximation algorithm, under a condition on the behavior policy. Importantly, we do not need to make the unrealistic assumption that the samples are i.i.d., and do not require an additional projection step in the algorithm to maintain the boundedness of the iterates. Numerical simulations corroborate our theoretical findings.

Thinh T. Doan

Papers

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning

Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis

Fast Convergence Rates of Distributed Subgradient Methods With Adaptive Quantization

Finite-Sample Analysis of Nonlinear Stochastic Approximation with Applications in Reinforcement Learning

Distributed resource allocation on dynamic networks in quadratic time