Wireless sensor networks (WSNs) monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002–2013 of machine learning methods that were used to address common issues in WSNs. The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.

https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=3963&context=sis_research

Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

Recent years have witnessed significant advances in reinforcement learning (RL), which has registered tremendous success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-agent RL techniques. Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. In this chapter, we provide a selective overview of MARL, with focus on algorithms backed by theoretical analysis. More specifically, we review the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. We also introduce several significant but challenging applications of these algorithms. Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. Some of the new angles extrapolate from our own research endeavors and interests. Our overall goal with this chapter is, beyond providing an assessment of the current state of the field on the mark, to identify fruitful future research directions on theoretical studies of MARL. We expect this chapter to serve as continuing stimulus for researchers interested in working on this exciting while challenging topic.

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Currently, the network traffic control systems are mainly composed of the Internet core and wired/wireless heterogeneous backbone networks. Recently, these packet-switched systems are experiencing an explosive network traffic growth due to the rapid development of communication technologies. The existing network policies are not sophisticated enough to cope with the continually varying network conditions arising from the tremendous traffic growth. Deep learning, with the recent breakthrough in the machine learning/intelligence area, appears to be a viable approach for the network operators to configure and manage their networks in a more intelligent and autonomous fashion. While deep learning has received a significant research attention in a number of other domains such as computer vision, speech recognition, robotics, and so forth, its applications in network traffic control systems are relatively recent and garnered rather little attention. In this paper, we address this point and indicate the necessity of surveying the scattered works on deep learning applications for various network traffic control aspects. In this vein, we provide an overview of the state-of-the-art deep learning architectures and algorithms relevant to the network traffic control systems. Also, we discuss the deep learning enablers for network systems. In addition, we discuss, in detail, a new use case, i.e., deep learning based intelligent routing. We demonstrate the effectiveness of the deep learning-based routing approach in contrast with the conventional routing strategy. Furthermore, we discuss a number of open research issues, which researchers may find useful in the future.

State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow’s Intelligent Network Traffic Control Systems

When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades. The fundamental difficulty is that the Bellman operator may become an expansion in general, resulting in oscillating and even divergent behavior of popular algorithms like Q-learning. In this paper, we revisit the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov’s smoothing technique and the Legendre-Fenchel transformation. We then develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem where any differentiable function class may be used. We provide what we believe to be the first convergence guarantee for general nonlinear function approximation, and analyze the algorithm’s sample complexity. Empirically, our algorithm compares favorably to state-of-the-art baselines in several benchmark control problems.

/pdf/sbeed-convergent-reinforcement-learning-with-nonlinear-48y52nu0v0.pdf

SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm widely used for policy evaluation in Markov reward processes. Bhandari et al. prove finite time convergence rates for TD learning w...

A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).

Distributed Policy Evaluation Under Multiple Behavior Strategies

Principal component analysis is a powerful technique for data analysis and compression, with a wide range of potential applications in wireless sensor networks. However, its centralized implementation, with a fusion center collecting all the samples, is inefficient in terms of energy consumption, scalability, and fault tolerance. Previous distributed approaches reduce the communication cost, but not the lack of flexibility, as they require multi-hop communications if the network is not fully connected. We present two fully distributed consensus-based algorithms that are guaranteed to converge to the global results, using only local communications among neighbors, regardless of the data distribution or the sparsity of the network: CB-DPCA is based on finding the eigenvectors of local covariance matrices, while CB-EM-DPCA is a distributed version of the expectation maximization algorithm. Both offer a flexible trade-off between the tightness of the achieved approximation and the associated communication cost.

Consensus-based distributed principal component analysis in wireless sensor networks

In a noncooperative dynamic game, multiple agents operating in a changing environment aim to optimize their utilities over an infinite time horizon. Time-varying environments allow to model more realistic scenarios (e.g., mobile devices equipped with batteries, wireless communications over a fading channel, etc.). However, solving a dynamic game is a difficult task that requires dealing with multiple coupled optimal control problems. We focus our analysis on a class of problems, named dynamic potential games, whose solution can be found through a single multivariate optimal control problem. Our analysis generalizes previous studies by considering that the set of environment's states and the set of players' actions are constrained, as it is required for many applications. We also show that the theoretical results are the natural extension of the analysis for static potential games. We apply the analysis and provide numerical methods to solve four example problems, with different features each: i) energy demand control in a smart-grid network; ii) network flow optimization in which the relays have bounded link capacity and limited battery life; iii) uplink multiple access communication with users that have to optimize the use of their batteries; and iv) two optimal scheduling games with time-varying channels.

Dynamic Potential Games With Constraints: Fundamentals and Applications in Communications

We propose a fully distributed actor-critic algorithm approximated by deep neural networks, named \textit{Diff-DAC}, with application to single-task and to average multitask reinforcement learning (MRL). Each agent has access to data from its local task only, but it aims to learn a policy that performs well on average for the whole set of tasks. During the learning process, agents communicate their value-policy parameters to their neighbors, diffusing the information across the network, so that they converge to a common policy, with no need for a central node. The method is scalable, since the computational and communication costs per agent grow with its number of neighbors. We derive Diff-DAC's from duality theory and provide novel insights into the standard actor-critic framework, showing that it is actually an instance of the dual ascent method that approximates the solution of a linear program. Experiments suggest that Diff-DAC can outperform the single previous distributed MRL approach (i.e., Dist-MTLPS) and even the centralized architecture.

Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning

Demand-side management presents significant benefits in reducing the energy load in smart grids by balancing consumption demands or including energy generation and/or storage devices in the user’s side. These techniques coordinate the energy load so that users minimize their monetary expenditure. However, these methods require accurate predictions in the energy consumption profiles, which make them inflexible to real demand variations. In this paper, we propose a realistic model that accounts for uncertainty in these variations and calculates a robust price for all users in the smart grid. We analyze the existence of solutions for this novel scenario, propose convergent distributed algorithms to find them, and perform simulations considering energy expenditure. We show that this model can effectively reduce the monetary expenses for all users in a real-time market, while at the same time it provides a reliable production cost estimate to the energy supplier.

Sergio Valcarcel Macua

Papers

Distributed Policy Evaluation Under Multiple Behavior Strategies

Consensus-based distributed principal component analysis in wireless sensor networks

Dynamic Potential Games With Constraints: Fundamentals and Applications in Communications

Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning

Robust Worst-Case Analysis of Demand-Side Management in Smart Grids