scispace - formally typeset
Journal ArticleDOI

Multi-Agent Reinforcement Learning Based Resource Management in MEC- and UAV-Assisted Vehicular Networks

Haixia Peng, +1 more
- 01 Jan 2021 - 
- Vol. 39, Iss: 1, pp 131-141
Reads0
Chats0
TLDR
From the simulation results, the MADDPG-based method can converge within 200 training episodes, comparable to the single-agent DDPG (SADDPG)-based one, and can achieve higher delay/QoS satisfaction ratios than the SADDPg-based and random schemes.
Abstract
In this paper, we investigate multi-dimensional resource management for unmanned aerial vehicles (UAVs) assisted vehicular networks. To efficiently provide on-demand resource access, the macro eNodeB and UAV, both mounted with multi-access edge computing (MEC) servers, cooperatively make association decisions and allocate proper amounts of resources to vehicles. Since there is no central controller, we formulate the resource allocation at the MEC servers as a distributive optimization problem to maximize the number of offloaded tasks while satisfying their heterogeneous quality-of-service (QoS) requirements, and then solve it with a multi-agent deep deterministic policy gradient (MADDPG)-based method. Through centrally training the MADDPG model offline, the MEC servers, acting as learning agents, then can rapidly make vehicle association and resource allocation decisions during the online execution stage. From our simulation results, the MADDPG-based method can converge within 200 training episodes, comparable to the single-agent DDPG (SADDPG)-based one. Moreover, the proposed MADDPG-based resource management scheme can achieve higher delay/QoS satisfaction ratios than the SADDPG-based and random schemes.

read more

Citations
More filters
Journal ArticleDOI

Enabling Massive IoT Toward 6G: A Comprehensive Survey

TL;DR: A use case of fully autonomous driving is presented to show 6G supports massive IoT and some breakthrough technologies, such as machine learning and blockchain, in 6G are introduced, where the motivations, applications, and open issues of these technologies for massive IoT are summarized.
Journal ArticleDOI

Optimizing Federated Learning in Distributed Industrial IoT: A Multi-Agent Approach

TL;DR: In this article, the authors proposed a reinforcement on federated learning (RoF) scheme, based on deep multi-agent reinforcement learning, to solve the problem of joint decision of device selection and computing and spectrum resource allocation in distributed industrial IoT networks.
Journal ArticleDOI

Resource Scheduling in Edge Computing: A Survey

TL;DR: In this article, the authors present the architecture of edge computing, under which different collaborative manners for resource scheduling are discussed, and introduce a unified model before summarizing the current works on resource scheduling from three research issues.
Journal ArticleDOI

RL/DRL Meets Vehicular Task Offloading Using Edge and Vehicular Cloudlet: A Survey

TL;DR: This work is the first to cover RL/DRL-based vehicular task offloading and provides lessons learned and open research challenges in this field and discusses the possible trend for future research.
Journal ArticleDOI

Multiagent Deep Reinforcement Learning for Task Offloading and Resource Allocation in Cybertwin-Based Networks

TL;DR: A hierarchical task offloading strategy is presented for delay-tolerant and delay-sensitive missions by integrating edge computing and artificial intelligence into Cybertwin-based network to guarantee user Quality of Experience (QoE), low latency, and ultrareliable services.
References
More filters
Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.
Proceedings Article

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

TL;DR: Conditions under which modi cations to the reward function of a Markov decision process preserve the op timal policy are investigated to shed light on the practice of reward shap ing a method used in reinforcement learn ing whereby additional training rewards are used to guide the learning agent.
Proceedings Article

Continuous control with deep reinforcement learning

TL;DR: In this paper, an actor-critic, model-free algorithm based on the deterministic policy gradient is proposed to operate over continuous action spaces, which is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain.
Proceedings Article

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

TL;DR: In this article, an actor-critic method was used to learn multi-agent coordination policies in cooperative and competitive multi-player RL games, where agent populations are able to discover various physical and informational coordination strategies.
Posted Content

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

TL;DR: By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.
Related Papers (5)