scispace - formally typeset
Open AccessJournal ArticleDOI

Federated Deep Reinforcement Learning for Internet of Things With Decentralized Cooperative Edge Caching

Reads0
Chats0
TLDR
This work proposes a federated deep-reinforcement-learning-based cooperative edge caching (FADE) framework that enables base stations to cooperatively learn a shared predictive model, and proves the expectation convergence of FADE.
Abstract
Edge caching is an emerging technology for addressing massive content access in mobile networks to support rapidly growing Internet-of-Things (IoT) services and applications. However, most current optimization-based methods lack a self-adaptive ability in dynamic environments. To tackle these challenges, current learning-based approaches are generally proposed in a centralized way. However, network resources may be overconsumed during the training and data transmission process. To address the complex and dynamic control issues, we propose a federated deep-reinforcement-learning-based cooperative edge caching (FADE) framework. FADE enables base stations (BSs) to cooperatively learn a shared predictive model by considering the first-round training parameters of the BSs as the initial input of the local training, and then uploads near-optimal local parameters to the BSs to participate in the next round of global training. Furthermore, we prove the expectation convergence of FADE. Trace-driven simulation results demonstrate the effectiveness of the proposed FADE framework on reducing the performance loss and average delay, offloading backhaul traffic, and improving the hit rate.

read more

Content maybe subject to copyright    Report

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail.
Powered by TCPDF (www.tcpdf.org)
This material is protected by copyright and other intellectual property rights, and duplication or sale of all or
part of any of the repository collections is not permitted, except that material may be duplicated by you for
your research use or educational purposes in electronic or print form. You must obtain permission for any
other use. Electronic or print copies may not be offered, whether for sale or otherwise to anyone who is not
an authorised user.
Wang, Xiaofei; Wang, Chenyang; Li, Xiuhua; Leung, Victor C.M.; Taleb, Tarik
Federated Deep Reinforcement Learning for Internet of Things with Decentralized
Cooperative Edge Caching
Published in:
IEEE Internet of Things Journal
DOI:
10.1109/JIOT.2020.2986803
Published: 01/10/2020
Document Version
Peer reviewed version
Please cite the original version:
Wang, X., Wang, C., Li, X., Leung, V. C. M., & Taleb, T. (2020). Federated Deep Reinforcement Learning for
Internet of Things with Decentralized Cooperative Edge Caching. IEEE Internet of Things Journal, 7(10), 9441-
9455. [9062302]. https://doi.org/10.1109/JIOT.2020.2986803

Federated Deep Reinforcement Learning for
Internet of Things with Decentralized Cooperative
Edge Caching
Xiaofei Wang, Senior Member, IEEE, Chenyang Wang, Student Member, IEEE, Xiuhua Li, Member, IEEE,
Victor C. M. Leung, Fellow, IEEE, and Tarik Taleb, Senior Member, IEEE
Abstract—Edge caching is an emerging technology for ad-
dressing massive content access in mobile networks to sup-
port rapidly growing Internet of Things (IoT) services and
applications. However, most current optimization-based methods
lack a self-adaptive ability in dynamic environments. To tackle
these challenges, current learning-based approaches are generally
proposed in a centralized way. However, network resources may
be overconsumed during the training and data transmission
process. To address the complex and dynamic control issues,
we propose a FederAted Deep reinforcement learning-based
cooperative Edge caching (FADE) framework. FADE enables base
stations (BSs) to cooperatively learn a shared predictive model by
considering the first-round training parameters of the BSs as the
initial input of the local training, and then uploads near-optimal
local parameters to the BSs to participate in the next round of
global training. Furthermore, we prove the convergence of the
proposed FADE, and it achieves the expectation convergence.
Trace-driven simulation results show that the proposed FADE
framework reduces 92% of performance loss and average 60%
system payment over the centralized deep reinforcement learning
(DRL) algorithm, achieves only a 4% performance loss of the
desirable omniscient oracle algorithm, and obtains 7%, 11%
and 9% network performance improvements compared to some
existing schemes, i.e., least recently used (LRU), least frequently
Part of this work was presented at the IEEE Wireless Communications and
Networking Conference (WCNC), April 15-18, 2019, Marrakesh, Morocco.
This work is supported in part by the National Key R & D Program of China
through grant No. 2019YFB2101901, 2018YFC0809803, 2018YFF0214700
and 2018YFF0214706, China NSFC through grants 61702364, 61902044
and 61672117, China NSFC GD Joint Fund U1701263, and Chongqing
Research Program of Basic Research and Frontier Technology through Grant
No. cstc2019jcyj-msxmX0589, Chinese National Engineering Laboratory for
Big Data System Computing Technology, Canadian NSERC, the European
Union’s Horizon 2020 Research and Innovation Program through the MonB5G
Project under Grant No. 871780, the Academy of Finland 6Genesis project
under Grant No. 318927, and the Academy of Finland CSN project under
Grant No. 311654. (Corresponding author: Xiuhua Li.)
X. Wang and C. Wang are with College of Intelligence and Computing,
Tianjin University, Tianjin, 300072 China (e-mail: {xiaofeiwang, chenyang-
wang}@tju.edu.cn).
X. Li is with State Key Laboratory of Power Transmission Equipment
and System Security and New Technology, Chongqing University, Chongqing
401331, China, with the School of Big Data & Software Engineering,
Chongqing University, Chongqing, 401331 China, and with Key Laboratory
of Dependable Service Computing in Cyber Physical Society (Chongqing
University), Ministry of Education, China (e-mail: lixiuhua1988@gmail.com).
V. C. M. Leung is with College of Computer Science & Software Engi-
neering, Shenzhen University, Shenzhen, 518060 China, and with Electrical
and Computer Engineering, The University of British Columbia, Vancouver,
BC, V6T 1Z4 Canada (e-mail: vleung@ieee.org).
T. Taleb is with the Department of Communications and Networking,
School of Electrical Engineering, Aalto University, 02150 Espoo, Finland
and Information Technology and Electrical Engineering, Oulu University,
Pentti Kaiteran katu 1, 90570 Oulu, Finland and the Department of Computer
and Information Security, Sejong University, 209 Neungdong-ro, Gunja-dong,
Gwangjin-gu, Seoul, 05006 South Korea (e-mail: Tarik.Taleb@aalto.fi).
used (LFU) and first-in first-out (FIFO), respectively.
Index Terms—Internet of Things, Edge Caching, Cooperative
Caching, Hit Rate, Deep Reinforcement Learning, Federated
Learning.
I. INTRODUCTION
With the rapid enhancements of wireless access technology
and the Internet of Things (IoT), massive Internet services and
applications are gradually migrating to mobile networks. Due
to the extensive access of sensors, massive interconnections
via IoT devices (e.g., smartphones, tablets and smartwatches)
are embedded in people’s daily lives. For instance, as reported
in [1], the number of IoT devices will surpass 10 billion by
2020, and higher real-time quality service requirements (e.g.,
heart rate monitor, step count and outdoor video live) from
these devices are required. Facing the rocket-rising network
traffic load and Quality of Service/Experience (QoS/QoE) of
user demands, enormous challenges have emerged for mobile
networks and the IoT [2]–[5].
In particular, due to the integration of powerful sensoring
and computing functions, IoT devices are equipped with intel-
ligent identification, behavior tracking and daily management,
heart rate monitoring, etc. [6]–[8]. Meanwhile, the application
of short-range communication technology enables these IoT
devices to form a variety of ad hoc network application sce-
narios (e.g., content airdrops and Apple Edge Cache Service).
For these scenarios, reliable content transmission may fail to
be provided due to the performance fluctuation of nearby IoT
devices. Thus, it is feasible to cache the content on multiple
nearby IoT devices with certain storage capabilities [9]–[11].
Moreover, edge computing has been regarded as a promising
technology that can bring computation and caching services in
proximity to the network edges (e.g., base stations (BSs) and
IoT devices) from the mobile network operator (MNO) or the
cloud. This paper considers that IoT devices are handled by
people to enable operations (e.g., request or receive messages
or sensoring). We consider a general cooperative edge caching-
supported IoT architecture illustrated in Fig. 1. To improve the
resiliency of QoS/QoE and provide the best performance for
IoT devices with content service requirements (e.g., system
update package) from the internet service provider (SP),
related applications exist. For instance, Apple Inc. recently
launched the Apple Edge Cache service [12], enabling the
delivery of Apple content services directly to the equipment
Copyright (c) 20xx IEEE. Personal use of this material is permitted. However, permission to use this material for any other
purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org.

2
)RU[J
Ș
Ș
345)UXK
Ș
Ș
Ș
IUTZKTZY
Fig. 1. Cooperative edge caching-supported IoT architecture.
within SP partner networks. Caching the requested contents
in edge nodes (e.g., BSs) are similar to that in IoT devices,
which improves the efficiency of content access compared to
excessive downloading via backhaul links [13]–[15].
Many efforts have been devoted to addressing the resource
allocation issues in traffic offloading for massive IoT devices
(especially for mobile devices). The studies in [16]–[19]
investigated the architectures of collaborative edge caching in
mobile networks. The authors in [20]–[23] optimized the issue
of content access delay by collaborative caching among BSs,
thereby improving the QoS of users. Other existing schemes
have been proposed to design and optimize edge caching
framework from the perspectives of energy consumption [24]
and context awareness [25].
Recently, learning-based approaches have also been widely
utilized to design and optimize edge caching [26]–[29]. For in-
stance, deep reinforcement learning (DRL) was considered for
comprehensive resource allocation, as in [30]. This approach
maximized the long-term reward of energy consumption and
required no prior knowledge of the considered networks. An
assumption was made that the devices are sufficiently powerful
to train the DRL agents independently. However, IoT devices
can only support lightweight neural networks with small-scale
data processing. In addition, most of the traditional DRL
algorithms train the data in BSs or a datacenter by sharing the
original data, leading to a large amount of network resource
consumption during the data transmission process [31], [32].
To cope with the dynamic environment and ensure data
localization training for IoT devices, we expect to optimize
the edge caching problem in a long-term and decentral-
ized fashion. Motivated by the aforementioned, we propose
a FederAted Deep reinforcement learning-based cooperative
Edge caching algorithm (FADE), which enables IoT devices
(or user equipment (UE))
1
to cooperatively learn a shared
model while keeping all the training data on the individual
device. The proposed FADE is performed in a decentralized
1
In this paper, we use UE to denote the IoT device hereafter.
model. First, a UE obtains the initial training model from the
local BS, improves it by learning from the local data in the
device, and summarizes the update. Then, only the update is
sent to the BS, where all the updates from participating UEs
will be averaged to improve the shared model.
The main contributions of this paper are summarized as
follows:
We investigate the issue of federated DRL for IoT with
decentralized cooperative edge caching. Particularly, we
model the content replacement problem as a Markov De-
cision Process (MDP) and propose a Federated Learning
framework based on Double Deep Q-Network (DQN) to
address the problem of data sampling in the uncontinuous
huge spaces.
We propose a FADE framework, which can enable fast
training and decouple the learning process from the data
stored in the cloud in a distributed-centralized way, which
keeps the data training in the local UEs. In addition, we
prove that the FADE is L-smooth and µ-strong and derive
its expectation of convergence.
Trace-driven simulation results show that the proposed
FADE framework reduces 92% loss performance and
average 60% system payment over the centralized DRL
algorithm and outperforms the existing LRU, LFU and
FIFO by 7%, 11% and 9% improvements, respectively.
The remainder of this paper is organized as follows. Sec. II
summarizes the previous work. We establish the system model
and formulate the optimization problem in Sec. III. The frame-
work design of the proposed FADE is presented in Sec. IV.
Trace-driven simulation results evaluate the effectiveness of
the proposed framework in Sec. V. Finally, Sec. VI concludes
this paper.
II. RELATED WORK
For caching-supported IoT networks, existing studies can be
divided into the following two categories.
The first category is to utilize traditional methods based on
convex optimization or probability modeling to address the
content placement problem for IoT networks. For instance,
[33] focused on maximizing traffic offloading and reducing
the system costs by designing a hierarchical edge caching
strategy. [34] considered the problem of the optimal bandwidth
allocation and minimized the average transmission delay by
deploying a greedy algorithm of cooperative edge caching.
Vural et al. [35] proposed a content replacement strategy
with multi-attribute joint optimization in terms of content
lifetime, the request rate of users, and the hop information
between the source user and the destination. To efficiently
optimize the caching resources of IoT devices, [36] exploited
probabilistic caching in heterogeneous IoT networks in order
to improve traffic offloading. The content caching problem in
IoT networks requires continuous optimization. In other words,
various attributes (e.g., content popularity and user mobility)
in IoT networks are constantly evolving. However, these
strategies are often difficult to adapt to dynamic environments
and hard to deploy due to the global information required in
practice.

3
The second classification is based on learning algorithms
such as machine learning/deep learning, which learns key at-
tribute features (e.g., user request behavior, content popularity,
and user mobility distribution) in the network to optimize
the content caching strategy. [37], [38] showed that RL has
great potential for the utilization in the scheme design of
content caching in BSs. Specifically, [37] proposed a cache
replacement strategy based on Q-Learning to reduce traffic
load in future cellular networks, and reinforcement learning
(RL) was also employed for cache placement [38] by using
a multi-armed bandit (MAB). Chen et al. [39] proposed a
popularity-based caching strategy for IoT networks by deploy-
ing deep neural networks to predict the near future popularity
of IoT data. However, most centralized learning algorithms are
prone to posing high cache diversity and storage utilization,
which leads to excessive network communication resource
consumption. On the other hand, the distributed learning
method requires much cache and action space, which will also
cause the above problems.
III. SYSTEM MODEL AND PROBLEM FORMULATION
In this section, we first introduce the topology of cooperative
edge caching-supported IoT systems and then discuss the delay
model. Next, the cache replacement model is demonstrated. Fi-
nally, we formulate the optimization problem of edge caching
for IoT systems. Some key parameters are listed in Table I.
A. Topology of Cooperative Edge Caching-Supported IoT
Systems
The topology of cooperative edge caching-supported IoT
networks is illustrated in Fig. 2. Particularly, in the considered
IoT network, a high number of geographically distributed UEs
(e.g., smartphones, tablets, and smartwatches.) are served by
some BSs via wireless cellular links, and BSs are connected
via wired optical cables. Here, each BS is deployed with an
edge server for computation and caching, and thus, each BS
can cache various contents to satisfy the demands of content
services for IoT devices. As a result, UEs can fetch their
requested contents either locally by edge servers or directly
by downloading the contents from SPs (in the cloud) to the
BSs via the MNO core.
Considering a hierarchical IoT network, N = {1, 2, ..., N
b
}
fully connected BSs with a finite cache size of C and U =
{1, 2, ..., N
u
} UEs are distributed in the service area. Denote
F = {1, 2, ..., F } as a library of contents that are supported
by Internet SPs and that all UEs may access in the system
for a relatively long time. Denote D
f
, f F as the size of
content f
2
.
Let (P
f
)
F ×1
be the global popularity, which indicates the
probability distribution of content f requested from all UEs in
the network, and let p
nf
be the local popularity of content f
under BS n. We consider that P
f
=
n∈N
p
nf
, and (P
f
)
F ×1
2
We consider D ={D
1
, D
2
, ..., D
f
, ...D
F
} to be the size of local datasets
as well, which will be introduced in Section IV-B.
TABLE I
KEY PARAMETERS AND NOTATIONS.
Notation Meaning
F Total number of contents
N
b
Number of BSs
N
u
Number of UEs
C Cache size
F Library of popular con-
tents
D
f
Size of content f
(P
f
)
F ×1
Global popularity of con-
tent f
M Wireless channels
d
c
n
, d
P
, d
b
Transmissions delay be-
tween BS n to UE, BS and
SPs, BS and BS, respec-
tively
v
u,n
Downlink data rate be-
tween BS n and UE u
P
u,f
Preference of UE u for
content f
s
c
i,n
The set of content caching
state in BS n with each
decision epoch i
χ
i
The state of BS during
each decision epoch i
Φ(χ
i
) = {a
local
i
, a
coBS
i
, a
SP
i
} System action with the s-
tate χ
i
R(χ, Φ) Reward function
Q(χ, Φ;w
w
w
i
) and
ˆ
Q(χ, Φ; bw
i
bw
i
bw
i
) Q values of mainNet and
TargetNet, respectively
L(w
w
w
i
) Loss function of Double
DQN
F
j
(w) Loss function of Federat-
ed DRL
follows the Mandelbrot-Zipf (MZipf) distribution
3
[42] as
P
f
=
(I
f
+ τ)
β
i∈F
(I
i
+ τ)
β
, f F, (1)
where I
f
is the rank of content f in descending order of
content popularity and τ and β denote the plateau factor and
skewness factor, respectively.
B. Delay Model
We consider the content access delay for a UE as the
round-trip time to receive the requested content. From Fig.
2, d
b
denotes the transmission delay of the BSs’ cooperation
and d
p
is the delay between the BS and SPs. The wireless
transmission delay d
c
can be regarded as the period of a
UE obtaining the content from the local BS. Considering
M = {1, 2, ..., M}, wireless channels are deployed, and
a
u
M is the channel that is assigned to UE u by BS.
Similar to [43], we can obtain the downlink data rate between
BS n and UE u as follows:
v
u,n
= B log
2
1 +
q
u
g
u,n
σ
2
+
v∈U\{u}:a
v
=a
u
q
v
g
v ,n
, (2)
where B denotes the channel bandwidth, σ
2
represents the
background noise power, q
u
is the power consumption of BS
n transmission to UE u, and the channel gain g
u,n
can be
determined by the distance l
u,n
between BS n and UE u.
3
Note that it is also widely used in mobile IoT scenarios [40] [41].

4
(GIQNG[R2OTQ
)KRR[RGX2OTQ
(9(92OTQ
4
;
܋
܊
ܘ
Ș
Ș
)RU[J
345)UXK
;
4
(9
;+
Fig. 2. Topology of the caching-supported IoT system.
Considering P
u,f
= r
u,f
/R
u
, f F reflects the preference
of UE u for content f, and
f∈F
P
u,f
= 1, where r
u,f
is
the number of UE u requests for content f and R
u
is the
total request number of UE u in the network. Furthermore,
we define P
u,nf
= P
u,f
a
un
as the local UE preference for
content f under BS n, a
un
is the association probability of
UE u and BS n. Thus, the wireless transmission delay d
c
n
can
be obtained as
d
c
n
=
u∈U
f∈F
P
u,nf
D
f
v
u,n
. (3)
C. Cache Replacement Model
We model the process of content cache replacement in a
BS as a Markov decision process (MDP) [44]. The state of
the cache and request, system action and feedback reward are
demonstrated.
1) Cache and Request State: During each decision epoch i,
we define the content cache state as s
c
i,n
:= {s
c
i,n,f
}, n N ,
f F. Here, s
c
i,n,f
is the cache state in BS n for content
f F, and s
c
i,n,f
= 1 represents that BS n caches the content
and f, s
c
i,n,f
= 0 otherwise. Furthermore, we use s
r
i,u
:=
{s
r
i,u,f
}, u U, f F to denote the request state from UE u,
where s
r
i,u,f
is the request state of u for content f. Thus, we
derive the cache and request state during each decision epoch
i as
χ
i
= (s
r
i,u
, s
c
i,n
) X
def
= {1, 2, .., F } ×
f∈F
P}. (4)
2) System Action: To adapt the continuous changes in the
dynamic environment, BSs can choose which contents should
be replaced and decide where the requests are processed (via
local BS, BS cooperation or SPs). We denote Φ(χ
i
) as the
system action with the state χ
i
, and the action space for all
cooperative BSs is defined as
Φ(χ
i
) = {a
local
i
, a
coBS
i
, a
SP
i
}, (5)
where there exist three different types of system action Φ(χ
i
),
shown as follows:
a) Local Processing Action: We denote a
local
i
def
=
[a
local
i,0
, a
local
i,1
, . . . , a
local
i,F
] as the local processing action when
the cache state controlled by the local BS is available, where
a
local
i,f
{0, 1}, f F, and a
local
i,f
= 1 indicates that content f
needs to be replaced by the current requested content, while
a
local
i,f
= 0 is the opposite. In this case, the content request is
processed locally.
b) Cooperation Processing Action: If the requested content
f is not cached in the local BS, the UE’s request needs
to be routed to its neighbor BS. We define a
coBS
i
def
=
[a
coBS
i,1
, . . . , a
coBS
i,N
] as the cooperation processing action,
where a
coBS
i,n
{0, 1}, and a
coBS
i,n
= 1 denotes that BS
n is selected to address the current UE’s request.
c) Remote Processing Action: If the UE cannot obtain the
requested content f from either the local BS or its neighbors.
The local BS decides whether to forward the request to SPs,
denoted as a
SP
i
{0, 1}, where a
SP
i
= 1 represents the request
that will be handled by SPs. In this case, the UE should obtain
the requested content f directly from the remote SPs.
3) System Reward: When the local BS takes action Φ(χ
i
)
upon state χ
i
, it will obtain the feedback reward. To satisfy
the QoS of UEs, our goal is to minimize the average content
access latency of the system.
Because of fiber communication, d
c
n
may be far greater than
d
b
and d
P
. Based on the (3) of the communication model
in Section II.B, to achieve the maximum system reward and
guarantee the objective of minimizing the average content
access delay, we use the negative exponential function to
normalize the reward function. Thus, we derive the reward
function as
R
n
(χ
i
, Φ (χ
i
))
=
p
nf
e
ξ
1
d
c
n
, Cellular Service
p
nf
e
(
ξ
1
d
c
n
+ξ
2
d
b
)
, BS BS Cooperation
p
nf
e
(
ξ
1
d
c
n
+ξ
3
d
P
)
, Backhaul Service
,
(6)
where ξ
1
+ ξ
2
+ ξ
3
= 1, ξ
1
ξ
2
< ξ
3
, and p
nf
e
ξ
1
d
c
n
is the
reward that a UE obtains content f from BS only via cellular
service; p
nf
e
(
ξ
1
d
c
n
+ξ
2
d
b
)
means the UE is served by the BS-
BS cooperation; When a UE has to be served by the MNO
core via backhaul links, the reward will be p
nf
e
(
ξ
1
d
c
n
+ξ
3
d
P
)
.
D. Problem Formulation
Based on (6), our optimization objective is to maximize the
expected long-term reward based on an arbitrary initial state
χ
1
as
R
long
= max E
Φ
lim
I→∞
1
I
I
i=1
R(χ
i
, Φ(χ
i
))|χ
1
= χ
, (7)
where R(χ
i
, Φ(χ
i
)) is the sum of R
n
(χ
i
, Φ (χ
i
)).
Moreover, a single-agent infinite-horizon MDP with a dis-
counted utility (8) can be generally utilized to approximate the
expected infinite-horizon undiscounted value, especially when
γ [0, 1) approaches 1.
V (χ, Φ) = E
Φ
i=1
(γ)
i1
· R(χ
i
, Φ(χ
i
))|χ
1
= χ
. (8)

Citations
More filters
Journal ArticleDOI

Federated Learning for Internet of Things: A Comprehensive Survey

TL;DR: In this paper, a comprehensive survey of the emerging applications of federated learning in IoT networks is provided, which explores and analyzes the potential of FL for enabling a wide range of IoT services, including IoT data sharing, data offloading and caching, attack detection, localization, mobile crowdsensing and IoT privacy and security.
Journal ArticleDOI

Federated Learning for Internet of Things: A Comprehensive Survey

TL;DR: In this paper, a comprehensive survey of the emerging applications of federated learning in IoT networks is provided, which explores and analyzes the potential of FL for enabling a wide range of IoT services, including IoT data sharing, data offloading and caching, attack detection, localization, mobile crowdsensing and IoT privacy and security.
Journal ArticleDOI

EEDTO: An Energy-Efficient Dynamic Task Offloading Algorithm for Blockchain-Enabled IoT-Edge-Cloud Orchestrated Computing

TL;DR: An energy-efficient dynamic task offloading algorithm is developed by choosing the optimal computing place in an online way, either on the IoT device, the MEC server or the MCC server with the goal of jointly minimizing the energy consumption and task response time.
Journal ArticleDOI

When Deep Reinforcement Learning Meets Federated Learning: Intelligent Multitimescale Resource Management for Multiaccess Edge Computing in 5G Ultradense Network

TL;DR: In this article, the authors proposed an intelligent UDEC (I-UDEC) framework, which integrates blockchain and artificial intelligence (AI) into 5G UDEC networks, and designed a novel two-timescale deep reinforcement learning (2Ts-DRL) approach.
Journal ArticleDOI

A Machine Learning Security Framework for Iot Systems

TL;DR: A novel machine learning (ML) based security framework that automatically copes with the expanding security aspects related to IoT domain that leverages both Software Defined Networking (SDN) and Network Function Virtualization (NFV) enablers for mitigating different threats.
References
More filters
Journal ArticleDOI

Human-level control through deep reinforcement learning

TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Posted Content

Communication-Efficient Learning of Deep Networks from Decentralized Data

TL;DR: This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.

Deep reinforcement learning with double Q-learning

TL;DR: In this article, the authors show that the DQN algorithm suffers from substantial overestimation in some games in the Atari 2600 domain, and they propose a specific adaptation to the algorithm and show that this algorithm not only reduces the observed overestimations, but also leads to much better performance on several games.
Proceedings Article

Communication-Efficient Learning of Deep Networks from Decentralized Data

TL;DR: In this paper, the authors presented a decentralized approach for federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets.
Posted Content

Federated Learning: Strategies for Improving Communication Efficiency

TL;DR: Two ways to reduce the uplink communication costs are proposed: structured updates, where the user directly learns an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, which learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What are the contributions mentioned in the paper "Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching" ?

To address the complex and dynamic control issues, the authors propose a FederAted Deep reinforcement learning-based cooperative Edge caching ( FADE ) framework. Furthermore, the authors prove the convergence of the proposed FADE, and it achieves the expectation convergence. Trace-driven simulation results show that the proposed FADE framework reduces 92 % of performance loss and average 60 % system payment over the centralized deep reinforcement learning ( DRL ) algorithm, achieves only a 4 % performance loss of the desirable omniscient oracle algorithm, and obtains 7 %, 11 % and 9 % network performance improvements compared to some existing schemes, i. e., least recently used ( LRU ), least frequently Part of this work was presented at the IEEE Wireless Communications and Networking Conference ( WCNC ), April 15-18, 2019, Marrakesh, Morocco. This work is supported in part by the National Key R & D Program of China through grant No. 2019YFB2101901, 2018YFC0809803, 2018YFF0214700 and 2018YFF0214706, China NSFC through grants 61702364, 61902044 and 61672117, China NSFC GD Joint Fund U1701263, and Chongqing Research Program of Basic Research and Frontier Technology through Grant No. cstc2019jcyj-msxmX0589, Chinese National Engineering Laboratory for Big Data System Computing Technology, Canadian NSERC, the European Union ’ s Horizon 2020 Research and Innovation Program through the MonB5G Project under Grant No. 871780, the Academy of Finland 6Genesis project under Grant No. 318927, and the Academy of Finland CSN project under Grant No. 311654. 

The first category is to utilize traditional methods based on convex optimization or probability modeling to address the content placement problem for IoT networks. 

The double DQN consists of a fully connected feed forward neural network with a 1 mid-layer consisting of 200 neurons, which is used to construct the Q network and Q̂ network. 

The proposed FADE algorithm outperforms the other algorithms; it achieves the lowest average delay of 0.29s, improving the performance of 29%, 27% and 26% compared to LRU, FIFO and LFU, respectively. 

a single-agent infinite-horizon MDP with a discounted utility (8) can be generally utilized to approximate the expected infinite-horizon undiscounted value, especially when γ ∈ [0, 1) approaches 1.V (χ,Φ) = EΦ [ ∞∑ i=1 (γ) i−1 · R(χi,Φ(χi))|χ1 = χ ] . (8)5 Each BS is expected to learn an optimal control policy, denoted as Φ∗, for maximizing V (χ,Φ) with a random initial state χ. 

Based on the (3) of the communication model in Section II.B, to achieve the maximum system reward and guarantee the objective of minimizing the average content access delay, the authors use the negative exponential function to normalize the reward function. 

the authors can obtain the Q-function iteration formula asQi+1(χ,Φ) = Qi(χ,Φ)+ αi · (R(χ,Φ) + γ ·maxΦ′ Qi(χ′,Φ′)−Qi(χ,Φ)), (15)where αi ∈ [0, 1) is the learning rate. 

It can be seen that the batch size has little effect on the network performance due to the stochastic selection mechanism from the transition memory. 

The performance of the proposed FADE decreases when the BS number is 3 and 4, mainly because the traffic pressure is apportioned by more BSs. 

This situation occurs mainly because a large amount of content needs to be transferred to the cloud for training in the centralized algorithm, while the proposed FADE shares the training parameters. 

Fig. 11(c) shows that the proposed algorithm outperforms the centralized, LRU, LFU and FIFO algorithms with up to 21%, 35%, 30% and 37% improvements when the BS number is 2. 

In particular, affected by the advantages in the performance of average delay, the proposed FADE algorithm also achieves better performance with respect to the hit rate. 

From Fig. 8, the proposed FADE outperforms the traditional centralized algorithm with an average 60% improvement for the system payment.