What is the simplest way to construct a double DQN?

The double DQN consists of a fully connected feed forward neural network with a 1 mid-layer consisting of 200 neurons, which is used to construct the Q network and Q̂ network.

What is the average delay of the proposed algorithm?

The proposed FADE algorithm outperforms the other algorithms; it achieves the lowest average delay of 0.29s, improving the performance of 29%, 27% and 26% compared to LRU, FIFO and LFU, respectively.

How can the authors obtain the Q-function iteration formula asQi+1(,?

the authors can obtain the Q-function iteration formula asQi+1(χ,Φ) = Qi(χ,Φ)+ αi · (R(χ,Φ) + γ ·maxΦ′ Qi(χ′,Φ′)−Qi(χ,Φ)), (15)where αi ∈ [0, 1) is the learning rate.

What is the effect of the batch size on the network performance?

It can be seen that the batch size has little effect on the network performance due to the stochastic selection mechanism from the transition memory.

Why does the proposed FADE perform better when the BS number is 3?

The performance of the proposed FADE decreases when the BS number is 3 and 4, mainly because the traffic pressure is apportioned by more BSs.

Why does the proposed FADE outperform the centralized algorithm?

This situation occurs mainly because a large amount of content needs to be transferred to the cloud for training in the centralized algorithm, while the proposed FADE shares the training parameters.

What is the performance of the proposed algorithm?

Fig. 11(c) shows that the proposed algorithm outperforms the centralized, LRU, LFU and FIFO algorithms with up to 21%, 35%, 30% and 37% improvements when the BS number is 2.

What is the difference between the proposed FADE algorithm and the other algorithms?

In particular, affected by the advantages in the performance of average delay, the proposed FADE algorithm also achieves better performance with respect to the hit rate.

How does the proposed FADE outperform the traditional centralized algorithm?

From Fig. 8, the proposed FADE outperforms the traditional centralized algorithm with an average 60% improvement for the system payment.

(Open Access) Federated Deep Reinforcement Learning for Internet of Things With Decentralized Cooperative Edge Caching (2020) | Xiaofei Wang

Q: What are the contributions mentioned in the paper "Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching" ?

To address the complex and dynamic control issues, the authors propose a FederAted Deep reinforcement learning-based cooperative Edge caching ( FADE ) framework. Furthermore, the authors prove the convergence of the proposed FADE, and it achieves the expectation convergence. Trace-driven simulation results show that the proposed FADE framework reduces 92 % of performance loss and average 60 % system payment over the centralized deep reinforcement learning ( DRL ) algorithm, achieves only a 4 % performance loss of the desirable omniscient oracle algorithm, and obtains 7 %, 11 % and 9 % network performance improvements compared to some existing schemes, i. e., least recently used ( LRU ), least frequently Part of this work was presented at the IEEE Wireless Communications and Networking Conference ( WCNC ), April 15-18, 2019, Marrakesh, Morocco. This work is supported in part by the National Key R & D Program of China through grant No. 2019YFB2101901, 2018YFC0809803, 2018YFF0214700 and 2018YFF0214706, China NSFC through grants 61702364, 61902044 and 61672117, China NSFC GD Joint Fund U1701263, and Chongqing Research Program of Basic Research and Frontier Technology through Grant No. cstc2019jcyj-msxmX0589, Chinese National Engineering Laboratory for Big Data System Computing Technology, Canadian NSERC, the European Union ’ s Horizon 2020 Research and Innovation Program through the MonB5G Project under Grant No. 871780, the Academy of Finland 6Genesis project under Grant No. 318927, and the Academy of Finland CSN project under Grant No. 311654.

Q: What is the first category of the IoT network?

The first category is to utilize traditional methods based on convex optimization or probability modeling to address the content placement problem for IoT networks.

Q: What is the optimal control policy for a single-agent infinite-horizon MDP?

a single-agent infinite-horizon MDP with a discounted utility (8) can be generally utilized to approximate the expected infinite-horizon undiscounted value, especially when γ ∈ [0, 1) approaches 1.V (χ,Φ) = EΦ [ ∞∑ i=1 (γ) i−1 · R(χi,Φ(χi))|χ1 = χ ] . (8)5 Each BS is expected to learn an optimal control policy, denoted as Φ∗, for maximizing V (χ,Φ) with a random initial state χ.

This is an electronic reprint of the original article.

This reprint may differ from the original in pagination and typographic detail.

This material is protected by copyright and other intellectual property rights, and duplication or sale of all or

part of any of the repository collections is not permitted, except that material may be duplicated by you for

your research use or educational purposes in electronic or print form. You must obtain permission for any

other use. Electronic or print copies may not be offered, whether for sale or otherwise to anyone who is not

an authorised user.

Wang, Xiaofei; Wang, Chenyang; Li, Xiuhua; Leung, Victor C.M.; Taleb, Tarik

Federated Deep Reinforcement Learning for Internet of Things with Decentralized

Cooperative Edge Caching

Published in:

IEEE Internet of Things Journal

DOI:

10.1109/JIOT.2020.2986803

Published: 01/10/2020

Document Version

Peer reviewed version

Please cite the original version:

Wang, X., Wang, C., Li, X., Leung, V. C. M., & Taleb, T. (2020). Federated Deep Reinforcement Learning for

Internet of Things with Decentralized Cooperative Edge Caching. IEEE Internet of Things Journal, 7(10), 9441-

9455. [9062302]. https://doi.org/10.1109/JIOT.2020.2986803

Federated Deep Reinforcement Learning for

Internet of Things with Decentralized Cooperative

Edge Caching

Xiaofei Wang, Senior Member, IEEE, Chenyang Wang, Student Member, IEEE, Xiuhua Li, Member, IEEE,

Victor C. M. Leung, Fellow, IEEE, and Tarik Taleb, Senior Member, IEEE

Abstract—Edge caching is an emerging technology for ad-

dressing massive content access in mobile networks to sup-

port rapidly growing Internet of Things (IoT) services and

applications. However, most current optimization-based methods

lack a self-adaptive ability in dynamic environments. To tackle

these challenges, current learning-based approaches are generally

proposed in a centralized way. However, network resources may

be overconsumed during the training and data transmission

process. To address the complex and dynamic control issues,

we propose a FederAted Deep reinforcement learning-based

cooperative Edge caching (FADE) framework. FADE enables base

stations (BSs) to cooperatively learn a shared predictive model by

considering the ﬁrst-round training parameters of the BSs as the

initial input of the local training, and then uploads near-optimal

local parameters to the BSs to participate in the next round of

global training. Furthermore, we prove the convergence of the

proposed FADE, and it achieves the expectation convergence.

Trace-driven simulation results show that the proposed FADE

framework reduces 92% of performance loss and average 60%

system payment over the centralized deep reinforcement learning

(DRL) algorithm, achieves only a 4% performance loss of the

desirable omniscient oracle algorithm, and obtains 7%, 11%

and 9% network performance improvements compared to some

existing schemes, i.e., least recently used (LRU), least frequently

Part of this work was presented at the IEEE Wireless Communications and

Networking Conference (WCNC), April 15-18, 2019, Marrakesh, Morocco.

This work is supported in part by the National Key R & D Program of China

through grant No. 2019YFB2101901, 2018YFC0809803, 2018YFF0214700

and 2018YFF0214706, China NSFC through grants 61702364, 61902044

and 61672117, China NSFC GD Joint Fund U1701263, and Chongqing

Research Program of Basic Research and Frontier Technology through Grant

No. cstc2019jcyj-msxmX0589, Chinese National Engineering Laboratory for

Big Data System Computing Technology, Canadian NSERC, the European

Union’s Horizon 2020 Research and Innovation Program through the MonB5G

Project under Grant No. 871780, the Academy of Finland 6Genesis project

under Grant No. 318927, and the Academy of Finland CSN project under

Grant No. 311654. (Corresponding author: Xiuhua Li.)

X. Wang and C. Wang are with College of Intelligence and Computing,

Tianjin University, Tianjin, 300072 China (e-mail: {xiaofeiwang, chenyang-

wang}@tju.edu.cn).

X. Li is with State Key Laboratory of Power Transmission Equipment

and System Security and New Technology, Chongqing University, Chongqing

401331, China, with the School of Big Data & Software Engineering,

Chongqing University, Chongqing, 401331 China, and with Key Laboratory

of Dependable Service Computing in Cyber Physical Society (Chongqing

University), Ministry of Education, China (e-mail: lixiuhua1988@gmail.com).

V. C. M. Leung is with College of Computer Science & Software Engi-

neering, Shenzhen University, Shenzhen, 518060 China, and with Electrical

and Computer Engineering, The University of British Columbia, Vancouver,

BC, V6T 1Z4 Canada (e-mail: vleung@ieee.org).

T. Taleb is with the Department of Communications and Networking,

School of Electrical Engineering, Aalto University, 02150 Espoo, Finland

and Information Technology and Electrical Engineering, Oulu University,

Pentti Kaiteran katu 1, 90570 Oulu, Finland and the Department of Computer

and Information Security, Sejong University, 209 Neungdong-ro, Gunja-dong,

Gwangjin-gu, Seoul, 05006 South Korea (e-mail: Tarik.Taleb@aalto.ﬁ).

used (LFU) and ﬁrst-in ﬁrst-out (FIFO), respectively.

Index Terms—Internet of Things, Edge Caching, Cooperative

Caching, Hit Rate, Deep Reinforcement Learning, Federated

Learning.

I. INTRODUCTION

With the rapid enhancements of wireless access technology

and the Internet of Things (IoT), massive Internet services and

applications are gradually migrating to mobile networks. Due

to the extensive access of sensors, massive interconnections

via IoT devices (e.g., smartphones, tablets and smartwatches)

are embedded in people’s daily lives. For instance, as reported

in [1], the number of IoT devices will surpass 10 billion by

2020, and higher real-time quality service requirements (e.g.,

heart rate monitor, step count and outdoor video live) from

these devices are required. Facing the rocket-rising network

trafﬁc load and Quality of Service/Experience (QoS/QoE) of

user demands, enormous challenges have emerged for mobile

networks and the IoT [2]–[5].

In particular, due to the integration of powerful sensoring

and computing functions, IoT devices are equipped with intel-

ligent identiﬁcation, behavior tracking and daily management,

heart rate monitoring, etc. [6]–[8]. Meanwhile, the application

of short-range communication technology enables these IoT

devices to form a variety of ad hoc network application sce-

narios (e.g., content airdrops and Apple Edge Cache Service).

For these scenarios, reliable content transmission may fail to

be provided due to the performance ﬂuctuation of nearby IoT

devices. Thus, it is feasible to cache the content on multiple

nearby IoT devices with certain storage capabilities [9]–[11].

Moreover, edge computing has been regarded as a promising

technology that can bring computation and caching services in

proximity to the network edges (e.g., base stations (BSs) and

IoT devices) from the mobile network operator (MNO) or the

cloud. This paper considers that IoT devices are handled by

people to enable operations (e.g., request or receive messages

or sensoring). We consider a general cooperative edge caching-

supported IoT architecture illustrated in Fig. 1. To improve the

resiliency of QoS/QoE and provide the best performance for

IoT devices with content service requirements (e.g., system

update package) from the internet service provider (SP),

related applications exist. For instance, Apple Inc. recently

launched the Apple Edge Cache service [12], enabling the

delivery of Apple content services directly to the equipment

Copyright (c) 20xx IEEE. Personal use of this material is permitted. However, permission to use this material for any other

purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org.

)RU[J

345)UXK

Ș

IUTZKTZY

Fig. 1. Cooperative edge caching-supported IoT architecture.

within SP partner networks. Caching the requested contents

in edge nodes (e.g., BSs) are similar to that in IoT devices,

which improves the efﬁciency of content access compared to

excessive downloading via backhaul links [13]–[15].

Many efforts have been devoted to addressing the resource

allocation issues in trafﬁc ofﬂoading for massive IoT devices

(especially for mobile devices). The studies in [16]–[19]

investigated the architectures of collaborative edge caching in

mobile networks. The authors in [20]–[23] optimized the issue

of content access delay by collaborative caching among BSs,

thereby improving the QoS of users. Other existing schemes

have been proposed to design and optimize edge caching

framework from the perspectives of energy consumption [24]

and context awareness [25].

Recently, learning-based approaches have also been widely

utilized to design and optimize edge caching [26]–[29]. For in-

stance, deep reinforcement learning (DRL) was considered for

comprehensive resource allocation, as in [30]. This approach

maximized the long-term reward of energy consumption and

required no prior knowledge of the considered networks. An

assumption was made that the devices are sufﬁciently powerful

to train the DRL agents independently. However, IoT devices

can only support lightweight neural networks with small-scale

data processing. In addition, most of the traditional DRL

algorithms train the data in BSs or a datacenter by sharing the

original data, leading to a large amount of network resource

consumption during the data transmission process [31], [32].

To cope with the dynamic environment and ensure data

localization training for IoT devices, we expect to optimize

the edge caching problem in a long-term and decentral-

ized fashion. Motivated by the aforementioned, we propose

a FederAted Deep reinforcement learning-based cooperative

Edge caching algorithm (FADE), which enables IoT devices

(or user equipment (UE))

to cooperatively learn a shared

model while keeping all the training data on the individual

device. The proposed FADE is performed in a decentralized

In this paper, we use UE to denote the IoT device hereafter.

model. First, a UE obtains the initial training model from the

local BS, improves it by learning from the local data in the

device, and summarizes the update. Then, only the update is

sent to the BS, where all the updates from participating UEs

will be averaged to improve the shared model.

The main contributions of this paper are summarized as

follows:

• We investigate the issue of federated DRL for IoT with

decentralized cooperative edge caching. Particularly, we

model the content replacement problem as a Markov De-

cision Process (MDP) and propose a Federated Learning

framework based on Double Deep Q-Network (DQN) to

address the problem of data sampling in the uncontinuous

huge spaces.

• We propose a FADE framework, which can enable fast

training and decouple the learning process from the data

stored in the cloud in a distributed-centralized way, which

keeps the data training in the local UEs. In addition, we

prove that the FADE is L-smooth and µ-strong and derive

its expectation of convergence.

• Trace-driven simulation results show that the proposed

FADE framework reduces 92% loss performance and

average 60% system payment over the centralized DRL

algorithm and outperforms the existing LRU, LFU and

FIFO by 7%, 11% and 9% improvements, respectively.

The remainder of this paper is organized as follows. Sec. II

summarizes the previous work. We establish the system model

and formulate the optimization problem in Sec. III. The frame-

work design of the proposed FADE is presented in Sec. IV.

Trace-driven simulation results evaluate the effectiveness of

the proposed framework in Sec. V. Finally, Sec. VI concludes

this paper.

II. RELATED WORK

For caching-supported IoT networks, existing studies can be

divided into the following two categories.

The ﬁrst category is to utilize traditional methods based on

convex optimization or probability modeling to address the

content placement problem for IoT networks. For instance,

[33] focused on maximizing trafﬁc ofﬂoading and reducing

the system costs by designing a hierarchical edge caching

strategy. [34] considered the problem of the optimal bandwidth

allocation and minimized the average transmission delay by

deploying a greedy algorithm of cooperative edge caching.

Vural et al. [35] proposed a content replacement strategy

with multi-attribute joint optimization in terms of content

lifetime, the request rate of users, and the hop information

between the source user and the destination. To efﬁciently

optimize the caching resources of IoT devices, [36] exploited

probabilistic caching in heterogeneous IoT networks in order

to improve trafﬁc ofﬂoading. The content caching problem in

IoT networks requires continuous optimization. In other words,

various attributes (e.g., content popularity and user mobility)

in IoT networks are constantly evolving. However, these

strategies are often difﬁcult to adapt to dynamic environments

and hard to deploy due to the global information required in

practice.

The second classiﬁcation is based on learning algorithms

such as machine learning/deep learning, which learns key at-

tribute features (e.g., user request behavior, content popularity,

and user mobility distribution) in the network to optimize

the content caching strategy. [37], [38] showed that RL has

great potential for the utilization in the scheme design of

content caching in BSs. Speciﬁcally, [37] proposed a cache

replacement strategy based on Q-Learning to reduce trafﬁc

load in future cellular networks, and reinforcement learning

(RL) was also employed for cache placement [38] by using

a multi-armed bandit (MAB). Chen et al. [39] proposed a

popularity-based caching strategy for IoT networks by deploy-

ing deep neural networks to predict the near future popularity

of IoT data. However, most centralized learning algorithms are

prone to posing high cache diversity and storage utilization,

which leads to excessive network communication resource

consumption. On the other hand, the distributed learning

method requires much cache and action space, which will also

cause the above problems.

III. SYSTEM MODEL AND PROBLEM FORMULATION

In this section, we ﬁrst introduce the topology of cooperative

edge caching-supported IoT systems and then discuss the delay

model. Next, the cache replacement model is demonstrated. Fi-

nally, we formulate the optimization problem of edge caching

for IoT systems. Some key parameters are listed in Table I.

A. Topology of Cooperative Edge Caching-Supported IoT

Systems

The topology of cooperative edge caching-supported IoT

networks is illustrated in Fig. 2. Particularly, in the considered

IoT network, a high number of geographically distributed UEs

(e.g., smartphones, tablets, and smartwatches.) are served by

some BSs via wireless cellular links, and BSs are connected

via wired optical cables. Here, each BS is deployed with an

edge server for computation and caching, and thus, each BS

can cache various contents to satisfy the demands of content

services for IoT devices. As a result, UEs can fetch their

requested contents either locally by edge servers or directly

by downloading the contents from SPs (in the cloud) to the

BSs via the MNO core.

Considering a hierarchical IoT network, N = {1, 2, ..., N

}

fully connected BSs with a ﬁnite cache size of C and U =

{1, 2, ..., N

} UEs are distributed in the service area. Denote

F = {1, 2, ..., F } as a library of contents that are supported

by Internet SPs and that all UEs may access in the system

for a relatively long time. Denote D

, f ∈ F as the size of

content f

Let (P

)

F ×1

be the global popularity, which indicates the

probability distribution of content f requested from all UEs in

the network, and let p

be the local popularity of content f

under BS n. We consider that P



n∈N

, and (P

)

F ×1

We consider D ={D

, D

, ..., D

, ...D

} to be the size of local datasets

as well, which will be introduced in Section IV-B.

TABLE I

KEY PARAMETERS AND NOTATIONS.

Notation Meaning

F Total number of contents

Number of BSs

Number of UEs

C Cache size

F Library of popular con-

tents

Size of content f

)

F ×1

Global popularity of con-

tent f

M Wireless channels

, d

Transmissions delay be-

tween BS n to UE, BS and

SPs, BS and BS, respec-

tively

u,n

Downlink data rate be-

tween BS n and UE u

u,f

Preference of UE u for

content f

i,n

The set of content caching

state in BS n with each

decision epoch i

The state of BS during

each decision epoch i

Φ(χ

) = {a

local

, a

co−BS

, a

} System action with the s-

tate χ

R(χ, Φ) Reward function

Q(χ, Φ;w

) and

Q(χ, Φ; bw

) Q values of mainNet and

TargetNet, respectively

L(w

) Loss function of Double

DQN

(w) Loss function of Federat-

ed DRL

follows the Mandelbrot-Zipf (MZipf) distribution

[42] as

+ τ)

−β



i∈F

+ τ)

−β

, ∀f ∈ F, (1)

where I

is the rank of content f in descending order of

content popularity and τ and β denote the plateau factor and

skewness factor, respectively.

B. Delay Model

We consider the content access delay for a UE as the

round-trip time to receive the requested content. From Fig.

2, d

denotes the transmission delay of the BSs’ cooperation

and d

is the delay between the BS and SPs. The wireless

transmission delay d

can be regarded as the period of a

UE obtaining the content from the local BS. Considering

M = {1, 2, ..., M}, wireless channels are deployed, and

∈ M is the channel that is assigned to UE u by BS.

Similar to [43], we can obtain the downlink data rate between

BS n and UE u as follows:

u,n

= B log



1 +

u,n



v∈U\{u}:a

v ,n



, (2)

where B denotes the channel bandwidth, σ

represents the

background noise power, q

is the power consumption of BS

n transmission to UE u, and the channel gain g

u,n

can be

determined by the distance l

u,n

between BS n and UE u.

Note that it is also widely used in mobile IoT scenarios [40] [41].

(GIQNG[R2OTQ

)KRR[RGX2OTQ

(9(92OTQ









;

ࢊ

)RU[J

345)UXK

;





(9

Fig. 2. Topology of the caching-supported IoT system.

Considering P

u,f

= r

u,f

, f ∈ F reﬂects the preference

of UE u for content f, and



f∈F

u,f

= 1, where r

u,f

the number of UE u requests for content f and R

is the

total request number of UE u in the network. Furthermore,

we deﬁne P

u,nf

= P

u,f

as the local UE preference for

content f under BS n, a

is the association probability of

UE u and BS n. Thus, the wireless transmission delay d

can

be obtained as



u∈U



f∈F

u,nf

u,n

. (3)

C. Cache Replacement Model

We model the process of content cache replacement in a

BS as a Markov decision process (MDP) [44]. The state of

the cache and request, system action and feedback reward are

demonstrated.

1) Cache and Request State: During each decision epoch i,

we deﬁne the content cache state as s

i,n

:= {s

i,n,f

}, n ∈ N ,

f ∈ F. Here, s

i,n,f

is the cache state in BS n for content

f ∈ F, and s

i,n,f

= 1 represents that BS n caches the content

and f, s

i,n,f

= 0 otherwise. Furthermore, we use s

i,u

i,u,f

}, u ∈ U, f ∈ F to denote the request state from UE u,

where s

i,u,f

is the request state of u for content f. Thus, we

derive the cache and request state during each decision epoch

i as

= (s

i,u

, s

i,n

) ∈ X

def

= {1, 2, .., F } × {×

f∈F

P}. (4)

2) System Action: To adapt the continuous changes in the

dynamic environment, BSs can choose which contents should

be replaced and decide where the requests are processed (via

local BS, BS cooperation or SPs). We denote Φ(χ

) as the

system action with the state χ

, and the action space for all

cooperative BSs is deﬁned as

Φ(χ

) = {a

local

, a

co−BS

, a

}, (5)

where there exist three different types of system action Φ(χ

shown as follows:

a) Local Processing Action: We denote a

local

def

local

i,0

, a

local

i,1

, . . . , a

local

i,F

] as the local processing action when

the cache state controlled by the local BS is available, where

local

i,f

∈ {0, 1}, f ∈ F, and a

local

i,f

= 1 indicates that content f

needs to be replaced by the current requested content, while

local

i,f

= 0 is the opposite. In this case, the content request is

processed locally.

b) Cooperation Processing Action: If the requested content

f is not cached in the local BS, the UE’s request needs

to be routed to its neighbor BS. We deﬁne a

co−BS

def

co−BS

i,1

, . . . , a

co−BS

i,N

] as the cooperation processing action,

where a

co−BS

i,n

∈ {0, 1}, and a

co−BS

i,n

= 1 denotes that BS

n is selected to address the current UE’s request.

c) Remote Processing Action: If the UE cannot obtain the

requested content f from either the local BS or its neighbors.

The local BS decides whether to forward the request to SPs,

denoted as a

∈ {0, 1}, where a

= 1 represents the request

that will be handled by SPs. In this case, the UE should obtain

the requested content f directly from the remote SPs.

3) System Reward: When the local BS takes action Φ(χ

)

upon state χ

, it will obtain the feedback reward. To satisfy

the QoS of UEs, our goal is to minimize the average content

access latency of the system.

Because of ﬁber communication, d

may be far greater than

and d

. Based on the (3) of the communication model

in Section II.B, to achieve the maximum system reward and

guarantee the objective of minimizing the average content

access delay, we use the negative exponential function to

normalize the reward function. Thus, we derive the reward

function as

(χ

, Φ (χ

))











−ξ

, Cellular Service

−

(

+ξ

)

, BS − BS Cooperation

−

(

+ξ

)

, Backhaul Service

(6)

where ξ

+ ξ

= 1, ξ

≪ ξ

< ξ

, and p

−ξ

is the

reward that a UE obtains content f from BS only via cellular

service; p

−

(

+ξ

)

means the UE is served by the BS-

BS cooperation; When a UE has to be served by the MNO

core via backhaul links, the reward will be p

−

(

+ξ

)

D. Problem Formulation

Based on (6), our optimization objective is to maximize the

expected long-term reward based on an arbitrary initial state

long

= max E



lim

I→∞



i=1

R(χ

, Φ(χ

))|χ

= χ



, (7)

where R(χ

, Φ(χ

)) is the sum of R

(χ

, Φ (χ

)).

Moreover, a single-agent inﬁnite-horizon MDP with a dis-

counted utility (8) can be generally utilized to approximate the

expected inﬁnite-horizon undiscounted value, especially when

γ ∈ [0, 1) approaches 1.

V (χ, Φ) = E



∞



i=1

(γ)

i−1

· R(χ

, Φ(χ

))|χ

= χ



. (8)

Federated Deep Reinforcement Learning for Internet of Things With Decentralized Cooperative Edge Caching

Figures

Citations

Federated Learning for Internet of Things: A Comprehensive Survey

Federated Learning for Internet of Things: A Comprehensive Survey

EEDTO: An Energy-Efficient Dynamic Task Offloading Algorithm for Blockchain-Enabled IoT-Edge-Cloud Orchestrated Computing

When Deep Reinforcement Learning Meets Federated Learning: Intelligent Multitimescale Resource Management for Multiaccess Edge Computing in 5G Ultradense Network

A Machine Learning Security Framework for Iot Systems

References

Human-level control through deep reinforcement learning

Communication-Efficient Learning of Deep Networks from Decentralized Data

Deep reinforcement learning with double Q-learning

Communication-Efficient Learning of Deep Networks from Decentralized Data

Federated Learning: Strategies for Improving Communication Efficiency

Related Papers (5)

In-Edge AI: Intelligentizing Mobile Edge Computing, Caching and Communication by Federated Learning

Communication-Efficient Learning of Deep Networks from Decentralized Data

Federated Learning in Mobile Edge Networks: A Comprehensive Survey

Human-level control through deep reinforcement learning

Federated Learning: Strategies for Improving Communication Efficiency

Frequently Asked Questions (13)

Q1. What are the contributions mentioned in the paper "Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching" ?

Q2. What is the first category of the IoT network?

Q3. What is the simplest way to construct a double DQN?

Q4. What is the average delay of the proposed algorithm?

Q5. What is the optimal control policy for a single-agent infinite-horizon MDP?

Q6. What is the UE’s goal in achieving the maximum system reward?

Q7. How can the authors obtain the Q-function iteration formula asQi+1(,?

Q8. What is the effect of the batch size on the network performance?

Q9. Why does the proposed FADE perform better when the BS number is 3?

Q10. Why does the proposed FADE outperform the centralized algorithm?

Q11. What is the performance of the proposed algorithm?

Q12. What is the difference between the proposed FADE algorithm and the other algorithms?

Q13. How does the proposed FADE outperform the traditional centralized algorithm?