Federated Deep Reinforcement Learning for Internet of Things With Decentralized Cooperative Edge Caching
read more
Citations
Federated Learning for Internet of Things: A Comprehensive Survey
Federated Learning for Internet of Things: A Comprehensive Survey
EEDTO: An Energy-Efficient Dynamic Task Offloading Algorithm for Blockchain-Enabled IoT-Edge-Cloud Orchestrated Computing
When Deep Reinforcement Learning Meets Federated Learning: Intelligent Multitimescale Resource Management for Multiaccess Edge Computing in 5G Ultradense Network
A Machine Learning Security Framework for Iot Systems
References
Human-level control through deep reinforcement learning
Communication-Efficient Learning of Deep Networks from Decentralized Data
Deep reinforcement learning with double Q-learning
Communication-Efficient Learning of Deep Networks from Decentralized Data
Federated Learning: Strategies for Improving Communication Efficiency
Related Papers (5)
In-Edge AI: Intelligentizing Mobile Edge Computing, Caching and Communication by Federated Learning
Human-level control through deep reinforcement learning
Frequently Asked Questions (13)
Q2. What is the first category of the IoT network?
The first category is to utilize traditional methods based on convex optimization or probability modeling to address the content placement problem for IoT networks.
Q3. What is the simplest way to construct a double DQN?
The double DQN consists of a fully connected feed forward neural network with a 1 mid-layer consisting of 200 neurons, which is used to construct the Q network and Q̂ network.
Q4. What is the average delay of the proposed algorithm?
The proposed FADE algorithm outperforms the other algorithms; it achieves the lowest average delay of 0.29s, improving the performance of 29%, 27% and 26% compared to LRU, FIFO and LFU, respectively.
Q5. What is the optimal control policy for a single-agent infinite-horizon MDP?
a single-agent infinite-horizon MDP with a discounted utility (8) can be generally utilized to approximate the expected infinite-horizon undiscounted value, especially when γ ∈ [0, 1) approaches 1.V (χ,Φ) = EΦ [ ∞∑ i=1 (γ) i−1 · R(χi,Φ(χi))|χ1 = χ ] . (8)5 Each BS is expected to learn an optimal control policy, denoted as Φ∗, for maximizing V (χ,Φ) with a random initial state χ.
Q6. What is the UE’s goal in achieving the maximum system reward?
Based on the (3) of the communication model in Section II.B, to achieve the maximum system reward and guarantee the objective of minimizing the average content access delay, the authors use the negative exponential function to normalize the reward function.
Q7. How can the authors obtain the Q-function iteration formula asQi+1(,?
the authors can obtain the Q-function iteration formula asQi+1(χ,Φ) = Qi(χ,Φ)+ αi · (R(χ,Φ) + γ ·maxΦ′ Qi(χ′,Φ′)−Qi(χ,Φ)), (15)where αi ∈ [0, 1) is the learning rate.
Q8. What is the effect of the batch size on the network performance?
It can be seen that the batch size has little effect on the network performance due to the stochastic selection mechanism from the transition memory.
Q9. Why does the proposed FADE perform better when the BS number is 3?
The performance of the proposed FADE decreases when the BS number is 3 and 4, mainly because the traffic pressure is apportioned by more BSs.
Q10. Why does the proposed FADE outperform the centralized algorithm?
This situation occurs mainly because a large amount of content needs to be transferred to the cloud for training in the centralized algorithm, while the proposed FADE shares the training parameters.
Q11. What is the performance of the proposed algorithm?
Fig. 11(c) shows that the proposed algorithm outperforms the centralized, LRU, LFU and FIFO algorithms with up to 21%, 35%, 30% and 37% improvements when the BS number is 2.
Q12. What is the difference between the proposed FADE algorithm and the other algorithms?
In particular, affected by the advantages in the performance of average delay, the proposed FADE algorithm also achieves better performance with respect to the hit rate.
Q13. How does the proposed FADE outperform the traditional centralized algorithm?
From Fig. 8, the proposed FADE outperforms the traditional centralized algorithm with an average 60% improvement for the system payment.