Implications of decentralized Q-learning resource allocation in wireless networks
read more
Citations
An Effective Spectrum Handoff Based on Reinforcement Learning for Target Channel Selection in the Industrial Internet of Things.
Interference Mitigation for Coexisting Wireless Body Area Networks: Distributed Learning Solutions
Deep Q-learning based resource allocation in industrial wireless networks for URLLC
D2D Power Control Based on Hierarchical Extreme Learning Machine
Energy-Efficient Secure Short-Packet Transmission in NOMA-Assisted mMTC Networks With Relaying
References
A Q-learning-based dynamic channel assignment technique for mobile communication systems
Channel Selection for Network-Assisted D2D Communication via No-Regret Bandit Learning With Calibrated Forecasting
A Distributed Access Point Selection Algorithm Based on No-Regret Learning for Wireless Access Networks
Joint Channel Selection and Power Control in Infrastructureless Wireless Networks: A Multiplayer Multiarmed Bandit Framework
Related Papers (5)
Frequently Asked Questions (18)
Q2. What are the future works mentioned in the paper "Implications of decentralized q-learning resource allocation in wireless networks" ?
The authors left for future work to further extend the decentralized approach in order to find collaborative algorithms that allow the neighbouring WNs to reach an equilibrium that grants acceptable individual performance. Furthermore, other learning approaches are intended to be analysed in the future for performance comparison in the resource allocation problem. This information may be directly exchanged or inferred from observations.
Q3. What is the purpose of the presented learning approach?
The presented learning approach is intended to operate at the PHY level, allowing the operation of the current MAC-layer communication standards (e.g., in IEEE 802.11 WLANs, the channel access is governed by the CSMA/CA operation, so that Stateless Q-learning may contribute to improve spatial reuse at the PHY level).
Q4. How do the authors calculate the maximum throughput of each WN?
By using the power received and the interference, the authors calculate the maximum theoretical throughput of each WN i at time t ∈ {1, 2...} by using the Shannon Capacity.
Q5. What is the way to describe the system?
Since the authors focus on a completelydecentralized scenario where no information about the other nodes is available, the system can then be fully described by the set of actions and rewards.
Q6. How do the authors extend the decentralized approach?
The authors left for future work to further extend the decentralized approach in order to find collaborative algorithms that allow the neighbouring WNs to reach an equilibrium that grants acceptable individual performance.
Q7. What is the strategy for implementing decentralized learning to the resource allocation problem?
To implement decentralized learning to the resource allocation problem, the authors consider each WN to be an agent running Stateless Q-learning through an ε-greedy action-selection strategy, so that actions a ∈
Q8. What is the way to learn?
During the learning process the authors assume that WNs select actions sequentially, so that at each learning iteration, every agent takes an action in an ordered way.
Q9. What is the way to maximize the network throughput?
as well, that in order to maximize the aggregate network throughput two of the WNs sacrifice themselves by choosing a lower transmit power.
Q10. What is the strategy for a decentralized Q-learning?
Each WN applies the Stateless Q-learning as follows:• Initially, it sets the estimates of its actions k ∈ {1, ...,K} to 0: Q̂(ak) = 0. • At each iteration, it applies an action by following the εgreedy strategy, i.e., it selects the best-rewarding actionwith probability 1 − εt, and a random one (uniformly distributed) the rest of the times.
Q11. What is the way to get the maximum reward?
The reward after choosing an action is set as:ri,t = Γi,t Γ∗i ,where Γi,t is the experienced throughput at time t by WN i ∈ {1, ..., n}, being n the number of WNs in the scenario, and Γ∗i = B log2(1 + SNRi) is WN i maximum achievable throughput (i.e., when it uses the maximum transmission power and there is no interference).
Q12. What is the path-loss exponent of WN i?
Go,where Ptx,i is the transmitted power in dBm by WN i, Prx,j is the power in dBm received in WN j, PL0 is the path-loss at one meter in dB, αPL is the path-loss exponent, di,j is the distance between the transmitter and the receiver in meters, Gsis the shadowing loss in dB, and Go is the obstacles loss in dB.
Q13. What is the effect of a fluctuation in higher layers of the protocol stack?
The effects of such a fluctuation in higher layers of the protocol stack can have severe consequences depending on the time scale at which they occur.
Q14. What is the way to study the network throughput?
C. Input Parameters AnalysisThe authors first analyse the effects of modifying α (the learning rate), γ (the discount factor) and ε0 (the initial exploration coefficient of the ε-greedy update rule) with respect to the achieved network throughput.
Q15. What is the optimal solution for the WNs?
The authors first identify the optimal solutions that maximize: i) the aggregate throughput, and ii) the proportional fairness, which is computed as the logarithmic sum of the throughput experienced by each WN, i.e., PF = max k∈A ∑ i log(Γi,k).
Q16. How many iterations of the simulations have been done?
The authors run simulations of 10000 iterations and capture the results of the last 5000 iterations to ensure that the initial transitory phase has ended.
Q17. Where can The authorfind the code used for the simulations?
2The code used for simulations can be found at https://github.com/wn-upf/Decentralized Qlearning Resource Allocation in WNs.git (Commit: eb4042a1830c8ea30b7eae3d72a51afe765a8d86).
Q18. How can the authors reduce variability in the learning algorithm?
The authors have evaluated the impact of the parameters intrinsic to the learning algorithm on this variability showing that it can be reduced by decreasing the exploration degree and learning rate.