What is the way to solve the POMDP?

Since finding exact solutions to the POMDP tends to be computationally intractable [13], a pair of computationally efficient suboptimal solutions, i.e. the maximum-likelihood heuristic policy and the voting heuristic policy, were explored.

What is the definition of a neural network?

a neural network consists of a number of neurons and weighted connections among them, where the neurons can be regarded as variables and the weights can be viewed as parameters.

What is the key idea of the proposed approach?

The key idea of the proposed approach is to enable each user to forecast the future actions of its opponents based on public knowledge and to proceed by best responding to the predicted joint action profile using some bandit strategy [3 p. 517].

What is the significance of the reinforcement learning model?

It was demonstrated that the compensation strategy based on the reinforcement learning model attained an exceptional performance improvement.

Where did he receive his B.S., M.S. and Ph.D?

Yong Ren [SM’16] received his B.S., M.S., and Ph.D. degrees in electronic engineering from Harbin Institute of Technology, China, in 1984, 1987, and 1994, respectively.

What was the performance improvement of the distributed channel selection problem?

This distributed channel selection problem was in harmony with the typical MP-MAB settings, and thus it was modeled as an MP-MAB game.

(Open Access) Machine Learning Paradigms for Next-Generation Wireless Networks (2017) | Chunxiao Jiang

Q: What are the contributions in this paper?

Hence the authors briefly review the rudimentary concepts of machine learning and propose their employment in the compelling applications of 5G networks, including cognitive radios, massive MIMOs, femto/small cells, heterogeneous networks, smart grid, energy harvesting, device-todevice communications, and so on.

Q: What are the challenges of next-generation wireless networks?

Next-generation wireless networks are expected to support extremely high data rates and radically new applications, which require a new wireless radio technology paradigm.

Q: What was the estimated parameter for the PUs?

The parameters collected included both the path-delay as well as the proportion of successful packet receptions, while the estimated parameter was the link’s successful transmission probability.

Q: What are the main applications of the PCA and ICA?

Both the PCA and ICA constitute powerful statistical signal processing techniques devised to recover statistically independent source signals from their linear mixtures.

Q: What are the key characteristics of the learning algorithm?

Key characteristics Application in 5GUnsupervised learning K-means clustering • K partition clustering • Iterative updating algorithm Heterogeneous networks [10]

IEEE Wireless Communications • Accepted for Publication

Chunxiao Jiang is with the

Tsinghua Space Center.

Y. Ren is with Tsinghua

University.

Haijun Zhang is with the

University of Science and

Technology Beijing, China

Zhu Han is with the

University of Houston.

Kwang-Cheng Chen is with

the University of South

Florida

Lajos Hanzo is with the

University of Southampton.

AbstrAct

Next-generation wireless networks are expect-

ed to support extremely high data rates and

radically new applications, which require a new

wireless radio technology paradigm. The chal-

lenge is that of assisting the radio in intelligent

adaptive learning and decision making, so that

the diverse requirements of next-generation wire-

less networks can be satisﬁed. Machine learning

is one of the most promising artiﬁcial intelligence

tools, conceived to support smart radio terminals.

Future smart 5G mobile terminals are expected

to autonomously access the most meritorious

spectral bands with the aid of sophisticated spec-

tral eﬃciency learning and inference, in order to

control the transmission power, while relying on

energy efficiency learning/inference and simul-

taneously adjusting the transmission protocols

with the aid of quality of service learning/infer-

ence. Hence we briefly review the rudimentary

concepts of machine learning and propose their

employment in the compelling applications of

5G networks, including cognitive radios, massive

MIMOs, femto/small cells, heterogeneous net-

works, smart grid, energy harvesting, device-to-

device communications, and so on. Our goal is

to assist the readers in refining the motivation,

problem formulation, and methodology of pow-

erful machine learning algorithms in the context

of future networks in order to tap into hitherto

unexplored applications and services.

IntroductIon

Radical and sometime even un-orthodox next-gen-

eration networking concepts have received sub-

stantial attention both in the academic as well as

industrial communities. One of their driving forces

is that of providing unprecedented data rates for

supporting radical new applications. Speciﬁcally,

next-generation networks are expected to learn

the diverse and colorful characteristics of both

the users’ ambience as well as human behavior,

in order to autonomously determine the opti-

mal system configurations. These smart mobile

terminals have to rely on sophisticated learning

and decision-making. Machine learning, as one

of the most powerful artificial intelligence tools,

constitutes a promising solution [1]. As shown in

Fig. 1, we may envision an intelligent radio that

is capable of autonomously accessing the avail-

able spectrum with the aid of learning, altruistical-

ly controlling transmission power for the sake of

conserving energy as well as adjusting the trans-

mission protocols.

Machine learning has found wide-ranging

applications in image/audio processing, finance

and economics, social behavior analysis, project

management, and so on [2]. Explicitly, a machine

learns the execution of a particular task T, with

the goal of maintaining a specific performance

metric P, based on a particular experience E,

where the system aims to reliably improve its

performance P while executing task T, again by

exploiting its experience E. Depending on how

we specify T, P, and E, the learning might also be

referred to as data mining, autonomous discov-

ery, database updating, programming by example,

and so on [3]. Machine learning algorithms can

be simply categorized as supervised and unsuper-

vised learning, where the adjectives “supervised/

unsupervised” indicate whether there are labeled

samples in the database. Later, reinforcement

learning emerged as a new category that was

inspired by behavioral psychology. It is concerned

with an agent’s certain form of reward/utility, who

is connected to its environment via perception

and action. The family of machine learning algo-

rithms can also be categorized based on their sim-

ilarity in terms of their functionality and structure,

yielding regression algorithms, instance-based

algorithms, regularization algorithms, decision tree

algorithms, Bayesian algorithms, clustering algo-

rithms, association rule based learning algorithms,

artificial neural networks, deep learning algo-

rithms, dimension reduction algorithms, ensem-

ble algorithms, and so on. In this article, we will

introduce the basic concept of machine learning

algorithms and the corresponding applications

according to the category of supervised, unsuper-

vised, and reinforcement learning.

Machine learning can be widely used in model-

ing various technical problems of next-generation

systems, such as large-scale MIMOs, device-to-

device (D2D) networks, heterogeneous networks

constituted by femtocells and small cells, and so

on. Figure 2 portrays the family-tree of machine

learning techniques and their potential applica-

tions in 5G. Against this background, we embark

on investigating the family of learning techniques.

Speciﬁcally, in the following sections we consider

supervised learning, unsupervised learning, and

Chunxiao Jiang, haiJun Zhang, Yong Ren, Zhu han,

Kwang-Cheng Chen, and LaJos hanZo

Machine Learning ParadigMs for

next-generation WireLess netWorks

Digital Object Identiﬁer:

10.1109/MWC.2016.1500356WC

accePted froM oPen caLL

IEEE Wireless Communications • Accepted for Publication

reinforcement learning. Each section consists of

several subsections, discussing specific learning

models, such as regression models and the k-near-

est neighbor (KNN) algorithm, support vector

machines (SVM) and Bayesian learning; k-means

clustering, principal and independent component

analysis; and partially observed Markov decision

processes, Q-learning, and the multi-armed bandit

technique. Each section commences with the intro-

duction of the learning model and its applications

in 5G networks. Finally, our conclusions are drawn.

supervIsed LeArnIng In

WIreLess communIcAtIons

regressIon modeLs, Knn And svm:

mImo chAnneL And energy LeArnIng

Models: Regression analysis relies on a statisti-

cal process for estimating the relationships among

variables. The goal of regression analysis is to pre-

dict the value of one or more continuous-valued

estimation targets, given the value of a D-dimen-

sional vector x of input variables. The estimation

target is a function of the independent variables.

In linear regression, the regression function is

linear, while in logistic regression, it is a logistic

function assuming a common sigmoid curve. The

KNN and SVM algorithms are mainly utilized for

classiﬁcation of points/objects. In KNN, an object

is classiﬁed into a speciﬁc category by a majority

vote of the object’s neighbors, with the object

being assigned to the class that is most common

among its k nearest neighbors. The output may be

constituted by a specific property of the object,

such as for example the average of the values

of its k nearest neighbors. By contrast, the SVM

algorithm relies on nonlinear mapping, which

transforms the original training data into a high-

er dimension where it becomes separable and

then it searches for the optimal linear separating

hyperplane that is capable of separating one class

from another, again in this higher dimension. They

correspond to non-linear classification methods

relying on the family of kernel methods. It was

shown that with the aid of an appropriate nonlin-

ear mapping to a suﬃciently high dimension, the

data from two classes can always be separated by

a hyperplane [3 p. 21, 185, 239, 349] .

Applications: These models can be used for

estimating or predicting radio parameters that are

associated with specific users. For example, in

massive MIMO systems associated with hundreds

of antennas, both detection and channel estima-

tion lead to high-dimensional search-problems,

which can be addressed by the above-mentioned

learning models. In order to generalize the SVM

function for employment in data classification

problems, its hierarchical version, referred to as

H-SVM, was proposed in [4], where each hierar-

chical level consisted of a ﬁnite number of SVM

classifiers. This regime was used for the estima-

tion of the Gaussian channel’s noise level in a

MIMO-aided wireless network having t transmit

antennas and r receive antennas. By exploiting the

training data, the H-SVM model was trained for

the estimation of the channel noise statistics.

In heterogeneous networks constituted by

diverse cells, handovers may be frequent, where

both the KNN and SVM can be applied to ﬁnding

the optimal handover solutions. At the application

layer, these models can also be used for learning

the mobile terminal’s specific usage pattern in

diverse spatio-temporal and device contexts, as

discussed in [5]. This may then be exploited for

prediction of the conﬁguration to be used in the

location-speciﬁc interface. Given a set of contex-

figure 1. Intelligent radio learning paradigm.

Smart

antenna

module

ADC

DAC

Radio learning

Action

selection

Utility and cost

evaluation

Learning

algorithm

Observations

Control

figure 2. Radio learning architecture.

Technologies: massive MIMO, femto/small cells and heterogeneous networks (HetNets), cloud radio access networks, cognitive radio, full duplex, energy harvesting, etc.

Machine learning applications: channel estimation/detection, spectrum sensing/access, cell/user clustering, switch and handover among HetNets,

signal dimension reduction, energy modeling, user behavior analysis, location prediction, intrusion/fault/anomaly detection,

cell/channel selection association.

Machine learning in 5G

Supervised learning Unsupervised learning Reinforcement learning

Regression model,

KNN, SVM

apps in 5G:

massive MIMO channel

estimation/detection;

user location/behavior

learning/classification

Bayesian learning

apps in 5G:

Massive MIMO

channel estimation;

spectrum sensing/

detection and

learning in CR

K-means clustering

apps in 5G:

small cell clustering;

WiFi association;

device-to-device user

clustering; HetNet

clustering

PCA and ICA

apps in 5G:

spectrum sensing;

anomaly/fault/intrusion

detection; signal

dimension reduction smart

grid user classification

MDP, POMDP, Q-learning, multi-armed bandit

apps on 5G:

decision making under unknown network

conditions, resource competition in femto/small

cell channel selection and spectrum sharing for

device-to-device networks, energy modeling in

energy harvesting; HetNet selection/association

IEEE Wireless Communications • Accepted for Publication

tual input cues, machine learning algorithms are

capable of exploiting the user context learned

for the sake of dynamically classifying the cues

into a system state for the sake of saving energy,

while maintaining a high level of user satisfaction.

Donohoo et al. [5] also conducted experiments

using ﬁve real user proﬁles, including the user-lo-

cations and energy consumption, but their data

is not accessible to the public. The experiment

showed that up to 90 percent successful energy

demand prediction is possible with the aid of the

KNN algorithms.

bAyesIAn LeArnIng:

mAssIve mImo And cognItIve rAdIo

Models: The philosophy of Bayesian learning

is to compute the a posteriori probability distribu-

tion of the target variables conditioned on its input

signals and on all of the training instances. Some

simple examples of generative models that may

be learned with the aid of Bayesian techniques

include, but are not limited to, the Gaussians mix-

ture model (GM), expectation maximization (EM),

and hidden Markov models (HMM) [3 p. 445].

GM is a model where each data point belongs

to one of several clusters or groups, and the data

points within each cluster are Gaussian distributed.

EM is a generalization of maximum likelihood

estimation, which iteratively ﬁnds the most likely

solutions or parameters. It is characterized by two

steps: the “E” step that chooses a function repre-

senting the lower bound of the likelihood, and the

“M” step that finds the parameters maximizing

the chosen function.

HMM is a tool designed for representing prob-

ability distributions of sequences of observations.

It can be considered a generalization of a mix-

ture-based model, where the hidden variables,

which control the specific mixture of the com-

ponent to be selected for each observation, are

related to each other through a Markov process,

rather than being independent of each other.

Applications: The Bayesian learning model

may be readily invoked for spectral characteristic

learning and estimation in next-generation net-

works. To address the pilot contamination prob-

lem encountered in massive MIMO systems, the

authors of [6] estimated both the channel param-

eters of the desired links in a target cell as well as

those of the interfering links of the adjacent cells,

where channel estimation was carried out with

the aid of sparse Bayesian learning techniques.

Based on the observation of received signals, the

channel component was ﬁrst modeled by a GM,

namely by a weighted sum of Gaussian distribu-

tions having diﬀerent variances, and then estimat-

ed with the aid of the EM algorithm.

Another three closely related applications may

be found in cognitive radio networks. In [7], a

cooperative wideband spectrum sensing scheme

based on the EM algorithm was proposed for the

detection of a primary user (PU) supported by a

multi-antenna assisted cognitive radio network.

This iterative technique ﬁrst created the log-like-

lihood function of both the unknown spectrum

occupancy as well as of the channel information

and of the noise in the “E” step. Then, it maxi-

mized the log-likelihood function for the sake of

inferring the unknown information during the “M”

step, which was carried out by jointly detecting

both the PU signal as well as estimating the chan-

nel’s unknown frequency response and the noise

variance of multiple subbands.

In contrast to [7], the authors in [8] construct-

ed a HMM relying on a two-state hidden Markov

process, where the PUs are present or absent and

a two-state observation space, indicating whether

the PUs are present or absent. Furthermore, the EM

algorithm was invoked for ﬁnding the true channel

parameters, such as the sojourn time of the avail-

able channels, the inactive states of the PUs, and

the PUs’ signal strength. Finally, the third application

of Bayesian learning was advocated in [9], where a

tomography model, belonging to the Bayesian infer-

ence framework, was proposed for conceiving and

statistically characterizing a range of techniques that

are capable of extracting the prevalent parameters

and traffic/interference patterns for employment

in cognitive radio networks at both the link layer

and network layer. The parameters collected includ-

ed both the path-delay as well as the proportion

of successful packet receptions, while the estimat-

ed parameter was the link’s successful transmission

probability. The Bayesian estimators were derived

for single/multiple transmissions in single/multi-

ple path scenarios. In Table 1, we summarize the

basic characteristics and applications of supervised

machine learning algorithms.

unsupervIsed LeArnIng In

WIreLess communIcAtIons

K-meAns cLusterIng:

heterogeneous netWorKs

Models: K-means clustering aims for partition-

ing n observations into k clusters, where each

observation belongs to the closest cluster. It

deﬁnes the centroid of a cluster as the center of

tabLe 1. Supervised machine learning algorithms.

Category Learning techniques Key characteristics Application in 5G

Supervised

learning

Regression models

• Estimate the variables’ relationships

• Linear and logistics regression

Energy learning [5]

K-nearest neighbor • Majority vote of neighbors Energy learning [5]

Support vector machines

• Non-linear mapping to high dimension

• Separate hyperplane classiﬁcation

MIMO channel learning [4]

Bayesian learning

• A posteriori distribution calculation

• GM, EM, and HMM

• Massive MIMO learning [6]

• Cognitive spectrum learning [7–9]

HMM is a tool designed

for representing prob-

ability distributions of

sequences of observa-

tions. It can be consid-

ered a generalization of

a mixture-based model,

where the hidden

variables, which control

the speciﬁc mixture

of the component to

be selected for each

observation, are related

to each other through a

Markov process, rather

than being independent

of each other.

IEEE Wireless Communications • Accepted for Publication

gravity, that is, the mean value of the points within

the cluster. The clustering algorithm proceeds in

an iterative manner, where an object is assigned

to the specific cluster whose centroid is nearest

to the object based on the Euclidean distance

‘similarity metric’, and then the in-cluster differ-

ences are minimized by iteratively updating the

cluster-centroid, until ‘convergence’ is achieved.

Explicitly, convergence is deemed to be achieved

when the assignment becomes stable, that is, the

clusters formed in the current round are the same as

those formed in the previous round [3 p. 161, 317].

Applications: Clustering is a common problem

in 5G networks, especially in heterogeneous sce-

narios associated with diverse cell sizes as well as

WiFi and D2D networks. For example, the small

cells have to be carefully clustered to avoid inter-

ference using coordinated multi-point transmis-

sion (CoMP), while the mobile users are clustered

to obey an optimal oﬀloading policy, the devices

are clustered in D2D networks to achieve high

energy eﬃciency, the WiFi users are clustered to

maintain an optimal access point association, and

so on. In [10], the authors considered a hybrid

optical/wireless network scenario, in order to

reduce the overall wireless tele-traﬃc by encour-

aging the utilization of the high-capacity optical

infrastructure. A mixed integer programming

(MIP) problem was formulated to jointly optimize

both the gateway partitioning and the virtual-chan-

nel allocation based on classic k-means clustering,

which was employed to partition the mesh access

points (MAPs) into several groups. The proposed

scheme commenced its operation from an initial

gateway access point (GAP) set, which can be

plucked by a random selection from the set of

MAPs, or can be more astutely determined using

a meritorious initialization criterion. Next, each

MAP is assigned to its nearest GAP. If several eli-

gible GAPs are in the vicinity, then the specific

GAP that has a readily available virtual channel

is chosen. Finally, by using the classic k-means

clustering algorithm, the MAPs are divided into k

groups associated with the closest GAPs.

prIncIpAL And Independent component

AnALysIs: smArt grId And cognItIve rAdIo

Models: Principal component analysis (PCA)

transforms a set of potentially correlated variables

into a set of uncorrelated variables, referred to

as the principal components, where the number

of principal components is less than or equal to

the number of original variables. Basically, the

ﬁrst principal component has the largest possible

variance (i.e., accounts for as much of the vari-

ability in the data as possible), and each succeed-

ing component in turn has the highest variance

possible under the constraint that it is orthogonal

to (i.e., uncorrelated with) the preceding compo-

nents. The principal components are orthogonal,

because they are the eigenvectors of the covari-

ance matrix, which is symmetric. By contrast, inde-

pendent component analysis (ICA) is a statistical

technique conceived to reveal hidden factors that

underlie sets of random variables, measurements,

or signals. In the model, the data variables are

assumed to be linear mixtures of some unknown

latent variables, and the mixing system is also

unknown. The latent variables are assumed to be

non-Gaussian and mutually independent, and they

are referred to as the independent components

of the observed data, which can be found by ICA

[3 p. 115].

Applications: Both the PCA and ICA consti-

tute powerful statistical signal processing tech-

niques devised to recover statistically independent

source signals from their linear mixtures. One

of their major applications may be found in the

area of anomaly-detection, fault-detection, and

intrusion-detection problems of wireless networks,

which rely on traﬃc monitoring. Furthermore, sim-

ilar problems may also be solved in sensor net-

works, mesh networks, and so on. They can also

be invoked for the physical layer signal dimen-

sion reduction of massive MIMO systems or to

classify the primary users’ behaviors in cognitive

radio networks. As a further example, in [11] PCA

and ICA were applied in a smart grid scenario to

recover the simultaneous wireless transmissions

of smart utility meters installed in each home. At

the power utility station, it was required to sepa-

rate the signals received from all the smart meters

before the signals can be decoded. The statistical

properties of the signals were exploited to blindly

separate them using ICA. This operation is capa-

ble of enhancing both the transmission eﬃciency

by avoiding channel estimation in each frame, as

well as data security by eliminating any wideband

interference or jamming signals. More explicitly,

a substantial security enhancement was achieved

by a robust version of the PCA-based method,

which exploited the sparse, low-rank nature of

the auto-covariance matrices of the smart meter-

ing signal and of the wideband interferer, respec-

tively, in order to conﬁdently separate them prior

to ICA processing. Another pertinent example is

found in cognitive radio scenarios, where the so

called Boolean ICA relied on the Boolean mixing

of OR, XOR, and other functions of binary signals

[12]. It was also incorporated into the PU sepa-

ration problem often encountered in cognitive

radio networks for the sake of distinguishing and

characterizing the activities of PUs in the context

of collaborative spectrum sensing. Furthermore,

the observations of the secondary users (SUs)

were modeled as Boolean OR mixtures of the

underlying binary PU sources. An iterative algo-

rithm, called Binary ICA, was developed to deter-

mine the activities of the underlying latent signal

sources, such as the PUs. It was demonstrated

that given m monitors or SUs, the activities of up

to (2m – 1) distinct PUs can be inferred. In Table 2,

we summarize the basic characteristics and appli-

cations of unsupervised machine learning algo-

rithms.

reInforcement LeArnIng In

WIreLess communIcAtIons

pArtIALLy observAbLe mArKov decIsIon

process: energy hArvestIng

Models: Markov decision processes (MDPs)

provide a mathematical framework for model-

ing decision making in speciﬁc situations, where

the outcomes are partly random and partly under

the control of a decision maker, as illustrated in

Fig. 3a. At each time step, the process is in some

state s, and the decision maker may opt for any

Principal component

analysis (PCA)

transforms a set of

potentially correlated

variables into a set of

uncorrelated variables

referred to as the princi-

pal components, where

the number of principal

components is less than

or equal to the number

of original variables.

IEEE Wireless Communications • Accepted for Publication

of the legitimate actions a that is available in state

s. The process responds at the next time step by

randomly moving into a new state s’, and giving

the decision maker a corresponding reward U

(s).

The probability that the process moves into its

new state s’ is influenced both by the specific

action chosen, as well as by the system’s inherent

transitions, formally described by the state transi-

tion probability P

(s’|s, a). Given s and a, the state

transition probability is conditionally independent

of all previous states and actions, that is, the state

transitions of an MDP process satisfy the funda-

mental Markov property. By contrast, a partially

observable Markov decision process (POMDP)

may be viewed as the generalization of a MDP,

where the agent is unable to directly observe the

underlying state transitions and hence only has

partial knowledge, as shown in Fig. 3b. The agent

has to keep track of both the probability distri-

bution of the legitimate states, based on a set of

observations, as well as of the observation proba-

bilities and of the underlying MDP [3 p. 517].

Applications: The family of MDP/POMDP

models constitutes ideal tools for supporting deci-

sion making in 5G networks, where the users may

be regarded as agents and the network consti-

tutes the environment. There are usually three

steps associated with modeling a problem using

MDP. The ﬁrst step is to specify the system’s state

space and the decision maker’s action space, as

well as verifying the Markov property. The sec-

ond step is that of constructing the state transition

probabilities P

(s’|s, a) formulated as the probabil-

ity of traversing from state s to s’under action a. The

last step is to quantify both the decision maker’s

immediate reward U

(s) and its long-term reward

using Bellman’s equation [13]. Then, a carefully

constructed iterative algorithm may be conceived

to identify the optimal action in each state.

Classical applications found in the literature

include the network selection/association prob-

lems of heterogeneous networks (HetNets), chan-

nel sensing, and user access in cognitive radio

networks, and so on. Furthermore, energy har-

vesting (EH) has also been extensively modeled

using MDP/POMDP, where the limited battery

and the time-variant channels are usually regard-

ed as the environment, while the users’ channel

selection or battery utilization are usually con-

sidered as the actions. For instance, in [13] the

transmission power control problems of EH sys-

tems were investigated using the POMDP model,

where the state space was defined by including

the battery state, the channel state, the packet

transmission/reception states, and an action by

the node, which corresponded to sending a pack-

et at a certain power level. The feedback messag-

es implicitly provided the EH system with partial

channel state information (CSI), which resulted

in the corresponding POMDP formulation. Since

ﬁnding exact solutions to the POMDP tends to be

computationally intractable [13], a pair of com-

putationally eﬃcient suboptimal solutions, i.e. the

maximum-likelihood heuristic policy and the vot-

ing heuristic policy, were explored.

Q-LeArnIng: femto/smALL ceLLs

Models: Q-learning may be invoked to ﬁnd an

optimal action policy for any given (finite) Mar-

kov decision process, especially when the system

model is unknown, as shown in Fig. 3c. It is a

model-free reinforcement learning technique and

as such it can be used in conjunction with MDP

models. In such a case, the Q-learning model is

also comprised of an agent, of the states S and of

a set of actions A per state. By executing an action

in a speciﬁc state, the agent gleans a reward and

the goal is to maximize its accumulated reward.

Such a reward is illustrated by a Q-function,

where “Q” is initialized to be an (arbitrary) ﬁxed

value. Then, “Q” is updated in an iterative manner

after the agent carries out an action and observes

the resultant reward as well as the associated new

state at each time-instant [3 p. 517].

Applications: Q-learning has also been exten-

sively applied in heterogeneous networks, usual-

ly in conjunction with the aforementioned MDP

models. In [14] the authors presented a hetero-

geneous fully distributed multi-objective strategy

based on a reinforcement learning model con-

tabLe 2. Unsupervised machine learning algorithms.

Category Learning techniques Key characteristics Application in 5G

Unsupervised

learning

K-means clustering • K partition clustering

• Iterative updating algorithm

Heterogeneous

networks [10]

PCA • Orthogonal transformation Smart grid [11]

ICA • Reveal hidden independent

factors

Spectrum learning in

cognitive radio [12]

figure 3. Illustration of reinforcement learning: a) Markov decision process; b) partially observed Markov decision process;

c) Q-learning.

Actions

Rewards

V(s) = max U(s)+P(s'|s,a)U(s')

Known

P(s'|s,a)

System/environment

Rewards

V(s) = max U(s)+O(s'|s,a)U(s')

Actions

System/environment

True:

P(s'|s,a)

Partially

observed:

O(s'|s,a)

Actions

Observe, learn, rewards

Q= old value + learned value

System/environment

Unknown

P(s'|s,a)

Machine Learning Paradigms for Next-Generation Wireless Networks

Citations

Deep Learning in Mobile and Wireless Networking: A Survey

Unmanned Aerial Vehicles: A Survey on Civil Applications and Key Research Challenges

A Comprehensive Survey on Internet of Things (IoT) Toward 5G Wireless Systems

6G Wireless Communications: Vision and Potential Techniques

Artificial Neural Networks-Based Machine Learning for Wireless Networks: A Tutorial

References

Channel Estimation for Massive MIMO Using Gaussian-Mixture Bayesian Learning

A Neural Network Based Spectrum Prediction Scheme for Cognitive Radio

Cognitive Radio Network for the Smart Grid: Experimental System Architecture, Control Algorithms, Security, and Microgrid Testbed

Fuzzy-based Spectrum Handoff in Cognitive Radio Networks

Neural network-based learning schemes for cognitive radio systems

Related Papers (5)

An Introduction to Deep Learning for the Physical Layer

Reinforcement Learning: An Introduction

Deep Learning in Mobile and Wireless Networking: A Survey

Human-level control through deep reinforcement learning

Power of Deep Learning for Channel Estimation and Signal Detection in OFDM Systems

Frequently Asked Questions (16)

Q1. What are the contributions in this paper?

Q2. What are the technologies that can be used for learning the mobile terminal’s usage pattern?

Q3. What are the challenges of next-generation wireless networks?

Q4. What was the estimated parameter for the PUs?

Q5. What is the way to solve the POMDP?

Q6. What are the main applications of the PCA and ICA?

Q7. What is the definition of a neural network?

Q8. What is the key idea of the proposed approach?

Q9. What are the key characteristics of the learning algorithm?

Q10. What are some examples of generative models that may be learned with the aid of Bayesian?

Q11. What is the challenge of assisting the radio in intelligent adaptive learning and decision making?

Q12. What is the goal of this article?

Q13. What is the fundamental Markov property of a MDP?

Q14. What is the significance of the reinforcement learning model?

Q15. Where did he receive his B.S., M.S. and Ph.D?

Q16. What was the performance improvement of the distributed channel selection problem?