scispace - formally typeset
Open AccessJournal ArticleDOI

An Overview on Application of Machine Learning Techniques in Optical Networks

TLDR
An overview of the application of ML to optical communications and networking is provided, relevant literature is classified and surveyed, and an introductory tutorial on ML is provided for researchers and practitioners interested in this field.
Abstract
Today’s telecommunication networks have become sources of enormous amounts of widely heterogeneous data. This information can be retrieved from network traffic traces, network alarms, signal quality indicators, users’ behavioral data, etc. Advanced mathematical tools are required to extract meaningful information from these data and take decisions pertaining to the proper functioning of the networks from the network-generated data. Among these mathematical tools, machine learning (ML) is regarded as one of the most promising methodological approaches to perform network-data analysis and enable automated network self-configuration and fault management. The adoption of ML techniques in the field of optical communication networks is motivated by the unprecedented growth of network complexity faced by optical networks in the last few years. Such complexity increase is due to the introduction of a huge number of adjustable and interdependent system parameters (e.g., routing configurations, modulation format, symbol rate, coding schemes, etc.) that are enabled by the usage of coherent transmission/reception technologies, advanced digital signal processing, and compensation of nonlinear effects in optical fiber propagation. In this paper we provide an overview of the application of ML to optical communications and networking. We classify and survey relevant literature dealing with the topic, and we also provide an introductory tutorial on ML for researchers and practitioners interested in this field. Although a good number of research papers have recently appeared, the application of ML to optical networks is still in its infancy: to stimulate further work in this area, we conclude this paper proposing new possible research directions.

read more

Content maybe subject to copyright    Report

1553-877X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2018.2880039, IEEE
Communications Surveys & Tutorials
1
An Overview on Application of Machine Learning
Techniques in Optical Networks
Francesco Musumeci, Member, IEEE, Cristina Rottondi, Member, IEEE, Avishek Nag, Member, IEEE, Irene
Macaluso, Darko Zibar, Member, IEEE, Marco Ruffini, Senior Member, IEEE, and Massimo
Tornatore, Senior Member, IEEE
Abstract—Today’s telecommunication networks have become
sources of enormous amounts of widely heterogeneous data. This
information can be retrieved from network traffic traces, network
alarms, signal quality indicators, users’ behavioral data, etc.
Advanced mathematical tools are required to extract meaningful
information from these data and take decisions pertaining to the
proper functioning of the networks from the network-generated
data. Among these mathematical tools, Machine Learning (ML)
is regarded as one of the most promising methodological ap-
proaches to perform network-data analysis and enable automated
network self-configuration and fault management.
The adoption of ML techniques in the field of optical com-
munication networks is motivated by the unprecedented growth
of network complexity faced by optical networks in the last
few years. Such complexity increase is due to the introduction
of a huge number of adjustable and interdependent system
parameters (e.g., routing configurations, modulation format,
symbol rate, coding schemes, etc.) that are enabled by the usage
of coherent transmission/reception technologies, advanced digital
signal processing and compensation of nonlinear effects in optical
fiber propagation.
In this paper we provide an overview of the application of
ML to optical communications and networking. We classify and
survey relevant literature dealing with the topic, and we also
provide an introductory tutorial on ML for researchers and
practitioners interested in this field. Although a good number of
research papers have recently appeared, the application of ML
to optical networks is still in its infancy: to stimulate further
work in this area, we conclude the paper proposing new possible
research directions.
Index Terms—Machine learning, Data analytics, Optical com-
munications and networking, Neural networks, Bit Error Rate,
Optical Signal-to-Noise Ratio, Network monitoring.
I. INTRODUCTION
Machine learning (ML) is a branch of Artificial Intelligence
that pushes forward the idea that, by giving access to the
right data, machines can learn by themselves how to solve a
specific problem [1]. By leveraging complex mathematical and
statistical tools, ML renders machines capable of performing
independently intellectual tasks that have been traditionally
Francesco Musumeci and Massimo Tornatore are with Po-
litecnico di Milano, Italy, e-mail: francesco.musumeci@polimi.it,
massimo.tornatore@polimi.it
Cristina Rottondi is with Dalle Molle Institute for Artificial Intelligence,
Switzerland, email: cristina.rottondi@supsi.ch.
Avishek Nag is with University College Dublin, Ireland, email:
avishek.nag@ucd.ie.
Irene Macaluso and Marco Ruffini are with Trinity College Dublin, Ireland,
email: macalusi@tcd.ie, ruffinm@tcd.ie.
Darko Zibar is with Technical University of Denmark, Denmark, email:
dazi@fotonik.dtu.dk.
solved by human beings. This idea of automating complex
tasks has generated high interest in the networking field, on
the expectation that several activities involved in the design
and operation of communication networks can be offloaded to
machines. Some applications of ML in different networking
areas have already matched these expectations in areas such
as intrusion detection [2], traffic classification [3], cognitive
radios [4].
Among various networking areas, in this paper we focus
on ML for optical networking. Optical networks constitute
the basic physical infrastructure of all large-provider networks
worldwide, thanks to their high capacity, low cost and many
other attractive properties [5]. They are now penetrating new
important telecom markets as datacom [6] and the access
segment [7], and there is no sign that a substitute technology
might appear in the foreseeable future. Different approaches
to improve the performance of optical networks have been
investigated, such as routing, wavelength assignment, traffic
grooming and survivability [8], [9].
In this paper we give an overview of the application of
ML to optical networking. Specifically, the contribution of
the paper is twofold, namely, i) we provide an introductory
tutorial on the use of ML methods and on their application in
the optical networks field, and ii) we survey the existing work
dealing with the topic, also performing a classification of the
various use cases addressed in literature so far. We cover both
the areas of optical communication and optical networking
to potentially stimulate new cross-layer research directions.
In fact, ML application can be useful especially in cross-layer
settings, where data analysis at physical layer, e.g., monitoring
Bit Error Rate (BER), can trigger changes at network layer,
e.g., in routing, spectrum and modulation format assignments.
The application of ML to optical communication and network-
ing is still in its infancy and the literature survey included
in this paper aims at providing an introductory reference for
researchers and practitioners willing to get acquainted with
existing ML applications as well as to investigate new research
directions.
A legitimate question that arises in the optical networking
field today is: why machine learning, a methodological area
that has been applied and investigated for at least three
decades, is only gaining momentum now? The answer is
certainly very articulated, and it most likely involves not purely
technical aspects [10]. From a technical perspective though, re-
cent technical progress at both optical communication system
and network level is at the basis of an unprecedented growth

1553-877X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2018.2880039, IEEE
Communications Surveys & Tutorials
2
in the complexity of optical networks.
On a system side, while optical channel modeling has
always been complex, the recent adoption of coherent tech-
nologies [11] has made modeling even more difficult by
introducing a plethora of adjustable design parameters (as
modulation formats, symbol rates, adaptive coding rates and
flexible channel spacing) to optimize transmission systems in
terms of bit-rate transmission distance product. In addition,
what makes this optimization even more challenging is that
the optical channel is highly nonlinear.
From a networking perspective, the increased complexity of
the underlying transmission systems is reflected in a series of
advancements in both data plane and control plane. At data
plane, the Elastic Optical Network (EON) concept [12]–[15]
has emerged as a novel optical network architecture able to
respond to the increased need of elasticity in allocating optical
network resources. In contrast to traditional fixed-grid Wave-
length Division Multiplexing (WDM) networks, EON offers
flexible (almost continuous) bandwidth allocation. Resource
allocation in EON can be performed to adapt to the several
above-mentioned decision variables made available by new
transmission systems, including different transmission tech-
niques, such as Orthogonal Frequency Division Multiplexing
(OFDM), Nyquist WDM (NWDM), transponder types (e.g.,
BVT
1
, S-BVT), modulation formats (e.g., QPSK, QAM), and
coding rates. This flexibility makes the resource allocation
problems much more challenging for network engineers. At
control plane, dynamic control, as in Software-defined net-
working (SDN), promises to enable long-awaited on-demand
reconfiguration and virtualization. Moreover, reconfiguring the
optical substrate poses several challenges in terms of, e.g.,
network re-optimization, spectrum fragmentation, amplifier
power settings, unexpected penalties due to non-linearities,
which call for strict integration between the control elements
(SDN controllers, network orchestrators) and optical perfor-
mance monitors working at the equipment level.
All these degrees of freedom and limitations do pose severe
challenges to system and network engineers when it comes
to deciding what the best system and/or network design
is. Machine learning is currently perceived as a paradigm
shift for the design of future optical networks and systems.
These techniques should allow to infer, from data obtained
by various types of monitors (e.g., signal quality, traffic
samples, etc.), useful characteristics that could not be easily or
directly measured. Some envisioned applications in the optical
domain include fault prediction, intrusion detection, physical-
flow security, impairment-aware routing, low-margin design,
traffic-aware capacity reconfigurations, but many others can
be envisioned and will be surveyed in the next sections.
The survey is organized as follows. In Section II, we
overview some preliminary ML concepts, focusing especially
on those targeted in the following sections. In Section III
we discuss the main motivations behind the application of
ML in the optical domain and we classify the main areas of
applications. In Section IV and Section V, we classify and
1
For a complete list of acronyms, the reader is referred to the Glossary at
the end of the paper.
summarize a large number of studies describing applications
of ML at the transmission layer and network layer. In Section
VI, we quantitatively overview a selection of existing papers,
identifying, for some of the applications described in Section
III, the ML algorithms which demonstrated higher effective-
ness for each specific use case, and the performance metrics
considered for the algorithms evaluation. Finally, Section VII
discusses some possible open areas of research and future
directions, whereas Section VIII concludes the paper.
II. OVERVIEW OF MACHINE LEARNING METHODS USED IN
OPTICAL NETWORKS
This section provides an overview of some of the most
popular algorithms that are commonly classified as machine
learning. The literature on ML is so extensive that even a
superficial overview of all the main ML approaches goes far
beyond the possibilities of this section, and the readers can
refer to a number of fundamental books on the subjects [16]–
[20]. However, in this section we provide a high level view of
the main ML techniques that are used in the work we reference
in the remainder of this paper. We here provide the reader
with some basic insights that might help better understand the
remaining parts of this survey paper. We divide the algorithms
in three main categories, described in the next sections, which
are also represented in Fig. 1: supervised learning, unsuper-
vised learning and reinforcement learning. Semi-supervised
learning, a hybrid of supervised and unsupervised learning, is
also introduced. ML algorithms have been successfully applied
to a wide variety of problems. Before delving into the different
ML methods, it is worth pointing out that, in the context of
telecommunication networks, there has been over a decade
of research on the application of ML techniques to wireless
networks, ranging from opportunistic spectrum access [21] to
channel estimation and signal detection in OFDM systems
[22], to Multiple-Input-Multiple-Output communications [23],
and dynamic frequency reuse [24].
A. Supervised learning
Supervised learning is used in a variety of applications, such
as speech recognition, spam detection and object recognition.
The goal is to predict the value of one or more output variables
given the value of a vector of input variables x. The output
variable can be a continuous variable (regression problem)
or a discrete variable (classification problem). A training
data set comprises N samples of the input variables and
the corresponding output values. Different learning methods
construct a function y(x) that allows to predict the value
of the output variables in correspondence to a new value of
the inputs. Supervised learning can be broken down into two
main classes, described below: parametric models, where the
number of parameters to use in the model is fixed, and non-
parametric models, where their number is dependent on the
training set.
1) Parametric models: In this case, the function y is a
combination of a fixed number of parametric basis functions.
These models use training data to estimate a fixed set of
parameters w. After the learning stage, the training data can

1553-877X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2018.2880039, IEEE
Communications Surveys & Tutorials
3
(a) Supervised Learning: the algorithm is trained on dataset that
consists of paths, wavelengths, modulation and the corresponding
BER. Then it extrapolates the BER in correspondence to new inputs.
(b) Unsupervised Learning: the algorithm identifies unusual patterns
in the data, consisting of wavelengths, paths, BER, and modulation.
(c) Reinforcement Learning: the algorithm learns by receiving
feedback on the effect of modifying some parameters, e.g. the
power and the modulation
Fig. 1: Overview of machine learning algorithms applied to
optical networks.
be discarded since the prediction in correspondence to new
inputs is computed using only the learned parameters w.
Linear models for regression and classification, which consist
of a linear combination of fixed nonlinear basis functions,
X
0
=1%
X
1%
X
n%
.%
.%
.%
h
0
=1%
h
1%
h
2%
h
m%
.%
.%
.%
y
1
%
y
k`%
.%
.%
.%
Σ"
X
0
=1%
X
1
%
X
n
%
…%
W
mo
(1)
%
W
m1
(1)
%
W
mn
(1)%
Inputs"
Weights" Ac0va0on"
Bias"
Input"
Fig. 2: Example of a NN with two layers of adaptive param-
eters. The bias parameters of the input layer and the hidden
layer are represented as weights from additional units with
fixed value 1 (x
0
and h
0
).
are the simplest parametric models in terms of analytical and
computational properties. Many different choices are available
for the basis functions: from polynomial to Gaussian, to
sigmoidal, to Fourier basis, etc. In case of multiple output
values, it is possible to use separate basis functions for each
component of the output or, more commonly, apply the same
set of basis functions for all the components. Note that these
models are linear in the parameters w, and this linearity
results in a number of advantageous properties, e.g., closed-
form solutions to the least-squares problem. However, their
applicability is limited to problems with low-dimensional input
space. In the remainder of this subsection we focus on neural
networks (NNs)
2
, since they are the most successful example
of parametric models.
NNs apply a series of functional transformations to the
inputs (see chapter V in [16], chapter VI in [17], and chapter
XVI in [20]). A NN is a network of units or neurons. The
basis function or activation function used by each unit is
a nonlinear function of a linear combination of the unit’s
inputs. Each neuron has a bias parameter that allows for any
fixed offset in the data. The bias is incorporated in the set of
parameters by adding a dummy input of unitary value to each
unit (see Figure 2). The coefficients of the linear combination
are the parameters w estimated during the training. The most
commonly used nonlinear functions are the logistic sigmoid
and the hyperbolic tangent. The activation function of the
output units of the NN is the identity function, the logistic
sigmoid function, and the softmax function, for regression,
binary classification, and multiclass classification problems
respectively.
Different types of connections between the units result in
different NNs with distinct characteristics. All units between
the inputs and output of the NN are called hidden units. In
2
Note that NNs are often referred to as Artificial Neural Networks (ANNs).
In this paper we use these two terms interchangeably.

1553-877X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2018.2880039, IEEE
Communications Surveys & Tutorials
4
the case of a NN, the network is a directed acyclic graph.
Typically, NNs are organized in layers, with units in each layer
receiving inputs only from units in the immediately preceding
layer and forwarding their output only to the immediately
following layer. NNs with one layer of hidden units and linear
output units can approximate arbitrary well any continuous
function on a compact domain provided that a sufficient
number of hidden units is used [25].
Given a training set, a NN is trained by minimizing an error
function with respect to the set of parameters w. Depending
on the type of problem and the corresponding choice of
activation function of the output units, different error functions
are used. Typically in case of regression models, the sum
of square error is used, whereas for classification the cross-
entropy error function is adopted. It is important to note that
the error function is a non convex function of the network
parameters, for which multiple optimal local solutions exist.
Iterative numerical methods based on gradient information are
the most common methods used to find the vector w that min-
imizes the error function. For a NN the error backpropagation
algorithm, which provides an efficient method for evaluating
the derivatives of the error function with respect to w, is the
most commonly used.
We should at this point mention that, before training the
network, the training set is typically pre-processed by applying
a linear transformation to rescale each of the input variables
independently in case of continuous data or discrete ordinal
data. The transformed variables have zero mean and unit
standard deviation. The same procedure is applied to the target
values in case of regression problems. In case of discrete
categorical data, a 1-of-K coding scheme is used. This form of
pre-processing is known as feature normalization and it is used
before training most ML algorithms since most models are
designed with the assumption that all features have comparable
scales
3
.
2) Nonparametric models: In nonparametric methods the
number of parameters depends on the training set. These
methods keep a subset or the entirety of the training data
and use them during prediction. The most used approaches
are k-nearest neighbor models (see chapter IV in [17]) and
support vector machines (SVMs) (see chapter VII in [16] and
chapter XIV in [20]). Both can be used for regression and
classification problems.
In the case of k-nearest neighbor methods, all training
data samples are stored (training phase). During prediction,
the k-nearest samples to the new input value are retrieved.
For classification problem, a voting mechanism is used; for
regression problems, the mean or median of the k nearest
samples provides the prediction. To select the best value of k,
cross-validation [26] can be used. Depending on the dimension
of the training set, iterating through all samples to compute
the closest k neighbors might not be feasible. In this case, k-d
trees or locality-sensitive hash tables can be used to compute
the k-nearest neighbors.
In SVMs, basis functions are centered on training samples;
the training procedure selects a subset of the basis functions.
3
However, decision tree based models are a well-known exception.
The number of selected basis functions, and the number of
training samples that have to be stored, is typically much
smaller than the cardinality of the training dataset. SVMs
build a linear decision boundary with the largest possible
distance from the training samples. Only the closest points to
the separators, the support vectors, are stored. To determine
the parameters of SVMs, a nonlinear optimization problem
with a convex objective function has to be solved, for which
efficient algorithms exist. An important feature of SVMs is
that by applying a kernel function they can embed data into a
higher dimensional space, in which data points can be linearly
separated. The kernel function measures the similarity between
two points in the input space; it is expressed as the inner
product of the input points mapped into a higher dimension
feature space in which data become linearly separable. The
simplest example is the linear kernel, in which the mapping
function is the identity function. However, provided that we
can express everything in terms of kernel evaluations, it is not
necessary to explicitly compute the mapping in the feature
space. Indeed, in the case of one of the most commonly used
kernel functions, the Gaussian kernel, the feature space has
infinite dimensions.
B. Unsupervised learning
Social network analysis, genes clustering and market re-
search are among the most successful applications of unsu-
pervised learning methods.
In the case of unsupervised learning the training dataset
consists only of a set of input vectors x. While unsupervised
learning can address different tasks, clustering or cluster
analysis is the most common.
Clustering is the process of grouping data so that the intra-
cluster similarity is high, while the inter-cluster similarity
is low. The similarity is typically expressed as a distance
function, which depends on the type of data. There exists
a variety of clustering approaches. Here, we focus on two
algorithms, k-means and Gaussian mixture model as exam-
ples of partitioning approaches and model-based approaches,
respectively, given their wide area of applicability. The reader
is referred to [27] for a comprehensive overview of cluster
analysis.
k-means is perhaps the most well-known clustering algo-
rithm (see chapter X in [27]). It is an iterative algorithm
starting with an initial partition of the data into k clusters.
Then the centre of each cluster is computed and data points are
assigned to the cluster with the closest centre. The procedure
- centre computation and data assignment - is repeated until
the assignment does not change or a predefined maximum
number of iterations is exceeded. Doing so, the algorithm may
terminate at a local optimum partition. Moreover, k-means is
well known to be sensitive to outliers. It is worth noting that
there exists ways to compute k automatically [26], and an
online version of the algorithm exists.
While k-means assigns each point uniquely to one cluster,
probabilistic approaches allow a soft assignment and provide
a measure of the uncertainty associated with the assign-
ment. Figure 3 shows the difference between k-means and

1553-877X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2018.2880039, IEEE
Communications Surveys & Tutorials
5
Fig. 3: Difference between k-means and Gaussian mixture
model clustering a given set of data samples.
a probabilistic Gaussian Mixture Model (GMM). GMM, a
linear superposition of Gaussian distributions, is one of the
most widely used probabilistic approaches to clustering. The
parameters of the model are the mixing coefficient of each
Gaussian component, the mean and the covariance of each
Gaussian distribution. To maximize the log likelihood function
with respect to the parameters given a dataset, the expectation
maximization algorithm is used, since no closed form solution
exists in this case. The initialization of the parameters can be
done using k-means. In particular, the mean and covariance
of each Gaussian component can be initialized to sample
means and covariances of the cluster obtained by k-means,
and the mixing coefficients can be set to the fraction of data
points assigned by k-means to each cluster. After initializing
the parameters and evaluating the initial value of the log
likelihood, the algorithm alternates between two steps. In the
expectation step, the current values of the parameters are
used to determine the “responsibility” of each component for
the observed data (i.e., the conditional probability of latent
variables given the dataset). The maximization step uses these
responsibilities to compute a maximum likelihood estimate of
the model’s parameters. Convergence is checked with respect
to the log likelihood function or the parameters.
C. Semi-supervised learning
Semi-supervised learning methods are a hybrid of the pre-
vious two introduced above, and address problems in which
most of the training samples are unlabeled, while only a few
labeled data points are available. The obvious advantage is that
in many domains a wealth of unlabeled data points is readily
available. Semi-supervised learning is used for the same type
of applications as supervised learning. It is particularly useful
when labeled data points are not so common or too expensive
to obtain and the use of available unlabeled data can improve
performance.
Self-training is the oldest form of semi-supervised learning
[28]. It is an iterative process; during the first stage only la-
beled data points are used by a supervised learning algorithm.
Then, at each step, some of the unlabeled points are labeled
according to the prediction resulting for the trained decision
function and these points are used along with the original
labeled data to retrain using the same supervised learning
algorithm. This procedure is shown in Fig. 4.
Since the introduction of self-training, the idea of using la-
beled and unlabeled data has resulted in many semi-supervised
Fig. 4: Sample step of the self-training mechanism, where an
unlabeled point is matched against labeled data to become part
of the labeled data set.
learning algorithms. According to the classification proposed
in [28], semi-supervised learning techniques can be organized
in four classes: i) methods based on generative models
4
; ii)
methods based on the assumption that the decision boundary
should lie in a low-density region; iii) graph-based methods;
iv) two-step methods (first an unsupervised learning step to
change the data representation or construct a new kernel; then
a supervised learning step based on the new representation or
kernel).
D. Reinforcement Learning
Reinforcement Learning (RL) is used, in general, to address
applications such as robotics, finance (investment decisions),
inventory management, where the goal is to learn a policy, i.e.,
a mapping between states of the environment into actions to
be performed, while directly interacting with the environment.
The RL paradigm allows agents to learn by exploring the
available actions and refining their behavior using only an
evaluative feedback, referred to as the reward. The agent’s
goal is to maximize its long-term performance. Hence, the
agent does not just take into account the immediate reward,
but it evaluates the consequences of its actions on the future.
Delayed reward and trial-and-error constitute the two most
significant features of RL.
RL is usually performed in the context of Markov deci-
sion processes (MDP). The agent’s perception at time k is
represented as a state s
k
S, where S is the finite set of
environment states. The agent interacts with the environment
by performing actions. At time k the agent selects an action
a
k
A, where A is the finite set of actions of the agent,
which could trigger a transition to a new state. The agent will
4
Generative methods estimate the joint distribution of the input and
output variables. From the joint distribution one can obtain the conditional
distribution p(y|x), which is then used to predict the output values in
correspondence to new input values. Generative methods can exploit both
labeled and unlabeled data.

Citations
More filters
Journal ArticleDOI

A Very Brief Introduction to Machine Learning With Applications to Communication Systems

TL;DR: In this paper, the authors provide a high-level introduction to the basics of supervised and unsupervised learning, exemplifying applications to communication networks by distinguishing tasks carried out at the edge and at the cloud segments of the network at different layers of the protocol stack, with an emphasis on the physical layer.
Posted Content

A Survey of Multi-Access Edge Computing in 5G and Beyond: Fundamentals, Technology Integration, and State-of-the-Art

TL;DR: This survey provides a holistic overview of MEC technology and its potential use cases and applications, and outlines up-to-date researches on the integration of M EC with the new technologies that will be deployed in 5G and beyond.
Journal ArticleDOI

An Optical Communication's Perspective on Machine Learning and Its Applications

TL;DR: The mathematical foundations of basic ML techniques from communication theory and signal processing perspectives are described, which in turn will shed light on the types of problems in optical communications and networking that naturally warrant ML use.
Journal ArticleDOI

Machine learning for network automation: overview, architecture, and applications [Invited Tutorial]

TL;DR: This tutorial paper reviews several machine learning concepts tailored to the optical networking industry and discusses algorithm choices, data and model management strategies, and integration into existing network control and management tools.
References
More filters
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Journal Article

Dropout: a simple way to prevent neural networks from overfitting

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Book

Data Mining: Concepts and Techniques

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Book

Pattern Recognition and Machine Learning

TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Related Papers (5)
Frequently Asked Questions (16)
Q1. What are the common types of parametric models?

Linear models for regression and classification, which consist of a linear combination of fixed nonlinear basis functions,are the simplest parametric models in terms of analytical and computational properties. 

This information can be retrieved from network traffic traces, network alarms, signal quality indicators, users ’ behavioral data, etc. In this paper the authors provide an overview of the application of ML to optical communications and networking. The authors classify and survey relevant literature dealing with the topic, and they also provide an introductory tutorial on ML for researchers and practitioners interested in this field. Although a good number of research papers have recently appeared, the application of ML to optical networks is still in its infancy: to stimulate further work in this area, the authors conclude the paper proposing new possible research directions. Among these mathematical tools, Machine Learning ( ML ) is regarded as one of the most promising methodological approaches to perform network-data analysis and enable automated network self-configuration and fault management. 

The authors thus envisage that, after learning from a batch of available past samples, other types of algorithms, in the field of semisupervised and/or unsupervised ML, could be implemented to gradually take in novel input data as they are made available by the network control plane. Under a different perspective, re-training of supervised mechanisms must be investigated to extend their applicability to, e. g., different network infrastructures ( the training on a given topology might not be valid for a different topology ) or to the same network infrastructure at a different point in time ( the training performed in a certain week/month/year might not be valid anymore after some time ). Although, to the best of their knowledge, no specific activity is currently undergoing with dedicated focus on optical networks, it is worth mentioning, e. g., ITU-T focus group on ML [ 122 ], whose activities are concentrated on various aspects of future networking, such as architectures, interfaces, protocols, algorithms and data formats. Finally, an interesting, though speculative, area of future research is the application of ML to all-optical devices and networks. 

Social network analysis, genes clustering and market research are among the most successful applications of unsupervised learning methods. 

unsupervised learning algorithms can be also used to extract common traffic patterns in different portions of the network. 

Another option that is very useful in case of a limited dataset is to use cross-validation so that as much of the available data as possible is exploited for training. 

With increasing capacity requirements for optical communication systems, performance monitoring is vital to ensure robust and reliable networks. 

To reduce the amount of monitors to deploy in the system, especially at intermediate points of the lightpaths, supervised learning algorithms can be used to learn the mapping between the optical fiber channel parameters and the properties of the detected signal at the receiver, which can be retrieved, e.g., by observing statistics of power eye diagrams, signal amplitude, OSNR, etc. 

The advantage of manually providing the features to the algorithm is that the NN can be relatively simple, e.g., consisting of one hidden layer and up to 10 hidden units and does not require large amount of data to be trained. 

path computation is performed by using cost-based routing algorithms, such as Dijkstra, Bellman-Ford, Yen algorithms, which rely on the definition of a pre-defined cost metric (e.g., based on the distance between source and destination, the end-to-end delay, the energy consumption, or even a combination of several metrics) to discriminate between alternative paths. 

The effectiveness of the Bayesian classifier is assessed in an experimental testbed: results show that only 0.8% of the tested instances were misclassified. 

According to the classification proposed in [28], semi-supervised learning techniques can be organized in four classes: i) methods based on generative models4; ii) methods based on the assumption that the decision boundary should lie in a low-density region; iii) graph-based methods; iv) two-step methods (first an unsupervised learning step to change the data representation or construct a new kernel; then a supervised learning step based on the new representation or kernel). 

In context of the FEELING algorithm, some secondary features are also defined in [98] which are linear combinations of the primary features. 

The first reference compares the performance of 6 unsupervised clustering algorithms to discriminate among 5 different formats (i.e. BPSK, QPSK, 8-PSK, 8-QAM, 16-QAM) in terms of True Positive Rate and running time depending on the OSNR at the receiver. 

the SVM classifies the signal suffering from filter-related failures into two classes based on whether the failure is due to tight filtering or due to filter shift. 

The trade-off between database size, computational time and effectiveness of the classification performance is extensively studied: in [40], the technique is shown to outperform state-of-the-art ML algorithms such as Naive Bayes, J48 tree and Random Forests (RFs).