What are the common types of parametric models?

Linear models for regression and classification, which consist of a linear combination of fixed nonlinear basis functions,are the simplest parametric models in terms of analytical and computational properties.

What are the contributions mentioned in the paper "An overview on application of machine learning techniques in optical networks" ?

This information can be retrieved from network traffic traces, network alarms, signal quality indicators, users ’ behavioral data, etc. In this paper the authors provide an overview of the application of ML to optical communications and networking. The authors classify and survey relevant literature dealing with the topic, and they also provide an introductory tutorial on ML for researchers and practitioners interested in this field. Although a good number of research papers have recently appeared, the application of ML to optical networks is still in its infancy: to stimulate further work in this area, the authors conclude the paper proposing new possible research directions. Among these mathematical tools, Machine Learning ( ML ) is regarded as one of the most promising methodological approaches to perform network-data analysis and enable automated network self-configuration and fault management.

What future works have the authors mentioned in the paper "An overview on application of machine learning techniques in optical networks" ?

The authors thus envisage that, after learning from a batch of available past samples, other types of algorithms, in the field of semisupervised and/or unsupervised ML, could be implemented to gradually take in novel input data as they are made available by the network control plane. Under a different perspective, re-training of supervised mechanisms must be investigated to extend their applicability to, e. g., different network infrastructures ( the training on a given topology might not be valid for a different topology ) or to the same network infrastructure at a different point in time ( the training performed in a certain week/month/year might not be valid anymore after some time ). Although, to the best of their knowledge, no specific activity is currently undergoing with dedicated focus on optical networks, it is worth mentioning, e. g., ITU-T focus group on ML [ 122 ], whose activities are concentrated on various aspects of future networking, such as architectures, interfaces, protocols, algorithms and data formats. Finally, an interesting, though speculative, area of future research is the application of ML to all-optical devices and networks.

What can be used to extract common traffic patterns in different portions of the network?

unsupervised learning algorithms can be also used to extract common traffic patterns in different portions of the network.

What is the way to use a limited dataset?

Another option that is very useful in case of a limited dataset is to use cross-validation so that as much of the available data as possible is exploited for training.

What is the importance of performance monitoring?

With increasing capacity requirements for optical communication systems, performance monitoring is vital to ensure robust and reliable networks.

What can be used to reduce the amount of monitors to deploy in the system?

To reduce the amount of monitors to deploy in the system, especially at intermediate points of the lightpaths, supervised learning algorithms can be used to learn the mapping between the optical fiber channel parameters and the properties of the detected signal at the receiver, which can be retrieved, e.g., by observing statistics of power eye diagrams, signal amplitude, OSNR, etc.

What is the advantage of manually providing features to the algorithm?

The advantage of manually providing the features to the algorithm is that the NN can be relatively simple, e.g., consisting of one hidden layer and up to 10 hidden units and does not require large amount of data to be trained.

What is the common method of calculating the cost of a path?

path computation is performed by using cost-based routing algorithms, such as Dijkstra, Bellman-Ford, Yen algorithms, which rely on the definition of a pre-defined cost metric (e.g., based on the distance between source and destination, the end-to-end delay, the energy consumption, or even a combination of several metrics) to discriminate between alternative paths.

How many instances of the Bayesian classifier were misclassified?

The effectiveness of the Bayesian classifier is assessed in an experimental testbed: results show that only 0.8% of the tested instances were misclassified.

What are the secondary features of the FEELING algorithm?

In context of the FEELING algorithm, some secondary features are also defined in [98] which are linear combinations of the primary features.

What is the first reference to compare the performance of unsupervised clustering algorithms?

The first reference compares the performance of 6 unsupervised clustering algorithms to discriminate among 5 different formats (i.e. BPSK, QPSK, 8-PSK, 8-QAM, 16-QAM) in terms of True Positive Rate and running time depending on the OSNR at the receiver.

What is the class of the signal that is suffering from a filter-related failure?

the SVM classifies the signal suffering from filter-related failures into two classes based on whether the failure is due to tight filtering or due to filter shift.

What is the trade-off between database size and computational time?

The trade-off between database size, computational time and effectiveness of the classification performance is extensively studied: in [40], the technique is shown to outperform state-of-the-art ML algorithms such as Naive Bayes, J48 tree and Random Forests (RFs).

(Open Access) An Overview on Application of Machine Learning Techniques in Optical Networks (2019) | Francesco Musumeci

Q: What are the successful applications of unsupervised learning methods?

Social network analysis, genes clustering and market research are among the most successful applications of unsupervised learning methods.

1553-877X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2018.2880039, IEEE

Communications Surveys & Tutorials

An Overview on Application of Machine Learning

Techniques in Optical Networks

Francesco Musumeci, Member, IEEE, Cristina Rottondi, Member, IEEE, Avishek Nag, Member, IEEE, Irene

Macaluso, Darko Zibar, Member, IEEE, Marco Rufﬁni, Senior Member, IEEE, and Massimo

Tornatore, Senior Member, IEEE

Abstract—Today’s telecommunication networks have become

sources of enormous amounts of widely heterogeneous data. This

information can be retrieved from network trafﬁc traces, network

alarms, signal quality indicators, users’ behavioral data, etc.

Advanced mathematical tools are required to extract meaningful

information from these data and take decisions pertaining to the

proper functioning of the networks from the network-generated

data. Among these mathematical tools, Machine Learning (ML)

is regarded as one of the most promising methodological ap-

proaches to perform network-data analysis and enable automated

network self-conﬁguration and fault management.

The adoption of ML techniques in the ﬁeld of optical com-

munication networks is motivated by the unprecedented growth

of network complexity faced by optical networks in the last

few years. Such complexity increase is due to the introduction

of a huge number of adjustable and interdependent system

parameters (e.g., routing conﬁgurations, modulation format,

symbol rate, coding schemes, etc.) that are enabled by the usage

of coherent transmission/reception technologies, advanced digital

signal processing and compensation of nonlinear effects in optical

ﬁber propagation.

In this paper we provide an overview of the application of

ML to optical communications and networking. We classify and

survey relevant literature dealing with the topic, and we also

provide an introductory tutorial on ML for researchers and

practitioners interested in this ﬁeld. Although a good number of

research papers have recently appeared, the application of ML

to optical networks is still in its infancy: to stimulate further

work in this area, we conclude the paper proposing new possible

research directions.

Index Terms—Machine learning, Data analytics, Optical com-

munications and networking, Neural networks, Bit Error Rate,

Optical Signal-to-Noise Ratio, Network monitoring.

I. INTRODUCTION

Machine learning (ML) is a branch of Artiﬁcial Intelligence

that pushes forward the idea that, by giving access to the

right data, machines can learn by themselves how to solve a

speciﬁc problem [1]. By leveraging complex mathematical and

statistical tools, ML renders machines capable of performing

independently intellectual tasks that have been traditionally

Francesco Musumeci and Massimo Tornatore are with Po-

litecnico di Milano, Italy, e-mail: francesco.musumeci@polimi.it,

massimo.tornatore@polimi.it

Cristina Rottondi is with Dalle Molle Institute for Artiﬁcial Intelligence,

Switzerland, email: cristina.rottondi@supsi.ch.

Avishek Nag is with University College Dublin, Ireland, email:

avishek.nag@ucd.ie.

Irene Macaluso and Marco Rufﬁni are with Trinity College Dublin, Ireland,

email: macalusi@tcd.ie, rufﬁnm@tcd.ie.

Darko Zibar is with Technical University of Denmark, Denmark, email:

dazi@fotonik.dtu.dk.

solved by human beings. This idea of automating complex

tasks has generated high interest in the networking ﬁeld, on

the expectation that several activities involved in the design

and operation of communication networks can be ofﬂoaded to

machines. Some applications of ML in different networking

areas have already matched these expectations in areas such

as intrusion detection [2], trafﬁc classiﬁcation [3], cognitive

radios [4].

Among various networking areas, in this paper we focus

on ML for optical networking. Optical networks constitute

the basic physical infrastructure of all large-provider networks

worldwide, thanks to their high capacity, low cost and many

other attractive properties [5]. They are now penetrating new

important telecom markets as datacom [6] and the access

segment [7], and there is no sign that a substitute technology

might appear in the foreseeable future. Different approaches

to improve the performance of optical networks have been

investigated, such as routing, wavelength assignment, trafﬁc

grooming and survivability [8], [9].

In this paper we give an overview of the application of

ML to optical networking. Speciﬁcally, the contribution of

the paper is twofold, namely, i) we provide an introductory

tutorial on the use of ML methods and on their application in

the optical networks ﬁeld, and ii) we survey the existing work

dealing with the topic, also performing a classiﬁcation of the

various use cases addressed in literature so far. We cover both

the areas of optical communication and optical networking

to potentially stimulate new cross-layer research directions.

In fact, ML application can be useful especially in cross-layer

settings, where data analysis at physical layer, e.g., monitoring

Bit Error Rate (BER), can trigger changes at network layer,

e.g., in routing, spectrum and modulation format assignments.

The application of ML to optical communication and network-

ing is still in its infancy and the literature survey included

in this paper aims at providing an introductory reference for

researchers and practitioners willing to get acquainted with

existing ML applications as well as to investigate new research

directions.

A legitimate question that arises in the optical networking

ﬁeld today is: why machine learning, a methodological area

that has been applied and investigated for at least three

decades, is only gaining momentum now? The answer is

certainly very articulated, and it most likely involves not purely

technical aspects [10]. From a technical perspective though, re-

cent technical progress at both optical communication system

and network level is at the basis of an unprecedented growth

Communications Surveys & Tutorials

in the complexity of optical networks.

On a system side, while optical channel modeling has

always been complex, the recent adoption of coherent tech-

nologies [11] has made modeling even more difﬁcult by

introducing a plethora of adjustable design parameters (as

modulation formats, symbol rates, adaptive coding rates and

ﬂexible channel spacing) to optimize transmission systems in

terms of bit-rate transmission distance product. In addition,

what makes this optimization even more challenging is that

the optical channel is highly nonlinear.

From a networking perspective, the increased complexity of

the underlying transmission systems is reﬂected in a series of

advancements in both data plane and control plane. At data

plane, the Elastic Optical Network (EON) concept [12]–[15]

has emerged as a novel optical network architecture able to

respond to the increased need of elasticity in allocating optical

network resources. In contrast to traditional ﬁxed-grid Wave-

length Division Multiplexing (WDM) networks, EON offers

ﬂexible (almost continuous) bandwidth allocation. Resource

allocation in EON can be performed to adapt to the several

above-mentioned decision variables made available by new

transmission systems, including different transmission tech-

niques, such as Orthogonal Frequency Division Multiplexing

(OFDM), Nyquist WDM (NWDM), transponder types (e.g.,

BVT

, S-BVT), modulation formats (e.g., QPSK, QAM), and

coding rates. This ﬂexibility makes the resource allocation

problems much more challenging for network engineers. At

control plane, dynamic control, as in Software-deﬁned net-

working (SDN), promises to enable long-awaited on-demand

reconﬁguration and virtualization. Moreover, reconﬁguring the

optical substrate poses several challenges in terms of, e.g.,

network re-optimization, spectrum fragmentation, ampliﬁer

power settings, unexpected penalties due to non-linearities,

which call for strict integration between the control elements

(SDN controllers, network orchestrators) and optical perfor-

mance monitors working at the equipment level.

All these degrees of freedom and limitations do pose severe

challenges to system and network engineers when it comes

to deciding what the best system and/or network design

is. Machine learning is currently perceived as a paradigm

shift for the design of future optical networks and systems.

These techniques should allow to infer, from data obtained

by various types of monitors (e.g., signal quality, trafﬁc

samples, etc.), useful characteristics that could not be easily or

directly measured. Some envisioned applications in the optical

domain include fault prediction, intrusion detection, physical-

ﬂow security, impairment-aware routing, low-margin design,

trafﬁc-aware capacity reconﬁgurations, but many others can

be envisioned and will be surveyed in the next sections.

The survey is organized as follows. In Section II, we

overview some preliminary ML concepts, focusing especially

on those targeted in the following sections. In Section III

we discuss the main motivations behind the application of

ML in the optical domain and we classify the main areas of

applications. In Section IV and Section V, we classify and

For a complete list of acronyms, the reader is referred to the Glossary at

the end of the paper.

summarize a large number of studies describing applications

of ML at the transmission layer and network layer. In Section

VI, we quantitatively overview a selection of existing papers,

identifying, for some of the applications described in Section

III, the ML algorithms which demonstrated higher effective-

ness for each speciﬁc use case, and the performance metrics

considered for the algorithms evaluation. Finally, Section VII

discusses some possible open areas of research and future

directions, whereas Section VIII concludes the paper.

II. OVERVIEW OF MACHINE LEARNING METHODS USED IN

OPTICAL NETWORKS

This section provides an overview of some of the most

popular algorithms that are commonly classiﬁed as machine

learning. The literature on ML is so extensive that even a

superﬁcial overview of all the main ML approaches goes far

beyond the possibilities of this section, and the readers can

refer to a number of fundamental books on the subjects [16]–

[20]. However, in this section we provide a high level view of

the main ML techniques that are used in the work we reference

in the remainder of this paper. We here provide the reader

with some basic insights that might help better understand the

remaining parts of this survey paper. We divide the algorithms

in three main categories, described in the next sections, which

are also represented in Fig. 1: supervised learning, unsuper-

vised learning and reinforcement learning. Semi-supervised

learning, a hybrid of supervised and unsupervised learning, is

also introduced. ML algorithms have been successfully applied

to a wide variety of problems. Before delving into the different

ML methods, it is worth pointing out that, in the context of

telecommunication networks, there has been over a decade

of research on the application of ML techniques to wireless

networks, ranging from opportunistic spectrum access [21] to

channel estimation and signal detection in OFDM systems

[22], to Multiple-Input-Multiple-Output communications [23],

and dynamic frequency reuse [24].

A. Supervised learning

Supervised learning is used in a variety of applications, such

as speech recognition, spam detection and object recognition.

The goal is to predict the value of one or more output variables

given the value of a vector of input variables x. The output

variable can be a continuous variable (regression problem)

or a discrete variable (classiﬁcation problem). A training

data set comprises N samples of the input variables and

the corresponding output values. Different learning methods

construct a function y(x) that allows to predict the value

of the output variables in correspondence to a new value of

the inputs. Supervised learning can be broken down into two

main classes, described below: parametric models, where the

number of parameters to use in the model is ﬁxed, and non-

parametric models, where their number is dependent on the

training set.

1) Parametric models: In this case, the function y is a

combination of a ﬁxed number of parametric basis functions.

These models use training data to estimate a ﬁxed set of

parameters w. After the learning stage, the training data can

Communications Surveys & Tutorials

(a) Supervised Learning: the algorithm is trained on dataset that

consists of paths, wavelengths, modulation and the corresponding

BER. Then it extrapolates the BER in correspondence to new inputs.

(b) Unsupervised Learning: the algorithm identiﬁes unusual patterns

in the data, consisting of wavelengths, paths, BER, and modulation.

feedback on the effect of modifying some parameters, e.g. the

power and the modulation

Fig. 1: Overview of machine learning algorithms applied to

optical networks.

be discarded since the prediction in correspondence to new

inputs is computed using only the learned parameters w.

Linear models for regression and classiﬁcation, which consist

of a linear combination of ﬁxed nonlinear basis functions,

=1%

k`%

Σ"

=1%

…%

(1)

(1)%

Inputs"

Weights" Ac0va0on"

Bias"

Input"

Fig. 2: Example of a NN with two layers of adaptive param-

eters. The bias parameters of the input layer and the hidden

layer are represented as weights from additional units with

ﬁxed value 1 (x

and h

are the simplest parametric models in terms of analytical and

computational properties. Many different choices are available

for the basis functions: from polynomial to Gaussian, to

sigmoidal, to Fourier basis, etc. In case of multiple output

values, it is possible to use separate basis functions for each

component of the output or, more commonly, apply the same

set of basis functions for all the components. Note that these

models are linear in the parameters w, and this linearity

results in a number of advantageous properties, e.g., closed-

form solutions to the least-squares problem. However, their

applicability is limited to problems with low-dimensional input

space. In the remainder of this subsection we focus on neural

networks (NNs)

, since they are the most successful example

of parametric models.

NNs apply a series of functional transformations to the

inputs (see chapter V in [16], chapter VI in [17], and chapter

XVI in [20]). A NN is a network of units or neurons. The

basis function or activation function used by each unit is

a nonlinear function of a linear combination of the unit’s

inputs. Each neuron has a bias parameter that allows for any

ﬁxed offset in the data. The bias is incorporated in the set of

parameters by adding a dummy input of unitary value to each

unit (see Figure 2). The coefﬁcients of the linear combination

are the parameters w estimated during the training. The most

commonly used nonlinear functions are the logistic sigmoid

and the hyperbolic tangent. The activation function of the

output units of the NN is the identity function, the logistic

sigmoid function, and the softmax function, for regression,

binary classiﬁcation, and multiclass classiﬁcation problems

respectively.

Different types of connections between the units result in

different NNs with distinct characteristics. All units between

the inputs and output of the NN are called hidden units. In

Note that NNs are often referred to as Artiﬁcial Neural Networks (ANNs).

In this paper we use these two terms interchangeably.

Communications Surveys & Tutorials

the case of a NN, the network is a directed acyclic graph.

Typically, NNs are organized in layers, with units in each layer

receiving inputs only from units in the immediately preceding

layer and forwarding their output only to the immediately

following layer. NNs with one layer of hidden units and linear

output units can approximate arbitrary well any continuous

function on a compact domain provided that a sufﬁcient

number of hidden units is used [25].

Given a training set, a NN is trained by minimizing an error

function with respect to the set of parameters w. Depending

on the type of problem and the corresponding choice of

activation function of the output units, different error functions

are used. Typically in case of regression models, the sum

of square error is used, whereas for classiﬁcation the cross-

entropy error function is adopted. It is important to note that

the error function is a non convex function of the network

parameters, for which multiple optimal local solutions exist.

Iterative numerical methods based on gradient information are

the most common methods used to ﬁnd the vector w that min-

imizes the error function. For a NN the error backpropagation

algorithm, which provides an efﬁcient method for evaluating

the derivatives of the error function with respect to w, is the

most commonly used.

We should at this point mention that, before training the

network, the training set is typically pre-processed by applying

a linear transformation to rescale each of the input variables

independently in case of continuous data or discrete ordinal

data. The transformed variables have zero mean and unit

standard deviation. The same procedure is applied to the target

values in case of regression problems. In case of discrete

categorical data, a 1-of-K coding scheme is used. This form of

pre-processing is known as feature normalization and it is used

before training most ML algorithms since most models are

designed with the assumption that all features have comparable

scales

2) Nonparametric models: In nonparametric methods the

number of parameters depends on the training set. These

methods keep a subset or the entirety of the training data

and use them during prediction. The most used approaches

are k-nearest neighbor models (see chapter IV in [17]) and

support vector machines (SVMs) (see chapter VII in [16] and

chapter XIV in [20]). Both can be used for regression and

classiﬁcation problems.

In the case of k-nearest neighbor methods, all training

data samples are stored (training phase). During prediction,

the k-nearest samples to the new input value are retrieved.

For classiﬁcation problem, a voting mechanism is used; for

regression problems, the mean or median of the k nearest

samples provides the prediction. To select the best value of k,

cross-validation [26] can be used. Depending on the dimension

of the training set, iterating through all samples to compute

the closest k neighbors might not be feasible. In this case, k-d

trees or locality-sensitive hash tables can be used to compute

the k-nearest neighbors.

In SVMs, basis functions are centered on training samples;

the training procedure selects a subset of the basis functions.

However, decision tree based models are a well-known exception.

The number of selected basis functions, and the number of

training samples that have to be stored, is typically much

smaller than the cardinality of the training dataset. SVMs

build a linear decision boundary with the largest possible

distance from the training samples. Only the closest points to

the separators, the support vectors, are stored. To determine

the parameters of SVMs, a nonlinear optimization problem

with a convex objective function has to be solved, for which

efﬁcient algorithms exist. An important feature of SVMs is

that by applying a kernel function they can embed data into a

higher dimensional space, in which data points can be linearly

separated. The kernel function measures the similarity between

two points in the input space; it is expressed as the inner

product of the input points mapped into a higher dimension

feature space in which data become linearly separable. The

simplest example is the linear kernel, in which the mapping

function is the identity function. However, provided that we

can express everything in terms of kernel evaluations, it is not

necessary to explicitly compute the mapping in the feature

space. Indeed, in the case of one of the most commonly used

kernel functions, the Gaussian kernel, the feature space has

inﬁnite dimensions.

B. Unsupervised learning

Social network analysis, genes clustering and market re-

search are among the most successful applications of unsu-

pervised learning methods.

In the case of unsupervised learning the training dataset

consists only of a set of input vectors x. While unsupervised

learning can address different tasks, clustering or cluster

analysis is the most common.

Clustering is the process of grouping data so that the intra-

cluster similarity is high, while the inter-cluster similarity

is low. The similarity is typically expressed as a distance

function, which depends on the type of data. There exists

a variety of clustering approaches. Here, we focus on two

algorithms, k-means and Gaussian mixture model as exam-

ples of partitioning approaches and model-based approaches,

respectively, given their wide area of applicability. The reader

is referred to [27] for a comprehensive overview of cluster

analysis.

k-means is perhaps the most well-known clustering algo-

rithm (see chapter X in [27]). It is an iterative algorithm

starting with an initial partition of the data into k clusters.

Then the centre of each cluster is computed and data points are

assigned to the cluster with the closest centre. The procedure

- centre computation and data assignment - is repeated until

the assignment does not change or a predeﬁned maximum

number of iterations is exceeded. Doing so, the algorithm may

terminate at a local optimum partition. Moreover, k-means is

well known to be sensitive to outliers. It is worth noting that

there exists ways to compute k automatically [26], and an

online version of the algorithm exists.

While k-means assigns each point uniquely to one cluster,

probabilistic approaches allow a soft assignment and provide

a measure of the uncertainty associated with the assign-

ment. Figure 3 shows the difference between k-means and

Communications Surveys & Tutorials

Fig. 3: Difference between k-means and Gaussian mixture

model clustering a given set of data samples.

a probabilistic Gaussian Mixture Model (GMM). GMM, a

linear superposition of Gaussian distributions, is one of the

most widely used probabilistic approaches to clustering. The

parameters of the model are the mixing coefﬁcient of each

Gaussian component, the mean and the covariance of each

Gaussian distribution. To maximize the log likelihood function

with respect to the parameters given a dataset, the expectation

maximization algorithm is used, since no closed form solution

exists in this case. The initialization of the parameters can be

done using k-means. In particular, the mean and covariance

of each Gaussian component can be initialized to sample

means and covariances of the cluster obtained by k-means,

and the mixing coefﬁcients can be set to the fraction of data

points assigned by k-means to each cluster. After initializing

the parameters and evaluating the initial value of the log

likelihood, the algorithm alternates between two steps. In the

expectation step, the current values of the parameters are

used to determine the “responsibility” of each component for

the observed data (i.e., the conditional probability of latent

variables given the dataset). The maximization step uses these

responsibilities to compute a maximum likelihood estimate of

the model’s parameters. Convergence is checked with respect

to the log likelihood function or the parameters.

C. Semi-supervised learning

Semi-supervised learning methods are a hybrid of the pre-

vious two introduced above, and address problems in which

most of the training samples are unlabeled, while only a few

labeled data points are available. The obvious advantage is that

in many domains a wealth of unlabeled data points is readily

available. Semi-supervised learning is used for the same type

of applications as supervised learning. It is particularly useful

when labeled data points are not so common or too expensive

to obtain and the use of available unlabeled data can improve

performance.

Self-training is the oldest form of semi-supervised learning

[28]. It is an iterative process; during the ﬁrst stage only la-

beled data points are used by a supervised learning algorithm.

Then, at each step, some of the unlabeled points are labeled

according to the prediction resulting for the trained decision

function and these points are used along with the original

labeled data to retrain using the same supervised learning

algorithm. This procedure is shown in Fig. 4.

Since the introduction of self-training, the idea of using la-

beled and unlabeled data has resulted in many semi-supervised

Fig. 4: Sample step of the self-training mechanism, where an

unlabeled point is matched against labeled data to become part

of the labeled data set.

learning algorithms. According to the classiﬁcation proposed

in [28], semi-supervised learning techniques can be organized

in four classes: i) methods based on generative models

; ii)

methods based on the assumption that the decision boundary

should lie in a low-density region; iii) graph-based methods;

iv) two-step methods (ﬁrst an unsupervised learning step to

change the data representation or construct a new kernel; then

a supervised learning step based on the new representation or

kernel).

D. Reinforcement Learning

Reinforcement Learning (RL) is used, in general, to address

applications such as robotics, ﬁnance (investment decisions),

inventory management, where the goal is to learn a policy, i.e.,

a mapping between states of the environment into actions to

be performed, while directly interacting with the environment.

The RL paradigm allows agents to learn by exploring the

available actions and reﬁning their behavior using only an

evaluative feedback, referred to as the reward. The agent’s

goal is to maximize its long-term performance. Hence, the

agent does not just take into account the immediate reward,

but it evaluates the consequences of its actions on the future.

Delayed reward and trial-and-error constitute the two most

signiﬁcant features of RL.

RL is usually performed in the context of Markov deci-

sion processes (MDP). The agent’s perception at time k is

represented as a state s

∈ S, where S is the ﬁnite set of

environment states. The agent interacts with the environment

by performing actions. At time k the agent selects an action

∈ A, where A is the ﬁnite set of actions of the agent,

which could trigger a transition to a new state. The agent will

Generative methods estimate the joint distribution of the input and

output variables. From the joint distribution one can obtain the conditional

distribution p(y|x), which is then used to predict the output values in

correspondence to new input values. Generative methods can exploit both

labeled and unlabeled data.

An Overview on Application of Machine Learning Techniques in Optical Networks

Figures

Citations

Data Mining - Concepts and Techniques.

A Very Brief Introduction to Machine Learning With Applications to Communication Systems

A Survey of Multi-Access Edge Computing in 5G and Beyond: Fundamentals, Technology Integration, and State-of-the-Art

An Optical Communication's Perspective on Machine Learning and Its Applications

Machine learning for network automation: overview, architecture, and applications [Invited Tutorial]

References

Reinforcement Learning: An Introduction

Dropout: a simple way to prevent neural networks from overfitting

Data Mining: Concepts and Techniques

Pattern Recognition and Machine Learning

The Elements of Statistical Learning

Related Papers (5)

An Optical Communication's Perspective on Machine Learning and Its Applications

Artificial intelligence (AI) methods in optical networks: A comprehensive survey

Machine learning for network automation: overview, architecture, and applications [Invited Tutorial]

Elastic optical networking: a new dawn for the optical layer?

Deep learning

Frequently Asked Questions (16)

Q1. What are the common types of parametric models?

Q2. What are the contributions mentioned in the paper "An overview on application of machine learning techniques in optical networks" ?

Q3. What future works have the authors mentioned in the paper "An overview on application of machine learning techniques in optical networks" ?

Q4. What are the successful applications of unsupervised learning methods?

Q5. What can be used to extract common traffic patterns in different portions of the network?

Q6. What is the way to use a limited dataset?

Q7. What is the importance of performance monitoring?

Q8. What can be used to reduce the amount of monitors to deploy in the system?

Q9. What is the advantage of manually providing features to the algorithm?

Q10. What is the common method of calculating the cost of a path?

Q11. How many instances of the Bayesian classifier were misclassified?

Q12. What is the main advantage of semi-supervised learning?

Q13. What are the secondary features of the FEELING algorithm?

Q14. What is the first reference to compare the performance of unsupervised clustering algorithms?

Q15. What is the class of the signal that is suffering from a filter-related failure?

Q16. What is the trade-off between database size and computational time?