scispace - formally typeset
Open AccessJournal ArticleDOI

Fault Detection and Isolation in Industrial Processes Using Deep Learning Approaches

Reads0
Chats0
TLDR
A novel approach for automated Fault Detection and Isolation (FDI) based on deep learning that can successfully diagnose and locate multiple classes of faults under real-time working conditions is presented and is shown to outperform other established FDI methods.
Abstract
Automated fault detection is an important part of a quality control system. It has the potential to increase the overall quality of monitored products and processes. The fault detection of automotive instrument cluster systems in computer-based manufacturing assembly lines is currently limited to simple boundary checking. The analysis of more complex nonlinear signals is performed manually by trained operators, whose knowledge is used to supervise quality checking and manual detection of faults. We present a novel approach for automated Fault Detection and Isolation (FDI) based on deep learning. The approach was tested on data generated by computer-based manufacturing systems equipped with local and remote sensing devices. The results show that the approach models the different spatial/temporal patterns found in the data. The approach can successfully diagnose and locate multiple classes of faults under real-time working conditions. The proposed method is shown to outperform other established FDI methods.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTION ON INDUSTRIAL INFORMATICS
Rahat Iqbal
1
, Tomasz Maniak
2
, Faiyaz Doctor
3
, Charalampos Karyotis
4
1
Institute of Future Transport and Cities (IFTC), Coventry University, UK (r.iqbal@coventry.ac.uk).
2
Interactive Coventry Ltd, UK (tomasz.maniak@interactivecoventry.com).
3
School of Computer Science and Electronic Engineering, University of Essex, UK (fdocto@essex.ac.uk)
4
Interactive Coventry Ltd, UK (charalampos.karyotis@interactivecoventry.com)
AbstractAutomated fault detection is an important part of
a quality control system. It has the potential to increase the
overall quality of monitored products and processes. The fault
detection of automotive instrument cluster systems in computer-
based manufacturing assembly lines is currently limited to
simple boundary checking. The analysis of more complex non-
linear signals is performed manually by trained operators,
whose knowledge is used to supervise quality checking and
manual detection of faults. We present a novel approach for
automated Fault Detection and Isolation (FDI) based on deep
learning. The approach was tested on data generated by
computer-based manufacturing systems equipped with local and
remote sensing devices. The results show that the approach
models the different spatial/temporal patterns found in the data.
The approach can successfully diagnose and locate multiple
classes of faults under real-time working conditions. The
proposed method is shown to outperform other established FDI
methods.
Index Terms- Deep learning, Artificial Neural Networks
(ANNs), Computer aided manufacturing, Fault detection,
Machine learning, Manufacturing automation.
I. INTRODUCTION
The development of fault detection systems for complex
real-world industrial processes is difficult and poses many
challenges [1]. Modern computer-based manufacturing
systems consist of many manufacturing cells performing a
range of assembly operations and functional tests. The cells
are controlled by computer software supervising a given
production process many of which are custom built [2].
Manuscript received February 4, 2019; accepted February 23,
2019. Paper no. TII-19-0392 (Corresponding author: Rahat Iqbal)
R.Iqbal is with the Institute of Future Transport and Cities (IFTC),
Coventry University, UK (e-mail: r.iqbal@coventry.ac.uk).
T. Maniak was with Nippon Seiki (UK-NSI), UK. He is now with
Interactive Coventry Ltd, UK (e-mail:
tomasz.maniak@interactivecoventry.com).
F. Doctor is with the School of Computer Science and Electronic
Engineering, University of Essex, UK (e-mail: fdocto@essex.ac.uk)
C. Karyotis is with Interactive Coventry Ltd, UK (e-email:
charalampos.karyotis@interactivecoventry.com)
For computers assigned to the supervision of
manufacturing plants, one of the most important tasks is to
detect and diagnose product faults. The first step in this task
is to acquire the data necessary for process analysis. The
earliest inspection systems utilised a small number of data
generating processes and sensing elements. This resulted in
only a limited amount of data which could be analysed by
engineers for the fault identification process, a more
methodical approach supported by structured data analysis
was lacking.
To this day, the only forms of fault detection used in many
manufacturing plants are those based on limit checking [3]. In
such a case minimal and maximal values, called thresholds,
are specified for a given characteristic in the manufacturing
process for a product. A normal operational state is when the
value of a feature is within these specified limits. Although
simple, robust and reliable, this method is slow to react to
changes of a given characteristic of the data and fails to
identify complex failures, which can only be identified by
looking at the correlations between features. Another problem
with this approach is the challenge of specifying the threshold
values for a given characteristic [4].
To resolve the above problem, most manufacturing
companies have historically adopted a technique called
Statistical Process Control (SPC) that was developed in 1920s
by Walter Shewhart. SPC is a set of different methods to
understand, monitor and improve process performance over
time [5]. The most apparent limitation of SPC methods is the
fact that they are concerned mainly with one input at a certain
point in time [6] and ignore the spatial/temporal correlation
which could otherwise help to detect and isolate potential
faults. It is therefore crucial to investigate and propose new
fault detection and isolation techniques based on more
sophisticated modelling capabilities of methods, such as
advanced intelligent data analysis and machine learning
approaches.
Modern computer-based manufacturing systems produce a
large volume of data generated by sensor and control signals
Fault Detection and Isolation in Industrial
Processes Using Deep Learning Approaches
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TII.2019.2902274
Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

2
during the manufacturing process. The data contains valuable
information about the state of the system and its potential
faults. In such systems, the available automated solutions to
assist engineers with fault detection are limited and only
consider one measured characteristic of a manufacturing
process at a time. This creates a simplified static image of a
complex dynamic system. State of the art tools can consider
multiple characteristics but disregard the temporal aspect of
the signal, creating a static model of the system. More
significantly, these tools ignore various correlations between
multiple characteristics, which dynamically change over time
and provide additional information about a fault occurrence.
Another problem is the limited automation of the fault
classification and inference, making it necessary to train staff
/ engineers to use the tools effectively. This results in
additional cost and places constraints on the flexible use of
human resources. Likewise, these methods cannot detect
faults at an early stage, respond to constantly changing fault
sources or learn new fault types from multi-type spatial-
temporal production data. Ignoring the above problems leads
to extensive production down-time and waste of resources,
unsafe machinery, poor production yield and suboptimal
human resource allocation.
The rest of the paper is organised as follows. Section II
provides an overview of existing FDI methods used in
manufacturing environment. Section III discusses the
proposed approach. Section IV discusses the implementation
and Section V describes the evaluation of the proposed
approach in a real-world setting. Finally, in Section VI
conclusions and future work are discussed.
II. EXISTING FAULT DETECTION METHODS
The importance of using FDI has been first recognised in
safety critical areas such as flight control, railways, medicine,
nuclear-plants and many more. The need for fault detection is
also more relevant nowadays due to the new application of
computational intelligence for data analysis performed by
real-time systems. This is especially true in real-time energy
efficient management of distributed resources [7], real-time
control and mobile crowdsensing [8] (both a vital part of
smart and connected communities) and the protection of
sensitive information collected by wearable sensors [9].
A conventional method for ensuring the fault free
operation of manufacturing production lines is to periodically
check the process variables, which include software
configuration validation, sensor validation, measurement
device calibration and preventive maintenance [10]. This
method is widely popularised in industry and used for
preventing and detecting abrupt failures. However, it is not
able to detect failures that can only be detected by continuous
assessment of variables, such as incipient process faults,
which are especially relevant in the manufacture of
microelectronic components. Owing to an increase in the
process complexity and sophistication of production
equipment, this method is no longer cost effective and
impractical to implement on large scale computer-based
production lines [11].
Fault detection methods can be mostly categorised into two
main groups: hardware redundancy and analytical
redundancy [12]. The main idea behind redundancy-based
methods is to generate a residual signal which represents a
difference between the normal behaviour of a system and its
actual measured behaviour. By considering this comparison,
a fault occurrence can be detected. Hardware redundancy is
based on creating the residual signal by using hardware [13].
The general idea behind this approach is to measure a given
process variable with more than one sensor and detect a fault
by performing consistency checks on the different sensors.
Analytical redundancy is based on creating the residual
signal from a mathematical model which can be developed by
analysing either the actual measurements, or the underlying
physics of the process. There are three main approaches to
analytical redundancy: model-based methods, data driven
methods, and knowledge based expert systems [14]. They are
all categorised based on a priori knowledge, which is
required for the model. Model based methods require a good
mathematical model of the monitored system which can be
acquired using parameter estimators, parity relations or state
observers such as Luenberger observers and Kalman Filters
[12]. Data driven methods, instead of creating a mathematical
model, use historical data recorded by sensors to monitor a
given system. The data is used to describe and model the
normal behaviour of that system, which is subsequently used
to generate a residual signal. The data driven methods can be
used only if the given system can generate enough data from
the sensors [15]. Finally, a knowledge based expert system
uses domain knowledge which is very often described as a set
of rules [16].
A different approach for the classification of fault detection
methods is to consider the different methods from the
perspective of the variables that are used to detect a fault
[17]. In this context, methods based on analysing single
signals or multiple signals and models can be considered. The
single signal methods consider one process variable in
isolation from other variables. They include methods based
on limit and trend checking such as fixed threshold, adaptive
threshold or change detection methods [17]. Thresholds are
set to detect whether a given characteristic of the system falls
outside the acceptable minimal and maximal values. This
method, whilst simple and reliable is slow to react to changes
in the value of a characteristic over time and is incapable of
identifying complex failures. To overcome this problem a set
of methods used to analyse multiple signals have been used.
Those are: principle component analysis (PCA), parameter
estimators, artificial neural networks, state observers, parity
equations and state estimators [15]. These methods identify
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TII.2019.2902274
Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

3
faults by analysing the correlations between multiple system
variables. Finally, a set of temporal methods for both single
and multiple signal variables have been used, which have
provided the tools necessary to identify faults in high
frequency signals. These methods are necessary for dynamic
systems where a fault can only be identified by looking at the
way signals change over time. Examples of these methods
are: spectrum analysis, wavelet analysis and analysis of
correlations [18].
Many fault detection systems used in computer-based
manufacturing environments are rule based expert systems.
An expert system is a specialised system that solves problems
in a domain of expertise. Such systems simulate human
reasoning for a problem domain; perform reasoning over a set
of previously defined logical statements and then solve the
problem using heuristic knowledge [19]. An expert system is
a computer program consisting of a large database of if-then-
else rules which mimics the cognitive behaviour and
knowledge of human experts [33]. The main advantages of
developing such systems include: ease of implementation and
development, ease of fault interpretation, transparent logical
reasoning, and the ability to deal with noise and uncertainty
in the data. Because of the large variety of processes to which
expert systems are applied, there is a significant number of
papers and scientific literature devoted to their
implementation [20] [21] [22]. Expert systems require
significant human effort and experience to precisely describe
the heuristic knowledge of a monitored process. Another
limitation in using this method is that the database of
symptoms should be modified each time a new rule is added.
Finally, another problem is their rigid structure as they lack
the ability to fully express the real-world understanding of the
underlying process [23]. This is the reason why they fail to
generalise and adapt when a new condition is encountered
that is not explicitly defined in the knowledge base. This kind
of knowledge is called ‘shallow’ since it lacks the deep
understanding of the underlying physics of the system [23].
That is why expert systems are very often impractical for
systems that have many variables, or systems with significant
variability.
Each manufacturing process is subject to uncertainty and
random disturbances. This uncertainty comes from many
sources, including measurement uncertainty, human
performance or part variation. That is why sometimes a
problem of fault detection needs to be formulated in the
context of stochastic systems. These systems are defined
using a probability distribution, which corresponds to the
state of the system under normal working conditions. Any
change in that probability distribution can be an indicator of a
fault occurrence in the monitored system. In real-time
systems, observations are analysed sequentially, and fault
occurrence is identified based on the observations over a
particular time period [20]. By monitoring the variable and
considering it as a function of time a fault occurrence could
be identified and a corrective action introduced. This action
would return the system to its normal operation by resetting
the variable to its desired value. Although statistical process
control (SPC) charts are still widely used in manufacturing
process control the charting methods have not kept up with
the progress in data acquisition. Another problem with SPC
analysis is the fact that it is slow to respond to subtle changes
in monitored variables. Finally, SPC charts are generally
concerned with the input of one variable in isolation,
therefore if a given variable is dependent on other variables
the charts can be misleading.
III. PROPOSED APPROACH
To address the problem of FDI, we have proposed a novel
universal biologically-inspired generative-modelling
approach as shown in Fig. 1. The approach is designed to
mimic the natural fault detection functions that have evolved
and developed in the mammalian brain and is inspired by a
theory proposed by Jeff Hawkins [24].
The proposed approach is capable of modelling complex
correlations between input values and the temporal
consequences between different input states of the system
(phrased in this paper as spatial-temporal correlations) in high
volumes of data. Consequently, the approach predicts the
future states of a system based on its previous behaviour
while taking into account significant noise in the data. The
approach can automatically learn complex real-world patterns
to identify abnormal conditions. This gives it a competitive
advantage over rival methods where substantive human
supervision is required. Due to its unique capability for
handling data invariances, the approach is able to process a
broad range of data types to discover patterns, which are too
complex for humans or standard machine learning techniques
to identify.
The main elements of the proposed approach are as
follows, see Fig 1. Initially data produced from several
hardware / software sources (data layer) is transformed into
individual signals. Those signals (input layer) comprise of
various data types and represent a measured physical
characteristic of a monitored process. Depending on the type
of signal they are encoded in one of the following ways. This
encoding is performed in the data transformation layer as
follows: for signals representing a categorical entity the
values are encoded using one-hot-encoding i.e., the input
space M
i
=
k
is mapped to k binary features encoding that
input. Where a signal is continuous, a range of that signal is
considered and divided into a fixed number of bins depending
on the mean and standard deviation of the signal. The input
space is then mapped to k-binary features encoding the bin
that the value falls into. Finally, binary signals are copied
without the need to use a dedicated encoder. During an
operation of the manufacturing system, at each time t the
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TII.2019.2902274
Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

4
measured physical characteristics 󰇝



󰇞
(where P is a set of all measured physical characteristics for a
given manufacturing system) are encoded and concatenated
to create a sparse binary input vector. The input vectors
(where is a set of all possible input vectors and
󰇟󰇠
) are generated during that operation change
dynamically over time and creates a sequence of input
vectors S = 󰇛
󰇜

. Here d denotes the elements in an input
vector
and the number of measured physical characteristics
are n. For typical complex manufacturing systems, the
following is true n > 150 and depending on the type of the
physical characteristic, the number of elements d > 1000. A
problem with the current representation of
is that although
the individual elements of that vector are correlated there is
no mechanism which would capture those correlations. To
solve the above problem the method uses a set of Deep Auto-
Encoders (DAEs) [25] to learn a vector space embedding
, where A
. By using an auto-encoder, a mapping
 is achieved which represents
in a continuous
vector space where correlated input vectors are mapped to
nearby points.
The discovery of correlations between individual inputs is
determined by the spatial transformation of input space into a
transformed vector-space embedding, by using the feature
encoder. The continuous space of vector embedding cannot
be directly used to infer a current state of a monitored system,
instead, hierarchical clustering is performed on the
transformed features derived from the DAE, to extract the
possible states for the modelled system. The process of
mapping input space into vector-space embedding and
performing hierarchical clustering using the distance between
individual input vectors is referred to as spatial pooling. The
main purpose of this operation is to reduce the input space to
a fixed number of the most probable states of the underlying
system being modelled. Temporal sequence learning is used
to train the model on the different temporal-consequential
relations between probable states of the system. This is used
to infer the next predicted state of the inputs as compared to
the actual behaviour of the system, which is termed as
temporal inference. The spatial pooling and temporal-
inference elements of the approach combine to produce a
spatial-temporal model of the operational behaviour of the
system being modelled. The model can then be used in
combination with prediction and classification approaches
such as standard Artificial Neural Networks (ANNs), to
predict future behaviour of the system under different
operational conditions and detect deviations and changes in
behaviour that might signify an underlying unknown effect or
problem. The prediction model can further provide inputs to
the optimisation framework or an interpretable fuzzy decision
model that is able to optimise processes based on quantitative
and qualitative inputs from various sources. This approach
can therefore be used to determine behaviour changes and
deviations of complex systems. The output of the model is
transferred for further control of the manufacturing
production system see application layer part of Fig. 1.
IV. IMPLEMENTATION
The approach has been implemented using the Python
programming language. The implementation of the proposed
approach makes use of the Theano library which benefits
from dynamic C code generation, stable and fast optimisation
algorithms, as well as integration with the mathematical
NumPy [29] library.
The implementation is divided into a learning module and
a real-time module. The learning module performs
continuous learning of the parameters for both spatial pooling
and temporal inference and uploads them into a database,
which is shared with the real-time module. The real-time
module performs real-time FDI with the use of parameters
stored in the shared database. The module does not perform
any learning and is concerned only with the execution of the
Fig. 1. Proposed approach
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TII.2019.2902274
Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

5
model with previously learned parameters. This operation of
splitting the learning process from the actual execution
process is necessary to ensure real-time operation which
would otherwise be unattainable. The execution of the
learning module is performed on a dedicated server, with the
deployed module running as a service. Initially the module
acquires several data samples from an SPC database. The
database contains a log of all signals generated by the
execution of the manufacturing process as they unfold in
time. This data is stored in a database as textual information
and loaded by the learning module to computer memory as a
list of string objects. Each element of this list represents the
current values for all manufacturing signals for a given time
frame
i
f
. The elements in that list are first fragmented into
separate signals and based on their type individually encoded
into sparsely distributed representations (SDR). The SDR
encodings for each signal at time
t
are combined into a
binary array to create an input vector. This process is
repeated for the remaining elements of that list and results in
a new list of binary input vectors being created and
subsequently used as an input to the DAE. An optimisation
algorithm is executed to adjust parameters of the DAE model
thus minimising the error on the input reconstructions. The
learned parameters of the model are saved and reused during
the next iteration of the algorithm.
The data generated by the DAE is subsequently processed
by the hierarchical-clustering module, which extracts
meaningful information about the data structure of the feature
space. The dendrogram created by the hierarchical-clustering
process is cut at a certain height to partition the feature space
into multiple regions. For each region, a centroid is assigned
and saved to a dictionary. This dictionary is used to map
signals for each time frame
i
f
into a state
i
s
where .
The output of this operation creates a list of temporal
transitions between the different states. The list can therefore
be considered to describe state representations of an
underlying Markov process. The transition probabilities
between the different states
i
s
are discovered and used to
populate the transition matrix of an n-order Markov model.
To reduce the memory requirements necessary to store the
transition matrix it is implemented as a dictionary. The
entries of the dictionary are saved in the database and used by
the real-time module to predict future states of the monitored
system. This operation concludes the first iteration of the
algorithm. The entire process is repeated and reinitialised
with an acquisition of a new set of data samples from the SPC
database. This process is presented in Fig. 2.
Fig. 2 Learning module execution diagram.
The real-time module is integrated with custom-built
Industrial Test, Control and Calibration (ITCC) software. It
starts its execution by downloading the DAE, centroid
dictionary and transition dictionary parameters from the
shared database. The real-time operation of the
manufacturing process, generating spatial-temporal signals is
logged and based on the data type of the signal, transformed
into the correct SDR representation. The encoded data is
subsequently forward-propagated through the DAE structure
(initialised with the parameters acquired from the shared
database). There is no learning performed in the real-time
module. The signals are processed by the DEA and as a
consequence transformed into a feature space used as input to
the centroid dictionary from where state information is
acquired. The inference of the state value is based on the
shortest distance between the feature vector and a given
centroid. The last
states are saved at any given time and
used with the transition dictionary of the
- order Markov
chain to infer the future state of the system. The predicted
state is transformed back to a feature space and saved to the
computer memory. During the next iteration, the predicted
feature vector is compared with an actual feature vector
generated by the manufacturing process. The residual vector
generated by this process is used as an input to a previously
trained MLP classifier, which indicates a fault occurrence in
the system. This process is described in algorithm 1.
SPC data generated
by manufacturing
process
SPC Data
Encoder
AC 1
AC 2
DEA
Hierarchical
Clustering
N-order Markov
chain
Database
Stochastic
Gradient
Descent
Stochastic
Gradient
Descent
Dictionary of state transitions
Dictionary of cluster centroids
Learned weights of DEA
Dictionary of SPC test data
paramters
Sparse binary vectors
Stochastic
Gradient
Descent
Executed N number of new data
samples
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TII.2019.2902274
Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

Citations
More filters
Journal ArticleDOI

Tackling Faults in the Industry 4.0 Era-A Survey of Machine-Learning Solutions and Key Aspects.

TL;DR: A detailed overview of ML-based human–machine interaction techniques is provided, allowing humans to be in-the-loop of the manufacturing processes in a symbiotic manner with minimal errors.
Journal ArticleDOI

Machine Learning for industrial applications: A comprehensive literature review

TL;DR: This paper deals with industrial applications of ML techniques, intending to clarify the real potentialities, as well as potential flaws, of ML algorithms applied to operation management, and a comprehensive review is presented and organized in a way that should facilitate the orientation of practitioners in this field.
Journal ArticleDOI

A Trustworthy Privacy Preserving Framework for Machine Learning in Industrial IoT Systems

TL;DR: This article introduces a framework named PriModChain that enforces privacy and trustworthiness on IIoT data by amalgamating differential privacy, federated ML, Ethereum blockchain, and smart contracts.
Journal ArticleDOI

Whole Process Monitoring Based on Unstable Neuron Output Information in Hidden Layers of Deep Belief Network

TL;DR: A novel method based on the unstable neurons in hidden layers is proposed to integrate the useful information for process monitoring, the Euclidean metric, the moving average filter, and the kernel density estimation technique are employed to provide an intuitionistic expression of the working state.
Journal ArticleDOI

Fault detection and identification of rolling element bearings with Attentive Dense CNN

TL;DR: This study proposes the Attentive Dense Convolutional Neural Network (ADCNN), a DL network, which considers the temporal coherence of the data samples, by the combination of Dense convolutional blocks with an attention mechanism.
References
More filters
Journal Article

Dropout: a simple way to prevent neural networks from overfitting

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Journal ArticleDOI

Reducing the Dimensionality of Data with Neural Networks

TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Posted Content

Improving neural networks by preventing co-adaptation of feature detectors

TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.
Proceedings Article

Greedy Layer-Wise Training of Deep Networks

TL;DR: These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What are the contributions mentioned in the paper "Ieee transaction on industrial informatics" ?

The authors present a novel approach for automated Fault Detection and Isolation ( FDI ) based on deep learning. 

An effective solution to the overfitting problem is the use of a technique called Dropout where the method of setting a random output of a given layer to 0, based on a given probability, was implemented [31]. 

The data used to train the learning module was composed of 15,000 samples, divided between training (70%), validation (15%) and test (15%) datasets. 

He has over 15 years’ experience in the design and implementation of intelligencesystems for real world applications with projects funded by Innovate UK, Harvard University and Newton Fund. 

An interesting alternative for future research work would be the investigation into the use of recurrent neural-networks (RNN) to improve temporal predictions of the proposed model, especially through the use of long/short term memory units. 

This is due to the problem of model over fitting [30] where a model fits the input data too closely and does not generalise well on the out of sample data. 

The following inputs were used: 327 test ids and corresponding test values (each measuring a different physical characteristic) and their test execution, together with 6 analogue and 91 digital signals. 

Fig. 3 clearly shows that there exists a point based on the number of epochs for which the model is trained, after which the reconstruction error for training samples goes down, but the reconstruction error on the validation set increases. 

The performance of the proposed method was analysed and compared with rival methods previously applied to FDI, namely, template based, rule based and Bayesian based methods, as shown in Table V.