What is the way to solve the overfitting problem?

An effective solution to the overfitting problem is the use of a technique called Dropout where the method of setting a random output of a given layer to 0, based on a given probability, was implemented [31].

How many samples were used to train the learning module?

The data used to train the learning module was composed of 15,000 samples, divided between training (70%), validation (15%) and test (15%) datasets.

How long has he been working on intelligence systems?

He has over 15 years’ experience in the design and implementation of intelligencesystems for real world applications with projects funded by Innovate UK, Harvard University and Newton Fund.

What is the alternative for future research work?

An interesting alternative for future research work would be the investigation into the use of recurrent neural-networks (RNN) to improve temporal predictions of the proposed model, especially through the use of long/short term memory units.

Why is the error rate on input reconstructions so low?

This is due to the problem of model over fitting [30] where a model fits the input data too closely and does not generalise well on the out of sample data.

What inputs were used to train the learning module?

The following inputs were used: 327 test ids and corresponding test values (each measuring a different physical characteristic) and their test execution, together with 6 analogue and 91 digital signals.

What is the effect of the momentum technique on the reconstruction of input reconstructions?

Fig. 3 clearly shows that there exists a point based on the number of epochs for which the model is trained, after which the reconstruction error for training samples goes down, but the reconstruction error on the validation set increases.

What is the performance of the proposed method?

The performance of the proposed method was analysed and compared with rival methods previously applied to FDI, namely, template based, rule based and Bayesian based methods, as shown in Table V.

(Open Access) Fault Detection and Isolation in Industrial Processes Using Deep Learning Approaches (2019) | Rahat Iqbal

Q: What are the contributions mentioned in the paper "Ieee transaction on industrial informatics" ?

The authors present a novel approach for automated Fault Detection and Isolation ( FDI ) based on deep learning.

IEEE TRANSACTION ON INDUSTRIAL INFORMATICS

Rahat Iqbal

, Tomasz Maniak

, Faiyaz Doctor

, Charalampos Karyotis

Institute of Future Transport and Cities (IFTC), Coventry University, UK (r.iqbal@coventry.ac.uk).

Interactive Coventry Ltd, UK (tomasz.maniak@interactivecoventry.com).

School of Computer Science and Electronic Engineering, University of Essex, UK (fdocto@essex.ac.uk)

Interactive Coventry Ltd, UK (charalampos.karyotis@interactivecoventry.com)

Abstract—Automated fault detection is an important part of

a quality control system. It has the potential to increase the

overall quality of monitored products and processes. The fault

detection of automotive instrument cluster systems in computer-

based manufacturing assembly lines is currently limited to

simple boundary checking. The analysis of more complex non-

linear signals is performed manually by trained operators,

whose knowledge is used to supervise quality checking and

manual detection of faults. We present a novel approach for

automated Fault Detection and Isolation (FDI) based on deep

learning. The approach was tested on data generated by

computer-based manufacturing systems equipped with local and

remote sensing devices. The results show that the approach

models the different spatial/temporal patterns found in the data.

The approach can successfully diagnose and locate multiple

classes of faults under real-time working conditions. The

proposed method is shown to outperform other established FDI

methods.

Index Terms- Deep learning, Artificial Neural Networks

(ANNs), Computer aided manufacturing, Fault detection,

Machine learning, Manufacturing automation.

I. INTRODUCTION

The development of fault detection systems for complex

real-world industrial processes is difficult and poses many

challenges [1]. Modern computer-based manufacturing

systems consist of many manufacturing cells performing a

range of assembly operations and functional tests. The cells

are controlled by computer software supervising a given

production process many of which are custom built [2].

Manuscript received February 4, 2019; accepted February 23,

2019. Paper no. TII-19-0392 (Corresponding author: Rahat Iqbal)

R.Iqbal is with the Institute of Future Transport and Cities (IFTC),

Coventry University, UK (e-mail: r.iqbal@coventry.ac.uk).

T. Maniak was with Nippon Seiki (UK-NSI), UK. He is now with

Interactive Coventry Ltd, UK (e-mail:

tomasz.maniak@interactivecoventry.com).

F. Doctor is with the School of Computer Science and Electronic

Engineering, University of Essex, UK (e-mail: fdocto@essex.ac.uk)

C. Karyotis is with Interactive Coventry Ltd, UK (e-email:

charalampos.karyotis@interactivecoventry.com)

For computers assigned to the supervision of

manufacturing plants, one of the most important tasks is to

detect and diagnose product faults. The first step in this task

is to acquire the data necessary for process analysis. The

earliest inspection systems utilised a small number of data

generating processes and sensing elements. This resulted in

only a limited amount of data which could be analysed by

engineers for the fault identification process, a more

methodical approach supported by structured data analysis

was lacking.

To this day, the only forms of fault detection used in many

manufacturing plants are those based on limit checking [3]. In

such a case minimal and maximal values, called thresholds,

are specified for a given characteristic in the manufacturing

process for a product. A normal operational state is when the

value of a feature is within these specified limits. Although

simple, robust and reliable, this method is slow to react to

changes of a given characteristic of the data and fails to

identify complex failures, which can only be identified by

looking at the correlations between features. Another problem

with this approach is the challenge of specifying the threshold

values for a given characteristic [4].

To resolve the above problem, most manufacturing

companies have historically adopted a technique called

Statistical Process Control (SPC) that was developed in 1920s

by Walter Shewhart. SPC is a set of different methods to

understand, monitor and improve process performance over

time [5]. The most apparent limitation of SPC methods is the

fact that they are concerned mainly with one input at a certain

point in time [6] and ignore the spatial/temporal correlation

which could otherwise help to detect and isolate potential

faults. It is therefore crucial to investigate and propose new

fault detection and isolation techniques based on more

sophisticated modelling capabilities of methods, such as

advanced intelligent data analysis and machine learning

approaches.

Modern computer-based manufacturing systems produce a

large volume of data generated by sensor and control signals

Fault Detection and Isolation in Industrial

Processes Using Deep Learning Approaches

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

The final version of record is available at http://dx.doi.org/10.1109/TII.2019.2902274

during the manufacturing process. The data contains valuable

information about the state of the system and its potential

faults. In such systems, the available automated solutions to

assist engineers with fault detection are limited and only

consider one measured characteristic of a manufacturing

process at a time. This creates a simplified static image of a

complex dynamic system. State of the art tools can consider

multiple characteristics but disregard the temporal aspect of

the signal, creating a static model of the system. More

significantly, these tools ignore various correlations between

multiple characteristics, which dynamically change over time

and provide additional information about a fault occurrence.

Another problem is the limited automation of the fault

classification and inference, making it necessary to train staff

/ engineers to use the tools effectively. This results in

additional cost and places constraints on the flexible use of

human resources. Likewise, these methods cannot detect

faults at an early stage, respond to constantly changing fault

sources or learn new fault types from multi-type spatial-

temporal production data. Ignoring the above problems leads

to extensive production down-time and waste of resources,

unsafe machinery, poor production yield and suboptimal

human resource allocation.

The rest of the paper is organised as follows. Section II

provides an overview of existing FDI methods used in

manufacturing environment. Section III discusses the

proposed approach. Section IV discusses the implementation

and Section V describes the evaluation of the proposed

approach in a real-world setting. Finally, in Section VI

conclusions and future work are discussed.

II. EXISTING FAULT DETECTION METHODS

The importance of using FDI has been first recognised in

safety critical areas such as flight control, railways, medicine,

nuclear-plants and many more. The need for fault detection is

also more relevant nowadays due to the new application of

computational intelligence for data analysis performed by

real-time systems. This is especially true in real-time energy

efficient management of distributed resources [7], real-time

control and mobile crowdsensing [8] (both a vital part of

smart and connected communities) and the protection of

sensitive information collected by wearable sensors [9].

A conventional method for ensuring the fault free

operation of manufacturing production lines is to periodically

check the process variables, which include software

configuration validation, sensor validation, measurement

device calibration and preventive maintenance [10]. This

method is widely popularised in industry and used for

preventing and detecting abrupt failures. However, it is not

able to detect failures that can only be detected by continuous

assessment of variables, such as incipient process faults,

which are especially relevant in the manufacture of

microelectronic components. Owing to an increase in the

process complexity and sophistication of production

equipment, this method is no longer cost effective and

impractical to implement on large scale computer-based

production lines [11].

Fault detection methods can be mostly categorised into two

main groups: hardware redundancy and analytical

redundancy [12]. The main idea behind redundancy-based

methods is to generate a residual signal which represents a

difference between the normal behaviour of a system and its

actual measured behaviour. By considering this comparison,

a fault occurrence can be detected. Hardware redundancy is

based on creating the residual signal by using hardware [13].

The general idea behind this approach is to measure a given

process variable with more than one sensor and detect a fault

by performing consistency checks on the different sensors.

Analytical redundancy is based on creating the residual

signal from a mathematical model which can be developed by

analysing either the actual measurements, or the underlying

physics of the process. There are three main approaches to

analytical redundancy: model-based methods, data driven

methods, and knowledge based expert systems [14]. They are

all categorised based on a priori knowledge, which is

required for the model. Model based methods require a good

mathematical model of the monitored system which can be

acquired using parameter estimators, parity relations or state

observers such as Luenberger observers and Kalman Filters

[12]. Data driven methods, instead of creating a mathematical

model, use historical data recorded by sensors to monitor a

given system. The data is used to describe and model the

normal behaviour of that system, which is subsequently used

to generate a residual signal. The data driven methods can be

used only if the given system can generate enough data from

the sensors [15]. Finally, a knowledge based expert system

uses domain knowledge which is very often described as a set

of rules [16].

A different approach for the classification of fault detection

methods is to consider the different methods from the

perspective of the variables that are used to detect a fault

[17]. In this context, methods based on analysing single

signals or multiple signals and models can be considered. The

single signal methods consider one process variable in

isolation from other variables. They include methods based

on limit and trend checking such as fixed threshold, adaptive

threshold or change detection methods [17]. Thresholds are

set to detect whether a given characteristic of the system falls

outside the acceptable minimal and maximal values. This

method, whilst simple and reliable is slow to react to changes

in the value of a characteristic over time and is incapable of

identifying complex failures. To overcome this problem a set

of methods used to analyse multiple signals have been used.

Those are: principle component analysis (PCA), parameter

estimators, artificial neural networks, state observers, parity

equations and state estimators [15]. These methods identify

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

The final version of record is available at http://dx.doi.org/10.1109/TII.2019.2902274

faults by analysing the correlations between multiple system

variables. Finally, a set of temporal methods for both single

and multiple signal variables have been used, which have

provided the tools necessary to identify faults in high

frequency signals. These methods are necessary for dynamic

systems where a fault can only be identified by looking at the

way signals change over time. Examples of these methods

are: spectrum analysis, wavelet analysis and analysis of

correlations [18].

Many fault detection systems used in computer-based

manufacturing environments are rule based expert systems.

An expert system is a specialised system that solves problems

in a domain of expertise. Such systems simulate human

reasoning for a problem domain; perform reasoning over a set

of previously defined logical statements and then solve the

problem using heuristic knowledge [19]. An expert system is

a computer program consisting of a large database of if-then-

else rules which mimics the cognitive behaviour and

knowledge of human experts [33]. The main advantages of

developing such systems include: ease of implementation and

development, ease of fault interpretation, transparent logical

reasoning, and the ability to deal with noise and uncertainty

in the data. Because of the large variety of processes to which

expert systems are applied, there is a significant number of

papers and scientific literature devoted to their

implementation [20] [21] [22]. Expert systems require

significant human effort and experience to precisely describe

the heuristic knowledge of a monitored process. Another

limitation in using this method is that the database of

symptoms should be modified each time a new rule is added.

Finally, another problem is their rigid structure as they lack

the ability to fully express the real-world understanding of the

underlying process [23]. This is the reason why they fail to

generalise and adapt when a new condition is encountered

that is not explicitly defined in the knowledge base. This kind

of knowledge is called ‘shallow’ since it lacks the deep

understanding of the underlying physics of the system [23].

That is why expert systems are very often impractical for

systems that have many variables, or systems with significant

variability.

Each manufacturing process is subject to uncertainty and

random disturbances. This uncertainty comes from many

sources, including measurement uncertainty, human

performance or part variation. That is why sometimes a

problem of fault detection needs to be formulated in the

context of stochastic systems. These systems are defined

using a probability distribution, which corresponds to the

state of the system under normal working conditions. Any

change in that probability distribution can be an indicator of a

fault occurrence in the monitored system. In real-time

systems, observations are analysed sequentially, and fault

occurrence is identified based on the observations over a

particular time period [20]. By monitoring the variable and

considering it as a function of time a fault occurrence could

be identified and a corrective action introduced. This action

would return the system to its normal operation by resetting

the variable to its desired value. Although statistical process

control (SPC) charts are still widely used in manufacturing

process control the charting methods have not kept up with

the progress in data acquisition. Another problem with SPC

analysis is the fact that it is slow to respond to subtle changes

in monitored variables. Finally, SPC charts are generally

concerned with the input of one variable in isolation,

therefore if a given variable is dependent on other variables

the charts can be misleading.

III. PROPOSED APPROACH

To address the problem of FDI, we have proposed a novel

universal biologically-inspired generative-modelling

approach as shown in Fig. 1. The approach is designed to

mimic the natural fault detection functions that have evolved

and developed in the mammalian brain and is inspired by a

theory proposed by Jeff Hawkins [24].

The proposed approach is capable of modelling complex

correlations between input values and the temporal

consequences between different input states of the system

(phrased in this paper as spatial-temporal correlations) in high

volumes of data. Consequently, the approach predicts the

future states of a system based on its previous behaviour

while taking into account significant noise in the data. The

approach can automatically learn complex real-world patterns

to identify abnormal conditions. This gives it a competitive

advantage over rival methods where substantive human

supervision is required. Due to its unique capability for

handling data invariances, the approach is able to process a

broad range of data types to discover patterns, which are too

complex for humans or standard machine learning techniques

to identify.

The main elements of the proposed approach are as

follows, see Fig 1. Initially data produced from several

hardware / software sources (data layer) is transformed into

individual signals. Those signals (input layer) comprise of

various data types and represent a measured physical

characteristic of a monitored process. Depending on the type

of signal they are encoded in one of the following ways. This

encoding is performed in the data transformation layer as

follows: for signals representing a categorical entity the

values are encoded using one-hot-encoding i.e., the input

space M

= ℤ

is mapped to k binary features encoding that

input. Where a signal is continuous, a range of that signal is

considered and divided into a fixed number of bins depending

on the mean and standard deviation of the signal. The input

space is then mapped to k-binary features encoding the bin

that the value falls into. Finally, binary signals are copied

without the need to use a dedicated encoder. During an

operation of the manufacturing system, at each time t the

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

The final version of record is available at http://dx.doi.org/10.1109/TII.2019.2902274

measured physical characteristics 󰇝



 



     



󰇞  

(where P is a set of all measured physical characteristics for a

given manufacturing system) are encoded and concatenated

to create a sparse binary input vector. The input vectors





 (where  is a set of all possible input vectors and 





󰇟󰇠



) are generated during that operation change

dynamically over time and creates a sequence of input

vectors S = 󰇛



󰇜





. Here d denotes the elements in an input

vector 



and the number of measured physical characteristics

are n. For typical complex manufacturing systems, the

following is true n > 150 and depending on the type of the

physical characteristic, the number of elements d > 1000. A

problem with the current representation of 



is that although

the individual elements of that vector are correlated there is

no mechanism which would capture those correlations. To

solve the above problem the method uses a set of Deep Auto-

Encoders (DAEs) [25] to learn a vector space embedding





 , where A  



. By using an auto-encoder, a mapping

  is achieved which represents 



in a continuous

vector space where correlated input vectors are mapped to

nearby points.

The discovery of correlations between individual inputs is

determined by the spatial transformation of input space into a

transformed vector-space embedding, by using the feature

encoder. The continuous space of vector embedding cannot

be directly used to infer a current state of a monitored system,

instead, hierarchical clustering is performed on the

transformed features derived from the DAE, to extract the

possible states for the modelled system. The process of

mapping input space into vector-space embedding and

performing hierarchical clustering using the distance between

individual input vectors is referred to as spatial pooling. The

main purpose of this operation is to reduce the input space to

a fixed number of the most probable states of the underlying

system being modelled. Temporal sequence learning is used

to train the model on the different temporal-consequential

relations between probable states of the system. This is used

to infer the next predicted state of the inputs as compared to

the actual behaviour of the system, which is termed as

temporal inference. The spatial pooling and temporal-

inference elements of the approach combine to produce a

spatial-temporal model of the operational behaviour of the

system being modelled. The model can then be used in

combination with prediction and classification approaches

such as standard Artificial Neural Networks (ANNs), to

predict future behaviour of the system under different

operational conditions and detect deviations and changes in

behaviour that might signify an underlying unknown effect or

problem. The prediction model can further provide inputs to

the optimisation framework or an interpretable fuzzy decision

model that is able to optimise processes based on quantitative

and qualitative inputs from various sources. This approach

can therefore be used to determine behaviour changes and

deviations of complex systems. The output of the model is

transferred for further control of the manufacturing

production system see application layer part of Fig. 1.

IV. IMPLEMENTATION

The approach has been implemented using the Python

programming language. The implementation of the proposed

approach makes use of the Theano library which benefits

from dynamic C code generation, stable and fast optimisation

algorithms, as well as integration with the mathematical

NumPy [29] library.

The implementation is divided into a learning module and

a real-time module. The learning module performs

continuous learning of the parameters for both spatial pooling

and temporal inference and uploads them into a database,

which is shared with the real-time module. The real-time

module performs real-time FDI with the use of parameters

stored in the shared database. The module does not perform

any learning and is concerned only with the execution of the

Fig. 1. Proposed approach

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

The final version of record is available at http://dx.doi.org/10.1109/TII.2019.2902274

model with previously learned parameters. This operation of

splitting the learning process from the actual execution

process is necessary to ensure real-time operation which

would otherwise be unattainable. The execution of the

learning module is performed on a dedicated server, with the

deployed module running as a service. Initially the module

acquires several data samples from an SPC database. The

database contains a log of all signals generated by the

execution of the manufacturing process as they unfold in

time. This data is stored in a database as textual information

and loaded by the learning module to computer memory as a

list of string objects. Each element of this list represents the

current values for all manufacturing signals for a given time

frame

. The elements in that list are first fragmented into

separate signals and based on their type individually encoded

into sparsely distributed representations (SDR). The SDR

encodings for each signal at time

are combined into a

binary array to create an input vector. This process is

repeated for the remaining elements of that list and results in

a new list of binary input vectors being created and

subsequently used as an input to the DAE. An optimisation

algorithm is executed to adjust parameters of the DAE model

thus minimising the error on the input reconstructions. The

learned parameters of the model are saved and reused during

the next iteration of the algorithm.

The data generated by the DAE is subsequently processed

by the hierarchical-clustering module, which extracts

meaningful information about the data structure of the feature

space. The dendrogram created by the hierarchical-clustering

process is cut at a certain height to partition the feature space

into multiple regions. For each region, a centroid is assigned

and saved to a dictionary. This dictionary is used to map

signals for each time frame

into a state

where   .

The output of this operation creates a list of temporal

transitions between the different states. The list can therefore

be considered to describe state representations of an

underlying Markov process. The transition probabilities

between the different states

are discovered and used to

populate the transition matrix of an n-order Markov model.

To reduce the memory requirements necessary to store the

transition matrix it is implemented as a dictionary. The

entries of the dictionary are saved in the database and used by

the real-time module to predict future states of the monitored

system. This operation concludes the first iteration of the

algorithm. The entire process is repeated and reinitialised

with an acquisition of a new set of data samples from the SPC

database. This process is presented in Fig. 2.

Fig. 2 Learning module execution diagram.

The real-time module is integrated with custom-built

Industrial Test, Control and Calibration (ITCC) software. It

starts its execution by downloading the DAE, centroid

dictionary and transition dictionary parameters from the

shared database. The real-time operation of the

manufacturing process, generating spatial-temporal signals is

logged and based on the data type of the signal, transformed

into the correct SDR representation. The encoded data is

subsequently forward-propagated through the DAE structure

(initialised with the parameters acquired from the shared

database). There is no learning performed in the real-time

module. The signals are processed by the DEA and as a

consequence transformed into a feature space used as input to

the centroid dictionary from where state information is

acquired. The inference of the state value is based on the

shortest distance between the feature vector and a given

centroid. The last

states are saved at any given time and

used with the transition dictionary of the

- order Markov

chain to infer the future state of the system. The predicted

state is transformed back to a feature space and saved to the

computer memory. During the next iteration, the predicted

feature vector is compared with an actual feature vector

generated by the manufacturing process. The residual vector

generated by this process is used as an input to a previously

trained MLP classifier, which indicates a fault occurrence in

the system. This process is described in algorithm 1.

SPC data generated

by manufacturing

process

SPC Data

Encoder

AC 1

AC 2

DEA

Hierarchical

Clustering

N-order Markov

chain

Database

Stochastic

Gradient

Descent

Stochastic

Gradient

Descent

Dictionary of state transitions

Dictionary of cluster centroids

Learned weights of DEA

Dictionary of SPC test data

paramters

Sparse binary vectors

Stochastic

Gradient

Descent

Executed N number of new data

samples

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

The final version of record is available at http://dx.doi.org/10.1109/TII.2019.2902274

Fault Detection and Isolation in Industrial Processes Using Deep Learning Approaches

Figures

Citations

Tackling Faults in the Industry 4.0 Era-A Survey of Machine-Learning Solutions and Key Aspects.

Machine Learning for industrial applications: A comprehensive literature review

A Trustworthy Privacy Preserving Framework for Machine Learning in Industrial IoT Systems

Whole Process Monitoring Based on Unstable Neuron Output Information in Hidden Layers of Deep Belief Network

Fault detection and identification of rolling element bearings with Attentive Dense CNN

References

Dropout: a simple way to prevent neural networks from overfitting

Reducing the Dimensionality of Data with Neural Networks

SciPy: Open Source Scientific Tools for Python

Improving neural networks by preventing co-adaptation of feature detectors

Greedy Layer-Wise Training of Deep Networks

Related Papers (5)

A plant-wide industrial process control problem

Deep learning

Reducing the Dimensionality of Data with Neural Networks

Deep Learning for Smart Industry: Efficient Manufacture Inspection System With Fog Computing

Classification as an aid tool for the selection of sensors used for fault detection and isolation

Frequently Asked Questions (9)

Q1. What are the contributions mentioned in the paper "Ieee transaction on industrial informatics" ?

Q2. What is the way to solve the overfitting problem?

Q3. How many samples were used to train the learning module?

Q4. How long has he been working on intelligence systems?

Q5. What is the alternative for future research work?

Q6. Why is the error rate on input reconstructions so low?

Q7. What inputs were used to train the learning module?

Q8. What is the effect of the momentum technique on the reconstruction of input reconstructions?

Q9. What is the performance of the proposed method?