scispace - formally typeset
Open AccessJournal ArticleDOI

Railway Track Circuit Fault Diagnosis Using Recurrent Neural Networks

Reads0
Chats0
TLDR
The long-short-term memory (LSTM) recurrent neural network is proposed to accomplish fault detection and identification tasks based on the commonly available measurement signals by considering the signals from multiple track circuits in a geographic area.
Abstract
Timely detection and identification of faults in railway track circuits are crucial for the safety and availability of railway networks. In this paper, the use of the long-short-term memory (LSTM) recurrent neural network is proposed to accomplish these tasks based on the commonly available measurement signals. By considering the signals from multiple track circuits in a geographic area, faults are diagnosed from their spatial and temporal dependences. A generative model is used to show that the LSTM network can learn these dependences directly from the data. The network correctly classifies 99.7% of the test input sequences, with no false positive fault detections. In addition, the t-Distributed Stochastic Neighbor Embedding (t-SNE) method is used to examine the resulting network, further showing that it has learned the relevant dependences in the data. Finally, we compare our LSTM network with a convolutional network trained on the same task. From this comparison, we conclude that the LSTM network architecture is better suited for the railway track circuit fault detection and identification tasks than the convolutional network.

read more

Content maybe subject to copyright    Report

Delft University of Technology
Railway track circuit fault diagnosis using recurrent neural networks
de Bruin, Tim; Verbert, Kim; Babuska, Robert
DOI
10.1109/TNNLS.2016.2551940
Publication date
2017
Document Version
Accepted author manuscript
Published in
IEEE Transactions on Neural Networks and Learning Systems
Citation (APA)
de Bruin, T., Verbert, K., & Babuska, R. (2017). Railway track circuit fault diagnosis using recurrent neural
networks.
IEEE Transactions on Neural Networks and Learning Systems
,
28
(3), 523-533.
https://doi.org/10.1109/TNNLS.2016.2551940
Important note
To cite this publication, please use the final published version (if applicable).
Please check the document version above.
Copyright
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent
of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Takedown policy
Please contact us and provide details if you believe this document breaches copyrights.
We will remove access to the work immediately and investigate your claim.
This work is downloaded from Delft University of Technology.
For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1
Railway Track Circuit Fault Diagno sis
using Recurrent Neural N etworks
Tim de Bruin, Kim Verbert and Robert Babu
˘
ska
Abstract—Timely detection and identification of faults in rail-
way track circuits is crucial for the safety and availabilit y of
railway networks. In this paper, the use of the Long Short Term
Memory Recurrent Neural Network is proposed to accomplish
these tasks based on t he commonly available measurement
signals. By considering the signals from multiple track circuits
in a geographic area, faults are diagnosed from their spatial and
temporal dependencies. A generative model is used to show that
the LSTM network can l earn these dependencies directly from
the data. The network correctly classifies 99.7% of the test input
sequences, with no false positive fault detections. Additionally, the
t-SNE method is used to examine the resulting network, further
showing that it has learned the relevant dependencies in the
data. Fi nally, we compare our LSTM network to a convolutional
network trained on the same task. From this comparison we
conclude that the LSTM network architecture better suited for
the railway t rack circuit fault detection and identification tasks
than the convolutional network.
Index Terms—Fault Diagnosis, Track Circuit, LSTM, Recur-
rent Neural Network.
I. INTRODUCTION
A
S railway networks are becoming busier, they are re-
quired to operate with increasing levels of availability
and reliab ility [1]. To enable the safe operation of a r ailway
network, it is crucial to detect the presence of trains in the
sections of a railway track. The ra ilway track circuit is world-
wide the most commonly used component for train detection.
To prevent accidents, the detection system is designed to be
fail-safe, meaning that in the case of a fault the railway section
is repor te d as occupied.
When this happens, trains are no longer allowed to enter
the particular section. This avoids collisions, but leads to train
delays. Moreover, in-spite of the fail-safe design of the track
circuit, there are situations in which the railway section can
be incorrectly reported as free , which can potentially lead to
danger ous situations. Therefore, to guarante e both safety and
a high availability of th e railway network, it is very important
to prevent track circuit failures. This requires a preventive
maintenan ce strategy to ensure that components are repaired
or replaced b e fore a fault develops into a failure. To schedule
the maintenance of the track circu its in the most efficient and
effective manner, it is necessary to detect and identify faults
as soon as possible.
This research is part of the STW/ProRail project Advanced monitoring
of intelligent rail infrastructure (ADMIRE)”, project 12235, supported by
the Dutch Technology Foundation STW. It is also part of the research
programme Deep Learning for Robust Robot Control (DL-Force) with project
number 656.000.003 . Both projects are partly nanced by the Netherlands
Organisation for Scientific Research (NWO).
All authors are with the Delft Center for Systems and Control, Delft
University of Technology, Mekelweg 2, 2628 CD Delft, The Netherlands.
e-mail: {t.d.debruin, k.a.j.verbert, r.babuska}@tudelft.nl
In this work, we propose a ne ural network approach to fault
diagnosis in railway track circuits. The fault diagnosis task
comprises the detection of faulty behavior and the determina-
tion of the cause(s) of that behavior.
Since the railway track circuit network is a large network,
it is not re a listic to a ssume that additional m onitoring d evices
will be installed on each track circ uit. The refore, this paper as-
sumes only the availability of data that are currently measur ed
in track circuits. By analyzing the measurement signals from
several track circuits in a small area over time, the fault cause
can be infe rred from the spatial and temporal depen dencies
[2]. In contrast to [2], in this work, a data-based approach to
fault diagnosis is considered, namely an Artificial Recurrent
Neural Network called the Long Short Term Mem ory (LSTM)
network [3].
Artificial Neural Networks have recently achieved state-
of-the- a rt performan ce on a range of challenging pattern
recogn ition tasks, such image classification [4] and speech
recogn ition [5] . Some of the advances made in these domains
can be applied to fault diagn osis problems as well, which
makes the use of neural networks an interesting o ption in this
domain.
Learning the long-term tempo ral dependencie s that are
characteristic of the faults in the track circ uit case presents
a challenge to standard neural network s. The LSTM network
deals with this problem by introducing memory cells into the
network architecture.
Currently, not eno ugh measuremen t data are available to
train the network and to verify its per formance. The refore, we
have combined the available data with qualitative knowledge
of th e fault behaviors [2] a nd we have constructed a gen e rative
model. The perfor mance of the proposed approach is demon-
strated using synthetic data produced by this model. However,
as the amount of available track circuit data is expected to
increase rapidly over time, we expec t that the method will be
relevant.
Related work
Several me thods fo r fault diagnosis in railway track circuits
have been proposed in literature [ 1], [2], [6]– [10]. A distinc-
tion can be made between method s that use data collected
by a measurement train [6], [7], [9], [ 10] and methods that
use data collected via track-side monitorin g devices [1], [2],
[8]. In this work, track-side mo nitoring devices are considere d
because they continuously monitor the system health and ar e
therefore suitable for the early diagnosis of faults. The main
difference compared to the approaches in [1 ], [8] is that in
those works multiple monitoring signals are used , while in this
Accepted Author Manuscript. Link to published article (IEEE): http://dx.doi.org/10.1109/TNNLS.2016.2551940
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of
any copyrighted component of this work in other works.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2
paper, for each track circuit, only one measurement signal is
available. The main difference compared to the approach in [2]
is that in [2] a knowledge-based approach is proposed, while
we consider a d ata-based appro ach, namely a Long Short Term
Memory (LSTM) network.
The use of spatial fault dependencies for the diagn osis of
faults is relatively new to the railway track circu it setting [2],
although it is m ore commonly used in other domains (e.g [11]–
[13]).
To the authors’ best knowledge, LSTM networks have not
been previously propo sed for fault diagnosis in railway track
circuits. However, many applications of neu ral networks to
fault diagnosis and condition monito ring problems can be
found in the literature. One recent popular approach is to use
a Deep Belief Network [14]. The stochastic nature of these
networks make them a natural fit to fault detection. By training
exclusively on examples from healthy behavior, the network
can determine the proba bility that a new input vector does not
come from the class of healthy states.
One example of this principle is given in [15], where a deep
belief network is train ed to detect faults in electric motors. In
[16], a deep belief network is used to create an industrial soft
sensor. The network predicts the value of a p rocess variable
based on the values of many other variables. However, it
does not take the temporal developments of these variables
into account. When these methods do take a time sequence
as an input, they often consider a sequence of fixed length.
In contrast, we use a recurrent network which allows the
predictions of the network to be u pdated at every input time-
step while keeping a memory of the past inputs.
Methods using recurrent neural networks have also been
discussed in the literature. An example closely related to th is
work is given in [17], where Echo State Networks are trained
to learn the spatial and temporal dependencies in a d istributed
sensor network. Faults are detected by predicting th e values
that the sensors will measure and comparing these to the true
values. Methods for fault classification based on predicting
the output of a system are common as well. On e examp le
is [18], in which for each fault category a separate recurrent
neural network model predicts the output of the system
given the inputs. The fault is then identified by determining
which model best explains the measured outputs. In contrast
to these methods, o ur method learns to detect and classify
faults directly from the measurements. Additionally, using the
LSTM network architecture allows us to learn longer term
temporal dependencies.
The rest of this pap er is organized as follows. In Section II,
the working of a track circuit is discussed.
In Section III, the structure and working of the LSTM
Network tha t is used to identify the faults is discussed. The
results of using the prop osed n eural network with the synthetic
data are given in Section IV, together with an analysis of the
trained network using the visualization method t-SNE [19]. In
Section V a com parison is made between the proposed LSTM
network and a convolutional network. The conclusions of this
work are given in Section VI. In Appendix A, a number of
faults that c a n cause a track circuit to fail are p resented, with
special attention given to the spatio-temoral dependencies tha t
make it possible to identify these faults from the measured
or generated data. Appendix B de scribes the generative model
that is used to produce the training and test data.
II. TRACK CIRCUITS
To enable the safe operation of a railway network, track
circuits are used to detect the absence of a train in a section
of railway track. Trains a re only allowed to enter tr a ck sections
which the corresponding track circuit has reported to be free.
A track circuit works by using the rails in a track section as
condu c tors that connect a transmitter at one end of the section
to a receiver at the other end, as shown in Figure 1. When no
train is present in the section, the transmitter will e nergize a
relay in the receiver which indicates that the section is free.
When a train enters the section, the wheel-sets of the train
forms a short circuit as shown in Figure 1. This causes the
current flow through the receiver to decrease to a level where
the relay is no longer energized and th e section is reporte d as
occupied.
The c orrect operation of a track circu it depends on the
electrical current through the receiver. In the absence of a train
in the section, the current must be high enough to energize the
relay. Conversely, in the presence of a train, the current must be
low enough so that the relay is de-energized. To maintain the
safety and availability of the railway network, it is important to
detect a ll possible faults in the system. Moreover, to schedule
preventive maintenance on the track circuits, it is importan t
to identify the fault type and to determine the development of
the fault severity over time.
A. Fault diagnosis
Every track circuit has different electrical properties which
results in different values of the ‘high’ current I
h
(t) when no
train is present, and of the ‘low’ current I
l
(t) when a train is
present. Additionally, the tr a nsients between these values may
be different. The current levels also depend on environmental
influences and on the prop erties o f the train passing through
the section. For these reasons, it is not possible to adequately
detect the presence of a fault by only consider ing the e lectrical
current I(t) during the passing of a single train. In this work
we consider th e current signals from several track circuits in
the same geograph ic area, measured over a longer period of
time. This ma kes it possible to not only detect the presence of
a fault, but to also distinguish b etween different fault types.
The reason ing behind this approach is that different faults have
different spatial and tempo ral footpr ints [2]. The faults th a t are
considered in this paper are:
Insulated joint defect
Conductive object (across the insula te d joints)
Mechanical rail defect
Electrical disturbance
Ballast degradation
A description of these fault types, together with their spatial
and temporal footprints, is given in Appendix A.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 3
NO TRAIN TRAIN
Insulated joint
Transmitter
Transmitter
Receiver
Receiver
Wheel-set
Fig. 1. Current flow in a track circuit. Each track circuit detects the absence of trains in a section of a railway track. Subsequent sections are separated from
each other by insulated joints.
B. Generative Model
To enable the development, testing and comparison of
condition monitoring methods, we have developed a ge nerative
model. This model is based on a qualitative unde rstanding of
the system and the effect of the faults considered, as we ll as on
limited set of measurement data available from real world track
circuits. This model, together with a strategy for sampling the
electrical current, is described in Appendix B.
III. NEURAL NETWORK
Artificial Neural Network s have achieved state of the art
performance on several pattern recognition tasks. One reason
for these successes is the use of a strategy called ’end-to-
end learning’. This strategy is based on moving away from
hand crafted feature detectors and manually integrating prior
knowledge into the network. Instead, networks are trained
to produce their end results directly from the raw input
data. To use e nd-to-end learning, a large labeled data set
is requ ired. Wh en this requirement is met, the benefits of a
holistic learning approach tend to be larger than the benefits
of explicitly using prior knowledge [20].
One exam ple o f a field in which this strategy has been
successfully applied is image recognitio n. On this proble m,
convolutional n e tworks achieve state of the art perfor mance by
using raw pixel values, instead of using hand-crafted feature
detectors as inputs [4 ]. Another example is speech recognition,
in whic h methods using phonemes as an intermediate repre-
sentation are being replaced by methods transcribing sound
data directly into letters [5].
For the track circuit fault diagnosis case there are currently
not enough labeled data available. However, the mea suring
equipment that records these data has been installed. There-
fore, it is reasonable to assume that at some future time the
data requirement will be met. The neural n etwork proposed in
this paper is trained and tested with synthetic data from our
generative model. This enables us to analyze the opportunities
of a pplying end-to-end learning to th e track circuit fault
diagnosis problem.
A. Network Architecture
The prior knowledge of the spatial a nd temporal fault
dependencies will not be explicitly integrated into the neural
network. It is, however, important to give the network a
structure that enables it to learn these d ependencies from the
data.
In order to take the spatial dependencie s into account, the
network input consist of the electrical current signa ls from
ve separate track circuits. The sign als come from the track
circuit tha t is bein g diagnosed I
B
(t), as well a s two other
track circuits on the sam e tr ack {I
A
(t), I
C
(t)} and two track
circuits on an adjacent track {I
D
(t), I
E
(t)}.
For detecting temporal dependencies, a Recurren t Neural
Network (RNN) is a n atural choice , since the recurrent connec-
tions in the network allow it to store memories of past events.
However, standard RNN’s struggle to learn long-term time
dependencies. This is due to the vanishing g radient prob lem
[3]. A popular solution to this problem is the use of the Long
Short Term Memory network architectu re.
1) LSTM cell: LSTM networks are able to learn long-term
time dependencies by introduc ing specialized memory cells
into the network architec ture. The structure of the memory
cell is shown in Figur e 2. The units a and b are the input
and output units respectively. The un it M is the memory unit.
It can remember a value through a recurrent connection with
itself. The neuron s denoted by g are gate u nits. The input gate
i de te rmines when a new input is adde d to the value of the
memory unit by multiplying the output of the input unit a by
the output o f the gate unit. In a similar way, the forget ga te f
determines wh en the value in the memory unit is kept constant
and when it is reduced or r eset. The output gate o determines
when the cell outputs its value.
Our network has two hidde n layers containing 250 LSTM
cells each. This configuration was empirically found to reliably
yield good resu lts for this problem. Smaller networks resulted
in worse pe rformance and larger networks d id not improve
the performance further while requiring significantly in creased
training times. In g eneral, the id eal size of the network is based
on the complexity of the problem, the amount of available

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 4
g
g
g
a
b
M
h
i
o
f
Fig. 2. Architecture of the LSTM memory cell. The black dots indicate a
multiplication of the outputs of the gate units g by the outputs of the regular
units.
training data and the available computational resources.
The inputs to each LSTM cell j in layer l consist of the
inputs to the layer at that time step x
l
(T ), as well as the
outputs of all LSTM cells in layer l at the previous time-step
h
l
(T 1). The equations that describe LSTM cell j in layer
l are:
i
l
j
(T ) = sigm
W
xi
l
j
x
l
(T ) + W
hi
l
j
h
l
(T 1) + b
i
l
j
(1)
f
l
j
(T ) = sigm
W
xf
l
j
x
l
(T ) + W
hf
l
j
h
l
(T 1) + b
f
l
j
(2)
a
l
j
(T ) = tanh
W
xa
l
j
x
l
(T ) + W
ha
l
j
h
l
(T 1) + b
a
l
j
(3)
o
l
j
(T ) = sigm
W
xo
l
j
x
l
(T ) + W
ho
l
j
h
l
(T 1) + b
o
l
j
(4)
M
l
j
(T ) = f
l
j
(T )
M
l
j
(T 1)
+ i
l
j
(T )a
l
j
(T ) (5)
h
l
j
(T ) = o
l
j
(T )tanh
M
l
j
(T )
(6)
2) Inputs and outputs: For each of the ve track circuits
in Figure 3, the current magnitu de is sam pled four times
during a train passing event. The details of this samp ling
proced ure are described in Appen dix B. The resulting 20
current values for each train passing event T are the inpu ts
to the fir st hidden layer for that train passing event time-step :
x
1
(T ) = [I
1
A
(T ) ... I
4
E
(T )].
The outputs of the first hidden lay e r are the inputs of the
second hidden layer: x
2
(T ) = h
1
(T ). The outputs of the
second hidden layer are the inputs to th e output layer of
the network. This layer consists of six softmax classification
units; one for the healthy state and five for each of the fault
categories. They give the likelihood that th e network assigns
to each category c at time-step T as:
P (Y = c)(T ) =
e
W
c
h
2
(T )+b
c
6
P
d=1
e
W
d
h
2
(T )+b
d
(7)
A complete overview of the network is given in Figure 3.
B. Network training
To train the neural network , two data sets are generated.
The first one is a training data set with 2160 0 sequences.
Time [s]
0 2 4 6 8 10 12
Current [A]
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
...
...
...
T C
A
T C
B
T C
C
T C
D
T C
E
Track 1
Track 2
I
1
A
..I
4
A
I
1
B
I
2
B
I
3
B
I
4
B
I
1
C
..I
4
E
x
1
x
20
M
1
1
M
1
250
h
1
1
h
1
250
M
2
1
M
2
250
h
2
1
h
2
250
class 1
class 2
class 3
class 4
class 5
class 6
I
II
III
Fig. 3. Fault diagnosis process overview. For each train passing event T , the
current time sequence of the five track circuits (I) is sampled (II). These
samples are the input to the neural network (III) which uses them to update
the likelihood of the six different fault classes.
The second is a validation data set containing 600 sequences.
For e a ch sequenc e the properties of the track circuits and
the properties of the fault are stoch astically determined. Each
sequence has a length of 2000 train passing events. This r e la te s
to a time period of 100 day s. Note that although more trains are
likely to pass through the consider ed sections, it is important
to keep the temporal dependencies from becoming too long
term. Therefore, it might be necessary to limit the number of
train passing events per day that are u sed as network inputs.
The network is trained to give a classification of the
sequence at every time-step T . The target for this classification

Citations
More filters
Journal ArticleDOI

Multiscale Convolutional Neural Networks for Fault Diagnosis of Wind Turbine Gearbox

TL;DR: Experimental results and comprehensive comparison analysis have demonstrated the superiority of the proposed MSCNN approach, thus providing an end-to-end learning-based fault diagnosis system for WT gearbox without additional signal processing and diagnostic expertise.
Journal ArticleDOI

Understanding and Learning Discriminant Features based on Multiattention 1DCNN for Wheelset Bearing Fault Diagnosis

TL;DR: Experimental results on the wheelset bearing dataset show that the proposed multiattention mechanism can significantly improve the discriminant feature representation, thus the MA1DCNN outperforms eight state-of-the-arts networks.
Journal ArticleDOI

Data-Based Line Trip Fault Prediction in Power Systems Using LSTM Networks and SVM

TL;DR: A method for data-based line trip fault prediction in power systems using long short-term memory (LSTM) networks and support vector machine (SVM) is proposed, which proves the method’s improvements compared with current data mining methods.
Journal ArticleDOI

Distributed Soft Fault Detection for Interval Type-2 Fuzzy-Model-Based Stochastic Systems With Wireless Sensor Networks

TL;DR: Simulation results successfully validate the effectiveness and applicability of the presented distributed fault detection scheme.
Journal ArticleDOI

A Review on Deep Learning Applications in Prognostics and Health Management

TL;DR: The survey validates the universal applicability of deep learning to various types of input in PHM, including vibration, imagery, time-series and structured data and suggests the possibility of transfer learning across PHM applications.
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Journal ArticleDOI

A fast learning algorithm for deep belief nets

TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Journal ArticleDOI

Backpropagation through time: what it does and how to do it

TL;DR: This paper first reviews basic backpropagation, a simple method which is now being widely used in areas like pattern recognition and fault diagnosis, and describes further extensions of this method, to deal with systems other than neural networks, systems involving simultaneous equations or true recurrent networks, and other practical issues which arise with this method.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What is the reason why a recurrent neural network is a natural choice?

For detecting temporal dependencies, a Recurrent Neural Network (RNN) is a natural choice, since the recurrent connections in the network allow it to store memories of past events. 

To enable the safe operation of a railway network, track circuits are used to detect the absence of a train in a section of railway track. 

The idea behind having multiple layers is that each subsequent layer uses the outputs of the previous layer to form higher level abstractions of the data. 

since the fault categories differ only based on their spatial and temporal dependencies and the network manages to correctly classify them in 99.7% of the trials, it has learned to distinguish between these dependencies. 

In general, the ideal size of the network is based on the complexity of the problem, the amount of availabletraining data and the available computational resources. 

After the training is complete, the network weights that resulted in the best performance on the validation data set are used to test the network. 

The target for this classificationt(T ) is the healthy state, unless the sequence contains a fault for which the severity at that time-step T is above 0.15. 

This causes the current flow through the receiver to decrease to a level where the relay is no longer energized and the section is reported as occupied. 

On this problem, convolutional networks achieve state of the art performance by using raw pixel values, instead of using hand-crafted feature detectors as inputs [4].