What is the idea behind having multiple layers?

The idea behind having multiple layers is that each subsequent layer uses the outputs of the previous layer to form higher level abstractions of the data.

How does the network learn to distinguish between the different dependencies?

since the fault categories differ only based on their spatial and temporal dependencies and the network manages to correctly classify them in 99.7% of the trials, it has learned to distinguish between these dependencies.

What is the ideal size of the network?

In general, the ideal size of the network is based on the complexity of the problem, the amount of availabletraining data and the available computational resources.

What is the way to test the network?

After the training is complete, the network weights that resulted in the best performance on the validation data set are used to test the network.

What is the target for this classificationt(T)?

The target for this classificationt(T ) is the healthy state, unless the sequence contains a fault for which the severity at that time-step T is above 0.15.

(Open Access) Railway Track Circuit Fault Diagnosis Using Recurrent Neural Networks (2017) | Tim de Bruin

Q: What is the reason why a recurrent neural network is a natural choice?

For detecting temporal dependencies, a Recurrent Neural Network (RNN) is a natural choice, since the recurrent connections in the network allow it to store memories of past events.

Q: What causes the receiver to be energized?

This causes the current flow through the receiver to decrease to a level where the relay is no longer energized and the section is reported as occupied.

Q: What is the main reason for the use of convolutional networks?

On this problem, convolutional networks achieve state of the art performance by using raw pixel values, instead of using hand-crafted feature detectors as inputs [4].

Delft University of Technology

Railway track circuit fault diagnosis using recurrent neural networks

de Bruin, Tim; Verbert, Kim; Babuska, Robert

DOI

10.1109/TNNLS.2016.2551940

Publication date

2017

Document Version

Accepted author manuscript

Published in

IEEE Transactions on Neural Networks and Learning Systems

Citation (APA)

de Bruin, T., Verbert, K., & Babuska, R. (2017). Railway track circuit fault diagnosis using recurrent neural

networks.

IEEE Transactions on Neural Networks and Learning Systems

(3), 523-533.

https://doi.org/10.1109/TNNLS.2016.2551940

Important note

To cite this publication, please use the final published version (if applicable).

Please check the document version above.

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent

of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Takedown policy

Please contact us and provide details if you believe this document breaches copyrights.

We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1

Railway Track Circuit Fault Diagno sis

using Recurrent Neural N etworks

Tim de Bruin, Kim Verbert and Robert Babu

ska

Abstract—Timely detection and identiﬁcation of faults in rail-

way track circuits is crucial for the safety and availabilit y of

railway networks. In this paper, the use of the Long Short Term

Memory Recurrent Neural Network is proposed to accomplish

these tasks based on t he commonly available measurement

signals. By considering the signals from multiple track circuits

in a geographic area, faults are diagnosed from their spatial and

temporal dependencies. A generative model is used to show that

the LSTM network can l earn these dependencies directly from

the data. The network correctly classiﬁes 99.7% of the test input

sequences, with no false positive fault detections. Additionally, the

t-SNE method is used to examine the resulting network, further

showing that it has learned the relevant dependencies in the

data. Fi nally, we compare our LSTM network to a convolutional

network trained on the same task. From this comparison we

conclude that the LSTM network architecture better suited for

the railway t rack circuit fault detection and identiﬁcation tasks

than the convolutional network.

Index Terms—Fault Diagnosis, Track Circuit, LSTM, Recur-

rent Neural Network.

I. INTRODUCTION

S railway networks are becoming busier, they are re-

quired to operate with increasing levels of availability

and reliab ility [1]. To enable the safe operation of a r ailway

network, it is crucial to detect the presence of trains in the

sections of a railway track. The ra ilway track circuit is world-

wide the most commonly used component for train detection.

To prevent accidents, the detection system is designed to be

fail-safe, meaning that in the case of a fault the railway section

is repor te d as occupied.

When this happens, trains are no longer allowed to enter

the particular section. This avoids collisions, but leads to train

delays. Moreover, in-spite of the fail-safe design of the track

circuit, there are situations in which the railway section can

be incorrectly reported as free , which can potentially lead to

danger ous situations. Therefore, to guarante e both safety and

a high availability of th e railway network, it is very important

to prevent track circuit failures. This requires a preventive

maintenan ce strategy to ensure that components are repaired

or replaced b e fore a fault develops into a failure. To schedule

the maintenance of the track circu its in the most efﬁcient and

effective manner, it is necessary to detect and identify faults

as soon as possible.

This research is part of the STW/ProRail project “Advanced monitoring

of intelligent rail infrastructure (ADMIRE)”, project 12235, supported by

the Dutch Technology Foundation STW. It is also part of the research

programme Deep Learning for Robust Robot Control (DL-Force) with project

number 656.000.003 . Both projects are partly ﬁ nanced by the Netherlands

Organisation for Scientiﬁc Research (NWO).

All authors are with the Delft Center for Systems and Control, Delft

University of Technology, Mekelweg 2, 2628 CD Delft, The Netherlands.

e-mail: {t.d.debruin, k.a.j.verbert, r.babuska}@tudelft.nl

In this work, we propose a ne ural network approach to fault

diagnosis in railway track circuits. The fault diagnosis task

comprises the detection of faulty behavior and the determina-

tion of the cause(s) of that behavior.

Since the railway track circuit network is a large network,

it is not re a listic to a ssume that additional m onitoring d evices

will be installed on each track circ uit. The refore, this paper as-

sumes only the availability of data that are currently measur ed

in track circuits. By analyzing the measurement signals from

several track circuits in a small area over time, the fault cause

can be infe rred from the spatial and temporal depen dencies

[2]. In contrast to [2], in this work, a data-based approach to

fault diagnosis is considered, namely an Artiﬁcial Recurrent

Neural Network called the Long Short Term Mem ory (LSTM)

network [3].

Artiﬁcial Neural Networks have recently achieved state-

of-the- a rt performan ce on a range of challenging pattern

recogn ition tasks, such image classiﬁcation [4] and speech

recogn ition [5] . Some of the advances made in these domains

can be applied to fault diagn osis problems as well, which

makes the use of neural networks an interesting o ption in this

domain.

Learning the long-term tempo ral dependencie s that are

characteristic of the faults in the track circ uit case presents

a challenge to standard neural network s. The LSTM network

deals with this problem by introducing memory cells into the

network architecture.

Currently, not eno ugh measuremen t data are available to

train the network and to verify its per formance. The refore, we

have combined the available data with qualitative knowledge

of th e fault behaviors [2] a nd we have constructed a gen e rative

model. The perfor mance of the proposed approach is demon-

strated using synthetic data produced by this model. However,

as the amount of available track circuit data is expected to

increase rapidly over time, we expec t that the method will be

relevant.

Related work

Several me thods fo r fault diagnosis in railway track circuits

have been proposed in literature [ 1], [2], [6]– [10]. A distinc-

tion can be made between method s that use data collected

by a measurement train [6], [7], [9], [ 10] and methods that

use data collected via track-side monitorin g devices [1], [2],

[8]. In this work, track-side mo nitoring devices are considere d

because they continuously monitor the system health and ar e

therefore suitable for the early diagnosis of faults. The main

difference compared to the approaches in [1 ], [8] is that in

those works multiple monitoring signals are used , while in this

Accepted Author Manuscript. Link to published article (IEEE): http://dx.doi.org/10.1109/TNNLS.2016.2551940

reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of

any copyrighted component of this work in other works.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2

paper, for each track circuit, only one measurement signal is

available. The main difference compared to the approach in [2]

is that in [2] a knowledge-based approach is proposed, while

we consider a d ata-based appro ach, namely a Long Short Term

Memory (LSTM) network.

The use of spatial fault dependencies for the diagn osis of

faults is relatively new to the railway track circu it setting [2],

although it is m ore commonly used in other domains (e.g [11]–

[13]).

To the authors’ best knowledge, LSTM networks have not

been previously propo sed for fault diagnosis in railway track

circuits. However, many applications of neu ral networks to

fault diagnosis and condition monito ring problems can be

found in the literature. One recent popular approach is to use

a Deep Belief Network [14]. The stochastic nature of these

networks make them a natural ﬁt to fault detection. By training

exclusively on examples from healthy behavior, the network

can determine the proba bility that a new input vector does not

come from the class of healthy states.

One example of this principle is given in [15], where a deep

belief network is train ed to detect faults in electric motors. In

[16], a deep belief network is used to create an industrial soft

sensor. The network predicts the value of a p rocess variable

based on the values of many other variables. However, it

does not take the temporal developments of these variables

into account. When these methods do take a time sequence

as an input, they often consider a sequence of ﬁxed length.

In contrast, we use a recurrent network which allows the

predictions of the network to be u pdated at every input time-

step while keeping a ’ memory ’ of the past inputs.

Methods using recurrent neural networks have also been

discussed in the literature. An example closely related to th is

work is given in [17], where Echo State Networks are trained

to learn the spatial and temporal dependencies in a d istributed

sensor network. Faults are detected by predicting th e values

that the sensors will measure and comparing these to the true

values. Methods for fault classiﬁcation based on predicting

the output of a system are common as well. On e examp le

is [18], in which for each fault category a separate recurrent

neural network model predicts the output of the system

given the inputs. The fault is then identiﬁed by determining

which model best explains the measured outputs. In contrast

to these methods, o ur method learns to detect and classify

faults directly from the measurements. Additionally, using the

LSTM network architecture allows us to learn longer term

temporal dependencies.

The rest of this pap er is organized as follows. In Section II,

the working of a track circuit is discussed.

In Section III, the structure and working of the LSTM

Network tha t is used to identify the faults is discussed. The

results of using the prop osed n eural network with the synthetic

data are given in Section IV, together with an analysis of the

trained network using the visualization method t-SNE [19]. In

Section V a com parison is made between the proposed LSTM

network and a convolutional network. The conclusions of this

work are given in Section VI. In Appendix A, a number of

faults that c a n cause a track circuit to fail are p resented, with

special attention given to the spatio-temoral dependencies tha t

make it possible to identify these faults from the measured

or generated data. Appendix B de scribes the generative model

that is used to produce the training and test data.

II. TRACK CIRCUITS

To enable the safe operation of a railway network, track

circuits are used to detect the absence of a train in a section

of railway track. Trains a re only allowed to enter tr a ck sections

which the corresponding track circuit has reported to be free.

A track circuit works by using the rails in a track section as

condu c tors that connect a transmitter at one end of the section

to a receiver at the other end, as shown in Figure 1. When no

train is present in the section, the transmitter will e nergize a

relay in the receiver which indicates that the section is free.

When a train enters the section, the wheel-sets of the train

forms a short circuit as shown in Figure 1. This causes the

current ﬂow through the receiver to decrease to a level where

the relay is no longer energized and th e section is reporte d as

occupied.

The c orrect operation of a track circu it depends on the

electrical current through the receiver. In the absence of a train

in the section, the current must be high enough to energize the

relay. Conversely, in the presence of a train, the current must be

low enough so that the relay is de-energized. To maintain the

safety and availability of the railway network, it is important to

detect a ll possible faults in the system. Moreover, to schedule

preventive maintenance on the track circuits, it is importan t

to identify the fault type and to determine the development of

the fault severity over time.

A. Fault diagnosis

Every track circuit has different electrical properties which

results in different values of the ‘high’ current I

(t) when no

train is present, and of the ‘low’ current I

(t) when a train is

present. Additionally, the tr a nsients between these values may

be different. The current levels also depend on environmental

inﬂuences and on the prop erties o f the train passing through

the section. For these reasons, it is not possible to adequately

detect the presence of a fault by only consider ing the e lectrical

current I(t) during the passing of a single train. In this work

we consider th e current signals from several track circuits in

the same geograph ic area, measured over a longer period of

time. This ma kes it possible to not only detect the presence of

a fault, but to also distinguish b etween different fault types.

The reason ing behind this approach is that different faults have

different spatial and tempo ral footpr ints [2]. The faults th a t are

considered in this paper are:

• Insulated joint defect

• Conductive object (across the insula te d joints)

• Mechanical rail defect

• Electrical disturbance

• Ballast degradation

A description of these fault types, together with their spatial

and temporal footprints, is given in Appendix A.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 3

NO TRAIN TRAIN

Insulated joint

Transmitter

Receiver

Wheel-set

Fig. 1. Current ﬂow in a track circuit. Each track circuit detects the absence of trains in a section of a railway track. Subsequent sections are separated from

each other by insulated joints.

B. Generative Model

To enable the development, testing and comparison of

condition monitoring methods, we have developed a ge nerative

model. This model is based on a qualitative unde rstanding of

the system and the effect of the faults considered, as we ll as on

limited set of measurement data available from real world track

circuits. This model, together with a strategy for sampling the

electrical current, is described in Appendix B.

III. NEURAL NETWORK

Artiﬁcial Neural Network s have achieved state of the art

performance on several pattern recognition tasks. One reason

for these successes is the use of a strategy called ’end-to-

end learning’. This strategy is based on moving away from

hand crafted feature detectors and manually integrating prior

knowledge into the network. Instead, networks are trained

to produce their end results directly from the raw input

data. To use e nd-to-end learning, a large labeled data set

is requ ired. Wh en this requirement is met, the beneﬁts of a

holistic learning approach tend to be larger than the beneﬁts

of explicitly using prior knowledge [20].

One exam ple o f a ﬁeld in which this strategy has been

successfully applied is image recognitio n. On this proble m,

convolutional n e tworks achieve state of the art perfor mance by

using raw pixel values, instead of using hand-crafted feature

detectors as inputs [4 ]. Another example is speech recognition,

in whic h methods using phonemes as an intermediate repre-

sentation are being replaced by methods transcribing sound

data directly into letters [5].

For the track circuit fault diagnosis case there are currently

not enough labeled data available. However, the mea suring

equipment that records these data has been installed. There-

fore, it is reasonable to assume that at some future time the

data requirement will be met. The neural n etwork proposed in

this paper is trained and tested with synthetic data from our

generative model. This enables us to analyze the opportunities

of a pplying end-to-end learning to th e track circuit fault

diagnosis problem.

A. Network Architecture

The prior knowledge of the spatial a nd temporal fault

dependencies will not be explicitly integrated into the neural

network. It is, however, important to give the network a

structure that enables it to learn these d ependencies from the

data.

In order to take the spatial dependencie s into account, the

network input consist of the electrical current signa ls from

ﬁve separate track circuits. The sign als come from the track

circuit tha t is bein g diagnosed I

(t), as well a s two other

track circuits on the sam e tr ack {I

(t), I

(t)} and two track

circuits on an adjacent track {I

(t), I

(t)}.

For detecting temporal dependencies, a Recurren t Neural

Network (RNN) is a n atural choice , since the recurrent connec-

tions in the network allow it to store memories of past events.

However, standard RNN’s struggle to learn long-term time

dependencies. This is due to the vanishing g radient prob lem

[3]. A popular solution to this problem is the use of the Long

Short Term Memory network architectu re.

1) LSTM cell: LSTM networks are able to learn long-term

time dependencies by introduc ing specialized memory cells

into the network architec ture. The structure of the memory

cell is shown in Figur e 2. The units a and b are the input

and output units respectively. The un it M is the memory unit.

It can remember a value through a recurrent connection with

itself. The neuron s denoted by g are gate u nits. The input gate

i de te rmines when a new input is adde d to the value of the

memory unit by multiplying the output of the input unit a by

the output o f the gate unit. In a similar way, the forget ga te f

determines wh en the value in the memory unit is kept constant

and when it is reduced or r eset. The output gate o determines

when the cell outputs its value.

Our network has two hidde n layers containing 250 LSTM

cells each. This conﬁguration was empirically found to reliably

yield good resu lts for this problem. Smaller networks resulted

in worse pe rformance and larger networks d id not improve

the performance further while requiring signiﬁcantly in creased

training times. In g eneral, the id eal size of the network is based

on the complexity of the problem, the amount of available

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 4

Fig. 2. Architecture of the LSTM memory cell. The black dots indicate a

multiplication of the outputs of the gate units g by the outputs of the regular

units.

training data and the available computational resources.

The inputs to each LSTM cell j in layer l consist of the

inputs to the layer at that time step x

(T ), as well as the

outputs of all LSTM cells in layer l at the previous time-step

(T − 1). The equations that describe LSTM cell j in layer

l are:

(T ) = sigm



(T ) + W

(T − 1) + b



(1)

(T ) = sigm



(T ) + W

(T − 1) + b



(2)

(T ) = tanh



(T ) + W

(T − 1) + b



(3)

(T ) = sigm



(T ) + W

(T − 1) + b



(4)

(T ) = f

(T )



(T − 1)



+ i

(T )a

(T ) (5)

(T ) = o

(T )tanh



(T )



(6)

2) Inputs and outputs: For each of the ﬁve track circuits

in Figure 3, the current magnitu de is sam pled four times

during a train passing event. The details of this samp ling

proced ure are described in Appen dix B. The resulting 20

current values for each train passing event T are the inpu ts

to the ﬁr st hidden layer for that train passing event time-step :

(T ) = [I

(T ) ... I

(T )].

The outputs of the ﬁrst hidden lay e r are the inputs of the

second hidden layer: x

(T ) = h

(T ). The outputs of the

second hidden layer are the inputs to th e output layer of

the network. This layer consists of six softmax classiﬁcation

units; one for the healthy state and ﬁve for each of the fault

categories. They give the likelihood that th e network assigns

to each category c at time-step T as:

P (Y = c)(T ) =

(T )+b

d=1

(T )+b

(7)

A complete overview of the network is given in Figure 3.

B. Network training

To train the neural network , two data sets are generated.

The ﬁrst one is a training data set with 2160 0 sequences.

Time [s]

0 2 4 6 8 10 12

Current [A]

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

...

T C

Track 1

Track 2

..I

250

class 1

class 2

class 3

class 4

class 5

class 6

III

Fig. 3. Fault diagnosis process overview. For each train passing event T , the

current time sequence of the ﬁve track circuits (I) is sampled (II). These

samples are the input to the neural network (III) which uses them to update

the likelihood of the six different fault classes.

The second is a validation data set containing 600 sequences.

For e a ch sequenc e the properties of the track circuits and

the properties of the fault are stoch astically determined. Each

sequence has a length of 2000 train passing events. This r e la te s

to a time period of 100 day s. Note that although more trains are

likely to pass through the consider ed sections, it is important

to keep the temporal dependencies from becoming too long

term. Therefore, it might be necessary to limit the number of

train passing events per day that are u sed as network inputs.

The network is trained to give a classiﬁcation of the

sequence at every time-step T . The target for this classiﬁcation

Railway Track Circuit Fault Diagnosis Using Recurrent Neural Networks

Figures

Citations

Multiscale Convolutional Neural Networks for Fault Diagnosis of Wind Turbine Gearbox

Understanding and Learning Discriminant Features based on Multiattention 1DCNN for Wheelset Bearing Fault Diagnosis

Data-Based Line Trip Fault Prediction in Power Systems Using LSTM Networks and SVM

Distributed Soft Fault Detection for Interval Type-2 Fuzzy-Model-Based Stochastic Systems With Wireless Sensor Networks

A Review on Deep Learning Applications in Prognostics and Health Management

References

Long short-term memory

Visualizing Data using t-SNE

A fast learning algorithm for deep belief nets

Backpropagation through time: what it does and how to do it

Convolutional networks for images, speech, and time series

Related Papers (5)

Long short-term memory

Reducing the Dimensionality of Data with Neural Networks

Deep learning

Deep neural networks: A promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data

An Intelligent Fault Diagnosis Method Using Unsupervised Feature Learning Towards Mechanical Big Data

Frequently Asked Questions (9)

Q1. What is the reason why a recurrent neural network is a natural choice?

Q2. What is the purpose of a track circuit?

Q3. What is the idea behind having multiple layers?

Q4. How does the network learn to distinguish between the different dependencies?

Q5. What is the ideal size of the network?

Q6. What is the way to test the network?

Q7. What is the target for this classificationt(T)?

Q8. What causes the receiver to be energized?

Q9. What is the main reason for the use of convolutional networks?