How can the rail defect classes be classified?

With the proposed small, medium, and large DCNNs, the rail defect classes can be successfully classified with almost 92% accuracy.

What are the main topics of this paper?

While [21], [22] have focused on identification of track components such as ballast, concrete, wood, and fastener, in this paper the authors focus on the detection and classification of different defects that occur at the rail surface.

What is the classification of normal samples?

In order to convert the multi-class classification results to the binary classification of normal samples versus anomalies, the authors simply regard all the non-normal classes as one and compute the numbers of true positives (TP ), true negatives (TN ), false positives (FP ), and false negatives (FN ).

What is the history of deep neural nets?

Deep convolutional neural nets have been developed rapidly in the field of object recognition since the breakthrough work of [1].

What is the description of the large DCNN model?

From the performance results of the DCNN models, the authors conclude that the large DCNN model performs better for the classification task than the small and medium DCNN model, although the network training takes a longer time.

What is the weight parameter for the gradient descent?

The weight parameters w are therefore obtained through optimization of the approximated expected value of an error function f defined as:Et[f(w)] = 1b tb∑ i=(t−1)b+1 f(w;xi) (1)where t ∈ {1, ...T} is the iteration index and xi is the ith training sample.

What is the function that facilitates the discrimination between image classes?

This function facilitates the discrimination between image classes by being imposed on the convolution filter output and performing a nonlinear transformation of a data space.

(Open Access) Deep convolutional neural networks for detection of rail surface defects (2016) | Shahrzad Faghih-Roohi

Q: What have the authors contributed in "Delft university of technology deep convolutional neural networks for detection of rail surface defects" ?

In this paper, the authors propose a deep convolutional neural network solution to the analysis of image data for the detection of rail surface defects. Therefore, the authors propose to use convolutional neural networks as a viable technique for feature learning. In this way, the authors explore the efficiency of the proposed deep convolutional neural network for detection and classification. The experimental results are promising and demonstrate the capability of the proposed approach.

Q: What future works have the authors mentioned in the paper "Delft university of technology deep convolutional neural networks for detection of rail surface defects" ?

With this in mind, exploring a deep learning approach that would be general enough to be used for automatic detection of other types of rail defects is their immediate future work of interest. In particular, the authors will explore the use of auto-encoders and other deep networks for this purpose.

Q: What is the latest approach for detection of rail defects?

Unlike signal processing, the use of image processing techniques and image data analysis is a very recent approach for detection of rail defects.

Delft University of Technology

Deep convolutional neural networks for detection of rail surface defects

Faghih Roohi, Shahrzad; Hajizadeh, Siamak; Nunez, Alfredo; Babuska, Robert; De Schutter, Bart

DOI

10.1109/IJCNN.2016.7727522

Publication date

2016

Document Version

Accepted author manuscript

Published in

Proceedings 2016 International Joint Conference on Neural Networks (IJCNN)

Citation (APA)

Faghih Roohi, S., Hajizadeh, S., Nunez, A., Babuska, R., & De Schutter, B. (2016). Deep convolutional

neural networks for detection of rail surface defects. In P. A. Estevez, P. P. Angelov, & E. Del Moral

Hernandez (Eds.),

Proceedings 2016 International Joint Conference on Neural Networks (IJCNN)

(pp.

2584-2589). IEEE . https://doi.org/10.1109/IJCNN.2016.7727522

Important note

To cite this publication, please use the final published version (if applicable).

Please check the document version above.

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent

of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Takedown policy

Please contact us and provide details if you believe this document breaches copyrights.

We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.

Deep Convolutional Neural Networks for

Detection of Rail Surface Defects

Shahrzad Faghih-Roohi

∗

, Siamak Hajizadeh

†

, Alfredo N

nez

†

, Robert Babuska

∗

and Bart De Schutter

∗

Delft Center for Systems and Control, Delft University of Technology

Mekelweg 2, 2628 CD Delft, The Netherlands

Email: S.Faghihroohi@tudelft.nl; R.Babuska@tudelft.nl; B.DeSchutter@tudelft.nl

†

Section of Railway Engineering, Delft University of Technology

Stevinweg 1, 2628 CN Delft, The Netherlands

Email: S.Hajizadeh@tudelft.nl; A.A.NunezVicencio@tudelft.nl

Abstract—In this paper, we propose a deep convolutional

neural network solution to the analysis of image data for the

detection of rail surface defects. The images are obtained from

many hours of automated video recordings. This huge amount

of data makes it impossible to manually inspect the images and

detect rail surface defects. Therefore, automated detection of

rail defects can help to save time and costs, and to ensure rail

transportation safety. However, one major challenge is that the

extraction of suitable features for detection of rail surface defects

is a non-trivial and difﬁcult task. Therefore, we propose to use

convolutional neural networks as a viable technique for feature

learning. Deep convolutional neural networks have recently been

applied to a number of similar domains with success. We compare

the results of different network architectures characterized by

different sizes and activation functions. In this way, we explore

the efﬁciency of the proposed deep convolutional neural net-

work for detection and classiﬁcation. The experimental results

are promising and demonstrate the capability of the proposed

approach.

I. INTRODUCTION

Feature learning by using deep neural networks has recently

been applied to a variety of computer vision and classiﬁcation

problems, and has proved successful in many domains. The

classiﬁcation accuracy over several benchmark vision data

sets, commonly with a large number of samples per class,

has been improved over the shallow classical approaches with

hand-crafted features [1]–[3]. Shallow learning approaches are

based on general assumptions that ignore the characteristics

of the given real data. In comparison to these methods, deep

learning techniques help to move away from hand-crafted fea-

tures design towards automated learning of problem-speciﬁc

features, directly from the data. Convolutional neural networks

are based on this strategy. Normally large feature learning

networks such as convolutional neural nets have hundreds

or thousands of parameters, which requires large data sets

for training. In many real-world applications however, not

all the collected big data are good sample data sets of the

target classes. Therefore, the question whether features can

be efﬁciently learned for classiﬁcation of such data sets is

important for such applications.

Automated rail defect detection using video cameras is an

example of a problem where the number of target defects in the

available data set is much smaller than the number of healthy

samples. Rail surface defects occur due to different reasons,

for example as a result of fatigue, due to the repetitive passings

of rolling stock over rail components such as welds, joints, and

switches, or because of the impacts from damaged wheels. If

the rail defects grow and are treated late, they may lead to

high maintenance costs. Therefore, automatic and facilitated

detection of defects is important [4]–[6].

Recently, the use of video cameras for the inspection of

rail tracks has become popular [7], [8], due to the error-

prone, costly, and time-consuming process of manual rail

monitoring. Detection models based on both learned (as in

[9]) and predeﬁned features (as in [10]) have been applied

to different aspects of rail defect detection. However, the use

of video cameras for the detection of rail surface defects

caused by rolling contact fatigue (RCF) has been mostly

studied using hand-crafted or predeﬁned features [8], [11].

Other methods based on spatial correlation statistics, gradient-

based, and hand-crafted features have been used in [12]–[14].

Compared to these feature learning methods, convolutional

neural networks use relatively little pre-processing.

In this paper, we present an application of deep convolu-

tional neural networks (DCNNs) for automatic detection of

rail surface defects. Our data resembles that of [8] for visual

inspection of rails. One immediate advantage of using a DCNN

is that unlike [8], we do not have to go into an elaborate

procedure for the extraction of features. We can rather use

raw images as input to the classiﬁcation model, which is

subsequently optimized using a mini-batch gradient descent

method for the entire network. We compare three DCNNs

with different structures (i.e. different in size and number of

parameters) for their classiﬁcation accuracy and computation

time.

This paper is organized as follows. In Section II, we review

some related work on rail defect detection. In Section III,

we describe the structure and operation of the convolutional

neural network that is used to detect defects. In Section

IV, we describe our data sets and present the proposed

deep convolutional neural network. Section V presents the

experimental results together with a comparison of different

training strategies. Section VI concludes the paper with a brief

discussion.

Accepted Author Manuscript. Link to published article (IEEE): http://dx.doi.org/10.1109/IJCNN.2016.7727522

republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted

component of this work in other works.

II. RELATED WORK

In the last decade, different object recognition methods have

been used in the ﬁeld of rail defect detection. In [4], [15]–[17],

some signal processing techniques such as noise reduction and

wavelet transforms have been used for estimating irregularities

and detecting defects in railway tracks. Unlike signal pro-

cessing, the use of image processing techniques and image

data analysis is a very recent approach for detection of rail

defects. This is due to the new developments in the technology

of video cameras and machine vision for rail monitoring. In

[18] and [19], image processing techniques have been used for

steel defect detection. In [18], a convolutional neural network

is trained on a database of photometric stereo images. By

means of differently colored light-sources illuminating the rail

surfaces, the defects are made visible in a photometric dark-

ﬁeld setup. Moreover, in [19], a max-pooling convolutional

neural network is applied for steel defect classiﬁcation. In both

papers, it is indicated that the data sets for training and testing

are quite small, and the training is prone to over-ﬁtting.

Classically in detection from visual data, gradient-based

features such as the histogram of oriented gradients (HoG),

scale-invariant feature transforms (SIFT), spacial pyramids,

and basis functions representations such as Gabor ﬁlters are

among the common choices of features (see e.g. [20]). In

recent years, the expansion of deep feature learning schemes

such as deep convolutional neural nets has provided such

applications with a better tool for extracting features that

are speciﬁcally tailored for each domain. Deep convolutional

neural nets have been developed rapidly in the ﬁeld of object

recognition since the breakthrough work of [1]. In [21], [22],

deep convolutional neural networks have been used for rail

fastening condition monitoring. While [21], [22] have focused

on identiﬁcation of track components such as ballast, concrete,

wood, and fastener, in this paper we focus on the detection and

classiﬁcation of different defects that occur at the rail surface.

Moreover, we are interested in evaluating different structures

of deep convolutional neural networks that can provide good

accuracy rates on classiﬁcation of rail defects compared to

classical learning methods.

III. CLASSIFICATION METHOD

A. Deep Convolutional Neural Network

A deep convolutional neural network (DCNN), based on the

classical convolutional neural network proposed by LeCun et

al. [23] consists of three main components:

1) Convolution: A convolution layer is connected to the

next layer in a similar manner as the traditional bipartite

multi-layer neural network, with the key difference that

the weights are shared between sets of connections. Each

set of weight sharing connections forms a ﬁlter that is

convoluted with the input data. There are usually several

such ﬁlters trained in parallel at each layer. Convolution

ﬁlters slide over small local receptive ﬁelds of input

image data in image classiﬁcation applications. Every

ﬁlter acts as a feature detector. The result of applying a

convolution across an image forms a feature map.

2) Activation function: This function facilitates the dis-

crimination between image classes by being imposed

on the convolution ﬁlter output and performing a non-

linear transformation of a data space. Some examples of

activation functions are the hyperbolic tangent function

(Tanh), the sigmoid function, and rectiﬁed linear units

(ReLU) [24].

3) Max-pooling: The feature maps resulting from a con-

volution layer are sub-sampled in a pooling layer. In a

max-pooling layer, the dimensions of the feature maps

are reduced by merging local information (selecting

maximum values) within a neighborhood window [19].

Convolutional layers and max-pooling layers are laid suc-

cessively to create the DCNN architecture. Compared to

shallow architectures, a DCNN has multiple layers that can

represent complex functions with higher efﬁciency and gener-

alization accuracy.

B. Training Methods

A batch (standard) gradient descent method involves op-

timizing the error over the entire training set. Since this

procedure can be computationally extremely expensive for

a large network, often for DCNN training an approximation

method called mini-batch stochastic gradient descent method

is used [2]. Here the difference is that instead of calculating the

gradient of the error over the entire training set, each iteration

calculates the gradient of the error for a small part of the

samples called the mini-batch. We denote with b and n the

size of mini-batch and the total number of training samples,

respectively. For the mini-batch gradient descent, there are

in total T = n/b iterations per training epoch. The weight

parameters w are therefore obtained through optimization of

the approximated expected value of an error function f deﬁned

as:

[f(w)] =

i=(t−1)b+1

f(w; x

) (1)

where t ∈ {1, ...T } is the iteration index and x

is the ith

training sample. At each iteration the weights are adjusted

using the gradient descent update rule:

(t+1)

= w

(t)

− µ∇

[f(w

(t)

)] (2)

with µ being the learning rate.

While batch gradient descent runs through all the samples

in the training set to obtain a single update for w in each

iteration, stochastic gradient descent uses only a single training

sample, and mini-batch gradient descent method uses b (i.e. the

mini-batch size) samples at each iteration. Stochastic gradient

descent and mini-batch gradient descent are computationally

much cheaper than batch gradient descent. Mini-batch gradient

descent can often be as fast as stochastic gradient descent if

appropriate vectorization is applied in computing the deriva-

tive terms of (2). For problems with non-convex objective

functions, stochastic gradient descent has sometimes shown

the ability to escape from local optima where batch gradient

descent is trapped [2]. Therefore, stochastic gradient descent

may perform better in applications such as DCNN training.

IV. IMPLEMENTATION

A. Data description

Our data sets are images of rail tracks that are collected from

a camera with a high frame rate. This camera is mounted on

a measurement vehicle and captures the top view of the rail

tracks. The video data covers approximately 350 kilometers of

track, equivalent to 700 kilometers of rail. Among the collected

frames, we manually labelled 22408 objects as belonging to

1 out of 6 classes (normal, weld, light squat, moderate squat,

severe squat, and joint). The weld class corresponds to those

parts of the track surface where the rails are welded together

to form one continuous rail, and in most of the cases in the

images, these are hardly distinguishable from the normal rail

surface even by the human eye when the weld is in good health

condition. Insulated joints electrically separate two consecutive

track sections with an insulating material that is easily seen in

the images. Squats are a type of surface-initiated track defects

[4]. A sample of images from the different types of squats is

shown in Figure 1. There are different classiﬁcations of the

squat types based on their severity and size, but often there

is no rigid distinction between the types, since squats have

a gradual growth process. Our data set contains 985 welds,

938 light squats and smaller trivial defects, 562 moderate and

severe squats, and 755 rail joints. These images are obtained

from the original images of rail tracks, after segmentation

of the track from the ballast and other background textures

surrounding the rails.

B. DCNN Architecture

In this paper, three DCNN structures (small, medium, and

large) are considered. Table I summarizes the information of

each DCNN structure. The parameters of the DCNNs are

determined by implementing each network with various com-

binations of parameters such as the number of feature maps,

the sizes of the ﬁlters, the number of layers, and the number

of nodes of fully-connected layers. Then, the parameters that

lead to the highest classiﬁcation accuracy have been selected

for building the DCNN models. Model parameters such as the

learning rate and the class weights are also adjusted during

the initial test runs. Out of each class, we reserve 10 percent

of the samples for testing and use the remaining 90 percent

for training.

Light

Squat

Normal

Weld

Joint

Severe

Squat

Moderate

Squat

Figure 1. A sample of images of defects and non-defects

Table I

THE STRUCTURES OF THE DEEP CONVOLUTIONAL NEURAL NETWORKS

CONSIDERED IN THIS PAPER

Deep convolutional neural network

Type of layer

Small Medium Large

Convolutional

layer 1

Feature maps 6 10

Filter size 17 × 9 9 × 5

9 × 5

Max-pooling

layer 1

Filter size 3 × 3 2 × 2

2 × 2

Convolutional

layer 2

Feature maps 10 20

Filter size 8 × 3 9 × 6

9 × 6

Max-pooling

layer 2

Filter size 3 × 3 2 × 2

2 × 2

Convolutional

layer 3

Feature maps - 30

Filter size - 4 × 2

4 × 2

Max-pooling

layer 3

Filter size - 2 × 2

2 × 2

Fully-connected

layer 1

Number of

nodes

70 120

320

Fully-connected

layer 2

Number of

nodes

18 30

Fully-connected

layer 3

Number of

nodes

- -

Estimated number of

parameters (weights)

22000 120600

644200

Figure 2 illustrates the response of the ﬁlters trained at the

3rd convolutional layer of the large DCNN. For visualization

purposes, the ﬁlters are scaled. A number of ﬁlters have

been trained to capture the wave-like patterns of the defects.

This ﬁltering acts very similar to predeﬁned feature extraction

functions such as the Gabor family of functions or the Cosine

transform.

For more details, the structure of the medium DCNN is

illustrated in Figure 3. This DCNN model consists of three

convolutional layers, three max-pooling layers, and three fully-

connected layers. Input images are of size 100 × 50 pixels

with 2 color channels (gray scale). The ﬁrst convolution layer

takes a normalized image and ﬁlters it with kernels of size

Figure 2. Visualization of the response of the last convolutional layer to

random stimuli, in the large DCNN

Input image

100 × 50

10 Feature (F) maps

92 × 46

Convolution

(C)

9 × 5

2 × 2

Max-pooling

(Mp)

10 F maps

46 × 23

9 × 6

20 F maps

38 × 18

2 × 2

4 × 2

2 × 2

30 F maps

8 × 4

30 F maps

16 × 8

20 F maps

19 × 9

Fully-connected

(F)

120

nodes

Output:

6 classes

nodes

Normal

Weld

L-squat

M-squat

S-squat

Joint

Figure 3. Architecture of the proposed medium DCNN

9 × 5 pixels. The second convolution layer takes the pooled

feature map of the ﬁrst layer and ﬁlters it with kernels of size

9 × 6 pixels. The kernel size of the third convolution layer is

4 × 2 pixels. In this model, max-pooling units of size 2 × 2

pixels are used. We use the hyperbolic tangent function (Tanh)

and rectiﬁed linear units (ReLU), respectively, as activation

function. After three convolutional and max-pooling layers,

the high-level reasoning in the convolutional neural network

is performed via fully-connected layers. In this network, we

use two fully-connected layers which have 120 nodes and 30

nodes, respectively. The output of the network classiﬁes the

input image using the 6 classes described in Section IV.A.

V. EXPERIMENT RESULTS

A. Experimental setup

The implementation is based on the framework in Torch

7 [25]. To decrease the unwanted variation due to different

lighting and rail texture conditions, we apply a simple nor-

malization to all samples. The learning rate (µ) is initially

set to 10

−3

with a decay factor of 10

−5

to help avoid over-

ﬁtting to the training data. From the results acquired in our

parameter adjustment runs, we set the mini-batch size (b) to

8 for all three DCNNs. Then, we train each network over

40 epochs. For each test we shufﬂe all the existing samples

and divide each class into 10 sets. For 10 rounds of cross-

validation, we test each time on one out of 10 sets and use

the rest for training. For the testing, we randomly under-

sample the normal class to 250 test samples per round. This

is done due to the huge imbalance of the class sizes, which

can severely bias the test results if all normal test samples are

evaluated. The ﬁnal results are averaged over the 10 rounds.

In order to convert the multi-class classiﬁcation results to the

binary classiﬁcation of normal samples versus anomalies, we

simply regard all the non-normal classes as one and compute

the numbers of true positives (T P), true negatives (T N ),

false positives (F P ), and false negatives (F N). The binary

classiﬁcation accuracy: (T P + T N)/(T P +T N +F P + F N )

and F1-score: 2T P/(2T P +F P +F N ) are then deﬁned based

on the reduced classiﬁcation matrices.

B. Classiﬁcation results

We report two types of results from the experiments.

Initially the networks are trained to classify the data into 6

classes. We compare the confusion matrices of the classiﬁ-

cation results trained on the three structures (small, medium,

and large DCNN). Then, we retrain the networks using both

Tanh and ReLU activation functions to classify samples into

3 classes. The ﬁrst class represents the normal rail that has

both weld and normal samples. The second class contains all

types of small defects and squats, and ﬁnally the third class

only consists of rail joints.

The confusion matrices of the classiﬁcation results trained

on the three DCNN are presented in Tables II-IV. The rows

of the matrices correspond to the correct classes and the

columns correspond to the predicted classes. In Table III for

instance, the percentage of correctly classiﬁed severe squats

is 48.13, while 33.12 percent of the actual severe squats are

classiﬁed in the medium squat class. Similarly, in Table IV,

the percentage of correctly classiﬁed welds is 61.95 percent,

while 30.97 percent of the actual welds are classiﬁed in

the normal class. This false classiﬁcation value is relatively

high compared to the total number of welds. The comparison

shows that in the assessment of the binary and multi-class

classiﬁcation accuracy, the two classes of normal and weld

should be integrated into one class (Normal). Also, all three

classes of light, medium, and severe squat should be combined

together as a defect class (Defect). As a result, there are

Table II

CONFUSION MATRIX OF THE SMALL DCNN (%)

Normal Weld L-squat M-squat S-squat Joint

Normal 94.62 3.06 1.82 0.19 0.08 0.23

Weld 31.85 59.62 3.20 1.55 0.29 3.49

L-squat 21.09 3.27 66.93 7.42 0.10 1.19

M-squat 5.95 3.81 27.62 53.57 7.62 1.43

S-squat 6.25 5.62 2.50 37.50 44.38 3.75

Joint 3.13 5.75 1.12 1.75 2.12 86.13

Table III

CONFUSION MATRIX OF THE MEDIUM DCNN (%)

Normal Weld L-squat M-squat S-squat Joint

Normal 95.74 1.94 1.78 0.11 0.04 0.39

Weld 29.61 63.01 3.11 0.88 0.19 3.20

L-squat 22.47 3.17 65.05 7.72 0.30 1.29

M-squat 8.81 2.86 22.86 56.19 6.90 2.38

S-squat 8.12 4.37 3.13 33.12 48.13 3.13

Joint 2.25 5.50 1.62 1.00 1.00 88.63

Table IV

CONFUSION MATRIX OF THE LARGE DCNN (%)

Normal Weld L-squat M-squat S-squat Joint

Normal 96.32 1.82 1.32 0.15 0.08 0.31

Weld 30.97 61.95 2.52 0.68 0.29 3.59

L-squat 22.08 3.17 64.36 9.40 0.10 0.89

M-squat 6.90 3.33 24.76 56.67 7.15 1.19

S-squat 8.12 2.50 3.13 34.37 50.00 1.88

Joint 2.50 4.37 1.00 1.75 2.00 88.38

Deep convolutional neural networks for detection of rail surface defects

Figures

Citations

Deep Learning for Anomaly Detection: A Survey.

Segmentation-based deep-learning approach for surface-defect detection

Deep Architecture for High-Speed Railway Insulator Surface Defect Detection: Denoising Autoencoder With Multitask Learning

Remaining useful lifetime prediction via deep domain adaptation

A CNN-Based Defect Inspection Method for Catenary Split Pins in High-Speed Railway

References

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Histograms of oriented gradients for human detection

Backpropagation applied to handwritten zip code recognition

Related Papers (5)

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

You Only Look Once: Unified, Real-Time Object Detection

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Frequently Asked Questions (14)

Q1. What have the authors contributed in "Delft university of technology deep convolutional neural networks for detection of rail surface defects" ?

Q2. What future works have the authors mentioned in the paper "Delft university of technology deep convolutional neural networks for detection of rail surface defects" ?

Q3. What is the latest approach for detection of rail defects?

Q4. What are the common choices of features in the detection of rail defects?

Q5. How is the convolutional neural network trained?

Q6. How can the rail defect classes be classified?

Q7. What are the main topics of this paper?

Q8. In what fields have different object recognition techniques been used for detecting defects?

Q9. What is the classification of normal samples?

Q10. What is the history of deep neural nets?

Q11. How many classes of squats are included in the data?

Q12. What is the description of the large DCNN model?

Q13. What is the weight parameter for the gradient descent?

Q14. What is the function that facilitates the discrimination between image classes?