scispace - formally typeset
Open AccessProceedings ArticleDOI

Deep convolutional neural networks for detection of rail surface defects

Reads0
Chats0
TLDR
This paper proposes a deep convolutional neural network solution to the analysis of image data for the detection of rail surface defects, and compares the results of different network architectures characterized by different sizes and activation functions.
Abstract
In this paper, we propose a deep convolutional neural network solution to the analysis of image data for the detection of rail surface defects. The images are obtained from many hours of automated video recordings. This huge amount of data makes it impossible to manually inspect the images and detect rail surface defects. Therefore, automated detection of rail defects can help to save time and costs, and to ensure rail transportation safety. However, one major challenge is that the extraction of suitable features for detection of rail surface defects is a non-trivial and difficult task. Therefore, we propose to use convolutional neural networks as a viable technique for feature learning. Deep convolutional neural networks have recently been applied to a number of similar domains with success. We compare the results of different network architectures characterized by different sizes and activation functions. In this way, we explore the efficiency of the proposed deep convolutional neural network for detection and classification. The experimental results are promising and demonstrate the capability of the proposed approach.

read more

Content maybe subject to copyright    Report

Delft University of Technology
Deep convolutional neural networks for detection of rail surface defects
Faghih Roohi, Shahrzad; Hajizadeh, Siamak; Nunez, Alfredo; Babuska, Robert; De Schutter, Bart
DOI
10.1109/IJCNN.2016.7727522
Publication date
2016
Document Version
Accepted author manuscript
Published in
Proceedings 2016 International Joint Conference on Neural Networks (IJCNN)
Citation (APA)
Faghih Roohi, S., Hajizadeh, S., Nunez, A., Babuska, R., & De Schutter, B. (2016). Deep convolutional
neural networks for detection of rail surface defects. In P. A. Estevez, P. P. Angelov, & E. Del Moral
Hernandez (Eds.),
Proceedings 2016 International Joint Conference on Neural Networks (IJCNN)
(pp.
2584-2589). IEEE . https://doi.org/10.1109/IJCNN.2016.7727522
Important note
To cite this publication, please use the final published version (if applicable).
Please check the document version above.
Copyright
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent
of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Takedown policy
Please contact us and provide details if you believe this document breaches copyrights.
We will remove access to the work immediately and investigate your claim.
This work is downloaded from Delft University of Technology.
For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.

Deep Convolutional Neural Networks for
Detection of Rail Surface Defects
Shahrzad Faghih-Roohi
, Siamak Hajizadeh
, Alfredo N
´
u
˜
nez
, Robert Babuska
and Bart De Schutter
Delft Center for Systems and Control, Delft University of Technology
Mekelweg 2, 2628 CD Delft, The Netherlands
Email: S.Faghihroohi@tudelft.nl; R.Babuska@tudelft.nl; B.DeSchutter@tudelft.nl
Section of Railway Engineering, Delft University of Technology
Stevinweg 1, 2628 CN Delft, The Netherlands
Email: S.Hajizadeh@tudelft.nl; A.A.NunezVicencio@tudelft.nl
Abstract—In this paper, we propose a deep convolutional
neural network solution to the analysis of image data for the
detection of rail surface defects. The images are obtained from
many hours of automated video recordings. This huge amount
of data makes it impossible to manually inspect the images and
detect rail surface defects. Therefore, automated detection of
rail defects can help to save time and costs, and to ensure rail
transportation safety. However, one major challenge is that the
extraction of suitable features for detection of rail surface defects
is a non-trivial and difficult task. Therefore, we propose to use
convolutional neural networks as a viable technique for feature
learning. Deep convolutional neural networks have recently been
applied to a number of similar domains with success. We compare
the results of different network architectures characterized by
different sizes and activation functions. In this way, we explore
the efficiency of the proposed deep convolutional neural net-
work for detection and classification. The experimental results
are promising and demonstrate the capability of the proposed
approach.
I. INTRODUCTION
Feature learning by using deep neural networks has recently
been applied to a variety of computer vision and classification
problems, and has proved successful in many domains. The
classification accuracy over several benchmark vision data
sets, commonly with a large number of samples per class,
has been improved over the shallow classical approaches with
hand-crafted features [1]–[3]. Shallow learning approaches are
based on general assumptions that ignore the characteristics
of the given real data. In comparison to these methods, deep
learning techniques help to move away from hand-crafted fea-
tures design towards automated learning of problem-specific
features, directly from the data. Convolutional neural networks
are based on this strategy. Normally large feature learning
networks such as convolutional neural nets have hundreds
or thousands of parameters, which requires large data sets
for training. In many real-world applications however, not
all the collected big data are good sample data sets of the
target classes. Therefore, the question whether features can
be efficiently learned for classification of such data sets is
important for such applications.
Automated rail defect detection using video cameras is an
example of a problem where the number of target defects in the
available data set is much smaller than the number of healthy
samples. Rail surface defects occur due to different reasons,
for example as a result of fatigue, due to the repetitive passings
of rolling stock over rail components such as welds, joints, and
switches, or because of the impacts from damaged wheels. If
the rail defects grow and are treated late, they may lead to
high maintenance costs. Therefore, automatic and facilitated
detection of defects is important [4]–[6].
Recently, the use of video cameras for the inspection of
rail tracks has become popular [7], [8], due to the error-
prone, costly, and time-consuming process of manual rail
monitoring. Detection models based on both learned (as in
[9]) and predefined features (as in [10]) have been applied
to different aspects of rail defect detection. However, the use
of video cameras for the detection of rail surface defects
caused by rolling contact fatigue (RCF) has been mostly
studied using hand-crafted or predefined features [8], [11].
Other methods based on spatial correlation statistics, gradient-
based, and hand-crafted features have been used in [12]–[14].
Compared to these feature learning methods, convolutional
neural networks use relatively little pre-processing.
In this paper, we present an application of deep convolu-
tional neural networks (DCNNs) for automatic detection of
rail surface defects. Our data resembles that of [8] for visual
inspection of rails. One immediate advantage of using a DCNN
is that unlike [8], we do not have to go into an elaborate
procedure for the extraction of features. We can rather use
raw images as input to the classification model, which is
subsequently optimized using a mini-batch gradient descent
method for the entire network. We compare three DCNNs
with different structures (i.e. different in size and number of
parameters) for their classification accuracy and computation
time.
This paper is organized as follows. In Section II, we review
some related work on rail defect detection. In Section III,
we describe the structure and operation of the convolutional
neural network that is used to detect defects. In Section
IV, we describe our data sets and present the proposed
deep convolutional neural network. Section V presents the
experimental results together with a comparison of different
training strategies. Section VI concludes the paper with a brief
discussion.
Accepted Author Manuscript. Link to published article (IEEE): http://dx.doi.org/10.1109/IJCNN.2016.7727522
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/
republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted
component of this work in other works.

II. RELATED WORK
In the last decade, different object recognition methods have
been used in the field of rail defect detection. In [4], [15]–[17],
some signal processing techniques such as noise reduction and
wavelet transforms have been used for estimating irregularities
and detecting defects in railway tracks. Unlike signal pro-
cessing, the use of image processing techniques and image
data analysis is a very recent approach for detection of rail
defects. This is due to the new developments in the technology
of video cameras and machine vision for rail monitoring. In
[18] and [19], image processing techniques have been used for
steel defect detection. In [18], a convolutional neural network
is trained on a database of photometric stereo images. By
means of differently colored light-sources illuminating the rail
surfaces, the defects are made visible in a photometric dark-
field setup. Moreover, in [19], a max-pooling convolutional
neural network is applied for steel defect classification. In both
papers, it is indicated that the data sets for training and testing
are quite small, and the training is prone to over-fitting.
Classically in detection from visual data, gradient-based
features such as the histogram of oriented gradients (HoG),
scale-invariant feature transforms (SIFT), spacial pyramids,
and basis functions representations such as Gabor filters are
among the common choices of features (see e.g. [20]). In
recent years, the expansion of deep feature learning schemes
such as deep convolutional neural nets has provided such
applications with a better tool for extracting features that
are specifically tailored for each domain. Deep convolutional
neural nets have been developed rapidly in the field of object
recognition since the breakthrough work of [1]. In [21], [22],
deep convolutional neural networks have been used for rail
fastening condition monitoring. While [21], [22] have focused
on identification of track components such as ballast, concrete,
wood, and fastener, in this paper we focus on the detection and
classification of different defects that occur at the rail surface.
Moreover, we are interested in evaluating different structures
of deep convolutional neural networks that can provide good
accuracy rates on classification of rail defects compared to
classical learning methods.
III. CLASSIFICATION METHOD
A. Deep Convolutional Neural Network
A deep convolutional neural network (DCNN), based on the
classical convolutional neural network proposed by LeCun et
al. [23] consists of three main components:
1) Convolution: A convolution layer is connected to the
next layer in a similar manner as the traditional bipartite
multi-layer neural network, with the key difference that
the weights are shared between sets of connections. Each
set of weight sharing connections forms a filter that is
convoluted with the input data. There are usually several
such filters trained in parallel at each layer. Convolution
filters slide over small local receptive fields of input
image data in image classification applications. Every
filter acts as a feature detector. The result of applying a
convolution across an image forms a feature map.
2) Activation function: This function facilitates the dis-
crimination between image classes by being imposed
on the convolution filter output and performing a non-
linear transformation of a data space. Some examples of
activation functions are the hyperbolic tangent function
(Tanh), the sigmoid function, and rectified linear units
(ReLU) [24].
3) Max-pooling: The feature maps resulting from a con-
volution layer are sub-sampled in a pooling layer. In a
max-pooling layer, the dimensions of the feature maps
are reduced by merging local information (selecting
maximum values) within a neighborhood window [19].
Convolutional layers and max-pooling layers are laid suc-
cessively to create the DCNN architecture. Compared to
shallow architectures, a DCNN has multiple layers that can
represent complex functions with higher efficiency and gener-
alization accuracy.
B. Training Methods
A batch (standard) gradient descent method involves op-
timizing the error over the entire training set. Since this
procedure can be computationally extremely expensive for
a large network, often for DCNN training an approximation
method called mini-batch stochastic gradient descent method
is used [2]. Here the difference is that instead of calculating the
gradient of the error over the entire training set, each iteration
calculates the gradient of the error for a small part of the
samples called the mini-batch. We denote with b and n the
size of mini-batch and the total number of training samples,
respectively. For the mini-batch gradient descent, there are
in total T = n/b iterations per training epoch. The weight
parameters w are therefore obtained through optimization of
the approximated expected value of an error function f defined
as:
E
t
[f(w)] =
1
b
tb
X
i=(t1)b+1
f(w; x
i
) (1)
where t {1, ...T } is the iteration index and x
i
is the ith
training sample. At each iteration the weights are adjusted
using the gradient descent update rule:
w
(t+1)
= w
(t)
µ
w
E
t
[f(w
(t)
)] (2)
with µ being the learning rate.
While batch gradient descent runs through all the samples
in the training set to obtain a single update for w in each
iteration, stochastic gradient descent uses only a single training
sample, and mini-batch gradient descent method uses b (i.e. the
mini-batch size) samples at each iteration. Stochastic gradient
descent and mini-batch gradient descent are computationally
much cheaper than batch gradient descent. Mini-batch gradient
descent can often be as fast as stochastic gradient descent if
appropriate vectorization is applied in computing the deriva-
tive terms of (2). For problems with non-convex objective
functions, stochastic gradient descent has sometimes shown

the ability to escape from local optima where batch gradient
descent is trapped [2]. Therefore, stochastic gradient descent
may perform better in applications such as DCNN training.
IV. IMPLEMENTATION
A. Data description
Our data sets are images of rail tracks that are collected from
a camera with a high frame rate. This camera is mounted on
a measurement vehicle and captures the top view of the rail
tracks. The video data covers approximately 350 kilometers of
track, equivalent to 700 kilometers of rail. Among the collected
frames, we manually labelled 22408 objects as belonging to
1 out of 6 classes (normal, weld, light squat, moderate squat,
severe squat, and joint). The weld class corresponds to those
parts of the track surface where the rails are welded together
to form one continuous rail, and in most of the cases in the
images, these are hardly distinguishable from the normal rail
surface even by the human eye when the weld is in good health
condition. Insulated joints electrically separate two consecutive
track sections with an insulating material that is easily seen in
the images. Squats are a type of surface-initiated track defects
[4]. A sample of images from the different types of squats is
shown in Figure 1. There are different classifications of the
squat types based on their severity and size, but often there
is no rigid distinction between the types, since squats have
a gradual growth process. Our data set contains 985 welds,
938 light squats and smaller trivial defects, 562 moderate and
severe squats, and 755 rail joints. These images are obtained
from the original images of rail tracks, after segmentation
of the track from the ballast and other background textures
surrounding the rails.
B. DCNN Architecture
In this paper, three DCNN structures (small, medium, and
large) are considered. Table I summarizes the information of
each DCNN structure. The parameters of the DCNNs are
determined by implementing each network with various com-
binations of parameters such as the number of feature maps,
the sizes of the filters, the number of layers, and the number
of nodes of fully-connected layers. Then, the parameters that
lead to the highest classification accuracy have been selected
for building the DCNN models. Model parameters such as the
learning rate and the class weights are also adjusted during
the initial test runs. Out of each class, we reserve 10 percent
of the samples for testing and use the remaining 90 percent
for training.
Light
Squat
Normal
Weld
Joint
Severe
Squat
Moderate
Squat
Figure 1. A sample of images of defects and non-defects
Table I
THE STRUCTURES OF THE DEEP CONVOLUTIONAL NEURAL NETWORKS
CONSIDERED IN THIS PAPER
Deep convolutional neural network
Type of layer
Small Medium Large
Convolutional
layer 1
Feature maps 6 10
20
Filter size 17 × 9 9 × 5
9 × 5
Max-pooling
layer 1
Filter size 3 × 3 2 × 2
2 × 2
Convolutional
layer 2
Feature maps 10 20
40
Filter size 8 × 3 9 × 6
9 × 6
Max-pooling
layer 2
Filter size 3 × 3 2 × 2
2 × 2
Convolutional
layer 3
Feature maps - 30
60
Filter size - 4 × 2
4 × 2
Max-pooling
layer 3
Filter size - 2 × 2
2 × 2
Fully-connected
layer 1
Number of
nodes
70 120
320
Fully-connected
layer 2
Number of
nodes
18 30
80
Fully-connected
layer 3
Number of
nodes
- -
8
Estimated number of
parameters (weights)
22000 120600
644200
Figure 2 illustrates the response of the filters trained at the
3rd convolutional layer of the large DCNN. For visualization
purposes, the filters are scaled. A number of filters have
been trained to capture the wave-like patterns of the defects.
This filtering acts very similar to predefined feature extraction
functions such as the Gabor family of functions or the Cosine
transform.
For more details, the structure of the medium DCNN is
illustrated in Figure 3. This DCNN model consists of three
convolutional layers, three max-pooling layers, and three fully-
connected layers. Input images are of size 100 × 50 pixels
with 2 color channels (gray scale). The first convolution layer
takes a normalized image and filters it with kernels of size
Figure 2. Visualization of the response of the last convolutional layer to
random stimuli, in the large DCNN

Input image
100 × 50
10 Feature (F) maps
92 × 46
Convolution
(C)
9 × 5
2 × 2
Max-pooling
(Mp)
10 F maps
46 × 23
C
9 × 6
20 F maps
38 × 18
Mp
2 × 2
C
4 × 2
Mp
2 × 2
30 F maps
8 × 4
30 F maps
16 × 8
20 F maps
19 × 9
Fully-connected
(F)
120
nodes
Output:
6 classes
30
nodes
Normal
Weld
L-squat
M-squat
S-squat
Joint
.
.
.
.
.
.
Figure 3. Architecture of the proposed medium DCNN
9 × 5 pixels. The second convolution layer takes the pooled
feature map of the first layer and filters it with kernels of size
9 × 6 pixels. The kernel size of the third convolution layer is
4 × 2 pixels. In this model, max-pooling units of size 2 × 2
pixels are used. We use the hyperbolic tangent function (Tanh)
and rectified linear units (ReLU), respectively, as activation
function. After three convolutional and max-pooling layers,
the high-level reasoning in the convolutional neural network
is performed via fully-connected layers. In this network, we
use two fully-connected layers which have 120 nodes and 30
nodes, respectively. The output of the network classifies the
input image using the 6 classes described in Section IV.A.
V. EXPERIMENT RESULTS
A. Experimental setup
The implementation is based on the framework in Torch
7 [25]. To decrease the unwanted variation due to different
lighting and rail texture conditions, we apply a simple nor-
malization to all samples. The learning rate (µ) is initially
set to 10
3
with a decay factor of 10
5
to help avoid over-
fitting to the training data. From the results acquired in our
parameter adjustment runs, we set the mini-batch size (b) to
8 for all three DCNNs. Then, we train each network over
40 epochs. For each test we shuffle all the existing samples
and divide each class into 10 sets. For 10 rounds of cross-
validation, we test each time on one out of 10 sets and use
the rest for training. For the testing, we randomly under-
sample the normal class to 250 test samples per round. This
is done due to the huge imbalance of the class sizes, which
can severely bias the test results if all normal test samples are
evaluated. The final results are averaged over the 10 rounds.
In order to convert the multi-class classification results to the
binary classification of normal samples versus anomalies, we
simply regard all the non-normal classes as one and compute
the numbers of true positives (T P), true negatives (T N ),
false positives (F P ), and false negatives (F N). The binary
classification accuracy: (T P + T N)/(T P +T N +F P + F N )
and F1-score: 2T P/(2T P +F P +F N ) are then defined based
on the reduced classification matrices.
B. Classification results
We report two types of results from the experiments.
Initially the networks are trained to classify the data into 6
classes. We compare the confusion matrices of the classifi-
cation results trained on the three structures (small, medium,
and large DCNN). Then, we retrain the networks using both
Tanh and ReLU activation functions to classify samples into
3 classes. The first class represents the normal rail that has
both weld and normal samples. The second class contains all
types of small defects and squats, and finally the third class
only consists of rail joints.
The confusion matrices of the classification results trained
on the three DCNN are presented in Tables II-IV. The rows
of the matrices correspond to the correct classes and the
columns correspond to the predicted classes. In Table III for
instance, the percentage of correctly classified severe squats
is 48.13, while 33.12 percent of the actual severe squats are
classified in the medium squat class. Similarly, in Table IV,
the percentage of correctly classified welds is 61.95 percent,
while 30.97 percent of the actual welds are classified in
the normal class. This false classification value is relatively
high compared to the total number of welds. The comparison
shows that in the assessment of the binary and multi-class
classification accuracy, the two classes of normal and weld
should be integrated into one class (Normal). Also, all three
classes of light, medium, and severe squat should be combined
together as a defect class (Defect). As a result, there are
Table II
CONFUSION MATRIX OF THE SMALL DCNN (%)
Normal Weld L-squat M-squat S-squat Joint
Normal 94.62 3.06 1.82 0.19 0.08 0.23
Weld 31.85 59.62 3.20 1.55 0.29 3.49
L-squat 21.09 3.27 66.93 7.42 0.10 1.19
M-squat 5.95 3.81 27.62 53.57 7.62 1.43
S-squat 6.25 5.62 2.50 37.50 44.38 3.75
Joint 3.13 5.75 1.12 1.75 2.12 86.13
Table III
CONFUSION MATRIX OF THE MEDIUM DCNN (%)
Normal Weld L-squat M-squat S-squat Joint
Normal 95.74 1.94 1.78 0.11 0.04 0.39
Weld 29.61 63.01 3.11 0.88 0.19 3.20
L-squat 22.47 3.17 65.05 7.72 0.30 1.29
M-squat 8.81 2.86 22.86 56.19 6.90 2.38
S-squat 8.12 4.37 3.13 33.12 48.13 3.13
Joint 2.25 5.50 1.62 1.00 1.00 88.63
Table IV
CONFUSION MATRIX OF THE LARGE DCNN (%)
Normal Weld L-squat M-squat S-squat Joint
Normal 96.32 1.82 1.32 0.15 0.08 0.31
Weld 30.97 61.95 2.52 0.68 0.29 3.59
L-squat 22.08 3.17 64.36 9.40 0.10 0.89
M-squat 6.90 3.33 24.76 56.67 7.15 1.19
S-squat 8.12 2.50 3.13 34.37 50.00 1.88
Joint 2.50 4.37 1.00 1.75 2.00 88.38

Citations
More filters
Posted Content

Deep Learning for Anomaly Detection: A Survey.

TL;DR: A structured and comprehensive overview of research methods in deep learning-based anomaly detection, grouped state-of-the-art research techniques into different categories based on the underlying assumptions and approach adopted.
Journal ArticleDOI

Segmentation-based deep-learning approach for surface-defect detection

TL;DR: A segmentation-based deep-learning architecture that is designed for the detection and segmentation of surface anomalies and is demonstrated on a specific domain of surface-crack detection.
Journal ArticleDOI

Deep Architecture for High-Speed Railway Insulator Surface Defect Detection: Denoising Autoencoder With Multitask Learning

TL;DR: Experiments of the catenary insulator defect detection along the Hefei–Fuzhou high-speed railway line indicate that the system can achieve high detection accuracy.
Journal ArticleDOI

Remaining useful lifetime prediction via deep domain adaptation

TL;DR: A new data-driven approach for domain adaptation in prognostics using Long Short-Term Neural Networks (LSTM) is proposed that uses a time window approach to extract temporal information from time-series data in a source domain with observed RUL values and a target domain containing only sensor information.
Journal ArticleDOI

A CNN-Based Defect Inspection Method for Catenary Split Pins in High-Speed Railway

TL;DR: A three-stage automatic defect inspection system for SPs mainly based on an improved deep convolutional neural network (CNN), which is called PVANET++, which is superior to others in accuracy, and has a considerable speed.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI

Backpropagation applied to handwritten zip code recognition

TL;DR: This paper demonstrates how constraints from the task domain can be integrated into a backpropagation network through the architecture of the network, successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What have the authors contributed in "Delft university of technology deep convolutional neural networks for detection of rail surface defects" ?

In this paper, the authors propose a deep convolutional neural network solution to the analysis of image data for the detection of rail surface defects. Therefore, the authors propose to use convolutional neural networks as a viable technique for feature learning. In this way, the authors explore the efficiency of the proposed deep convolutional neural network for detection and classification. The experimental results are promising and demonstrate the capability of the proposed approach. 

With this in mind, exploring a deep learning approach that would be general enough to be used for automatic detection of other types of rail defects is their immediate future work of interest. In particular, the authors will explore the use of auto-encoders and other deep networks for this purpose. 

Unlike signal processing, the use of image processing techniques and image data analysis is a very recent approach for detection of rail defects. 

Classically in detection from visual data, gradient-based features such as the histogram of oriented gradients (HoG), scale-invariant feature transforms (SIFT), spacial pyramids, and basis functions representations such as Gabor filters are among the common choices of features (see e.g. [20]). 

After three convolutional and max-pooling layers, the high-level reasoning in the convolutional neural network is performed via fully-connected layers. 

With the proposed small, medium, and large DCNNs, the rail defect classes can be successfully classified with almost 92% accuracy. 

While [21], [22] have focused on identification of track components such as ballast, concrete, wood, and fastener, in this paper the authors focus on the detection and classification of different defects that occur at the rail surface. 

In [4], [15]–[17], some signal processing techniques such as noise reduction and wavelet transforms have been used for estimating irregularities and detecting defects in railway tracks. 

In order to convert the multi-class classification results to the binary classification of normal samples versus anomalies, the authors simply regard all the non-normal classes as one and compute the numbers of true positives (TP ), true negatives (TN ), false positives (FP ), and false negatives (FN ). 

Deep convolutional neural nets have been developed rapidly in the field of object recognition since the breakthrough work of [1]. 

Among the collected frames, the authors manually labelled 22408 objects as belonging to 1 out of 6 classes (normal, weld, light squat, moderate squat, severe squat, and joint). 

From the performance results of the DCNN models, the authors conclude that the large DCNN model performs better for the classification task than the small and medium DCNN model, although the network training takes a longer time. 

The weight parameters w are therefore obtained through optimization of the approximated expected value of an error function f defined as:Et[f(w)] = 1b tb∑ i=(t−1)b+1 f(w;xi) (1)where t ∈ {1, ...T} is the iteration index and xi is the ith training sample. 

This function facilitates the discrimination between image classes by being imposed on the convolution filter output and performing a nonlinear transformation of a data space.