scispace - formally typeset
Open AccessJournal ArticleDOI

An End-to-End Steel Surface Defect Detection Approach via Fusing Multiple Hierarchical Features

TLDR
This paper proposed a novel defect detection system based on deep learning and focused on a practical industrial application: steel plate defect inspection and employs a baseline convolution neural network to generate feature maps at each stage, and the proposed multilevel feature fusion network (MFN) combines multiple hierarchical features into one feature, which can include more location details of defects.
Abstract
A complete defect detection task aims to achieve the specific class and precise location of each defect in an image, which makes it still challenging for applying this task in practice. The defect detection is a composite task of classification and location, leading to related methods is often hard to take into account the accuracy of both. The implementation of defect detection depends on a special detection data set that contains expensive manual annotations. In this paper, we proposed a novel defect detection system based on deep learning and focused on a practical industrial application: steel plate defect inspection. In order to achieve strong classification ability, this system employs a baseline convolution neural network (CNN) to generate feature maps at each stage, and then the proposed multilevel feature fusion network (MFN) combines multiple hierarchical features into one feature, which can include more location details of defects. Based on these multilevel features, a region proposal network (RPN) is adopted to generate regions of interest (ROIs). For each ROI, a detector, consisting of a classifier and a bounding box regressor, produces the final detection results. Finally, we set up a defect detection data set NEU-DET for training and evaluating our method. On the NEU-DET, our method achieves 74.8/82.3 mAP with baseline networks ResNet34/50 by using 300 proposals. In addition, by using only 50 proposals, our method can detect at 20 ft/s on a single GPU and reach 92% of the above performance, hence the potential for real-time detection.

read more

Content maybe subject to copyright    Report

An End-to-End Steel Surface Defect Detection
Approach via Fusing Multiple
Hierarchical Features
Yu He, Kechen Song , Qinggang Meng , and Yunhui Yan
Abstract A complete defect detection task aims to achieve
the specific class and precise location of each defect in an image,
which makes it still challenging for applying this task in practice.
The defect detection is a composite task of classification and
location, leading to related methods is often hard to take into
account the accuracy of both. The implementation of defect
detection depends on a special detection data set that contains
expensive manual annotations. In this paper, we proposed a novel
defect detection system based on deep learning and focused on a
practical industrial application: steel plate defect inspection. In
order to achieve strong classification ability, this system employs a
baseline convolution neural network (CNN) to generate feature
maps at each stage, and then the proposed multilevel feature
fusion network (MFN) combines multiple hierarchical features
into one feature, which can include more location details of
defects. Based on these multilevel features, a region proposal
network (RPN) is adopted to generate regions of interest (ROIs).
For each ROI, a detector, consisting of a classifier and a bounding
box regressor, produces the final detection results. Finally, we set
up a defect detection data set NEU-DET for training and
evaluating our method. On the NEU-DET, our method achieves
74.8/82.3 mAP with baseline networks ResNet34/50 by using
300 proposals. In addition, by using only 50 proposals, our
method can detect at 20 ft/s on a single GPU and reach 92% of the
above performance, hence the potential for real-time detection.
Index Terms Automated defect inspection (ADI), defect
detection dataset (NEU-DET), defect detection network (DDN),
multilevel-feature fusion network (MFN).
I. INT RODUCTION
D
EFECT inspection is a crucial step to guarantee the
quality of industrial production, especially for steel
plates. However, this process is usually performed manually
This work was supported in p
art by the National Natural Science
Foundation of China under Grant 51805078 and Grant 51374063, in part
by the National Key Research and Development Program of China under
Grant 2017YFB0304200, in part by the Fundamental Research Funds for the
Central Universities under Grant N170304014 and Grant N150308001, and in
p
art by the China Scholarship Council under Grant 201806085007. The
Associate Editor coordinating the review process was Emanuele Zappa.
(Corresponding authors: Kechen Song; Yunhui Yan.)
Y. He, K. Song, and Y. Yan are with the School of Mechanical Engineering
and Automation, Northeastern University, Shenyang 110819, China, and
also with the Key Laboratory of Vibration and Control of Aero-Propulsion
Systems, Ministry of Education of China, Northeastern University, Shenyang
110819, China (e-mail: heyu142616@gmail.com; songkc@me.neu.edu.cn;
yanyh@mail.neu.edu.cn).
Q. Meng is with the Department of Computer Science, Loughborough
University, Loughborough LE11 3TU, U.K. (e-mail: q.meng@lboro.ac.uk).
Fig. 1. Defect classification and defect detection task. (a) Defect classification
task aims to “What, only outputting a defect class score. (b) Defect detection
task aims to “What” and “Where, outputting a bounding box with a defect
class score.
Fig. 2. Complicated defects. (a) Multiple defects. The yellow boxes indicate
the defects belong to an identical class. (b) Multiclass defects. The red and
blue boxes indicate the defects of different classes. (c) Overlapping defects.
The pink box surrounds an overlapping region of defects of different classes.
in industry, which is unreliable and time-consuming. In order
to replace the manual work, it is desirable to allow a machine
to automatically inspect surface defects from steel plates with
the use of computer vision technologies.
The founder of computer vision, British neurophysiologist
Marr, considers that a vision task can be defined as “What is
Where” that is the process of discovering what presents in an
image and where is it [1]. Therefore, the object classification
and detection are the most fundamental problems in the field
of computer vision research [2]. Similarly, the automated
defect inspection (ADI) can also be divided into two types:
defect classification and defect detection. Given a defect
image, the defect classification task is to solve if this image
contains some class of defect [Fig. 1(a)], and the defect
detection task is to solve where a defect exists in this image,
represented by a bounding box with a class score [Fig. 1(b)].
Therefore, a complete defect detection task consists of two
parts: defect classification, determining specific categories of
defects, and defect localization, obtaining detailed regions of
defects. For defect inspection o n steel plates, the detection task
has superio r advantages to comp licated defects, e.g ., multiple
defects [Fig. 2(a)], multiclass defects [Fig. 2(b)], and overlap-
ping defects [Fig. 2(c)]. The classification task can only find

Fig. 3. Different styles of obtaining a defect region. (a) Many previous
detectors based on hand-craft features directly combine related spatial cells
into a block through various special approaches. The block is regarded as a
detection region, which is a coarse box without refining. (b) Detectors based
on DL mainly use regression methods to refine a predicting box. Through a
large amount of iterative learning, the predicting box is gradually close to the
groundtruth box. Finally, the refined box is regarded as the bounding box of
the defect, which can represent the precise location information of the defect.
the defect with the highest category confidence in an image
and not know the number of defects shown in Fig. 2(a), classes
of defects shown in Fig. 2(b), and emerge of an overlapping
defect shown in Fig. 2(c) . However, for the follow-up quality
assessment system, the quantity, category, and complexity of
defects would be served as the chief indicators to evaluate the
quality of a steel plate. It is apparent that defect detection can
achieve a more comprehensive information reflection of a steel
plate surface.
The previous ADI methods have two common problems:
the one is the unclear usage of hand-craft features [3]–[5]. The
determination of features is too subjective, and thereby human
experience usually plays a decisive role in it. The other prob-
lem is imprecise defect localization [Fig. 3(a)]. Most methods
only perform defect classification [6]–[8] or an incomplete
defect detection. For example, some methods perform binary
classification to find the regions of defects [9], [10] or only
provide a coarse region of a defect [11], [12]. The recent
developed deep learning (DL) technology can overcome the
drawbacks of traditional ADI methods and have achieved
significant results on many vision tasks. The DL can extract
discriminative representations through a deep network [e.g.,
a convolution neural network (CNN)]. These representations
can reach a high level of abstract and therefore have strong
representation ability. The hand-craft features, by contr ast, are
merely the combination of low-level features [16]. Moreover,
DL can train on location-annotated samples to obtain p recise
location informa tion.
At present, some studies have already applied DL for ADI.
However, most methods can only perform defect classification
due to the lack of special data sets [18]–[21]. The defect
classification seems to be oversimplify and unable to pro-
vide location information. Other methods use a combination
of DL and traditional image processing to perform defect
detection or segmentation [17]. These methods always use
a DL classifier in parallel with a detector or a segmenter
that based on traditional image processing. This way can
eliminate the need for special training data sets but d amage
the end-to-end characteristic of DL system and lose the
intelligence and generalization to some extent. Unlike the
above-mentioned methods, we attempt to establish an end-
to-end defect detection system for ADI, which can provide
a bounding box with a class score for precisely classifying
and locating a defect [Fig. 3(b)]. A DL-based segmenter like
Mask R-CNN [13] seems to be better for showing the shape
of a defect. However, this kind of segmenter will consume
huge amounts of computation source, which cannot meet the
real-time demand of industrial inspection. Furthermore, it is
highly impracticable for the industry to build a large instance-
level defect segmentation data set, and thereby this kind o f
segmenter is almost impossible to apply. Therefore, it is the
best tradeoff to perform defect detection for ADI at present.
This paper mainly addresses three challenges. First, the
detection system needs strong classification ability. The com-
mon classification problems such as interclass similarity, intr-
aclass difference, and background interference are also present
in ADI [9], [11]. Therefore, we equip a deep network ResNet
into the system as the backbone [23]. As current research
in transfer learning [15], the key to drive large networks is
pretraining on ImageNet [22]. The detection system can gain
strong classification power by training ResNet on enough data.
Second, the challenge of performing defect localization
using CNN features in DL-based methods remains. As we
known, the convolutional layers of CNN can be regarded as
filters, which results in some location details will be gradually
lost when an image flows in the CNN. Usually, DL-based
methods perform localization based on the last convolutional
feature map [14], [28], [34]. Our method is to fuse multi-
ple feature maps. Because the feature maps exhibit diverse
characteristics at each stage of CNNs: the shallow features
have rich information but not discriminative enough, and the
deep features are semantic robustly but lose too many details.
In other fields [34], the Hypernet also uses more features but
they are mainly selected from the latter part of the network.
The proposed multilevel-feature fusion network (MFN) com-
bines the multiple features covering all stages. We address the
detection from the industrial perspective. Since gray images
have less information than color images, the MFN must
include lower level features that are discarded by HyperNet.
Furthermore, the MFN uniforms the size of multiple features
before fusion, which can not only save more details of images
but also use less parameters of models.
Third, in defect detection, data annotation is expensive,
because one has to draw a defect’s bounding box and assign a
class label to it. Recent progress in this field can be attributed
to two factors: 1) ImageNet pretrained models and 2) large
baseline CNNs, which made great progress in DL-based defect
classification [18]–[20]. However, the limited data and expen-
sive annotation still limit the development of d efect detection.
In this paper, we open a defect detection data set NEU-DET
for fine-tuning models. When the DL models have finished
training on a special data set, they can be used to perform the
defect detection task.
This paper establishes an end-to-end ADI system, called
defect detection network (DDN), in an attemp t to overcome
the above-mentioned challenges. The DDN 1) adopts a strong
ResNet in defect classification; 2) proposes the MFN to assem-
ble more location details; and 3) sets up a d efect detection data
set for fine-tuning and reports improvements on it. In more
detail, first, we pretrain the ResNet on the ImageNet and

fine-tune all the models on the NEU-DET. The MFN can
fuse the selected features into a multilevel feature, which has
characteristics covering all the stages of the ResNet. Next,
a region proposal network (RPN) is adopted in proposals
generatio n based on the multilevel features and then the DDN
can output the class scores and the coordinates of bounding
box. Finally, we evaluate the proposed method on NEU-DET
and the results can demonstrate a clear superior to other ADI
methods.
To summarize, the main contributions of this paper are as
follows.
1) The introduction of the end-to-end defect detection
pipeline DDN that integrates the ResNet and the RPN
for precise defect classification and localization.
2) The proposed MFN for fusing multilevel features. Com-
pared with other fusing methods, MFN can combine
the lower level and higher level features, which makes
multilevel features to have more comprehen sive charac-
teristics.
3) A defect detection data set NEU-DET for fine-tuning
networks and a demonstration that the proposed DDN
has a very competitive performance on this data set.
II. R
ELATED WORK
A. Defect Inspection
Generally, a defect classification m ethod includes two parts:
a feature extractor and a classifier. The classic feature extractor
is to obtain hand-craft features such as HOG and LBP,
and they are always followed by a classifier, e.g., SVM.
Therefore, the combination of different feature extractors and
classifiers produces a variety of defect classification meth-
ods. For instance, Song and Yan [3] improve the LBP to
against noise and adopt NNC and SVM to classify defects.
Ghorai et al. [9] is based on a small set of wavelet features
and use SVM to perform defect classification. Different from
above-mentioned two methods, Chu et al. [8] employ a general
feature extractor and enhance SVM. From the perspective of
computer vision, the defect classification task is essentially
defect image classification, which is struggled in complicated
defect images. To solve it, the simple and direct way is to
perform defect localization before defect classification m aking
the inspection task classify on regions of defects instead of a
whole defect image, which is the defect detection task. For
example, the defect detectors in [11] and [12] first perform
a 0–1 classification to judge features whether belong to a
defect class or a nondefect class, and then finds defect regions
based on the boundary of defect-class features, finally perform
different classification methods to determine the specific class
of a defect. In addition, there is another simplified detector
for the requirement of quick detection, which only focuses on
regions of defects but regardless of the defects are in different
categories [10].
However, the DL-based methods differ radically from the
above methods. Hand-craft f eature extractor locally analyses
a single image and extract features. However, CNN is to
construct the representation of all the input data through
a large amount of learning. CNN has fine generalization
and transf erability so that there are some defect inspection
methods based on CNN. For example, Chen and Ho [21]
demonstrate that an object detector like Overfeat [24] can be
transferred to be a defect detector by some means. Similar
to [18] and [19], they demonstrate that using a sequential
CNN to extract features can improve classification accuracy
on defect inspection. Similarly, based on a sequential CNN,
Ren et al. [17] perform an extra defect segmentation task on
classification results to define the boundary of a defect. More-
over, Natarajan et al. [20] employ a deeper neural network
VGG19 for defect classification. With the depth of CNN,
the defect classification accuracy has been further improved.
B. Baseline Networks
There are three popular CNN architectures at present, which
are used as baseline networks for pretraining. The early suc-
cessful networks are based on the sequential pipeline architec-
ture [25], which establish the basic structure of CNN and prove
the importance o f depth of networks. Subsequently, the incep-
tion networks employed modular units, which increase both
the depth and width of a network without the increment of
computational cost [26]. The third type is ResNet using resid-
ual blocks to make networks deeper without overfitting [23].
ResNet is widely applied in various vision tasks, achieving
competitive results with a few parameters.
Choosing a proper baseline network is the key to gain
good results for DL methods. A large network has strong
represent-ability for input data hence the extracted features
at high-abstract level, but there is a great demand for
training data.
C. CNN Detectors
The CNN detectors aim to classify and locate each target
with a bounding box. They are mainly divided into two meth-
ods: one is the region-based method and another is the direct
regression method. The most famous region-based detectors
are the “R-CNN family” [27], [28], [14]. In this framework,
thousands of class-independent region proposals are employed
for detection. Region-based methods are superior in precision
but require slightly more computation. The representative
direct regression methods are YOLO [29] and SSD [30].
They directly divide an image into small grids and then for
each grid predict bounding boxes, which then regressed to
the groundtruth boxes. The direct regression method is fast to
detect but struggles in small instances.
III. D
EFECT DETECTION NETWORK
In this section, the DDN is described in detail (see Fig. 4).
A single-scale image of an arbitrary size is processed by a
CNN, and the convolutional feature maps at each stage of
the ConvNet are produced (ConvNet represents the convo-
lutional par t of a CNN). We extract multiple feature maps
and then aggregate them in the same dimension by using
a lightweight MFN. In this way, MFN features have the
characteristics from several hierarchical levels of ConvNet.
Next, RPN [14] is employed to generate region proposals

Fig. 4. DDN. In a single pass, we extract features from each stage of the Baseline ConvNet, which then fused into a multile vel feature by MFN. RPN is
adopted to generate ROIs based on the multilevel feature. For each ROI, the corresponding multilevel feature is transformed into a fixed-length feature through
the ROI pooling and the GAP layers. Two fc layers process each fixed-length feature and feed into output layers producing two results: a one-of-(C + 1)
defect class prediction (cls) and a refined bounding box coordinate (loc).
[regions of interest (ROIs)] over the MFN feature. Finally,
the MFN feature corresponding to each ROI is transformed
into a fixed-length feature through the ROI pooling [28]
and the global average pooling (GAP) layers. The feature
is fed into two fully connected (fc) layers. One is a one-of-
(C + 1) defect classification layer (“cls”) and the other is a
bounding-box regression layer (“loc”).
The rest of this section introduces the d etails of DDN and
motivates why we need to design MFN into the network for
the defect detection task.
A. Baseline ConvNet Architecture
As we know that pretraining on the ImageNet d ata set is
important to achieve co mpetitive perf ormance, and then this
pretrained model can be fine-tuned on a relatively small defect
data set. In this paper, we select the recent successful baseline
network ResNet as the backbone. ResNet presents several
attractive advantages as follows.
1) ResNet can achieve the state-of-the-art precision with
extremely few parame ters, in comparison with the CNN
of sequential pipeline architecture of the same magni-
tude (ResNet50 vs. VGG16, 0.85 M vs. 138 M para-
meters). It implies that ResNet has lower computational
cost and less probability of overfitting.
2) ResNet uses GAP to process the final convolutional
feature map instead of the dual stacked fc layers, which
can be in a manner of preserving more comprehensive
location information of defects in the image.
3) ResNet has a modularized ConvNet, which is easy to
integrate.
In this paper, we select ResNet34 and ResNet50 as base-
line networks. The detailed structures of both networks
are shown in Table I, and residual blocks are denoted as
{R2, R3, R4, R5}.
B. Produce Multilevel Features
Previous excellent approaches only utilize h igh-level fea-
tures to extract region proposals (like the faster R-CNN extract
proposals upon the last convolutional feature maps). In order
to obtain quality region proposals, single-level features should

TABLE I
A
RCHITECTURE OF BASELINE NETWORKS
be extended to multilevel features. Obviously, the simplest
way is to assemble feature maps from m ultiple layers [31].
Therefore, now comes the question, which layers should be
combined? There are two essential conditions: nonadjacent,
because adjacent layers have highly local correlation [32], and
coverage, including features from low level to high level. For
a ResNet, the most intuitive way is to combine the last layers
in each residual block.
To fuse features at different levels, the proposed network
MFN is appended on the pretrained model. MFN has four
branches, denoted as {B2, B3, B4, B5}, and each branch
is a small network. B2, B3, B4, and B5 are sequentially
connected to the last layer of R2, R3, R4, and R5. When
an image flows through the baseline ConvNet, the Ri features
are produced in order. The Ri feature means the feature map
output from the last layer of the residual block Ri, i =
2,...,5. Similarly, the Bi feature is the feature map produced
from the last layer of the MFN batch Bi, i = 2,...,5. Then,
each of Ri features is led to the corresponding branch in MFN
producing Bi features. Finally, multilevel features are obtained
via concatenating the B2, B3, B4, and B5 features, which come
from different stages of a CNN.
As a final note, MFN is efficient in computation and strong
in generalization. MFN can reduce required parameters via
modifying the number of filters of 1 × 1 conv. This operation
may hurt accuracy but prevent overfitting in the case of
insufficient training data.
C. Extract Region Proposals
The RPN is employed to extract region proposals by sliding
on the multilevel feature maps. RPN takes an image of
arbitrary size as input and outputs anchor boxes (candidate
boxes), each with a score representing whether it is a defect
or not. The o riginality of RPN is the “anchor” scheme that
makes anchor bo xes in multiple scales and aspect ratios. Then,
anchor boxes are hierarchically mapped to the input image
so that region proposals of multiple scales and aspect ratios
produced. As a result of the resolution size of MFN feature, the
RPN can be considered as sliding on the R4 feature. Follow
[14], we set three aspect ratios {1:1, 1:2, 2:1}. Considering
multiple sizes of defects, we set four scales {64
2
, 128
2
, 256
2
,
512
2
}. Therefore, RPN produces 12 anchor boxes at each
sliding location.
The region p roposal extractor always ends with an ROI
pooling layer. This layer performs a max-pooling operation
over a feature m ap inside each ROI to convert it into a small
feature vector (512-d for ResNet34 and 2048-d for ResNet50)
with a fixed size of W × H (in this paper, 7 × 7). At last,
based on these small cubes, calculate the offset of each region
proposal with an adjacent g roundtruth box and the probability
whether there exist defects.
For a single image, RPN may extract thousands of region
proposals. To deal with the redundant information, the greedy
nonmaximum suppression (NMS) is o ften applied for elimi-
nating high-overlap region proposals. We set the intersection
over union (IOU) threshold for NMS at 0.7, which can discard
a majority of region proposals. After NMS, the top-K ranked
region proposals are selected from the rest. In the following,
we fine-tune DDN using top-300 region proposals owing to
the extracted quality region proposals, but reduce this number
to accelerate the detection speed without harming accuracy at
test-time.
IV. T
RAINING
A. Multitask Loss Function
The defect detection task can be divided into two subtasks,
hence DDN has two output layers. The cls layer outputs a
discrete probability distribution, k = (k
1
,...,k
C
), for each
ROI over C + 1 categories (C defect categories plus one
background category). As usual, k is computed by a softmax
function. The cls loss L
cls
is a log loss over two classes (defect
or not defect). L
cls
=−log(k, k
) where k
is the groundtruth
class. The loc layer outputs bounding box regression offsets,
t = (t
x
, t
y
, t
w
, t
h
), for each of the C defect categories. As in
[28], the loc loss L
loc
is a smooth L1 loss function. L
loc
=
SmoothL1(t t
) where t
is the groundtruth box associated
with a positive sample. For bounding box regression, we adopt
the parameterizations of t and t
given in [27]
t
x
= (x x
a
)/w
a
, t
y
= (y y
a
)/h
a
t
w
= log(w/w
a
), t
h
= log(h/h
a
)
t
x
=
x
x
a
/w
a
, t
y
=
y
y
a
/h
a
t
w
= log
w
/w
a
, t
h
= log
h
/h
a
(1)
where the subscripts x , y, w,andh denote each box’s center
coordinates and its width and height. The variables x , x
a
,and
x
separately represent the predicted box, anchor box, and
groundtruth box (the same rules for y, w,andh).
With these definitions, we minimize a multitask loss func-
tion, which is defined as
L(k, k
, t, t
) = L
cls
(k, k
) + λp
L
cls
(t, t
) (2)

Figures
Citations
More filters

Computer vision : a modern approach = 计算机视觉 : 一种现代的方法

David Forsyth, +1 more
TL;DR: Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance and describes numerous important application areas such as image based rendering and digital libraries.
Journal ArticleDOI

PGA-Net: Pyramid Feature Fusion and Global Context Attention Network for Automated Surface Defect Detection

TL;DR: A pyramid feature fusion and global context attention network for pixel-wise detection of surface defect, called PGA-Net, which outperforms the state-of-the-art methods on mean intersection of union and mean pixel accuracy.
Journal ArticleDOI

Using Deep Learning to Detect Defects in Manufacturing: A Comprehensive Survey and Current Challenges.

TL;DR: In this paper, a survey of state-of-the-art deep learning methods for defect detection is presented, focusing on three aspects, namely method and experimental results, and the core ideas and codes of studies related to high precision, high positioning, rapid detection, small object, complex background, occluded object detection and object association.
Journal ArticleDOI

EDRNet: Encoder–Decoder Residual Network for Salient Object Detection of Strip Steel Surface Defects

TL;DR: Compared with the existing saliency detection methods, the deeply supervised EDRNet can accurately segment the complete defect objects with well-defined boundary and effectively filter out irrelevant background noise.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What contributions have the authors mentioned in the paper "An end-to-end steel surface defect detection approach via fusing multiple hierarchical features" ?

In this paper, the authors proposed a novel defect detection system based on deep learning and focused on a practical industrial application: steel plate defect inspection. In order to achieve strong classification ability, this system employs a baseline convolution neural network ( CNN ) to generate feature maps at each stage, and then the proposed multilevel feature fusion network ( MFN ) combines multiple hierarchical features into one feature, which can include more location details of defects. In addition, by using only 50 proposals, their method can detect at 20 ft/s on a single GPU and reach 92 % of the above performance, hence the potential for real-time detection. 

The early successful networks are based on the sequential pipeline architecture [25], which establish the basic structure of CNN and prove the importance of depth of networks. 

In the feature, the authors will focus on two directions as follows: the one is data augmentation technology due to the expensive manual annotations in detection data sets. 

There are six types of defects from hot-rolled steel plates, including crazing, inclusion, patches, pitted surface, rolled-in scales, and scratches. 

To solve it, the simple and direct way is to perform defect localization before defect classification making the inspection task classify on regions of defects instead of a whole defect image, which is the defect detection task. 

As the authors know that pretraining on the ImageNet data set is important to achieve competitive performance, and then this pretrained model can be fine-tuned on a relatively small defect data set. 

The training set containing 1260 images used for fine-tuning the network introduced in Section IV-B, and the test set containing 540 images. 

Increasing the number of proposals can get a promising recall, but this will greatly increase the runtime of the detection [38], and what is worse, low-quality proposals would be involved in the process of detection, leading to failure of defect detection in some cases. 

In the following, the authors fine-tune DDN using top-300 region proposals owing to the extracted quality region proposals, but reduce this number to accelerate the detection speed without harming accuracy at test-time.