scispace - formally typeset
Open AccessJournal ArticleDOI

Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results From the MICCAI 2015 Endoscopic Vision Challenge

TLDR
Results show that convolutional neural networks are the state of the art in polyp detection and it is also demonstrated that combining different methodologies can lead to an improved overall performance.
Abstract
Colonoscopy is the gold standard for colon cancer screening though some polyps are still missed, thus preventing early disease detection and treatment. Several computational systems have been proposed to assist polyp detection during colonoscopy but so far without consistent evaluation. The lack of publicly available annotated databases has made it difficult to compare methods and to assess if they achieve performance levels acceptable for clinical use. The Automatic Polyp Detection sub-challenge, conducted as part of the Endoscopic Vision Challenge ( http://endovis.grand-challenge.org ) at the international conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2015, was an effort to address this need. In this paper, we report the results of this comparative evaluation of polyp detection methods, as well as describe additional experiments to further explore differences between methods. We define performance metrics and provide evaluation databases that allow comparison of multiple methodologies. Results show that convolutional neural networks are the state of the art. Nevertheless, it is also demonstrated that combining different methodologies can lead to an improved overall performance.

read more

Content maybe subject to copyright    Report

1
Comparative Validation of Polyp Detection Methods
in Video Colonoscopy: Results from the MICCAI
2015 Endoscopic Vision Challenge
Jorge Bernal
, Nima Tajkbaksh
, F. Javier Sánchez, Bogdan J. Matuszewski, Hao Chen, Lequan Yu, Quentin
Angermann, Olivier Romain, Bjørn Rustad, Ilangko Balasingham, Konstantin Pogorelov, Sungbin Choi, Quentin
Debard, Lena Maier-Hein, Stefanie Speidel, Danail Stoyanov, Patrick Brandao, Henry Córdova, Cristina
Sánchez-Montes, Suryakanth R. Gurudu, Gloria Fernández-Esparrach, Xavier Dray, Jianming Liang
+
, Aymeric
Histace
+
Abstract—Colonoscopy is the gold standard for colon cancer
screening though still some polyps are missed, thus preventing
early disease detection and treatment. Several computational
systems have been proposed to assist polyp detection during
colonoscopy but so far without consistent evaluation. The lack
of publicly available annotated databases has made it difficult to
compare methods and to assess if they achieve performance levels
acceptable for clinical use. The Automatic Polyp Detection sub-
challenge, conducted as part of the Endoscopic Vision Challenge
(http://endovis.grand-challenge.org) at the international confer-
ence on Medical Image Computing and Computer Assisted
Intervention (MICCAI) in 2015, was an effort to address this
need. In this paper, we report the results of this comparative eval-
uation of polyp detection methods, as well as describe additional
experiments to further explore differences between methods. We
define performance metrics and provide evaluation databases
that allow comparison of multiple methodologies. Results show
that convolutional neural networks (CNNs) are the state of the
art. Nevertheless it is also demonstrated that combining different
methodologies can lead to an improved overall performance.
Index Terms—Endoscopic vision, Polyp Detection, Hand-
crafted features, Machine Learning, Validation Framework
Authors contributed equally to this work,
+
Last position is shared between
the authors.
Jorge Bernal and F. Javier Sánchez are with Computer Science Department
at Universitat Autònoma de Barcelona and Computer Vision Center, Spain
Nima Tajkbaksh and Jianming Liang are with Arizona State Univ., USA
Aymeric Histace, Quentin Angermann, Olivier Romain and Xavier Dray are
with ETIS, ENSEA, Univ. of Cergy-Pontoise, CNRS, Cergy, France. Xavier
Dray is also with Lariboisière Hospital-APHP, France.
Hao Chen and Lequan Yu are with Dpt of Computer Science and Engi-
neering, Chinese University of Hong Kong, China
Bjørn Rustad and Ilangko Balasingham are with Oslo University Hospital.
Bjørn Rustad is also with OmniVision, University of Oslo, Norway.
Konstantin Pogorelov is with Media Performance Group, Simula Research
Laboratory and University of Oslo, Norway
Sungbin Choi is with Seoul National University, Seoul, South Korea
Bogdan J. Matuszewski is with School of Engineering, University of Central
Lancashire, Preston, United Kingdom
Quentin Debard is with University of Nice-Sophia Antipolis, Nice, France
Lena Maier-Hein is with the junior group Computer-assisted Interventions,
German Cancer Research Center (DKFZ), Germany
Stefanie Speidel is with the Institute for Anthropomatics, Karlsruhe Institute
of Technology, Germany
Danail Stoyanov and Patrick Brandao are with the Centre for Medical Image
Computing and Dept. of Computer Science, Univ. College London, UK
Henry Córdova, Cristina Sánchez-Montes and Gloria Fernández-Esparrach
are with Endoscopy Unit, Gastroenterology Department, Hospital Clínic,
IDIBAPS, CIBEREHD, University of Barcelona, Barcelona, Spain
Suryakanth R. Gurudu is with Division of Gastroenterology and Hepatol-
ogy, Mayo Clinic, Scottsdale, Arizona, USA
Copyright (c) 2010 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubs-permissions@ieee.org.
I. INTRODUCTION
This paper introduces the results and main conclusions
of the MICCAI 2015 Sub-Challenge on Automatic Polyp
Detection in Colonoscopy, conducted as part of the Endoscopic
Vision Challenge (http://endovis.grand-challenge.org). More
precisely, we present a validation study comparing the perfor-
mance of different polyp detection methods covering different
methodologies proposed by participating teams, providing an
insight analysis of their detection yield. In this section, we
introduce both clinical and technical contexts.
A. Clinical context
Colorectal cancer (CRC) is the third largest cause of cancer
deaths in the United States among men and women, and it is
expected to have resulted in about 49, 196 deaths in 2016 in the
USA [1]. CRC arises from adenomatous polyps (or adenomas),
that are growths of glandular tissue originating from the
colonic mucosa. Though adenomas are initially benign, they
might become malignant over time and spread to adjacent
and distant organs such as lymph nodes, liver or lungs, being
ultimately responsible for complications and death [2].
CRC prevention is first based on the detection of at-risk
patients: those with symptoms (such as hematochezia and
anemia), those with positive screening tests (such as a fecal
occult blood test or a fecal immunochemical test), and those
with a past history of adenoma or with a family history of
advanced adenoma or CRC. In these groups of patients, a
colonoscopy is proposed to detect polyps before any malignant
transformation or at an early cancer stage. This stage refers
to the most superficial colon layers, with no deep invasion,
and it is associated with a 5-year survival rate over 90% [3],
[1]. If any polyp found is characterized as a likely adenoma,
its removal should be considered to confirm the diagnosis, to
set its histological stage and to confirm its complete removal,
giving clinicians clues to determine the need and timing of the
next colonoscopy [4].
Though colonoscopy is the gold standard for colon screen-
ing, other alternatives, such as CT colonography [5] or wireless
capsule endoscopy (WCE) [6], are also used to search for
polyps. They are less invasive to patients and do not present
perforation risk. Though, as colonoscopy, they require bowel
preparation. Nevertheless in these cases, if a polyp is found,
a colonoscopy must be considered to remove the suspicious

2
lesion. These alternatives have specific limitations that may
affect the outcome of the screening. For instance, CT colonog-
raphy has a low small lesions (5 mm or less) detection rate
due to resolution constraints [7] and it implies using ionising
radiation. WCE allows to detect all kind of lesions but their
observation depends on whether they are recorded during the
progress of the camera through the gastrointestinal tract or
not. Moreover, its diagnostic yield is highly dependent on the
cleanliness of the colon (whereas colonoscopy has some in-
situ lavage capabilities). Last but not least, the analysis of the
information provided by WCE can be highly time-consuming,
as the recorded videos can last up to 8 hours [8].
Colonoscopy presents some drawbacks, polyp miss-rate
being the most important among these. Colonoscopy rarely
misses polyps bigger than 10 mm, but the miss-rate increases
significantly with smaller sized and/or flat polyps [9], [10]. It
has also to be noted that colonoscopies are seldom recorded, so
a new procedure must be performed to revisit explored areas.
The outcome of the colonoscopy exploration depends on: 1)
bowel preparation [11]; 2) specific choice of endoscope and
video processor, affecting image quality and preventing the
use of certain image enhancing tools; 3) clinicians’ skills, as
both endoscopist’s experience and his/her actual concentration
during the intervention may influence the degree of procedure
completion (reaching the cecum or not) and the percentage
of the colon that has been explored [12], [13] and 4) patient-
specific issues, as due to colon movements and the appearance
of folds and angulations during the exploration, some parts of
the colon which may potentially present polyps may not be
reached [9]. Moreover, patients’ personal and family history
can increase the risk of having a polyp and, in this case, the
exploration should be even more thorough.
B. Technical strategies to improve polyp detection rate
Apart from the continuous improvement of clinicians’ skills
through training programs and practice [14], technical efforts
are being undertaken to improve colonoscopy’s outcome. We
clustered them into two groups: improvement of devices and
the development of computational support systems.
Amongst the device improvements, the following should be
highlighted: 1) increase in image resolution and, consequently,
textural information; 2) the use of wide-angle cameras showing
more colon wall surface; 3) the development of zooming and
magnification techniques [15] and 4) the development of new
imaging methodologies such as autofluorescence imaging [16]
or virtual chromoendoscopy (Olympus’ Narrow Band Imaging
[17], Fujinon’s FICE [18] or Pentax’s i-Scan [19]). This last
group of techniques modify how the scene is observed by
improving the contrast of endoluminal scene elements, which
may help in lesion detection and also with in-vivo lesion
diagnosis due to the enhanced visualization of lesion tissues
[20]. These advances have fostered the cooperation between
clinicians and computer scientists in the development and
validation of computer-aided support systems for colonoscopy,
aimed to help clinicians in all stages of CRC diagnosis. A
significant part of this effort has been focused on computer
assisted polyp detection. As it is indicated in [21], cooperation
between technologists and clinicians is essential to develop
clinically useful solutions, with both these groups understand-
ing challenges and limitations in their respective domains.
Automatic polyp detection in colonoscopy videos has been
an active research topic during the last 20 years and several
approaches have been proposed. We present a review of the
most relevant methods in Section II but, to the best of our
knowledge, none of them has been adopted for a routine
patient treatment. There might be several reasons behind this.
First of all, in order for a given method to be clinically
useful, it has to meet real time constraints; e.g. for videos
acquired at 25 frames per second (fps) the maximum time
available to process each image frame should be under 40ms.
Secondly, some of them are built from a theoretical model of
a polyp appearance [14], [22] and therefore limited to only
certain polyp morphologies, which may not translate to the
actual scene where polyp appearance varies greatly. Thirdly,
the majority of methods are mainly focused on the polyps
and they do not consider the presence of other elements such
as folds, blood vessels or the lumen that can affect methods’
performance [14]. Last but not least, some of these methods
have been only trained and tested on selected good quality still
image frames. The lack of temporal coherence and the great
variability in polyp appearance due to camera progression and
visibility conditions might impact their performance in the full
sequences analysis, as they might cause instability in their
response against similar stimuli.
Computational methods also have to deal with additional
colonoscopy-specific challenges. For instance, they should
consider the impact of image artifacts generated due to scene
illumination (specular highlights, overexposed regions) or to
specific configuration of the videoprocessor attached to the
colonoscope, which might overlay information over the scene
view. These artifacts, apart from altering the view of the
scene, might not be stable within consecutive frames and
therefore methods should both compensate their impact on
the individual frame polyp detection and tracking in the full
sequence analysis. Additionally, though an effort is made to
ensure an adequate bowel preparation, some particles may still
appear which, in some cases, could lead to false detections
when isolated or to occlusion leading to miss detection or
localization errors. As mentioned before, these methods have
to cope with a great degree of variability in polyp appearance
which depends on illumination conditions, camera position
and on clinician skills when progressing through the colon.
Finally, available methods have been typically validated on
small and restricted databases, under specific endoscope device
conditions (brand and resolution), in some cases even covering
only one specific polyp type, shape or morphology hindering
their actual performance in a more generic setting.
C. Motivation of the comparison study
Unfortunately, the lack of a common validation framework,
which is a frequent problem in medical and endoscopy image
analysis [21], has limited the effectiveness of the comparison
between existing approaches, making it difficult to determine
which of them could have actual advantage in clinical use.

3
To cope with this, efforts have been made on publishing fully
annotated databases [14], [22] and on organizing challenges
as part of international conferences (ISBI, MICCAI), which
offer a basis to discuss validation strategies.
Considering this and taking inspiration from recent works
on quantitative comparative methods’ analysis in areas such
as laparoscopic 3D Surface Reconstruction [23] or liver seg-
mentation [24], we present in this paper a complete validation
study of polyp detection methods performed as part of the
2015 MICCAI sub-challenge on Automatic Polyp Detection.
This sub-challenge was organized jointly by three research
teams: 1) Computer Vision Center/Universitat Autònoma de
Barcelona and Hospital Clinic from Barcelona, Spain (CVC-
CLINIC); 2) ETIS Lab (ENSEA/CNRS/University of Cergy-
Pontoise) and Lariboisière Hospital-APHP at Paris, France
(ETIS-LARIB), and 3) Arizona State University and Mayo
Clinic, USA (ASU-Mayo).
The objective of this paper is to present a comparative study
of polyp detection methods under a newly proposed validation
framework. This validation framework was firstly introduced
as part of MICCAI 2015 Sub-Challenge on Automatic Polyp
Detection in Colonoscopy and we present in this paper the
results of the mentioned sub-challenge. Beyond this, we also
propose additional experiments to assess even more in-depth
the performance of an automatic polyp detection method.
These new experiments are focused on exploring the actual
clinical applicability of a given method by assessing up to
what extent they are affected by some of the technical and
clinical challenges reported in the literature or whether they
incorporate temporal coherence features or not. Finally we
also go beyond the individual analysis of methods and propose
combination strategies in order to study whether a combination
method may lead to improved individual performance.
The remainder of the paper is structured as follows: In
Section II we present the methods proposed by each of the
participating teams in the challenge, including them in the
context of existing published methods. In Section III we
describe the complete validation framework. Results from the
comparative study are presented in Section IV. Section V
provides an in-depth analysis of the results and discusses
some topics related to challenge organization. Finally, the
concluding remarks are drawn in Section VI.
II. AUTOMATIC POLYP DETECTION METHODS
A. Historical review of computational polyp detection methods
After analyzing approaches reported in the literature, we
propose to cluster methods into three groups: 1) hand-
crafted; 2) end-to-end learning and 3) hybrid approaches.
This taxonomy represents the different historical trends of
polyp detection methods, as in early 2000s, the majority
of the methods used a given texture descriptor to guide
a classification method but, subsequently, some researchers
decided to go for hand-crafted features, aiming at a real time
implementation. As technology evolved and the computational
capabilities increased, techniques such as neural networks that
were developed in the past and abandoned due to excessive
computational cost have now resurfaced.
Regarding hand-crafted methods, the majority are based
on exploiting low-level image processing methods to obtain
candidate polyp boundaries (using Hessian filters in the work
of Iwahori et al. [25], intensity valleys in the work of Bernal et
al. [14] or Hough transform in the work of Silva et al, [26]) and
then use resulting information to define cues unique to polyps.
For instance, the work of Zhu et al. [27] analyzes curvatures
of detected boundaries whereas the method of Kang et al. [28]
is focused on searching ellipsoidal shapes typically associated
with polyps. Finally, the method of Hwang et al. [29] combines
curvature analysis and shape fitting in their strategy.
Concerning end-to-end learning, texture and color informa-
tion were formerly used as descriptors such as in the work of
Karkanis et al. [30] which proposed the use of color wavelets,
the work of Ameling et al. [31] that exploits the use of co-
ocurrence matrices or the work of Gross et al. [32], which
proposed the use of local binary patterns. Active learning
methodologies have also been introduced as in the work
of Angermann et al. [33] to reinforce the tradeoff between
performance and computation time. Some of the most recent
methods use deep learning tools to aid in polyp detection tasks,
as in the work of Park et al. [34] or in the work of Ribeiro et
al. [35]. In these very recent developments, differences among
methods are based on the selection of a specific network
architecture and databases used for training.
Finally, there are several hybrid methods which combine
both methodologies for polyp detection, such as in the works
of Tajbaksh et al. [22], which combines edge detection and
feature extraction to boost detection accuracy, the work of
Bae et al. [36], that propose a system based on imbalanced
learning and discriminative feature learning; the work of Silva
et al. [26], which uses hand-crafted features to filter non-
informative image regions and the work of Ševo et al. [37],
which combines edge density and convolutional networks.
As mentioned in Section I, the great majority of the methods
are tested on private databases though we can observe that
more recent publications such as the work of Park et al. [34]
or the work of Ribeiro et al. [35] have started to use publicly
available databases such as the ones used in the MICCAI
2015 Sub-challenge on Automatic Polyp Detection. Related
to this, apart from new proposals, some of the referenced
methods have been adopted by participants, such as the
works of Bernal et al. [14], Silva et al. [26] or the work of
Tajbaksh et al. [22]. We provide in the next subsection a brief
description of participating methods highlighting their most
relevant contributions to the field. We grouped the methods
following the taxonomy defined earlier in this subsection.
B. MICCAI 2015 Polyp Detection Sub-challenge methods
1) Hand-crafted features:
CVC-CLINIC: This method [14] is based on a model
of appearance considering polyps as protruding surfaces,
being their boundaries defined from intensity valleys
detection. Their proposal includes a pre-preprocessing
stage to mitigate the impact of other valley-rich structures
(blood vessels, specular highlights). To build final energy
maps highlighting polyp presence, four different

4
TABLE I
SUMMARY OF INFORMATION FROM THE TEAMS THAT TOOK PART IN MICCAI 2015 CHALLENGE ON AUTOMATIC POLYP DETECTION.
Team
acronym
Full team details Methodology Published
Still-frame
analysis
Video
analysis
Training
(seconds)
Testing
(seconds)
System tested
ASU Arizona State University (USA) Hybrid Yes [22] No Yes
N/A 2.7
2.4 GHz Intel quad
core processor and an
NVIDIA GeForce GTX
760 video card
CUMED
Department of Computer Science
and Engineering, Chinese Uni-
versity of Hong Kong (China)
End-to-end
learning
(CNNs)
No Yes Yes
10800 0.2
A standard PC with
a 2.50 GHz Intel(R)
Xeon(R) E5-1620 CPU
and a NVIDIA GeForce
GTX Titan X GPU
CVC-CLINIC
Computer Vision Center and
Universitat Autònoma de
Barcelona (Spain)
Hand-crafted Yes [14] Yes Yes
N/A 10
Intel core i7-4790 at
3.6GHz
ETIS-LARIB
ETIS, ENSEA, University of
Cergy-Pontoise, CNRS, Cergy
(France)
Hybrid Yes [26] Yes No
196 2.14
Intel i5 4200U 2.30 GHz
OUS
Oslo University Hospital, OUS
Norway, University of Oslo
(Norway)
End-to-end
learning
(CNNs)
No Yes Yes
86400 5
Intel i5, 4 cores at
2.8 GHz, 4 GB RAM.
Graphic card with 4 GB
memory used for training
PLS
Polyp Localize and Spot Team,
Media Performance Group, Sim-
ula Research Laboratory and
University of Oslo (Norway)
Hybrid No Yes Yes
0.33 per image 0.145
2 Intel(R) Xeon(R) CPU
E5-2650 at 2.00GHz
CPU, 64 GB of RAM,
NVIDIA Corporation
GK110, GeForce GTX
TITAN
SNU
Seoul National University, Seoul
(South Korea)
End-to-end
learning
(CNNs)
No Yes Yes
360 0.8-1
NVIDIA TITAN X GPU
UNS-UCLAN
School of Engineering, Uni-
versity of Central Lancashire,
Preston (UK) and University
of Nice-Sophia Antipolis, Nice
(France)
End-to-end
learning
(CNNs)
No Yes No
18000 5
i7-5930K @ 3.5GHz (6
cores), 64 GB RAM,
NVIDIA GeForce GTX
TITAN X
constraints (continuity, completeness, concavity, and ro-
bustness against spurious structures) are imposed to
candidate boundaries to differentiate polyps from other
structures.
2) End-to-end learning:
CUMED: The architecture of the proposed network con-
tains two sections including a downsampling path and an
upsampling path [38]. The former contains convolutional
and max-pooling layers while the latter contains convo-
lutional and upsampling layers, increasing the resolutions
of feature maps and output prediction masks. To alleviate
the problem of vanishing gradients and encourage the
back-propagation of gradient flow in deep neural net-
works, the auxiliary classifiers are injected to train the
network. Furthermore, they can serve as regularization
to reduce over-fitting and improve the discriminative
capability of features in intermediate layers [39], [40].
The classification layer, after fusing multi-level contex-
tual information, produces the detection results. Network
training is formulated as a pixel-wise classification prob-
lem with respect to ground-truth masks. The highlight of
this approach is that it explores multi-level feature repre-
sentations with fully CNNs in an end-to-end way, taking
an image as input and directly providing the score map.
In addition, feature-rich hierarchies from a large scale
auxiliary dataset are transferred into the model to reduce
over-fitting and further boost detection performance [41].
UNS-UCLAN: This method, inspired by reported works
[42], [43], [44], uses three CNNs trained at different
image scales, namely 1, 0.5, and 0.25, of the original
training images. For all the scales the CNNs use the
same architecture, but they are trained independently on
the RGB images at their corresponding scale. After this
initial training phase, the last fully connected part of each
CNN is removed and the outputs from the ’convolutional
part’ of all the three networks are fed as input to a single
Multi-Layer Perceptron (MLP) network. This additional
network is trained independently from the three CNNs.
In this approach CNNs are used as feature extraction
engines operating at different spatial scales, and the MLP
performs the classification based on these features.
The method’s output is the polyp incidence probability
map, which is then processed to locate dominant prob-
ability peaks, as peaks locations and probability values
are returned as the final output of the system. The
training was performed exclusively on the CVC-CLINIC
database.
OUS: This method is based on the popular AlexNet
model [44] for CNNs and its slight modification Caf-
feNet, which is pre-trained on the ILSVRC 2012 [45]
dataset. Computations are achieved using the Caffe li-
brary [46]. The original model is modified to take input
patches of size 96 × 96, and the kernel size of the
two first pooling layers is decreased from 3 to 2, while
the last pooling layer is removed. The output layer is
modified to give two outputs, polyp or non-polyp. In order
to increase the training examples, data augmentation is
performed in the form of random mirroring, rotation, up-
and down-scaling, cropping, and brightness adjustment.
Final polyp presence or absence was determined by using
a sliding-window strategy, with three scalings for still
frame analysis and two for full video sequence analysis.
SNU: This methodology proposes a two-step approach:
detection and localization. For both steps, CNNs were

5
used. Starting from GoogleNet (pre-trained on the Ima-
geNet dataset), a CNN fine-tuning was performed. Input
image is resized to 224x224 pixels prior training and data
augmentation (rotation and scaling) is also performed.
Training set images are augmented by using several
degrees of random rotation and scaling. Detection is con-
sidered as a simple binary classification task whereas, for
localization, CNN are applied on polyp-positive images
which are then segmented into a uniform-sized 8x8 grid
(64 grids per image). Then, for each image, one grid is
overlaid in black and then CNNs are applied thereafter
to perform the binary classification task. The 64 overlaid
grid images are then sorted by classification score to
calculate final polyps’ position.
3) Hybrid approaches:
PLS: The proposed full localization scheme consists of
two parts, detection and localization. Regarding detec-
tion, two sets of images, one containing polyps, and
the other without polyps, are used for training. Global
image features [47] are used as they are easy and fast
to calculate. Based on similarity scores between input
frame and training ones and results ranks, the detection
subsystem decides in real-time to which class (polyp or
no polyp) the input frame belongs to.
The localization scheme is implemented as a sequence of
preprocessing filters (RGB to YCbCr color space conver-
sion, removal of borders and sub-images, flare masking
and low-pass filtering) and uses the polyp’s physical
shape to find its exact position, approximating polyps
by elliptical shape regions presenting local features that
differentiate them from surrounding tissues. The final
decision regarding polyp location is taken by means of the
maximum values in the energy map computed using the
elliptical shape of the polyp’s usual appearance. Finally,
the method outputs four possible locations per frame.
ETIS-LARIB : This method [26] is inspired by the
psycho-visual methodology used by clinicians when per-
forming an endoscopic examination. First, a detection of
the Regions of Interests (ROI) that may contain a polyp
is performed using shape and size image features. This
first pre-selection allows a first and fast scanning of the
image. Due to being circular/elliptical shapes associated
to polyps, a Hough transform was used for this first
filtering stage. Once ROIs are detected, a second analysis,
based on texture is achieved in order to remove those
ROIs with no actual polyp content. To achieve this, an
ad-hoc classifier based on a boosting-based learning pro-
cess using texture features computed from co-ocurrence
matrices (standard Haralick features) is proposed.
ASU: This method [22] consists of two stages. In the first
stage, a set of polyp candidates is generated using geo-
metric features. Specifically, given a colonoscopy frame,
a crude set of edge pixels is first obtained. This edge map
is then refined using a classification and feature extraction
scheme [48]. The goal of the edge classification scheme is
to remove as many non-polyp boundary pixels as possible
from the initial edge map. The geometry of the retained
edges is then used in a voting scheme that localizes
polyps candidates as objects with curved boundaries
in the refined edge maps. The voting scheme further
estimates a bounding box for each generated candidate
based on the generated voting map. In the second stage,
an ensemble of CNNs -each of them specialized in one
type of features- is applied to each candidate bounding
box [49]. Finally, the outputs of the CNNs are averaged to
generate a confidence score for a given polyp candidate.
Table I shows a summary of the different methods partic-
ipating at MICCAI 2015 Challenge on Automatic Polyp De-
tection. As each method was tested under different conditions,
computation times are given to complete the information on
the training and testing processes.
III. VALIDATION STUDY
We introduce in this section the complete validation study
proposed to assess and compare the performance of different
polyp detection methods.
A. Definitions and general performance metrics
We define Polyp detection as the capability of a given
method to determine polyp presence in a colonoscopy frame
(Polyp presence detection) and, once this is determined, it
is able to provide the location of the polyp within the image
(Polyp localization). Consequently, a good polyp detection
method should select images (video frames) containing polyps
and ignore all others and it should indicate the position of all
polyps present in an image. There are some terms defined
next which are key to set performance metrics. As we deal
with images from real patients examinations, we will find two
different cases: images with polyps and images without polyps.
In the first case, if detection output is within the polyp,
the method is said to be providing a True Positive (TP)
or correct alarm. It has to be noted that only one TP will
be considered per polyp, no matter how many detections fall
within the polyp. Any detection that falls outside the polyp is
considered a False Positive (FP) or false alarm. The absence
of alarm in images with a polyp is considered a False Negative
(FN), counting one per each polyp in the image that has not
been detected. Regarding images without polyps, we define as
a True Negative (TN) whenever the method does not provide
any output for this particular image. Any detection provided
for frames without a polyp counts as a False Positive (FP).
Considering these definitions, we propose the use of the frame-
based performance metrics presented in Table II.
TABLE II
PERFORMANCE METRICS FOR POLYP DETECTION.
Metric Abbreviation Calculation
Precision Prec
P rec =
T P
T P +F P
Recall Rec
Rec =
T P
T P +F N
Specificity Spec
Spec =
T N
F P +F N
F1-measure F1
F 1 =
2×P rec×Rec
P rec+Rec
F2-measure F2
F 2 =
5×P rec×Rec
4×P rec+Rec

Figures
Citations
More filters
Journal ArticleDOI

MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation.

TL;DR: This work develops a novel architecture, MultiResUNet, as the potential successor to the U-Net architecture, and tests and compared it with the classical U- net on a vast repertoire of multimodal medical images.
Journal ArticleDOI

Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study.

TL;DR: In a low prevalent ADR population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected, as well as an increase inThe rate of hyperplastic polyps.
Journal ArticleDOI

Deep Learning Localizes and Identifies Polyps in Real Time With 96% Accuracy in Screening Colonoscopy.

TL;DR: The ability of computer-assisted image analysis using convolutional neural networks (CNNs; a deep learning model for image analysis) to improve polyp detection, a surrogate of ADR, is tested and could increase the ADR and decrease interval colorectal cancers but requires validation in large multicenter trials.
Journal ArticleDOI

Deep learning for image-based cancer detection and diagnosis − A survey

TL;DR: The survey provides an overview on deep learning and the popular architectures used for cancer detection and diagnosis and presents four popular deep learning architectures, including convolutional neural networks, fully Convolutional networks, auto-encoders, and deep belief networks in the survey.
Journal ArticleDOI

Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy

TL;DR: A deep-learning algorithm can detect polyps in the colon in real time and with high sensitivity and specificity, according to validation studies with prospectively collected images and videos from colonoscopies performed in 1,138 patients.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Journal ArticleDOI

Cancer statistics, 2008.

TL;DR: This report examines cancer incidence, mortality, and survival by site, sex, race/ethnicity, education, geographic area, and calendar year, as well as the proportionate contribution of selected sites to the overall trends.
Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What contributions have the authors mentioned in the paper "Comparative validation of polyp detection methods in video colonoscopy: results from the miccai 2015 endoscopic vision challenge" ?

In this paper, the authors report the results of this comparative evaluation of polyp detection methods, as well as describe additional experiments to further explore differences between methods. The authors define performance metrics and provide evaluation databases that allow comparison of multiple methodologies. 

More precisely, future studies should tackle some of the issues detected such as the variability in source data resolution and size and should aim to cover all different polyp morphological types. This may result in, apart from a more complete analysis, a deeper understanding on how each method works and in which scenarios each of them show the most benefit, thinking of potential optimized combinations of them to finally build up a clinically useful method. 

Considering the scope of the analysis presented in the paper,the metric that will be used to compare different methods willbe F1-score, as it presents a balance between missed polypsand false alarms. 

The most straightforward conclusion from this experiment is that image quality matters, as methods’ performance decrease when only bad quality images are considered. 

The main result of this comparative study is that methods including some degree of machine learning outperform classic hand-crafted methods, specially regarding specificity scores in non-polyp videos. 

The lack of temporal coherence and the greatvariability in polyp appearance due to camera progression andvisibility conditions might impact their performance in the fullsequences analysis, as they might cause instability in theirresponse against similar stimuli. 

The analysis of sequences without polyp frames shows that PLS offers the best performance, which is possibly due to the presence of a11specific polyp presence module in this approach. 

There are some image challenges that generally seem to make polyp frames detection difficult such as the presence of overlay information and overexposed regions, with the latter being more prevalent in the explored images. 

teams could also provide a confidence value (value between 0 and 1) for the performance curves drawing purposes, though this was not mandatory. 

The main feature that a clinically applicable system should have is that it should detect all polyps regardless16their appearance (high detection rate (DR), measured as the percentage of polyps detected in at least one frame out of the total of polyps present in the testing videos). 

The authors can also observe how methods tend to provide a higher number of false alarms for good quality images, which the authors interpret as a result of structures likely to be confused with polyps being better visually defined. 

With respect to polyp frames, the first conclusion to be extracted is that low visibility images and the presence of specular highlights within the polyp affect all methods in the same way. 

For the sake of statistical representativeness of the results, the authors did not perform the same experiment for ETIS-LARIB database due to its smaller size. 

To account for differences in performance related to polyp morphology the authors will use Precision, Recall and F1 scores as defined in Table II. 

In order to provide these curves for all teams, confidence values should have been provided; in this case, only one team per subcategory (UNS-UCLAN in still-frame analysis and ASU-Mayo for full video analysis) provided this information whereas the17rest only provided what the authors assume are results obtained using the best configuration of each particular method.