What is the significance of the metric used to compare different methods?

Considering the scope of the analysis presented in the paper,the metric that will be used to compare different methods willbe F1-score, as it presents a balance between missed polypsand false alarms.

What is the straightforward conclusion from this experiment?

The most straightforward conclusion from this experiment is that image quality matters, as methods’ performance decrease when only bad quality images are considered.

What is the main result of this comparative study?

The main result of this comparative study is that methods including some degree of machine learning outperform classic hand-crafted methods, specially regarding specificity scores in non-polyp videos.

What is the reason why the results of the analysis show that PLS offers the performance?

The analysis of sequences without polyp frames shows that PLS offers the best performance, which is possibly due to the presence of a11specific polyp presence module in this approach.

What are some image challenges that make polyp detection difficult?

There are some image challenges that generally seem to make polyp frames detection difficult such as the presence of overlay information and overexposed regions, with the latter being more prevalent in the explored images.

What was the requirement for the performance curves drawing?

teams could also provide a confidence value (value between 0 and 1) for the performance curves drawing purposes, though this was not mandatory.

What is the main feature that a clinically applicable system should have?

The main feature that a clinically applicable system should have is that it should detect all polyps regardless16their appearance (high detection rate (DR), measured as the percentage of polyps detected in at least one frame out of the total of polyps present in the testing videos).

What can the authors observe about the effect of polyps on methods’ performance?

The authors can also observe how methods tend to provide a higher number of false alarms for good quality images, which the authors interpret as a result of structures likely to be confused with polyps being better visually defined.

What is the main conclusion to be extracted from the study?

With respect to polyp frames, the first conclusion to be extracted is that low visibility images and the presence of specular highlights within the polyp affect all methods in the same way.

Why did the authors not perform the same experiment for ETIS-LARIB database?

For the sake of statistical representativeness of the results, the authors did not perform the same experiment for ETIS-LARIB database due to its smaller size.

What are the three criteria used to account for differences in performance related to polyp morphology?

To account for differences in performance related to polyp morphology the authors will use Precision, Recall and F1 scores as defined in Table II.

How many teams provided the curves for each method?

In order to provide these curves for all teams, confidence values should have been provided; in this case, only one team per subcategory (UNS-UCLAN in still-frame analysis and ASU-Mayo for full video analysis) provided this information whereas the17rest only provided what the authors assume are results obtained using the best configuration of each particular method.

(Open Access) Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results From the MICCAI 2015 Endoscopic Vision Challenge (2017) | Jorge Bernal

Q: What contributions have the authors mentioned in the paper "Comparative validation of polyp detection methods in video colonoscopy: results from the miccai 2015 endoscopic vision challenge" ?

In this paper, the authors report the results of this comparative evaluation of polyp detection methods, as well as describe additional experiments to further explore differences between methods. The authors define performance metrics and provide evaluation databases that allow comparison of multiple methodologies.

Q: What are the future works mentioned in the paper "Comparative validation of polyp detection methods in video colonoscopy: results from the miccai 2015 endoscopic vision challenge" ?

More precisely, future studies should tackle some of the issues detected such as the variability in source data resolution and size and should aim to cover all different polyp morphological types. This may result in, apart from a more complete analysis, a deeper understanding on how each method works and in which scenarios each of them show the most benefit, thinking of potential optimized combinations of them to finally build up a clinically useful method.

Q: What are the main reasons for the lack of coherence in the analysis of polyps?

The lack of temporal coherence and the greatvariability in polyp appearance due to camera progression andvisibility conditions might impact their performance in the fullsequences analysis, as they might cause instability in theirresponse against similar stimuli.

Comparative Validation of Polyp Detection Methods

in Video Colonoscopy: Results from the MICCAI

2015 Endoscopic Vision Challenge

Jorge Bernal

∗

, Nima Tajkbaksh

∗

, F. Javier Sánchez, Bogdan J. Matuszewski, Hao Chen, Lequan Yu, Quentin

Angermann, Olivier Romain, Bjørn Rustad, Ilangko Balasingham, Konstantin Pogorelov, Sungbin Choi, Quentin

Debard, Lena Maier-Hein, Stefanie Speidel, Danail Stoyanov, Patrick Brandao, Henry Córdova, Cristina

Sánchez-Montes, Suryakanth R. Gurudu, Gloria Fernández-Esparrach, Xavier Dray, Jianming Liang

, Aymeric

Histace

Abstract—Colonoscopy is the gold standard for colon cancer

screening though still some polyps are missed, thus preventing

early disease detection and treatment. Several computational

systems have been proposed to assist polyp detection during

colonoscopy but so far without consistent evaluation. The lack

of publicly available annotated databases has made it difﬁcult to

compare methods and to assess if they achieve performance levels

acceptable for clinical use. The Automatic Polyp Detection sub-

challenge, conducted as part of the Endoscopic Vision Challenge

(http://endovis.grand-challenge.org) at the international confer-

ence on Medical Image Computing and Computer Assisted

Intervention (MICCAI) in 2015, was an effort to address this

need. In this paper, we report the results of this comparative eval-

uation of polyp detection methods, as well as describe additional

experiments to further explore differences between methods. We

deﬁne performance metrics and provide evaluation databases

that allow comparison of multiple methodologies. Results show

that convolutional neural networks (CNNs) are the state of the

art. Nevertheless it is also demonstrated that combining different

methodologies can lead to an improved overall performance.

Index Terms—Endoscopic vision, Polyp Detection, Hand-

crafted features, Machine Learning, Validation Framework

∗

Authors contributed equally to this work,

Last position is shared between

the authors.

Jorge Bernal and F. Javier Sánchez are with Computer Science Department

at Universitat Autònoma de Barcelona and Computer Vision Center, Spain

Nima Tajkbaksh and Jianming Liang are with Arizona State Univ., USA

Aymeric Histace, Quentin Angermann, Olivier Romain and Xavier Dray are

with ETIS, ENSEA, Univ. of Cergy-Pontoise, CNRS, Cergy, France. Xavier

Dray is also with Lariboisière Hospital-APHP, France.

Hao Chen and Lequan Yu are with Dpt of Computer Science and Engi-

neering, Chinese University of Hong Kong, China

Bjørn Rustad and Ilangko Balasingham are with Oslo University Hospital.

Bjørn Rustad is also with OmniVision, University of Oslo, Norway.

Konstantin Pogorelov is with Media Performance Group, Simula Research

Laboratory and University of Oslo, Norway

Sungbin Choi is with Seoul National University, Seoul, South Korea

Bogdan J. Matuszewski is with School of Engineering, University of Central

Lancashire, Preston, United Kingdom

Quentin Debard is with University of Nice-Sophia Antipolis, Nice, France

Lena Maier-Hein is with the junior group Computer-assisted Interventions,

German Cancer Research Center (DKFZ), Germany

Stefanie Speidel is with the Institute for Anthropomatics, Karlsruhe Institute

of Technology, Germany

Danail Stoyanov and Patrick Brandao are with the Centre for Medical Image

Computing and Dept. of Computer Science, Univ. College London, UK

Henry Córdova, Cristina Sánchez-Montes and Gloria Fernández-Esparrach

are with Endoscopy Unit, Gastroenterology Department, Hospital Clínic,

IDIBAPS, CIBEREHD, University of Barcelona, Barcelona, Spain

Suryakanth R. Gurudu is with Division of Gastroenterology and Hepatol-

ogy, Mayo Clinic, Scottsdale, Arizona, USA

However, permission to use this material for any other purposes must be

obtained from the IEEE by sending a request to pubs-permissions@ieee.org.

I. INTRODUCTION

This paper introduces the results and main conclusions

of the MICCAI 2015 Sub-Challenge on Automatic Polyp

Detection in Colonoscopy, conducted as part of the Endoscopic

Vision Challenge (http://endovis.grand-challenge.org). More

precisely, we present a validation study comparing the perfor-

mance of different polyp detection methods covering different

methodologies proposed by participating teams, providing an

insight analysis of their detection yield. In this section, we

introduce both clinical and technical contexts.

A. Clinical context

Colorectal cancer (CRC) is the third largest cause of cancer

deaths in the United States among men and women, and it is

expected to have resulted in about 49, 196 deaths in 2016 in the

USA [1]. CRC arises from adenomatous polyps (or adenomas),

that are growths of glandular tissue originating from the

colonic mucosa. Though adenomas are initially benign, they

might become malignant over time and spread to adjacent

and distant organs such as lymph nodes, liver or lungs, being

ultimately responsible for complications and death [2].

CRC prevention is ﬁrst based on the detection of at-risk

patients: those with symptoms (such as hematochezia and

anemia), those with positive screening tests (such as a fecal

occult blood test or a fecal immunochemical test), and those

with a past history of adenoma or with a family history of

advanced adenoma or CRC. In these groups of patients, a

colonoscopy is proposed to detect polyps before any malignant

transformation or at an early cancer stage. This stage refers

to the most superﬁcial colon layers, with no deep invasion,

and it is associated with a 5-year survival rate over 90% [3],

[1]. If any polyp found is characterized as a likely adenoma,

its removal should be considered to conﬁrm the diagnosis, to

set its histological stage and to conﬁrm its complete removal,

giving clinicians clues to determine the need and timing of the

next colonoscopy [4].

Though colonoscopy is the gold standard for colon screen-

ing, other alternatives, such as CT colonography [5] or wireless

capsule endoscopy (WCE) [6], are also used to search for

polyps. They are less invasive to patients and do not present

perforation risk. Though, as colonoscopy, they require bowel

preparation. Nevertheless in these cases, if a polyp is found,

a colonoscopy must be considered to remove the suspicious

lesion. These alternatives have speciﬁc limitations that may

affect the outcome of the screening. For instance, CT colonog-

raphy has a low small lesions (5 mm or less) detection rate

due to resolution constraints [7] and it implies using ionising

radiation. WCE allows to detect all kind of lesions but their

observation depends on whether they are recorded during the

progress of the camera through the gastrointestinal tract or

not. Moreover, its diagnostic yield is highly dependent on the

cleanliness of the colon (whereas colonoscopy has some in-

situ lavage capabilities). Last but not least, the analysis of the

information provided by WCE can be highly time-consuming,

as the recorded videos can last up to 8 hours [8].

Colonoscopy presents some drawbacks, polyp miss-rate

being the most important among these. Colonoscopy rarely

misses polyps bigger than 10 mm, but the miss-rate increases

signiﬁcantly with smaller sized and/or ﬂat polyps [9], [10]. It

has also to be noted that colonoscopies are seldom recorded, so

a new procedure must be performed to revisit explored areas.

The outcome of the colonoscopy exploration depends on: 1)

bowel preparation [11]; 2) speciﬁc choice of endoscope and

video processor, affecting image quality and preventing the

use of certain image enhancing tools; 3) clinicians’ skills, as

both endoscopist’s experience and his/her actual concentration

during the intervention may inﬂuence the degree of procedure

completion (reaching the cecum or not) and the percentage

of the colon that has been explored [12], [13] and 4) patient-

speciﬁc issues, as due to colon movements and the appearance

of folds and angulations during the exploration, some parts of

the colon which may potentially present polyps may not be

reached [9]. Moreover, patients’ personal and family history

can increase the risk of having a polyp and, in this case, the

exploration should be even more thorough.

B. Technical strategies to improve polyp detection rate

Apart from the continuous improvement of clinicians’ skills

through training programs and practice [14], technical efforts

are being undertaken to improve colonoscopy’s outcome. We

clustered them into two groups: improvement of devices and

the development of computational support systems.

Amongst the device improvements, the following should be

highlighted: 1) increase in image resolution and, consequently,

textural information; 2) the use of wide-angle cameras showing

more colon wall surface; 3) the development of zooming and

magniﬁcation techniques [15] and 4) the development of new

imaging methodologies such as autoﬂuorescence imaging [16]

or virtual chromoendoscopy (Olympus’ Narrow Band Imaging

[17], Fujinon’s FICE [18] or Pentax’s i-Scan [19]). This last

group of techniques modify how the scene is observed by

improving the contrast of endoluminal scene elements, which

may help in lesion detection and also with in-vivo lesion

diagnosis due to the enhanced visualization of lesion tissues

[20]. These advances have fostered the cooperation between

clinicians and computer scientists in the development and

validation of computer-aided support systems for colonoscopy,

aimed to help clinicians in all stages of CRC diagnosis. A

signiﬁcant part of this effort has been focused on computer

assisted polyp detection. As it is indicated in [21], cooperation

between technologists and clinicians is essential to develop

clinically useful solutions, with both these groups understand-

ing challenges and limitations in their respective domains.

Automatic polyp detection in colonoscopy videos has been

an active research topic during the last 20 years and several

approaches have been proposed. We present a review of the

most relevant methods in Section II but, to the best of our

knowledge, none of them has been adopted for a routine

patient treatment. There might be several reasons behind this.

First of all, in order for a given method to be clinically

useful, it has to meet real time constraints; e.g. for videos

acquired at 25 frames per second (fps) the maximum time

available to process each image frame should be under 40ms.

Secondly, some of them are built from a theoretical model of

a polyp appearance [14], [22] and therefore limited to only

certain polyp morphologies, which may not translate to the

actual scene where polyp appearance varies greatly. Thirdly,

the majority of methods are mainly focused on the polyps

and they do not consider the presence of other elements such

as folds, blood vessels or the lumen that can affect methods’

performance [14]. Last but not least, some of these methods

have been only trained and tested on selected good quality still

image frames. The lack of temporal coherence and the great

variability in polyp appearance due to camera progression and

visibility conditions might impact their performance in the full

sequences analysis, as they might cause instability in their

response against similar stimuli.

Computational methods also have to deal with additional

colonoscopy-speciﬁc challenges. For instance, they should

consider the impact of image artifacts generated due to scene

illumination (specular highlights, overexposed regions) or to

speciﬁc conﬁguration of the videoprocessor attached to the

colonoscope, which might overlay information over the scene

view. These artifacts, apart from altering the view of the

scene, might not be stable within consecutive frames and

therefore methods should both compensate their impact on

the individual frame polyp detection and tracking in the full

sequence analysis. Additionally, though an effort is made to

ensure an adequate bowel preparation, some particles may still

appear which, in some cases, could lead to false detections

when isolated or to occlusion leading to miss detection or

localization errors. As mentioned before, these methods have

to cope with a great degree of variability in polyp appearance

which depends on illumination conditions, camera position

and on clinician skills when progressing through the colon.

Finally, available methods have been typically validated on

small and restricted databases, under speciﬁc endoscope device

conditions (brand and resolution), in some cases even covering

only one speciﬁc polyp type, shape or morphology hindering

their actual performance in a more generic setting.

C. Motivation of the comparison study

Unfortunately, the lack of a common validation framework,

which is a frequent problem in medical and endoscopy image

analysis [21], has limited the effectiveness of the comparison

between existing approaches, making it difﬁcult to determine

which of them could have actual advantage in clinical use.

To cope with this, efforts have been made on publishing fully

annotated databases [14], [22] and on organizing challenges

as part of international conferences (ISBI, MICCAI), which

offer a basis to discuss validation strategies.

Considering this and taking inspiration from recent works

on quantitative comparative methods’ analysis in areas such

as laparoscopic 3D Surface Reconstruction [23] or liver seg-

mentation [24], we present in this paper a complete validation

study of polyp detection methods performed as part of the

2015 MICCAI sub-challenge on Automatic Polyp Detection.

This sub-challenge was organized jointly by three research

teams: 1) Computer Vision Center/Universitat Autònoma de

Barcelona and Hospital Clinic from Barcelona, Spain (CVC-

CLINIC); 2) ETIS Lab (ENSEA/CNRS/University of Cergy-

Pontoise) and Lariboisière Hospital-APHP at Paris, France

(ETIS-LARIB), and 3) Arizona State University and Mayo

Clinic, USA (ASU-Mayo).

The objective of this paper is to present a comparative study

of polyp detection methods under a newly proposed validation

framework. This validation framework was ﬁrstly introduced

as part of MICCAI 2015 Sub-Challenge on Automatic Polyp

Detection in Colonoscopy and we present in this paper the

results of the mentioned sub-challenge. Beyond this, we also

propose additional experiments to assess even more in-depth

the performance of an automatic polyp detection method.

These new experiments are focused on exploring the actual

clinical applicability of a given method by assessing up to

what extent they are affected by some of the technical and

clinical challenges reported in the literature or whether they

incorporate temporal coherence features or not. Finally we

also go beyond the individual analysis of methods and propose

combination strategies in order to study whether a combination

method may lead to improved individual performance.

The remainder of the paper is structured as follows: In

Section II we present the methods proposed by each of the

participating teams in the challenge, including them in the

context of existing published methods. In Section III we

describe the complete validation framework. Results from the

comparative study are presented in Section IV. Section V

provides an in-depth analysis of the results and discusses

some topics related to challenge organization. Finally, the

concluding remarks are drawn in Section VI.

II. AUTOMATIC POLYP DETECTION METHODS

A. Historical review of computational polyp detection methods

After analyzing approaches reported in the literature, we

propose to cluster methods into three groups: 1) hand-

crafted; 2) end-to-end learning and 3) hybrid approaches.

This taxonomy represents the different historical trends of

polyp detection methods, as in early 2000s, the majority

of the methods used a given texture descriptor to guide

a classiﬁcation method but, subsequently, some researchers

decided to go for hand-crafted features, aiming at a real time

implementation. As technology evolved and the computational

capabilities increased, techniques such as neural networks that

were developed in the past and abandoned due to excessive

computational cost have now resurfaced.

Regarding hand-crafted methods, the majority are based

on exploiting low-level image processing methods to obtain

candidate polyp boundaries (using Hessian ﬁlters in the work

of Iwahori et al. [25], intensity valleys in the work of Bernal et

al. [14] or Hough transform in the work of Silva et al, [26]) and

then use resulting information to deﬁne cues unique to polyps.

For instance, the work of Zhu et al. [27] analyzes curvatures

of detected boundaries whereas the method of Kang et al. [28]

is focused on searching ellipsoidal shapes typically associated

with polyps. Finally, the method of Hwang et al. [29] combines

curvature analysis and shape ﬁtting in their strategy.

Concerning end-to-end learning, texture and color informa-

tion were formerly used as descriptors such as in the work of

Karkanis et al. [30] which proposed the use of color wavelets,

the work of Ameling et al. [31] that exploits the use of co-

ocurrence matrices or the work of Gross et al. [32], which

proposed the use of local binary patterns. Active learning

methodologies have also been introduced as in the work

of Angermann et al. [33] to reinforce the tradeoff between

performance and computation time. Some of the most recent

methods use deep learning tools to aid in polyp detection tasks,

as in the work of Park et al. [34] or in the work of Ribeiro et

al. [35]. In these very recent developments, differences among

methods are based on the selection of a speciﬁc network

architecture and databases used for training.

Finally, there are several hybrid methods which combine

both methodologies for polyp detection, such as in the works

of Tajbaksh et al. [22], which combines edge detection and

feature extraction to boost detection accuracy, the work of

Bae et al. [36], that propose a system based on imbalanced

learning and discriminative feature learning; the work of Silva

et al. [26], which uses hand-crafted features to ﬁlter non-

informative image regions and the work of Ševo et al. [37],

which combines edge density and convolutional networks.

As mentioned in Section I, the great majority of the methods

are tested on private databases though we can observe that

more recent publications such as the work of Park et al. [34]

or the work of Ribeiro et al. [35] have started to use publicly

available databases such as the ones used in the MICCAI

2015 Sub-challenge on Automatic Polyp Detection. Related

to this, apart from new proposals, some of the referenced

methods have been adopted by participants, such as the

works of Bernal et al. [14], Silva et al. [26] or the work of

Tajbaksh et al. [22]. We provide in the next subsection a brief

description of participating methods highlighting their most

relevant contributions to the ﬁeld. We grouped the methods

following the taxonomy deﬁned earlier in this subsection.

B. MICCAI 2015 Polyp Detection Sub-challenge methods

1) Hand-crafted features:

• CVC-CLINIC: This method [14] is based on a model

of appearance considering polyps as protruding surfaces,

being their boundaries deﬁned from intensity valleys

detection. Their proposal includes a pre-preprocessing

stage to mitigate the impact of other valley-rich structures

(blood vessels, specular highlights). To build ﬁnal energy

maps highlighting polyp presence, four different

TABLE I

SUMMARY OF INFORMATION FROM THE TEAMS THAT TOOK PART IN MICCAI 2015 CHALLENGE ON AUTOMATIC POLYP DETECTION.

Team

acronym

Full team details Methodology Published

Still-frame

analysis

Video

analysis

Training

(seconds)

Testing

(seconds)

System tested

ASU Arizona State University (USA) Hybrid Yes [22] No Yes

N/A 2.7

2.4 GHz Intel quad

core processor and an

NVIDIA GeForce GTX

760 video card

CUMED

Department of Computer Science

and Engineering, Chinese Uni-

versity of Hong Kong (China)

End-to-end

learning

(CNNs)

No Yes Yes

10800 0.2

A standard PC with

a 2.50 GHz Intel(R)

Xeon(R) E5-1620 CPU

and a NVIDIA GeForce

GTX Titan X GPU

CVC-CLINIC

Computer Vision Center and

Universitat Autònoma de

Barcelona (Spain)

Hand-crafted Yes [14] Yes Yes

N/A 10

Intel core i7-4790 at

3.6GHz

ETIS-LARIB

ETIS, ENSEA, University of

Cergy-Pontoise, CNRS, Cergy

(France)

Hybrid Yes [26] Yes No

196 2.14

Intel i5 4200U 2.30 GHz

OUS

Oslo University Hospital, OUS

Norway, University of Oslo

(Norway)

End-to-end

learning

(CNNs)

No Yes Yes

86400 5

Intel i5, 4 cores at

2.8 GHz, 4 GB RAM.

Graphic card with 4 GB

memory used for training

PLS

Polyp Localize and Spot Team,

Media Performance Group, Sim-

ula Research Laboratory and

University of Oslo (Norway)

Hybrid No Yes Yes

0.33 per image 0.145

2 Intel(R) Xeon(R) CPU

E5-2650 at 2.00GHz

CPU, 64 GB of RAM,

NVIDIA Corporation

GK110, GeForce GTX

TITAN

SNU

Seoul National University, Seoul

(South Korea)

End-to-end

learning

(CNNs)

No Yes Yes

360 0.8-1

NVIDIA TITAN X GPU

UNS-UCLAN

School of Engineering, Uni-

versity of Central Lancashire,

Preston (UK) and University

of Nice-Sophia Antipolis, Nice

(France)

End-to-end

learning

(CNNs)

No Yes No

18000 5

i7-5930K @ 3.5GHz (6

cores), 64 GB RAM,

NVIDIA GeForce GTX

TITAN X

constraints (continuity, completeness, concavity, and ro-

bustness against spurious structures) are imposed to

candidate boundaries to differentiate polyps from other

structures.

2) End-to-end learning:

• CUMED: The architecture of the proposed network con-

tains two sections including a downsampling path and an

upsampling path [38]. The former contains convolutional

and max-pooling layers while the latter contains convo-

lutional and upsampling layers, increasing the resolutions

of feature maps and output prediction masks. To alleviate

the problem of vanishing gradients and encourage the

back-propagation of gradient ﬂow in deep neural net-

works, the auxiliary classiﬁers are injected to train the

network. Furthermore, they can serve as regularization

to reduce over-ﬁtting and improve the discriminative

capability of features in intermediate layers [39], [40].

The classiﬁcation layer, after fusing multi-level contex-

tual information, produces the detection results. Network

training is formulated as a pixel-wise classiﬁcation prob-

lem with respect to ground-truth masks. The highlight of

this approach is that it explores multi-level feature repre-

sentations with fully CNNs in an end-to-end way, taking

an image as input and directly providing the score map.

In addition, feature-rich hierarchies from a large scale

auxiliary dataset are transferred into the model to reduce

over-ﬁtting and further boost detection performance [41].

• UNS-UCLAN: This method, inspired by reported works

[42], [43], [44], uses three CNNs trained at different

image scales, namely 1, 0.5, and 0.25, of the original

training images. For all the scales the CNNs use the

same architecture, but they are trained independently on

the RGB images at their corresponding scale. After this

initial training phase, the last fully connected part of each

CNN is removed and the outputs from the ’convolutional

part’ of all the three networks are fed as input to a single

Multi-Layer Perceptron (MLP) network. This additional

network is trained independently from the three CNNs.

In this approach CNNs are used as feature extraction

engines operating at different spatial scales, and the MLP

performs the classiﬁcation based on these features.

The method’s output is the polyp incidence probability

map, which is then processed to locate dominant prob-

ability peaks, as peaks locations and probability values

are returned as the ﬁnal output of the system. The

training was performed exclusively on the CVC-CLINIC

database.

• OUS: This method is based on the popular AlexNet

model [44] for CNNs and its slight modiﬁcation Caf-

feNet, which is pre-trained on the ILSVRC 2012 [45]

dataset. Computations are achieved using the Caffe li-

brary [46]. The original model is modiﬁed to take input

patches of size 96 × 96, and the kernel size of the

two ﬁrst pooling layers is decreased from 3 to 2, while

the last pooling layer is removed. The output layer is

modiﬁed to give two outputs, polyp or non-polyp. In order

to increase the training examples, data augmentation is

performed in the form of random mirroring, rotation, up-

and down-scaling, cropping, and brightness adjustment.

Final polyp presence or absence was determined by using

a sliding-window strategy, with three scalings for still

frame analysis and two for full video sequence analysis.

• SNU: This methodology proposes a two-step approach:

detection and localization. For both steps, CNNs were

used. Starting from GoogleNet (pre-trained on the Ima-

geNet dataset), a CNN ﬁne-tuning was performed. Input

image is resized to 224x224 pixels prior training and data

augmentation (rotation and scaling) is also performed.

Training set images are augmented by using several

degrees of random rotation and scaling. Detection is con-

sidered as a simple binary classiﬁcation task whereas, for

localization, CNN are applied on polyp-positive images

which are then segmented into a uniform-sized 8x8 grid

(64 grids per image). Then, for each image, one grid is

overlaid in black and then CNNs are applied thereafter

to perform the binary classiﬁcation task. The 64 overlaid

grid images are then sorted by classiﬁcation score to

calculate ﬁnal polyps’ position.

3) Hybrid approaches:

• PLS: The proposed full localization scheme consists of

two parts, detection and localization. Regarding detec-

tion, two sets of images, one containing polyps, and

the other without polyps, are used for training. Global

image features [47] are used as they are easy and fast

to calculate. Based on similarity scores between input

frame and training ones and results ranks, the detection

subsystem decides in real-time to which class (polyp or

no polyp) the input frame belongs to.

The localization scheme is implemented as a sequence of

preprocessing ﬁlters (RGB to YCbCr color space conver-

sion, removal of borders and sub-images, ﬂare masking

and low-pass ﬁltering) and uses the polyp’s physical

shape to ﬁnd its exact position, approximating polyps

by elliptical shape regions presenting local features that

differentiate them from surrounding tissues. The ﬁnal

decision regarding polyp location is taken by means of the

maximum values in the energy map computed using the

elliptical shape of the polyp’s usual appearance. Finally,

the method outputs four possible locations per frame.

• ETIS-LARIB : This method [26] is inspired by the

psycho-visual methodology used by clinicians when per-

forming an endoscopic examination. First, a detection of

the Regions of Interests (ROI) that may contain a polyp

is performed using shape and size image features. This

ﬁrst pre-selection allows a ﬁrst and fast scanning of the

image. Due to being circular/elliptical shapes associated

to polyps, a Hough transform was used for this ﬁrst

ﬁltering stage. Once ROIs are detected, a second analysis,

based on texture is achieved in order to remove those

ROIs with no actual polyp content. To achieve this, an

ad-hoc classiﬁer based on a boosting-based learning pro-

cess using texture features computed from co-ocurrence

matrices (standard Haralick features) is proposed.

• ASU: This method [22] consists of two stages. In the ﬁrst

stage, a set of polyp candidates is generated using geo-

metric features. Speciﬁcally, given a colonoscopy frame,

a crude set of edge pixels is ﬁrst obtained. This edge map

is then reﬁned using a classiﬁcation and feature extraction

scheme [48]. The goal of the edge classiﬁcation scheme is

to remove as many non-polyp boundary pixels as possible

from the initial edge map. The geometry of the retained

edges is then used in a voting scheme that localizes

polyps candidates as objects with curved boundaries

in the reﬁned edge maps. The voting scheme further

estimates a bounding box for each generated candidate

based on the generated voting map. In the second stage,

an ensemble of CNNs -each of them specialized in one

type of features- is applied to each candidate bounding

box [49]. Finally, the outputs of the CNNs are averaged to

generate a conﬁdence score for a given polyp candidate.

Table I shows a summary of the different methods partic-

ipating at MICCAI 2015 Challenge on Automatic Polyp De-

tection. As each method was tested under different conditions,

computation times are given to complete the information on

the training and testing processes.

III. VALIDATION STUDY

We introduce in this section the complete validation study

proposed to assess and compare the performance of different

polyp detection methods.

A. Deﬁnitions and general performance metrics

We deﬁne Polyp detection as the capability of a given

method to determine polyp presence in a colonoscopy frame

(Polyp presence detection) and, once this is determined, it

is able to provide the location of the polyp within the image

(Polyp localization). Consequently, a good polyp detection

method should select images (video frames) containing polyps

and ignore all others and it should indicate the position of all

polyps present in an image. There are some terms deﬁned

next which are key to set performance metrics. As we deal

with images from real patients examinations, we will ﬁnd two

different cases: images with polyps and images without polyps.

In the ﬁrst case, if detection output is within the polyp,

the method is said to be providing a True Positive (TP)

or correct alarm. It has to be noted that only one TP will

be considered per polyp, no matter how many detections fall

within the polyp. Any detection that falls outside the polyp is

considered a False Positive (FP) or false alarm. The absence

of alarm in images with a polyp is considered a False Negative

(FN), counting one per each polyp in the image that has not

been detected. Regarding images without polyps, we deﬁne as

a True Negative (TN) whenever the method does not provide

any output for this particular image. Any detection provided

for frames without a polyp counts as a False Positive (FP).

Considering these deﬁnitions, we propose the use of the frame-

based performance metrics presented in Table II.

TABLE II

PERFORMANCE METRICS FOR POLYP DETECTION.

Metric Abbreviation Calculation

Precision Prec

P rec =

T P

T P +F P

Recall Rec

Rec =

T P

T P +F N

Speciﬁcity Spec

Spec =

T N

F P +F N

F1-measure F1

F 1 =

2×P rec×Rec

P rec+Rec

F2-measure F2

F 2 =

5×P rec×Rec

4×P rec+Rec

Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results From the MICCAI 2015 Endoscopic Vision Challenge

Figures

Citations

MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation.

Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study.

Deep Learning Localizes and Identifies Polyps in Real Time With 96% Accuracy in Screening Colonoscopy.

Deep learning for image-based cancer detection and diagnosis − A survey

Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy

References

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Large Scale Visual Recognition Challenge

Caffe: Convolutional Architecture for Fast Feature Embedding

Cancer statistics, 2008.

Caffe: Convolutional Architecture for Fast Feature Embedding

Related Papers (5)

WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians

Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer

Deep Residual Learning for Image Recognition

U-Net: Convolutional Networks for Biomedical Image Segmentation

Very Deep Convolutional Networks for Large-Scale Image Recognition

Frequently Asked Questions (15)

Q1. What contributions have the authors mentioned in the paper "Comparative validation of polyp detection methods in video colonoscopy: results from the miccai 2015 endoscopic vision challenge" ?

Q2. What are the future works mentioned in the paper "Comparative validation of polyp detection methods in video colonoscopy: results from the miccai 2015 endoscopic vision challenge" ?

Q3. What is the significance of the metric used to compare different methods?

Q4. What is the straightforward conclusion from this experiment?

Q5. What is the main result of this comparative study?

Q6. What are the main reasons for the lack of coherence in the analysis of polyps?

Q7. What is the reason why the results of the analysis show that PLS offers the performance?

Q8. What are some image challenges that make polyp detection difficult?

Q9. What was the requirement for the performance curves drawing?

Q10. What is the main feature that a clinically applicable system should have?

Q11. What can the authors observe about the effect of polyps on methods’ performance?

Q12. What is the main conclusion to be extracted from the study?

Q13. Why did the authors not perform the same experiment for ETIS-LARIB database?

Q14. What are the three criteria used to account for differences in performance related to polyp morphology?

Q15. How many teams provided the curves for each method?