What does the algorithm use to fine-tune the pre-trained CNNs?

The authors use SGD with a momentum of 0.9, learning rate of 0.0003, and batch size of 1 to fine-tune the pre-trained CNNs using the augmented dataset.

How long does it take to train the mask R-CNN models?

It seems that the the models are getting overfitted on the training dataset after 30 epochs, which results in performance degradation.

What is the reason for the failure of Mask R-CNN?

The results confirm that with a better training dataset, Mask R-CNN will become a promising technique for polyp detection and segmentation, and using a deeper or more complex CNN feature extractor might become unnecessary.

What is the reason for the adapted and evaluated Mask R-CNN?

In this paper the authors adapted and evaluated Mask R-CNN with three recent CNN feature extractors i.e. Resnet50, Resnet101, and Inception Resnet (v2) for polyp detection and segmentation.

What is the example of a new polyp?

As shown in Fig. 5, the newpolyp images added to the training data helped Mask R-CNN with Inception Resnet (v2) to predict a better mask for the polyp shown in the first column, correctly detect and segment the missed polyp shown in the second column, and correct the FP detection for the polyp shown in the third column.

What are the components of the mask R-CNN?

Each output produced by the Mask R-CNN consists of three components: a confidence value, the coordinates of a bounding box, and a mask (see Fig. 3).

(Open Access) Polyp Detection and Segmentation using Mask R-CNN: Does a Deeper Feature Extractor CNN Always Perform Better? (2019) | Hemin Ali Qadir

Q: What have the authors contributed in "Polyp detection and segmentation using mask r-cnn: does a deeper feature extractor cnn always perform better?" ?

In this paper, the authors adapt Mask R-CNN and evaluate its performance with different modern convolutional neural networks ( CNN ) as its feature extractor for polyp detection and segmentation. The authors investigate the performance improvement of each feature extractor by adding extra polyp images to the training dataset to answer whether they need deeper and more complex CNNs, or better dataset for training in automatic polyp detection and segmentation. Finally, the authors propose an ensemble method for further performance improvement. The authors evaluate the performance on the 2015 MICCAI polyp detection dataset.

Q: What loss functions are used for the localization loss?

The authors use the following loss functions: Smooth L1 for the localization loss, softmax for the classification loss and binary cross-entropy for the mask loss.

Q: What is the purpose of this experiment?

In this experiment, the authors aim to know to what extent adding extra training images with new polyps can help the CNN feature extractors improve their performance.

Polyp Detection and Segmentation using Mask

R-CNN: Does a Deeper Feature Extractor CNN

Always Perform Better?

Hemin Ali Qadir

1,2,5

, Younghak Shin

, Johannes Solhusvik

2,5

, Jacob Bergsland

Lars Aabakken

1,4

, Ilangko Balasingham

1,3

Intervention Centre, Oslo University Hospital, Oslo, Norway

Department of Informatics, University of Oslo, Oslo, Norway

Department of Electronic Systems, Norwegian University of Science and Technology, Trondheim, Norway

Department of Transplantation Medicine, University of Oslo, Oslo, Norway

OmniVision Technologies Norway AS, Oslo, Norway

LG CNS, Seoul, Korea

Abstract—Automatic polyp detection and segmentation are

highly desirable for colon screening due to polyp miss rate by

physicians during colonoscopy, which is about 25%. However, this

computerization is still an unsolved problem due to various polyp-

like structures in the colon and high interclass polyp variations

in terms of size, color, shape and texture. In this paper, we

adapt Mask R-CNN and evaluate its performance with different

modern convolutional neural networks (CNN) as its feature ex-

tractor for polyp detection and segmentation. We investigate the

performance improvement of each feature extractor by adding

extra polyp images to the training dataset to answer whether

we need deeper and more complex CNNs, or better dataset

for training in automatic polyp detection and segmentation.

Finally, we propose an ensemble method for further performance

improvement. We evaluate the performance on the 2015 MICCAI

polyp detection dataset. The best results achieved are 72.59%

recall, 80% precision, 70.42% dice, and 61.24% jaccard. The

model achieved state-of-the-art segmentation performance.

Index Terms—polyp detection, polyp segmentation, convolu-

tional neural network, mask R-CNN, ensemble

I. INTRODUCTION

Colorectal cancer is the second most common cause of

cancer-related death in the United States for both men and

women, and its incidence increases every year [1]. Colonic

polyps, growths of glandular tissue at colonic mucosa, are the

major cause of colorectal cancer. Although they are initially

benign, they might become malignant over time if left un-

treated [2]. Colonoscopy is the primary method for screening

and preventing polyps from becoming cancerous [3]. However,

colonoscopy is dependent on highly skilled endoscopists and

high level of eye-hand coordination, and recent clinical studies

have shown that 22%–28% of polyps are missed in patients

undergoing colonoscopy [4].

Over the past decades, various computer aided diagnosis

systems have been developed to reduce polyp miss rate and

improve the detection capability during colonoscopy [5]–

[19]. The existing automatic polyp detection and segmentation

This work was supported by Research Council of Norway through the

industrial Ph.D. project under the contract number 271542/O30.

methods can be roughly grouped into two categories: 1) those

which use hand-crafted features [5]–[11], 2) those which use

data driven approach, more speciﬁcally deep learning method

[12]–[18].

The majority of hand-crafted based methods can be cat-

egorized into two groups: texture/color based [5]–[8] and

shape based [9]–[11]. In [5]–[8], color wavelet, texture, Haar,

histogram of oriented gradients and local binary pattern were

investigated to differentiate polyps from the normal mucosa.

Hwang et al. [9] assumed that polyps have elliptical shape

that distinguishes polyps from non-polyp regions. Bernal et

al. [10] used valley information based on polyp appearance to

segment potential regions by watersheds followed by region

merging and classiﬁcation. Tajbakhsh et al. [11] used edge

shape and context information to accumulate votes for polyp

regions. These feature patterns are frequently similar in polyp

and polyp-like normal structures, resulting in decreased per-

formance.

To overcome the shortcomings of the hand-crafted features,

a data driven approach based on CNN was proposed for polyp

detection [12]–[19]. In the 2015 MICCAI sub-challenge on

automatic polyp detection [12], most of the proposed methods

were based on CNN, including the winner. The authors in

[13] and [14] showed that fully convolution network (FCN)

architectures could be reﬁned and adapted to recognize polyp

structures. Zhang et al. [15] used FCN-8S to segment polyp

region candidates, and texton features computed from each

region were used by a random forest classiﬁer for the ﬁnal

decision. Shin et al. [16] showed that Faster R-CNN is a

promising technique for polyp detection. Zhnag et al. [17]

added a tracker to enhance the performance of a CNN polyp

detector. Yu et al. [18] adapted a 3D-CNN model in which a

sequence of frames was used for polyp detection.

In this paper, we adapt Mask R-CNN [20] for polyp detec-

tion and segmentation. Segmenting out polyps from the normal

mucosa can help physicians to improve their segmentation

errors and subjectivity. We have several objectives in this

study. We ﬁrst evaluate the performance of Mask R-CNN and

compare it to existing methods. Secondly, we aim to evaluate

different CNN architectures (e.g., Resnet50 and Resnet101

[21], and Inception Resnet V2 [21]) as the feature extractor

for the Mask R-CNN for polyp segmentation. Thirdly, we aim

to answer to what extent adding extra training images can

help to improve the performance of each of the CNN feature

extractors. Do we really need to go for a deeper and more

complex CNN to extract higher level of features or do we just

need to build a better dataset for training? Finally, we propose

an ensemble method for further performance improvement.

II. MATERIALS AND METHODS

A. Datasets

Most of the proposed methods mentioned in section I were

tested on different datasets. The authors in [14], [15] used

a dataset containing images of the same polyps for training

and testing phases after randomly splitting it into two subsets.

This is not very realistic case for validating a method as we

may have the same polyps in the training and testing phases.

These two issues limit the comparison between the reported

results. The 2015 MICCAI sub-challenge on automatic polyp

detection was an attempt to evaluate different methods on the

same datasets. We, therefore, use the same datasets of 2015

MICCAI polyp detection challenge for training and testing the

models. We only use the two datasets of still images: 1) CVC-

ClinicDB [23] containing 32 different polyps presented in 612

images, and 2) ETIS-Larib [24] containing 36 different polyps

presented in 196 images. In addition, we use CVC-ColonDB

[25] that contains 15 different polyps presented in 300 images.

B. Evaluation Metrics

For polyp detection performance evaluation, we calculate

recall and precision using the well-known medical parameters

such as True Positive (TP), False Positive (FP), True Negative

(TN) and False Negative (FN) as follows:

recall =

T P

T P + F N

, (1)

precision =

T P

T P + F P

. (2)

For evaluation of polyp segmentation, we use common seg-

mentation evaluation metrics: Jaccard index (also known as

intersection over union, IoU), and Dice similarity score as

follows:

J(A, B) =

| A ∩ B |

| A ∪ B |

| A ∩ B |

| A | + | B | − | A ∩ B |

, (3)

Dice(A, B) =

2 | A ∩ B |

| A | + | B |

, (4)

where A represents the output image of the method and B the

actual ground-truth.

C. Mask R-CNN

Mask R-CNN [20] is a general framework for object

instance segmentation. It is an intuitive extension of Faster

R-CNN [26], the state-of-the-art object detector. Mask R-

CNN adapts the same ﬁrst stage of Faster R-CNN which

is region proposal network (RPN). It adds a new branch to

the second stage for predicting an object mask in parallel

with the existing branches for bounding box regression and

conﬁdence value. Instead of using RoIPool, which performs

coarse quantization for feature extraction in Faster R-CNN,

Mask R-CNN uses RoIAlign, quantization-free layer, to ﬁx

the misalignment problem.

For our polyp detection and segmentation, we use the

architecture shown in Fig. 1 to evaluate the performance of

Mask R-CNN with different CNN based feature extractors.

To train our models, we use a multi-task loss on each region

of interest called anchor proposed by RPN. For each anchor

a, we ﬁnd the best matching ground-truth box b. If there is

a match, anchor a acts as a positive anchor, and we assign a

class label y

= 1, and a vector (φ(b

; a)) encoding box b with

respect to anchor a. If there is no match, anchor a acts as a

negative sample, and the class label is set to y

= 0. The mask

branch has a 14 × 14 dimensional output for each anchor. The

loss for each anchor a, then consists of three losses: location-

based loss `

loc

for the predicted box f

loc

(I; a, θ), classiﬁcation

loss `

cls

for the predicted class f

cls

(I; a, θ) and mask loss

mask

for the predicted mask f

mask

(I, a, θ), where I is the

image and θ is the model parameter,

L(a, I; θ) =

i=1

j=1

1[a is positive] . `

loc



φ(b

; a)

−f

loc

(I; a, θ)



+ `

cls



, f

cls

(I; a, θ)



mask



mask

, f

mask

(I, a, θ)



(5)

where m is the size of mini-batch and N is the number of

anchors for each frame. We use the following loss functions:

Smooth L1 for the localization loss, softmax for the classiﬁ-

cation loss and binary cross-entropy for the mask loss.

D. CNN Feature Extractor Networks

In the ﬁrst stage of Mask R-CNN, we need a CNN based

feature extractor to extract high level features from the input

image. The choice of the feature extractor is essential because

the CNN architecture, the number of parameters and type of

layers directly affect the speed, memory usage and most im-

portantly the performance of the Mask R-CNN. In this study,

we select three feature extractors to compare and evaluate their

performance in polyp detection and segmentation. We select a

deep CNN (e.g., Resnet50 [21]), deeper CNN (e.g., Resnet101

[21]), and complex CNN (e.g., Inception Resnet (v2) [22]).

Resnet is a residual learning framework to ease the training

of substantially deep networks to avoid degradation problem–

accuracy gets saturated and then degrades rapidly with depth

increasing [21]. With residual learning, we can now beneﬁt

from deeper CNN networks to obtain even higher level of

features which are essential for difﬁcult tasks such as polyp

detection and segmentation. With inception technique, we can

increase the depth and width of a CNN network without

increasing the computational cost [27]. Szegedy et al. [22]

proposed Inception Resnet (v2) to combine the optimization

Fig. 1. Our Mask R-CNN framework. In the ﬁrst stage, we use Resnet50, Resnet101 and Resnet Inception v2 as the feature extractor for the performance

evaluation of polyp detection and segmentation. Region proposal network (RPN) utilizes feature maps at one of the intermediate layers (usually the last

convolutional layer) of the CNN feature extractor networks to generate box proposals (300 boxes in our study). The proposed boxes are a grid of anchors

tiled in different aspect ratios and scales. The second stage predicts the conﬁdence value, the offsets for the proposed box and the mask within the box for

each anchor.

beneﬁts of residual learning and computational efﬁciency from

inception units.

For all three feature extractors, it is important to choose one

of the layer to extract features for predicting region proposals

by RPN. In our experiments, we use the recommended layers

by the original papers. For both Resnet50 and Resnet101, we

use the last layer of the conv4 block. For Inception Resnet

(v2), we use M ixed 6a layer and its associated residual

layers.

E. Ensemble Model

The three CNN feature extractors compute different types

of features due to differences in their number of layers and

architectures. A deeper CNN can compute a higher level of

features from the input image while it loses some spatial

information due to the contraction and pooling layers. Some

polyps might be missed by one of the CNN model while

it could be detected by another one. To partly solve this

problem, we propose an ensemble model to combine results

of two Mask R-CNN models with two different CNN feature

extractors. We use one of the models as the main model and

its output is always relied on, and the second model as an

auxiliary model to support the main model. We only take

into account the outputs from the auxiliary model when the

conﬁdence of the detection is > 95% (an optimized value

using a validation dataset, see section III-B).

F. Training Details

The available polyp datasets are not large enough to train a

deep CNN. To prevent the models from overﬁtting, we enlarge

the dataset by applying different augmentation strategies. We

follow the same augmentation methods recommended by Shin

et al. [16]. Image augmentation cannot improve data distri-

bution of the training set—they can only lead to an image-

level transformation through depth and scale. This does not

ensure the model from being overﬁtted. Therefore, we use

transfer learning by initializing the weights of our CNN feature

extractors from models pre-trained on Microsoft’s COCO

dataset [28]. We use SGD with a momentum of 0.9, learning

rate of 0.0003, and batch size of 1 to ﬁne-tune the pre-trained

CNNs using the augmented dataset. We keep the original

image size during both training and test phases.

III. RESULTS AND DISCUSSION

A. Performance Evaluation of the CNN Feature Extractors

In this section, we report the performance of our Mask

R-CNN model shown in Fig. 1 with the three CNN feature

extractors as the base networks. In this experiment, we used

CVC-ColonDB for training and CVC-ClinicDB for testing.

We trained the three Mask R-CNN models for 10, 20, and 30

epochs and drew curves to show the performance improvement

(see Fig. 2). We noticed that only 20 epochs was enough to

ﬁne-tune the parameters of the three Mask R-CNN models for

polyp detection and segmentation, in case of Resnet50 and

Resnet101 only 10 epochs. It seems that the the models are

getting overﬁtted on the training dataset after 30 epochs, which

results in performance degradation.

For comparison, we chose 20 epochs and summarized the

results in Table I. Inception Resnet (v2) and Resnet101 have

shown the best performance for many object classiﬁcation,

detection and segmentation tasks on datasets of natural images

[29]. However, Mask R-CNN with Resnet50 could outperform

the counterpart models in all evaluation metrics, with a recall

of 83.49%, precision of 92.95%, dice of 71.6% and jaccard

of 63.9%. This might be due to the fact that deeper and more

10 20 30 40

number of training epochs

accuracy

Resnet50

Resnet101

Inception Resnet

Fig. 2. Accuracy of the CNN feature extractors vs. number of epochs

complex networks need larger number of images for training.

The CVC-ColonDB dataset contains 300 images with only 15

different polyps. This dataset might not have enough unique

polyps for Resnet101 and Inception Resnet (v2) to show their

actual performance. This outcome is important because it

could be used as evidence to properly choose a CNN feature

extractor according to the size of the available dataset.

TABLE I

COMPARISON OF THE RESULTS OBTAINED ON THE CVC-CLINICDB

AFTER THE MODELS HAVE BEEN TRAINED FOR 20 EPOCHS

Mask R-CNNs Recall % Precision % Dice % Jaccard %

Resnet50 83.49 92.95 71.6 63.9

Resnet101 80.71 92.1 70.42 63.3

Inception Resnet 77.31 91.25 70.31 63.6

Fig. 3 illustrates three examples with different output results.

The polyp shown in the ﬁrst column is correctly detected and

nicely segmented by the three models. The polyp in the second

column is detected correctly by the three models, but only

Resnet50 was successful to segment out most of the polyp

pixels from the background. The polyp in the third column is

only detected and segmented by Resnet50.

B. Ensemble Results

It is important to know if detection and segmentation

performance can be improved by combining the output results

of two Mask R-CNN models. Table II shows the results of

TABLE II

ENSEMBLE RESULTS OBTAINED ON THE CVC-CLINICDB BY COMBINING

THE RESULTS OF TWO MASK R-CNN MODELS

Mask R-CNNs Recall % Precision % Dice % Jaccard %

Resnet50 83.49 92.95 71.6 63.9

Resnet101 80.71 92.1 70.42 63.3

Resnet Inception 77.31 91.25 70.31 63.6

Ensemble

50+101

86.42 92.41 75.72 68.28

Improvement 2.93 -0.54 4.12 4.38

Ensemble

50+Incep

83.95 90.67 74.73 67.41

Improvement 0.46 -2.28 3.13 3.51

50+101

Resnet50 used as main, Resnet101 used as auxiliary

50+Incep

Resnet50 used as main, Resnet Inception used as auxiliary

this combination. We chose Resnet50 as our main model

because it performed better than its counterparts as seen in

Table I, and the two others as the auxiliary model. We ﬁrst

Ground Truth

Input ImageResnet50Resnet101Inception Resnet

Fig. 3. Example of three outputs produced by our Mask R-CNN models. The

images in the 1

row show the ground truths for the polyps shown in the 2

raw. The images in the 3

row show the output results produced by Mask

R-CNN with Resnet50. The images in the 4

row are outputs from Mask

R-CNN with Resnet101. The images in the 5

row are outputs from Mask

R-CNN with Resnet Inception (v2).

used the ETIS-Larib dataset as the validation set to select a

suitable conﬁdence threshold for the auxiliary model. This is

an essential prepossessing to prevent increasing the number

of FP detection. Based on this optimization step, the output

of the auxiliary model is only taken into account when the

conﬁdence of the detection is > 95%.

Table II demonstrates that the auxiliary model could only

add a small improvement in the performance of the main

model. Resnet101 could improve recall by 2.93%, dice by

4.12%, and jaccard by 4.38% whereas Resnet Inception could

only improve recall by 0.46%, dice by 3.13%, and jaccard

by 3.51%. Precision got decreased in both cases. The im-

provement in detection is less than in segmentation. This

means that Resnet50 was able to detect most of the polyps

detected by the two auxiliary models. Fig. 4 illustrates two

polyp examples. The ﬁrst polyp is partially segmented and

the second polyp is missed by Resnet50. However, they both

are precisely segmented by Resnet101 and Resnet Inception

with a conﬁdence of 99%.

Input+Ground Truth Resnet50 Resnet101 Recent Inception

Fig. 4. Example of two outputs produced by the three Mask R-CNN models.

Column 1 shows two polyps with their ground truths. Columns 2, 3 and 4

show the results of Resnet50, Resnet101 and Resnet Inception, respectively.

C. The Effect of Adding New Images to the Training Set

In this experiment, we aim to know to what extent adding

extra training images with new polyps can help the CNN

feature extractors improve their performance. We thus trained

the three models again for 20 epochs using the images in

both ETIS-Larib and CVC-ColonDB datasets for training (51

different polyps). Table III shows that all the three models were

able to greatly improve both the detection and segmentation

capabilities of the Mask R-CNN (especially Inception Resnet)

after adding 36 new polyps of ETIS-Larib (196 images) to

the training data. Unlike ensemble approach, all the metrics,

including precision, improved by larger margins in this exper-

iment. As can be noticed in the results, Resnet Inception is

the model with the most improvements in all metrics. This

indicates the ability of this CNN architecture to extract richer

features from larger training data. As shown in Fig. 5, the new

TABLE III

COMPARISON OF RESULTS OBTAINED ON THE CVC-CLINICDB AFTER

ETIS-LARIB WAS ADDED TO THE TRAINING DATA AND THE MODELS

TRAINED FOR 20 EPOCHS

Mask R-CNNs Recall % Precision % Dice % Jaccard %

Resnet50

83.49 92.95 71.6 63.9

Resnet50

85.34 93.1 80.42 73.4

improvement 1.85 0.15 8.82 9.5

Resnet101

80.71 92.1 70.42 63.3

Resnet101

84.87 95 77.48 70.13

improvement 4.16 2.9 7.06 6.83

Inception Resnet

77.31 91.25 70.31 63.6

Inception Resnet

86.1 94.1 80.19 73.2

improvement 8.79 2.85 9.88 9.6

indicates that only CVC-ColonDB was used for the training

indicates that CVC-ColonDB and ETIS-Larib were used for training

polyp images added to the training data helped Mask R-CNN

with Inception Resnet (v2) to predict a better mask for the

polyp shown in the ﬁrst column, correctly detect and segment

the missed polyp shown in the second column, and correct the

FP detection for the polyp shown in the third column.

D. Comparison with Other Methods

Each output produced by the Mask R-CNN consists of three

components: a conﬁdence value, the coordinates of a bounding

box, and a mask (see Fig. 3). This makes Mask R-CNN eligi-

ble for performance comparison with other methods in terms

of the detection and segmentation capabilities. For comparison

Ground Truth

Input Image

Resnet Inception

Inception Resnet

Fig. 5. Example of three outputs produced by Mask R-CNN with Inception

Resnet (v2). The images in the 1

row show the ground truths for the polyps

shown in the 2

row. The images in the 3

row are output results of the

model when trained on CVC-ColonDB (Inception Resnet

). The images in

the 4

row are output results of the model when trained on CVC-ColonDB

and ETIS-Larib (Inception Resnet

against the methods presented in MICCAI 2015, we followed

the same dataset guidelines i.e. CVC-ClinicDB dataset used

for training stage whereas ETIS-Larib dataset used for testing

stage. In Table IV, we compare our Mask R-CNN models

TABLE IV

SEGMENTATION RESULTS OBTAINED ON THE ETIS-LARIB DATASET

Segmentation Models Dice % Jaccard %

FCN-VGG [13] 70.23 54.20

Mask R-CNN with Resnet50 58.14 51.32

Mask R-CNN with Resnet101 70.42 61.24

Mask R-CNN with Inception Resnet 63.78 56.85

against FCN-VGG [13] which is the only segmentation method

fully tested on ETIS-Larib. Our Mask R-CNN with Resnet101

has outperformed all the other methods including FCN-VGG,

with a dice of 70.42% and Jaccard of 61.24%. To be able to

fairly compare the detection capability of our Mask R-CNN

models, we followed the same procedure in MICCAI 2015 to

compute TP, FP, FN, and TN. As can be seen in Table V, our

Mask R-CNN with Resnet101 achieved the highest precision

(80%) and a good recall (72.59%), outperforming Mask R-

CNN with Resnet50, Mask R-CNN with Inception Resnet

(v2) and the best method in MICCAI 2015. FCN-VGG has

a better recall because both CVC-ClinicDB and ASU-Mayo

were used in the training stage (more data for training). These

results in Tables IV and V are inconsistent with the results in

Table I where Resnet50 achieved the best performance. The

Polyp Detection and Segmentation using Mask R-CNN: Does a Deeper Feature Extractor CNN Always Perform Better?

Figures

Citations

DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation

PolypSegNet: A modified encoder-decoder architecture for automated polyp segmentation from colonoscopy images.

Consolidated domain adaptive detection and localization framework for cross-device colonoscopic images.

AFP-Net: Realtime Anchor-Free Polyp Detection in Colonoscopy

Toward real-time polyp detection using fully CNNs for 2D Gaussian shapes prediction.

References

Deep Residual Learning for Image Recognition

Going deeper with convolutions

Microsoft COCO: Common Objects in Context

Mask R-CNN

Cancer statistics, 2018

Related Papers (5)

WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians

Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer

Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results From the MICCAI 2015 Endoscopic Vision Challenge

Automatic Colon Polyp Detection Using Region Based Deep CNN and Post Learning Approaches

U-Net: Convolutional Networks for Biomedical Image Segmentation

Frequently Asked Questions (12)

Q1. What have the authors contributed in "Polyp detection and segmentation using mask r-cnn: does a deeper feature extractor cnn always perform better?" ?

Q2. What loss functions are used for the localization loss?

Q3. What is the way to train a CNN?

Q4. What does the algorithm use to fine-tune the pre-trained CNNs?

Q5. How long does it take to train the mask R-CNN models?

Q6. What is the purpose of this experiment?

Q7. What is the reason for the failure of Mask R-CNN?

Q8. What does the performance of the mask R-CNN with Resnet50 mean?

Q9. What is the reason for the adapted and evaluated Mask R-CNN?

Q10. What is the example of a new polyp?

Q11. What are the components of the mask R-CNN?

Q12. What is the way to compare the Mask R-CNN with other methods?

Trending Questions (1)