Polyp Detection and Segmentation using Mask R-CNN: Does a Deeper Feature Extractor CNN Always Perform Better?
read more
Citations
DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation
PolypSegNet: A modified encoder-decoder architecture for automated polyp segmentation from colonoscopy images.
Consolidated domain adaptive detection and localization framework for cross-device colonoscopic images.
AFP-Net: Realtime Anchor-Free Polyp Detection in Colonoscopy
Toward real-time polyp detection using fully CNNs for 2D Gaussian shapes prediction.
References
Deep Residual Learning for Image Recognition
Going deeper with convolutions
Microsoft COCO: Common Objects in Context
Mask R-CNN
Cancer statistics, 2018
Related Papers (5)
Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results From the MICCAI 2015 Endoscopic Vision Challenge
Frequently Asked Questions (12)
Q2. What loss functions are used for the localization loss?
The authors use the following loss functions: Smooth L1 for the localization loss, softmax for the classification loss and binary cross-entropy for the mask loss.
Q3. What is the way to train a CNN?
The choice of the feature extractor is essential because the CNN architecture, the number of parameters and type of layers directly affect the speed, memory usage and most importantly the performance of the Mask R-CNN.
Q4. What does the algorithm use to fine-tune the pre-trained CNNs?
The authors use SGD with a momentum of 0.9, learning rate of 0.0003, and batch size of 1 to fine-tune the pre-trained CNNs using the augmented dataset.
Q5. How long does it take to train the mask R-CNN models?
It seems that the the models are getting overfitted on the training dataset after 30 epochs, which results in performance degradation.
Q6. What is the purpose of this experiment?
In this experiment, the authors aim to know to what extent adding extra training images with new polyps can help the CNN feature extractors improve their performance.
Q7. What is the reason for the failure of Mask R-CNN?
The results confirm that with a better training dataset, Mask R-CNN will become a promising technique for polyp detection and segmentation, and using a deeper or more complex CNN feature extractor might become unnecessary.
Q8. What does the performance of the mask R-CNN with Resnet50 mean?
Mask R-CNN with Resnet50 could outperform the counterpart models in all evaluation metrics, with a recall of 83.49%, precision of 92.95%, dice of 71.6% and jaccard of 63.9%.
Q9. What is the reason for the adapted and evaluated Mask R-CNN?
In this paper the authors adapted and evaluated Mask R-CNN with three recent CNN feature extractors i.e. Resnet50, Resnet101, and Inception Resnet (v2) for polyp detection and segmentation.
Q10. What is the example of a new polyp?
As shown in Fig. 5, the newpolyp images added to the training data helped Mask R-CNN with Inception Resnet (v2) to predict a better mask for the polyp shown in the first column, correctly detect and segment the missed polyp shown in the second column, and correct the FP detection for the polyp shown in the third column.
Q11. What are the components of the mask R-CNN?
Each output produced by the Mask R-CNN consists of three components: a confidence value, the coordinates of a bounding box, and a mask (see Fig. 3).
Q12. What is the way to compare the Mask R-CNN with other methods?
Their Mask R-CNN with Resnet101 has outperformed all the other methods including FCN-VGG, with a dice of 70.42% and Jaccard of 61.24%.