scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Medical Imaging in 2020"


Journal ArticleDOI
TL;DR: UNet++ as mentioned in this paper proposes an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision, leading to a highly flexible feature fusion scheme.
Abstract: The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects—an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus .

1,487 citations


Journal ArticleDOI
Yujin Oh1, Sangjoon Park1, Jong Chul Ye1
TL;DR: Experimental results show that the proposed patch-based convolutional neural network approach achieves state-of-the-art performance and provides clinically interpretable saliency maps, which are useful for COVID-19 diagnosis and patient triage.
Abstract: Under the global pandemic of COVID-19, the use of artificial intelligence to analyze chest X-ray (CXR) image for COVID-19 diagnosis and patient triage is becoming important. Unfortunately, due to the emergent nature of the COVID-19 pandemic, a systematic collection of CXR data set for deep neural network training is difficult. To address this problem, here we propose a patch-based convolutional neural network approach with a relatively small number of trainable parameters for COVID-19 diagnosis. The proposed method is inspired by our statistical analysis of the potential imaging biomarkers of the CXR radiographs. Experimental results show that our method achieves state-of-the-art performance and provides clinically interpretable saliency maps, which are useful for COVID-19 diagnosis and patient triage.

684 citations


Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed a COVID-19 Lung Infection Segmentation Deep Network ( Inf-Net) to automatically identify infected regions from chest CT slices, where a parallel partial decoder is used to aggregate the high-level features and generate a global map.
Abstract: Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions from CT slices faces several challenges, including high variation in infection characteristics, and low intensity contrast between infections and normal tissues. Further, collecting a large amount of data is impractical within a short time period, inhibiting the training of a deep model. To address these challenges, a novel COVID-19 Lung Infection Segmentation Deep Network ( Inf-Net ) is proposed to automatically identify infected regions from chest CT slices. In our Inf-Net , a parallel partial decoder is used to aggregate the high-level features and generate a global map. Then, the implicit reverse attention and explicit edge-attention are utilized to model the boundaries and enhance the representations. Moreover, to alleviate the shortage of labeled data, we present a semi-supervised segmentation framework based on a randomly selected propagation strategy, which only requires a few labeled images and leverages primarily unlabeled data. Our semi-supervised framework can improve the learning ability and achieve a higher performance. Extensive experiments on our COVID-SemiSeg and real CT volumes demonstrate that the proposed Inf-Net outperforms most cutting-edge segmentation models and advances the state-of-the-art performance.

633 citations


Journal ArticleDOI
TL;DR: A weakly-supervised deep learning model can accurately predict the COVID-19 infectious probability and discover lesion regions in chest CT without the need for annotating the lesions for training.
Abstract: Accurate and rapid diagnosis of COVID-19 suspected cases plays a crucial role in timely quarantine and medical treatment. Developing a deep learning-based model for automatic COVID-19 diagnosis on chest CT is helpful to counter the outbreak of SARS-CoV-2. A weakly-supervised deep learning framework was developed using 3D CT volumes for COVID-19 classification and lesion localization. For each patient, the lung region was segmented using a pre-trained UNet; then the segmented 3D lung region was fed into a 3D deep neural network to predict the probability of COVID-19 infectious; the COVID-19 lesions are localized by combining the activation regions in the classification network and the unsupervised connected components. 499 CT volumes were used for training and 131 CT volumes were used for testing. Our algorithm obtained 0.959 ROC AUC and 0.976 PR AUC. When using a probability threshold of 0.5 to classify COVID-positive and COVID-negative, the algorithm obtained an accuracy of 0.901, a positive predictive value of 0.840 and a very high negative predictive value of 0.982. The algorithm took only 1.93 seconds to process a single patient’s CT volume using a dedicated GPU. Our weakly-supervised deep learning model can accurately predict the COVID-19 infectious probability and discover lesion regions in chest CT without the need for annotating the lesions for training. The easily-trained and high-performance deep learning algorithm provides a fast way to identify COVID-19 patients, which is beneficial to control the outbreak of SARS-CoV-2. The developed deep learning software is available at https://github.com/sydney0zq/covid-19-detection .

490 citations


Journal ArticleDOI
TL;DR: A novel deep network, derived from Spatial Transformer Networks, is presented, which simultaneously predicts the disease severity score associated to a input frame and provides localization of pathological artefacts in a weakly-supervised way.
Abstract: Deep learning (DL) has proved successful in medical imaging and, in the wake of the recent COVID-19 pandemic, some works have started to investigate DL-based solutions for the assisted diagnosis of lung diseases. While existing works focus on CT scans, this paper studies the application of DL techniques for the analysis of lung ultrasonography (LUS) images. Specifically, we present a novel fully-annotated dataset of LUS images collected from several Italian hospitals, with labels indicating the degree of disease severity at a frame-level, video-level, and pixel-level (segmentation masks). Leveraging these data, we introduce several deep models that address relevant tasks for the automatic analysis of LUS images. In particular, we present a novel deep network, derived from Spatial Transformer Networks, which simultaneously predicts the disease severity score associated to a input frame and provides localization of pathological artefacts in a weakly-supervised way. Furthermore, we introduce a new method based on uninorms for effective frame score aggregation at a video-level. Finally, we benchmark state of the art deep models for estimating pixel-level segmentations of COVID-19 imaging biomarkers. Experiments on the proposed dataset demonstrate satisfactory results on all the considered tasks, paving the way to future research on DL for the assisted diagnosis of COVID-19 from LUS data.

398 citations


Journal ArticleDOI
TL;DR: A noise-robust Dice loss that is a generalization of Dice loss for segmentation and Mean Absolute Error (MAE) loss for robustness against noise is introduced and combined with an adaptive self-ensembling framework for training.
Abstract: Segmentation of pneumonia lesions from CT scans of COVID-19 patients is important for accurate diagnosis and follow-up. Deep learning has a potential to automate this task but requires a large set of high-quality annotations that are difficult to collect. Learning from noisy training labels that are easier to obtain has a potential to alleviate this problem. To this end, we propose a novel noise-robust framework to learn from noisy labels for the segmentation task. We first introduce a noise-robust Dice loss that is a generalization of Dice loss for segmentation and Mean Absolute Error (MAE) loss for robustness against noise, then propose a novel COVID-19 Pneumonia Lesion segmentation network (COPLE-Net) to better deal with the lesions with various scales and appearances. The noise-robust Dice loss and COPLE-Net are combined with an adaptive self-ensembling framework for training, where an Exponential Moving Average (EMA) of a student model is used as a teacher model that is adaptively updated by suppressing the contribution of the student to EMA when the student has a large training loss. The student model is also adaptive by learning from the teacher only when the teacher outperforms the student. Experimental results showed that: (1) our noise-robust Dice loss outperforms existing noise-robust loss functions, (2) the proposed COPLE-Net achieves higher performance than state-of-the-art image segmentation networks, and (3) our framework with adaptive self-ensembling significantly outperforms a standard training process and surpasses other noise-robust training approaches in the scenario of learning from noisy labels for COVID-19 pneumonia lesion segmentation.

331 citations


Journal ArticleDOI
TL;DR: In this paper, a two-stage architecture and training procedure was proposed for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 100,000 images).
Abstract: We present a deep convolutional neural network for breast cancer screening exam classification, trained, and evaluated on over 200000 exams (over 1000000 images). Our network achieves an AUC of 0.895 in predicting the presence of cancer in the breast, when tested on the screening population. We attribute the high accuracy to a few technical advances. 1) Our network’s novel two-stage architecture and training procedure, which allows us to use a high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. 2) A custom ResNet-based network used as a building block of our model, whose balance of depth and width is optimized for high-resolution medical images. 3) Pretraining the network on screening BI-RADS classification, a related task with more noisy labels. 4) Combining multiple input views in an optimal way among a number of possible choices. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and show that our model is as accurate as experienced radiologists when presented with the same data. We also show that a hybrid model, averaging the probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To further understand our results, we conduct a thorough analysis of our network’s performance on different subpopulations of the screening population, the model’s design, training procedure, errors, and properties of its internal representations. Our best models are publicly available at https://github.com/nyukat/breast_cancer_classifier .

317 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a fully data-driven deep learning algorithm for space interpolation, which can be also easily applied to non-Cartesian trajectories by adding an additional regridding layer.
Abstract: The annihilating filter-based low-rank Hankel matrix approach (ALOHA) is one of the state-of-the-art compressed sensing approaches that directly interpolates the missing ${k}$ -space data using low-rank Hankel matrix completion. The success of ALOHA is due to the concise signal representation in the ${k}$ -space domain, thanks to the duality between structured low-rankness in the ${k}$ -space domain and the image domain sparsity. Inspired by the recent mathematical discovery that links convolutional neural networks to Hankel matrix decomposition using data-driven framelet basis, here we propose a fully data-driven deep learning algorithm for ${k}$ -space interpolation. Our network can be also easily applied to non-Cartesian ${k}$ -space trajectories by simply adding an additional regridding layer. Extensive numerical experiments show that the proposed deep learning method consistently outperforms the existing image-domain deep learning approaches.

261 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a semi-supervised deep learning approach to recover high-resolution (HR) CT images from low resolution (LR) counterparts by enforcing the cycle-consistency in terms of the Wasserstein distance.
Abstract: In this paper, we present a semi-supervised deep learning approach to accurately recover high-resolution (HR) CT images from low-resolution (LR) counterparts. Specifically, with the generative adversarial network (GAN) as the building block, we enforce the cycle-consistency in terms of the Wasserstein distance to establish a nonlinear end-to-end mapping from noisy LR input images to denoised and deblurred HR outputs. We also include the joint constraints in the loss function to facilitate structural preservation. In this process, we incorporate deep convolutional neural network (CNN), residual learning, and network in network techniques for feature extraction and restoration. In contrast to the current trend of increasing network depth and complexity to boost the imaging performance, we apply a parallel ${1}\times {1}$ CNN to compress the output of the hidden layer and optimize the number of layers and the number of filters for each convolutional layer. The quantitative and qualitative evaluative results demonstrate that our proposed model is accurate, efficient and robust for super-resolution (SR) image restoration from noisy LR input images. In particular, we validate our composite SR networks on three large-scale CT datasets, and obtain promising results as compared to the other state-of-the-art methods.

257 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper developed a dual-sampling attention network to automatically diagnose COVID-19 from the community acquired pneumonia (CAP) in chest computed tomography (CT), and proposed a novel online attention module with a 3D convolutional network (CNN) to focus on the infection regions in lungs when making decisions of diagnoses.
Abstract: The coronavirus disease (COVID-19) is rapidly spreading all over the world, and has infected more than 1,436,000 people in more than 200 countries and territories as of April 9, 2020. Detecting COVID-19 at early stage is essential to deliver proper healthcare to the patients and also to protect the uninfected population. To this end, we develop a dual-sampling attention network to automatically diagnose COVID-19 from the community acquired pneumonia (CAP) in chest computed tomography (CT). In particular, we propose a novel online attention module with a 3D convolutional network (CNN) to focus on the infection regions in lungs when making decisions of diagnoses. Note that there exists imbalanced distribution of the sizes of the infection regions between COVID-19 and CAP, partially due to fast progress of COVID-19 after symptom onset. Therefore, we develop a dual-sampling strategy to mitigate the imbalanced learning. Our method is evaluated (to our best knowledge) upon the largest multi-center CT data for COVID-19 from 8 hospitals. In the training-validation stage, we collect 2186 CT scans from 1588 patients for a 5-fold cross-validation. In the testing stage, we employ another independent large-scale testing dataset including 2796 CT scans from 2057 patients. Results show that our algorithm can identify the COVID-19 images with the area under the receiver operating characteristic curve (AUC) value of 0.944, accuracy of 87.5%, sensitivity of 86.9%, specificity of 90.1%, and F1-score of 82.0%. With this performance, the proposed algorithm could potentially aid radiologists with COVID-19 diagnosis from CAP, especially in the early stage of the COVID-19 outbreak.

256 citations


Journal ArticleDOI
Neeraj Kumar1, Ruchika Verma2, Deepak Anand3, Yanning Zhou4, Omer Fahri Onder, E. D. Tsougenis, Hao Chen, Pheng-Ann Heng4, Jiahui Li5, Zhiqiang Hu6, Yunzhi Wang7, Navid Alemi Koohbanani8, Mostafa Jahanifar8, Neda Zamani Tajeddin8, Ali Gooya8, Nasir M. Rajpoot8, Xuhua Ren9, Sihang Zhou10, Qian Wang9, Dinggang Shen10, Cheng-Kun Yang, Chi-Hung Weng, Wei-Hsiang Yu, Chao-Yuan Yeh, Shuang Yang11, Shuoyu Xu12, Pak-Hei Yeung13, Peng Sun12, Amirreza Mahbod14, Gerald Schaefer15, Isabella Ellinger14, Rupert Ecker, Örjan Smedby16, Chunliang Wang16, Benjamin Chidester17, That-Vinh Ton18, Minh-Triet Tran19, Jian Ma17, Minh N. Do18, Simon Graham8, Quoc Dang Vu20, Jin Tae Kwak20, Akshaykumar Gunda21, Raviteja Chunduri3, Corey Hu22, Xiaoyang Zhou23, Dariush Lotfi24, Reza Safdari24, Antanas Kascenas, Alison O'Neil, Dennis Eschweiler25, Johannes Stegmaier25, Yanping Cui26, Baocai Yin, Kailin Chen, Xinmei Tian26, Philipp Gruening27, Erhardt Barth27, Elad Arbel28, Itay Remer28, Amir Ben-Dor28, Ekaterina Sirazitdinova, Matthias Kohl, Stefan Braunewell, Yuexiang Li29, Xinpeng Xie29, Linlin Shen29, Jun Ma30, Krishanu Das Baksi31, Mohammad Azam Khan32, Jaegul Choo32, Adrián Colomer33, Valery Naranjo33, Linmin Pei34, Khan M. Iftekharuddin34, Kaushiki Roy35, Debotosh Bhattacharjee35, Anibal Pedraza36, Maria Gloria Bueno36, Sabarinathan Devanathan37, Saravanan Radhakrishnan37, Praveen Koduganty37, Zihan Wu38, Guanyu Cai39, Xiaojie Liu39, Yuqin Wang39, Amit Sethi3 
TL;DR: Several of the top techniques compared favorably to an individual human annotator and can be used with confidence for nuclear morphometrics as well as heavy data augmentation in the MoNuSeg 2018 challenge.
Abstract: Generalized nucleus segmentation techniques can contribute greatly to reducing the time to develop and validate visual biomarkers for new digital pathology datasets. We summarize the results of MoNuSeg 2018 Challenge whose objective was to develop generalizable nuclei segmentation techniques in digital pathology. The challenge was an official satellite event of the MICCAI 2018 conference in which 32 teams with more than 80 participants from geographically diverse institutes participated. Contestants were given a training set with 30 images from seven organs with annotations of 21,623 individual nuclei. A test dataset with 14 images taken from seven organs, including two organs that did not appear in the training set was released without annotations. Entries were evaluated based on average aggregated Jaccard index (AJI) on the test set to prioritize accurate instance segmentation as opposed to mere semantic segmentation. More than half the teams that completed the challenge outperformed a previous baseline. Among the trends observed that contributed to increased accuracy were the use of color normalization as well as heavy data augmentation. Additionally, fully convolutional networks inspired by variants of U-Net, FCN, and Mask-RCNN were popularly used, typically based on ResNet or VGG base architectures. Watershed segmentation on predicted semantic segmentation maps was a popular post-processing strategy. Several of the top techniques compared favorably to an individual human annotator and can be used with confidence for nuclear morphometrics.

Journal ArticleDOI
TL;DR: A deep stacked transformation approach for domain generalization that can be generalized to the design of highly robust deep segmentation models for clinical deployment and reaches the performance of state-of theart fully supervised models that are trained and tested on their source domains.
Abstract: Recent advances in deep learning for medical image segmentation demonstrate expert-level accuracy. However, application of these models in clinically realistic environments can result in poor generalization and decreased accuracy, mainly due to the domain shift across different hospitals, scanner vendors, imaging protocols, and patient populations etc. Common transfer learning and domain adaptation techniques are proposed to address this bottleneck. However, these solutions require data (and annotations) from the target domain to retrain the model, and is therefore restrictive in practice for widespread model deployment. Ideally, we wish to have a trained (locked) model that can work uniformly well across unseen domains without further training. In this paper, we propose a deep stacked transformation approach for domain generalization. Specifically, a series of ${n}$ stacked transformations are applied to each image during network training. The underlying assumption is that the “expected” domain shift for a specific medical imaging modality could be simulated by applying extensive data augmentation on a single source domain, and consequently, a deep model trained on the augmented “big” data (BigAug) could generalize well on unseen domains. We exploit four surprisingly effective, but previously understudied, image-based characteristics for data augmentation to overcome the domain generalization problem. We train and evaluate the BigAug model (with ${n}={9}$ transformations) on three different 3D segmentation tasks (prostate gland, left atrial, left ventricle) covering two medical imaging modalities (MRI and ultrasound) involving eight publicly available challenge datasets. The results show that when training on relatively small dataset (n = 10~32 volumes, depending on the size of the available datasets) from a single source domain: (i) BigAug models degrade an average of 11%(Dice score change) from source to unseen domain, substantially better than conventional augmentation (degrading 39%) and CycleGAN-based domain adaptation method (degrading 25%), (ii) BigAug is better than “shallower” stacked transforms (i.e. those with fewer transforms) on unseen domains and demonstrates modest improvement to conventional augmentation on the source domain, (iii) after training with BigAug on one source domain, performance on an unseen domain is similar to training a model from scratch on that domain when using the same number of training samples. When training on large datasets (n = 465 volumes) with BigAug, (iv) application to unseen domains reaches the performance of state-of-the-art fully supervised models that are trained and tested on their source domains. These findings establish a strong benchmark for the study of domain generalization in medical imaging, and can be generalized to the design of highly robust deep segmentation models for clinical deployment.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed novel Context Pyramid Fusion Network (named CPFNet) is very competitive with other state-of-the-art methods on four different challenging tasks, including skin lesion segmentation, retinal linear lesion segmentsation, multi-class segmentation of thoracic organs at risk and multi- class segmentsation of retinal edema lesions.
Abstract: Accurate and automatic segmentation of medical images is a crucial step for clinical diagnosis and analysis. The convolutional neural network (CNN) approaches based on the U-shape structure have achieved remarkable performances in many different medical image segmentation tasks. However, the context information extraction capability of single stage is insufficient in this structure, due to the problems such as imbalanced class and blurred boundary. In this paper, we propose a novel Context Pyramid Fusion Network (named CPFNet) by combining two pyramidal modules to fuse global/multi-scale context information. Based on the U-shape structure, we first design multiple global pyramid guidance (GPG) modules between the encoder and the decoder, aiming at providing different levels of global context information for the decoder by reconstructing skip-connection. We further design a scale-aware pyramid fusion (SAPF) module to dynamically fuse multi-scale context information in high-level features. These two pyramidal modules can exploit and fuse rich context information progressively. Experimental results show that our proposed method is very competitive with other state-of-the-art methods on four different challenging tasks, including skin lesion segmentation, retinal linear lesion segmentation, multi-class segmentation of thoracic organs at risk and multi-class segmentation of retinal edema lesions.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a novel unsupervised domain adaptation framework, named as synergistic image and feature alignment (SIFA), to effectively adapt a segmentation network to an unlabeled target domain.
Abstract: Unsupervised domain adaptation has increasingly gained interest in medical image computing, aiming to tackle the performance degradation of deep neural networks when being deployed to unseen data with heterogeneous characteristics. In this work, we present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA) , to effectively adapt a segmentation network to an unlabeled target domain. Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives. In particular, we simultaneously transform the appearance of images across domains and enhance domain-invariance of the extracted features by leveraging adversarial learning in multiple aspects and with a deeply supervised mechanism. The feature encoder is shared between both adaptive perspectives to leverage their mutual benefits via end-to-end learning. We have extensively evaluated our method with cardiac substructure segmentation and abdominal multi-organ segmentation for bidirectional cross-modality adaptation between MRI and CT images. Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images, and outperforms the state-of-the-art domain adaptation approaches by a large margin.

Journal ArticleDOI
TL;DR: This paper proposes an attention-based deep 3D multiple instance learning (AD3D-MIL) where a patient-level label is assigned to a 3D chest CT that is viewed as a bag of instances, which can semantically generate deep3D instances following the possible infection area.
Abstract: Automated Screening of COVID-19 from chest CT is of emergency and importance during the outbreak of SARS-CoV-2 worldwide in 2020. However, accurate screening of COVID-19 is still a massive challenge due to the spatial complexity of 3D volumes, the labeling difficulty of infection areas, and the slight discrepancy between COVID-19 and other viral pneumonia in chest CT. While a few pioneering works have made significant progress, they are either demanding manual annotations of infection areas or lack of interpretability. In this paper, we report our attempt towards achieving highly accurate and interpretable screening of COVID-19 from chest CT with weak labels. We propose an attention-based deep 3D multiple instance learning (AD3D-MIL) where a patient-level label is assigned to a 3D chest CT that is viewed as a bag of instances. AD3D-MIL can semantically generate deep 3D instances following the possible infection area. AD3D-MIL further applies an attention-based pooling approach to 3D instances to provide insight into each instance’s contribution to the bag label. AD3D-MIL finally learns Bernoulli distributions of the bag-level labels for more accessible learning. We collected 460 chest CT examples: 230 CT examples from 79 patients with COVID-19, 100 CT examples from 100 patients with common pneumonia, and 130 CT examples from 130 people without pneumonia. A series of empirical studies show that our algorithm achieves an overall accuracy of 97.9%, AUC of 99.0%, and Cohen kappa score of 95.7%. These advantages endow our algorithm as an efficient assisted tool in the screening of COVID-19.

Journal ArticleDOI
TL;DR: This paper presents novel loss functions for training convolutional neural network (CNN)-based segmentation methods with the goal of reducing HD directly, and suggests three loss functions that can be used for training to reduce HD.
Abstract: The Hausdorff Distance (HD) is widely used in evaluating medical image segmentation methods. However, the existing segmentation methods do not attempt to reduce HD directly. In this paper, we present novel loss functions for training convolutional neural network (CNN)-based segmentation methods with the goal of reducing HD directly. We propose three methods to estimate HD from the segmentation probability map produced by a CNN. One method makes use of the distance transform of the segmentation boundary. Another method is based on applying morphological erosion on the difference between the true and estimated segmentation maps. The third method works by applying circular/spherical convolution kernels of different radii on the segmentation probability maps. Based on these three methods for estimating HD, we suggest three loss functions that can be used for training to reduce HD. We use these loss functions to train CNNs for segmentation of the prostate, liver, and pancreas in ultrasound, magnetic resonance, and computed tomography images and compare the results with commonly-used loss functions. Our results show that the proposed loss functions can lead to approximately 18–45% reduction in HD without degrading other segmentation performance criteria such as the Dice similarity coefficient. The proposed loss functions can be used for training medical image segmentation methods in order to reduce the large segmentation errors.

Journal ArticleDOI
Hyunseok Seo1, Charles Huang1, Maxime Bassenne1, Ruoxiu Xiao1, Lei Xing1 
TL;DR: In this paper, a residual path with deconvolution and activation operations was added to the skip connection of the U-Net to avoid duplication of low-resolution information of features.
Abstract: Segmentation of livers and liver tumors is one of the most important steps in radiation therapy of hepatocellular carcinoma. The segmentation task is often done manually, making it tedious, labor intensive, and subject to intra-/inter- operator variations. While various algorithms for delineating organ-at-risks (OARs) and tumor targets have been proposed, automatic segmentation of livers and liver tumors remains intractable due to their low tissue contrast with respect to the surrounding organs and their deformable shape in CT images. The U-Net has gained increasing popularity recently for image analysis tasks and has shown promising results. Conventional U-Net architectures, however, suffer from three major drawbacks. First, skip connections allow for the duplicated transfer of low resolution information in feature maps to improve efficiency in learning, but this often leads to blurring of extracted image features. Secondly, high level features extracted by the network often do not contain enough high resolution edge information of the input, leading to greater uncertainty where high resolution edge dominantly affects the network’s decisions such as liver and liver-tumor segmentation. Thirdly, it is generally difficult to optimize the number of pooling operations in order to extract high level global features, since the number of pooling operations used depends on the object size. To cope with these problems, we added a residual path with deconvolution and activation operations to the skip connection of the U-Net to avoid duplication of low resolution information of features. In the case of small object inputs, features in the skip connection are not incorporated with features in the residual path. Furthermore, the proposed architecture has additional convolution layers in the skip connection in order to extract high level global features of small object inputs as well as high level features of high resolution edge information of large object inputs. Efficacy of the modified U-Net (mU-Net) was demonstrated using the public dataset of Liver tumor segmentation (LiTS) challenge 2017. For liver-tumor segmentation, Dice similarity coefficient (DSC) of 89.72 %, volume of error (VOE) of 21.93 %, and relative volume difference (RVD) of − 0.49 % were obtained. For liver segmentation, DSC of 98.51 %, VOE of 3.07 %, and RVD of 0.26 % were calculated. For the public 3D Image Reconstruction for Comparison of Algorithm Database (3Dircadb), DSCs were 96.01 % for the liver and 68.14 % for liver-tumor segmentations, respectively. The proposed mU-Net outperformed existing state-of-art networks.

Journal ArticleDOI
TL;DR: This paper proposes the mutual bootstrapping deep convolutional neural networks (MB-DCNN) model, which is superior to the performance of representative state-of-the-art skin lesion segmentation and classification methods.
Abstract: Automated skin lesion segmentation and classification are two most essential and related tasks in the computer-aided diagnosis of skin cancer. Despite their prevalence, deep learning models are usually designed for only one task, ignoring the potential benefits in jointly performing both tasks. In this paper, we propose the mutual bootstrapping deep convolutional neural networks (MB-DCNN) model for simultaneous skin lesion segmentation and classification. This model consists of a coarse segmentation network (coarse-SN), a mask-guided classification network (mask-CN), and an enhanced segmentation network (enhanced-SN). On one hand, the coarse-SN generates coarse lesion masks that provide a prior bootstrapping for mask-CN to help it locate and classify skin lesions accurately. On the other hand, the lesion localization maps produced by mask-CN are then fed into enhanced–SN, aiming to transfer the localization information learned by mask-CN to enhanced-SN for accurate lesion segmentation. In this way, both segmentation and classification networks mutually transfer knowledge between each other and facilitate each other in a bootstrapping way. Meanwhile, we also design a novel rank loss and jointly use it with the Dice loss in segmentation networks to address the issues caused by class imbalance and hard-easy pixel imbalance. We evaluate the proposed MB-DCNN model on the ISIC-2017 and PH2 datasets, and achieve a Jaccard index of 80.4% and 89.4% in skin lesion segmentation and an average AUC of 93.8% and 97.7% in skin lesion classification, which are superior to the performance of representative state-of-the-art skin lesion segmentation and classification methods. Our results suggest that it is possible to boost the performance of skin lesion segmentation and classification simultaneously via training a unified model to perform both tasks in a mutual bootstrapping way.

Journal ArticleDOI
TL;DR: A novel cross-disease attention network (CANet) is presented to jointly grade DR and DME by exploring the internal relationship between the diseases with only image-level supervision and achieves the best result on the ISBI 2018 IDRiD challenge dataset and outperforms other methods on the Messidor dataset.
Abstract: Diabetic retinopathy (DR) and diabetic macular edema (DME) are the leading causes of permanent blindness in the working-age population. Automatic grading of DR and DME helps ophthalmologists design tailored treatments to patients, thus is of vital importance in the clinical practice. However, prior works either grade DR or DME, and ignore the correlation between DR and its complication, i.e ., DME. Moreover, the location information, e.g ., macula and soft hard exhaust annotations, are widely used as a prior for grading. Such annotations are costly to obtain, hence it is desirable to develop automatic grading methods with only image-level supervision. In this article, we present a novel cross-disease attention network (CANet) to jointly grade DR and DME by exploring the internal relationship between the diseases with only image-level supervision. Our key contributions include the disease-specific attention module to selectively learn useful features for individual diseases, and the disease-dependent attention module to further capture the internal relationship between the two diseases. We integrate these two attention modules in a deep network to produce disease-specific and disease-dependent features, and to maximize the overall performance jointly for grading DR and DME. We evaluate our network on two public benchmark datasets, i.e ., ISBI 2018 IDRiD challenge dataset and Messidor dataset. Our method achieves the best result on the ISBI 2018 IDRiD challenge dataset and outperforms other methods on the Messidor dataset. Our code is publicly available at https://github.com/xmengli999/CANet .

Journal ArticleDOI
TL;DR: A conceptually simple framework that can efficiently predict whether or not a CT scan contains pneumonia while simultaneously identifying pneumonia types between COVID-19 and Interstitial Lung Disease (ILD) caused by other viruses is proposed.
Abstract: We propose a conceptually simple framework for fast COVID-19 screening in 3D chest CT images. The framework can efficiently predict whether or not a CT scan contains pneumonia while simultaneously identifying pneumonia types between COVID-19 and Interstitial Lung Disease (ILD) caused by other viruses. In the proposed method, two 3D-ResNets are coupled together into a single model for the two above-mentioned tasks via a novel prior-attention strategy. We extend residual learning with the proposed prior-attention mechanism and design a new so-called prior-attention residual learning (PARL) block. The model can be easily built by stacking the PARL blocks and trained end-to-end using multi-task losses. More specifically, one 3D-ResNet branch is trained as a binary classifier using lung images with and without pneumonia so that it can highlight the lesion areas within the lungs. Simultaneously, inside the PARL blocks, prior-attention maps are generated from this branch and used to guide another branch to learn more discriminative representations for the pneumonia-type classification. Experimental results demonstrate that the proposed framework can significantly improve the performance of COVID-19 screening. Compared to other methods, it achieves a state-of-the-art result. Moreover, the proposed method can be easily extended to other similar clinical applications such as computer-aided detection and diagnosis of pulmonary nodules in CT images, glaucoma lesions in Retina fundus images, etc.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a multi-site network (MS-Net) for improving prostate segmentation by learning robust representations, leveraging multiple sources of data, and developed Domain Specific Batch Normalization layers in the network backbone, enabling the network to estimate statistics and perform feature normalization for each site separately.
Abstract: Automated prostate segmentation in MRI is highly demanded for computer-assisted diagnosis. Recently, a variety of deep learning methods have achieved remarkable progress in this task, usually relying on large amounts of training data. Due to the nature of scarcity for medical images, it is important to effectively aggregate data from multiple sites for robust model training, to alleviate the insufficiency of single-site samples. However, the prostate MRIs from different sites present heterogeneity due to the differences in scanners and imaging protocols, raising challenges for effective ways of aggregating multi-site data for network training. In this paper, we propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations, leveraging multiple sources of data. To compensate for the inter-site heterogeneity of different MRI datasets, we develop Domain-Specific Batch Normalization layers in the network backbone, enabling the network to estimate statistics and perform feature normalization for each site separately. Considering the difficulty of capturing the shared knowledge from multiple datasets, a novel learning paradigm, i.e. , Multi-site-guided Knowledge Transfer, is proposed to enhance the kernels to extract more generic representations from multi-site data. Extensive experiments on three heterogeneous prostate MRI datasets demonstrate that our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.

Journal ArticleDOI
TL;DR: Pathomic fusion as discussed by the authors proposes an interpretable strategy for end-to-end multimodal fusion of histology image and genomic (mutations, CNV, RNASeq) features for survival outcome prediction.
Abstract: Cancer diagnosis, prognosis, and therapeutic response predictions are based on morphological information from histology slides and molecular profiles from genomic data. However, most deep learning-based objective outcome prediction and grading paradigms are based on histology or genomics alone and do not make use of the complementary information in an intuitive manner. In this work, we propose Pathomic Fusion, an interpretable strategy for end-to-end multimodal fusion of histology image and genomic (mutations, CNV, RNASeq) features for survival outcome prediction. Our approach models pairwise feature interactions across modalities by taking the Kronecker product of unimodal feature representations, and controls the expressiveness of each representation via a gatingbased attention mechanism. Following supervised learning, we are able to interpret and saliently localize features across each modality, and understand how feature importance shifts when conditioning on multimodal input. We validate our approach using glioma and clear cell renal cell carcinoma datasets from the Cancer Genome Atlas (TCGA), which contains paired wholeslide image, genotype, and transcriptome data with ground truth survival and histologic grade labels. In a 15-fold cross-validation, our results demonstrate that the proposed multimodal fusion paradigm improves prognostic determinations from ground truth grading and molecular subtyping, as well as unimodal deep networks trained on histology and genomic data alone. The proposed method establishes insight and theory on how to train deep networks on multimodal biomedical data in an intuitive manner, which will be useful for other problems in medicine that seek to combine heterogeneous data streams for understanding diseases and predicting response and resistance to treatment. Code and trained models are made available at: https://github.com/mahmoodlab/PathomicFusion.

Journal ArticleDOI
TL;DR: A novel Hybrid-fusion Network (Hi-Net) is proposed for multi-modal MR image synthesis, which learns a mapping from multi- modal source images to target images, and effectively exploits the correlations among multiple modalities.
Abstract: Magnetic resonance imaging (MRI) is a widely used neuroimaging technique that can provide images of different contrasts ( i.e. , modalities). Fusing this multi-modal data has proven particularly effective for boosting model performance in many tasks. However, due to poor data quality and frequent patient dropout, collecting all modalities for every patient remains a challenge. Medical image synthesis has been proposed as an effective solution, where any missing modalities are synthesized from the existing ones. In this paper, we propose a novel Hybrid-fusion Network (Hi-Net) for multi-modal MR image synthesis, which learns a mapping from multi-modal source images ( i.e. , existing modalities) to target images ( i.e. , missing modalities). In our Hi-Net, a modality-specific network is utilized to learn representations for each individual modality, and a fusion network is employed to learn the common latent representation of multi-modal data. Then, a multi-modal synthesis network is designed to densely combine the latent representation with hierarchical features from each modality, acting as a generator to synthesize the target images. Moreover, a layer-wise multi-modal fusion strategy effectively exploits the correlations among multiple modalities, where a Mixed Fusion Block (MFB) is proposed to adaptively weight different fusion strategies. Extensive experiments demonstrate the proposed model outperforms other state-of-the-art medical image synthesis methods.

Journal ArticleDOI
TL;DR: In this article, a conditional generative adversarial network (cGAN) was used to segment nuclei mymargin using synthetic and real histopathology data, and the network was trained with spectral normalization and gradient penalty for nuclei segmentation.
Abstract: Nuclei mymargin segmentation is a fundamental task for various computational pathology applications including nuclei morphology analysis, cell type classification, and cancer grading. Deep learning has emerged as a powerful approach to segmenting nuclei but the accuracy of convolutional neural networks (CNNs) depends on the volume and the quality of labeled histopathology data for training. In particular, conventional CNN-based approaches lack structured prediction capabilities, which are required to distinguish overlapping and clumped nuclei. Here, we present an approach to nuclei segmentation that overcomes these challenges by utilizing a conditional generative adversarial network (cGAN) trained with synthetic and real data. We generate a large dataset of H&E training images with perfect nuclei segmentation labels using an unpaired GAN framework. This synthetic data along with real histopathology data from six different organs are used to train a conditional GAN with spectral normalization and gradient penalty for nuclei segmentation. This adversarial regression framework enforces higher-order spacial-consistency when compared to conventional CNN models. We demonstrate that this nuclei segmentation approach generalizes across different organs, sites, patients and disease states, and outperforms conventional approaches, especially in isolating individual and overlapping nuclei.

Journal ArticleDOI
TL;DR: In this article, a unified latent representation is learned which can completely encode information from different aspects of features and is endowed with promising class structure for separability, while the completeness is guaranteed with a group of backward neural networks (each for one type of features), while by using class labels the representation is enforced to be compact within COVID-19/community-acquired pneumonia (CAP).
Abstract: Recently, the outbreak of Coronavirus Disease 2019 (COVID-19) has spread rapidly across the world. Due to the large number of infected patients and heavy labor for doctors, computer-aided diagnosis with machine learning algorithm is urgently needed, and could largely reduce the efforts of clinicians and accelerate the diagnosis process. Chest computed tomography (CT) has been recognized as an informative tool for diagnosis of the disease. In this study, we propose to conduct the diagnosis of COVID-19 with a series of features extracted from CT images. To fully explore multiple features describing CT images from different views, a unified latent representation is learned which can completely encode information from different aspects of features and is endowed with promising class structure for separability. Specifically, the completeness is guaranteed with a group of backward neural networks (each for one type of features), while by using class labels the representation is enforced to be compact within COVID-19/community-acquired pneumonia (CAP) and also a large margin is guaranteed between different types of pneumonia. In this way, our model can well avoid overfitting compared to the case of directly projecting high-dimensional features into classes. Extensive experimental results show that the proposed method outperforms all comparison methods, and rather stable performances are observed when varying the number of training data.

Journal ArticleDOI
TL;DR: A novel 3D self-attention convolutional neural network for the LDCT denoising problem and a self-supervised learning scheme to train a domain-specific autoencoder as the perceptual loss function are proposed.
Abstract: Computed tomography (CT) is a widely used screening and diagnostic tool that allows clinicians to obtain a high-resolution, volumetric image of internal structures in a non-invasive manner. Increasingly, efforts have been made to improve the image quality of low-dose CT (LDCT) to reduce the cumulative radiation exposure of patients undergoing routine screening exams. The resurgence of deep learning has yielded a new approach for noise reduction by training a deep multi-layer convolutional neural networks (CNN) to map the low-dose to normal-dose CT images. However, CNN-based methods heavily rely on convolutional kernels, which use fixed-size filters to process one local neighborhood within the receptive field at a time. As a result, they are not efficient at retrieving structural information across large regions. In this paper, we propose a novel 3D self-attention convolutional neural network for the LDCT denoising problem. Our 3D self-attention module leverages the 3D volume of CT images to capture a wide range of spatial information both within CT slices and between CT slices. With the help of the 3D self-attention module, CNNs are able to leverage pixels with stronger relationships regardless of their distance and achieve better denoising results. In addition, we propose a self-supervised learning scheme to train a domain-specific autoencoder as the perceptual loss function. We combine these two methods and demonstrate their effectiveness on both CNN-based neural networks and WGAN-based neural networks with comprehensive experiments. Tested on the AAPM-Mayo Clinic Low Dose CT Grand Challenge data set, our experiments demonstrate that self-attention (SA) module and autoencoder (AE) perceptual loss function can efficiently enhance traditional CNNs and can achieve comparable or better results than the state-of-the-art methods.

Journal ArticleDOI
TL;DR: It is confirmed that metric-sensitive losses are superior to cross-entropy based loss functions in case of evaluation with Dice Score or Jaccard Index in a multi-class setting, and across different object sizes and foreground/background ratios.
Abstract: In many medical imaging and classical computer vision tasks, the Dice score and Jaccard index are used to evaluate the segmentation performance. Despite the existence and great empirical success of metric-sensitive losses, i.e. relaxations of these metrics such as soft Dice, soft Jaccard and Lovasz-Softmax, many researchers still use per-pixel losses, such as (weighted) cross-entropy to train CNNs for segmentation. Therefore, the target metric is in many cases not directly optimized. We investigate from a theoretical perspective, the relation within the group of metric-sensitive loss functions and question the existence of an optimal weighting scheme for weighted cross-entropy to optimize the Dice score and Jaccard index at test time. We find that the Dice score and Jaccard index approximate each other relatively and absolutely, but we find no such approximation for a weighted Hamming similarity. For the Tversky loss, the approximation gets monotonically worse when deviating from the trivial weight setting where soft Tversky equals soft Dice. We verify these results empirically in an extensive validation on six medical segmentation tasks and can confirm that metric-sensitive losses are superior to cross-entropy based loss functions in case of evaluation with Dice Score or Jaccard Index. This further holds in a multi-class setting, and across different object sizes and foreground/background ratios. These results encourage a wider adoption of metric-sensitive loss functions for medical segmentation tasks where the performance measure of interest is the Dice score or Jaccard index.

Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed a relation-driven semi-supervised framework for medical image classification, which leverages a self-ensembling model to produce high-quality consistency targets for the unlabeled data.
Abstract: Training deep neural networks usually requires a large amount of labeled data to obtain good performance. However, in medical image analysis, obtaining high-quality labels for the data is laborious and expensive, as accurately annotating medical images demands expertise knowledge of the clinicians. In this paper, we present a novel relation-driven semi-supervised framework for medical image classification. It is a consistency-based method which exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations, and leverages a self-ensembling model to produce high-quality consistency targets for the unlabeled data. Considering that human diagnosis often refers to previous analogous cases to make reliable decisions, we introduce a novel sample relation consistency (SRC) paradigm to effectively exploit unlabeled data by modeling the relationship information among different samples. Superior to existing consistency-based methods which simply enforce consistency of individual predictions, our framework explicitly enforces the consistency of semantic relation among different samples under perturbations, encouraging the model to explore extra semantic information from unlabeled data. We have conducted extensive experiments to evaluate our method on two public benchmark medical image classification datasets, i.e. , skin lesion diagnosis with ISIC 2018 challenge and thorax disease classification with ChestX-ray14. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.

Journal ArticleDOI
TL;DR: A fully-automatic, rapid, accurate, and machine-agnostic method that can segment and quantify the infection regions on CT scans from different sources is proposed and its important application value in combating the disease is suggested.
Abstract: COVID-19 has caused a global pandemic and become the most urgent threat to the entire world. Tremendous efforts and resources have been invested in developing diagnosis, prognosis and treatment strategies to combat the disease. Although nucleic acid detection has been mainly used as the gold standard to confirm this RNA virus-based disease, it has been shown that such a strategy has a high false negative rate, especially for patients in the early stage, and thus CT imaging has been applied as a major diagnostic modality in confirming positive COVID-19. Despite the various, urgent advances in developing artificial intelligence (AI)-based computer-aided systems for CT-based COVID-19 diagnosis, most of the existing methods can only perform classification, whereas the state-of-the-art segmentation method requires a high level of human intervention. In this paper, we propose a fully-automatic, rapid, accurate, and machine-agnostic method that can segment and quantify the infection regions on CT scans from different sources. Our method is founded upon two innovations: 1) the first CT scan simulator for COVID-19, by fitting the dynamic change of real patients’ data measured at different time points, which greatly alleviates the data scarcity issue; and 2) a novel deep learning algorithm to solve the large-scene-small-object problem, which decomposes the 3D segmentation problem into three 2D ones, and thus reduces the model complexity by an order of magnitude and, at the same time, significantly improves the segmentation accuracy. Comprehensive experimental results over multi-country, multi-hospital, and multi-machine datasets demonstrate the superior performance of our method over the existing ones and suggest its important application value in combating the disease.

Journal ArticleDOI
TL;DR: The proposed AG-CNN approach significantly advances the state-of-the-art in glaucoma detection, and the features are also visualized as the localized pathological area, which are further added in theAG-CNN structure to enhance the glauca detection performance.
Abstract: Glaucoma is one of the leading causes of irreversible vision loss. Many approaches have recently been proposed for automatic glaucoma detection based on fundus images. However, none of the existing approaches can efficiently remove high redundancy in fundus images for glaucoma detection, which may reduce the reliability and accuracy of glaucoma detection. To avoid this disadvantage, this paper proposes an attention-based convolutional neural network (CNN) for glaucoma detection, called AG-CNN. Specifically, we first establish a large-scale attention-based glaucoma (LAG) database, which includes 11 760 fundus images labeled as either positive glaucoma (4878) or negative glaucoma (6882). Among the 11 760 fundus images, the attention maps of 5824 images are further obtained from ophthalmologists through a simulated eye-tracking experiment. Then, a new structure of AG-CNN is designed, including an attention prediction subnet, a pathological area localization subnet, and a glaucoma classification subnet. The attention maps are predicted in the attention prediction subnet to highlight the salient regions for glaucoma detection, under a weakly supervised training manner. In contrast to other attention-based CNN methods, the features are also visualized as the localized pathological area, which are further added in our AG-CNN structure to enhance the glaucoma detection performance. Finally, the experiment results from testing over our LAG database and another public glaucoma database show that the proposed AG-CNN approach significantly advances the state-of-the-art in glaucoma detection.