scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Medical Imaging in 2021"


Journal ArticleDOI
TL;DR: The proposed CAAD model outperforms binary classification models on the clinical X-VIRAL dataset that contains 5,977 viral pneumonia (no COVID-19) cases, 37,393 non-viral pneumonia or healthy cases and achieves an AUC of 83.61% and sensitivity of 71.70%, which is comparable to the performance of radiologists reported in the literature.
Abstract: Clusters of viral pneumonia occurrences over a short period may be a harbinger of an outbreak or pandemic. Rapid and accurate detection of viral pneumonia using chest X-rays can be of significant value for large-scale screening and epidemic prevention, particularly when other more sophisticated imaging modalities are not readily accessible. However, the emergence of novel mutated viruses causes a substantial dataset shift, which can greatly limit the performance of classification-based approaches. In this paper, we formulate the task of differentiating viral pneumonia from non-viral pneumonia and healthy controls into a one-class classification-based anomaly detection problem. We therefore propose the confidence-aware anomaly detection (CAAD) model, which consists of a shared feature extractor, an anomaly detection module, and a confidence prediction module. If the anomaly score produced by the anomaly detection module is large enough, or the confidence score estimated by the confidence prediction module is small enough, the input will be accepted as an anomaly case ( i.e. , viral pneumonia). The major advantage of our approach over binary classification is that we avoid modeling individual viral pneumonia classes explicitly and treat all known viral pneumonia cases as anomalies to improve the one-class model. The proposed model outperforms binary classification models on the clinical X-VIRAL dataset that contains 5,977 viral pneumonia (no COVID-19) cases, 37,393 non-viral pneumonia or healthy cases. Moreover, when directly testing on the X-COVID dataset that contains 106 COVID-19 cases and 107 normal controls without any fine-tuning, our model achieves an AUC of 83.61% and sensitivity of 71.70%, which is comparable to the performance of radiologists reported in the literature.

206 citations


Journal ArticleDOI
TL;DR: CA-Net as mentioned in this paper proposes a joint spatial attention module to make the network focus more on the foreground region and a novel channel attention module is proposed to adaptively recalibrate channel-wise feature responses and highlight the most relevant feature channels.
Abstract: Accurate medical image segmentation is essential for diagnosis and treatment planning of diseases. Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance for automatic medical image segmentation. However, they are still challenged by complicated conditions where the segmentation target has large variations of position, shape and scale, and existing CNNs have a poor explainability that limits their application to clinical decisions. In this work, we make extensive use of multiple attentions in a CNN architecture and propose a comprehensive attention-based CNN (CA-Net) for more accurate and explainable medical image segmentation that is aware of the most important spatial positions, channels and scales at the same time. In particular, we first propose a joint spatial attention module to make the network focus more on the foreground region. Then, a novel channel attention module is proposed to adaptively recalibrate channel-wise feature responses and highlight the most relevant feature channels. Also, we propose a scale attention module implicitly emphasizing the most salient feature maps among multiple scales so that the CNN is adaptive to the size of an object. Extensive experiments on skin lesion segmentation from ISIC 2018 and multi-class segmentation of fetal MRI found that our proposed CA-Net significantly improved the average segmentation Dice score from 87.77% to 92.08% for skin lesion, 84.79% to 87.08% for the placenta and 93.20% to 95.88% for the fetal brain respectively compared with U-Net. It reduced the model size to around 15 times smaller with close or even better accuracy compared with state-of-the-art DeepLabv3+. In addition, it has a much higher explainability than existing networks by visualizing the attention weight maps. Our code is available at https://github.com/HiLab-git/CA-Net .

205 citations


Journal ArticleDOI
Along He1, Tao Li1, Ning Li1, Kai Wang1, Huazhu Fu 
TL;DR: A novel Category Attention Block (CAB), which explores more discriminative region-wise features for each DR grade and treats each category equally, and the GAB, which can exploit detailed and class-agnostic global attention feature maps for fundus images.
Abstract: Diabetic Retinopathy (DR) grading is challenging due to the presence of intra-class variations, small lesions and imbalanced data distributions. The key for solving fine-grained DR grading is to find more discriminative features corresponding to subtle visual differences, such as microaneurysms, hemorrhages and soft exudates. However, small lesions are quite difficult to identify using traditional convolutional neural networks (CNNs), and an imbalanced DR data distribution will cause the model to pay too much attention to DR grades with more samples, greatly affecting the final grading performance. In this article, we focus on developing an attention module to address these issues. Specifically, for imbalanced DR data distributions, we propose a novel Category Attention Block (CAB), which explores more discriminative region-wise features for each DR grade and treats each category equally. In order to capture more detailed small lesion information, we also propose the Global Attention Block (GAB), which can exploit detailed and class-agnostic global attention feature maps for fundus images. By aggregating the attention blocks with a backbone network, the CABNet is constructed for DR grading. The attention blocks can be applied to a wide range of backbone networks and trained efficiently in an end-to-end manner. Comprehensive experiments are conducted on three publicly available datasets, showing that CABNet produces significant performance improvements for existing state-of-the-art deep architectures with few additional parameters and achieves the state-of-the-art results for DR grading. Code and models will be available at https://github.com/he2016012996/CABnet .

130 citations


Journal ArticleDOI
TL;DR: The results of the MICCAI 2020 Challenge on generalizable deep learning for cardiac segmentation are presented in this article, where 14 teams submitted different solutions to the problem, combining various baseline models, data augmentation strategies, and domain adaptation techniques.
Abstract: The emergence of deep learning has considerably advanced the state-of-the-art in cardiac magnetic resonance (CMR) segmentation. Many techniques have been proposed over the last few years, bringing the accuracy of automated segmentation close to human performance. However, these models have been all too often trained and validated using cardiac imaging samples from single clinical centres or homogeneous imaging protocols. This has prevented the development and validation of models that are generalizable across different clinical centres, imaging conditions or scanner vendors. To promote further research and scientific benchmarking in the field of generalizable deep learning for cardiac segmentation, this paper presents the results of the Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation (M&Ms) Challenge, which was recently organized as part of the MICCAI 2020 Conference. A total of 14 teams submitted different solutions to the problem, combining various baseline models, data augmentation strategies, and domain adaptation techniques. The obtained results indicate the importance of intensity-driven data augmentation, as well as the need for further research to improve generalizability towards unseen scanner vendors or new imaging protocols. Furthermore, we present a new resource of 375 heterogeneous CMR datasets acquired by using four different scanner vendors in six hospitals and three different countries (Spain, Canada and Germany), which we provide as open-access for the community to enable future research in the field.

127 citations


Journal ArticleDOI
TL;DR: In this article, the second fastMRI challenge was held, which focused on pathological assessment in brain images and required participants to submit models evaluated on MRI scanners from outside the training set.
Abstract: Accelerating MRI scans is one of the principal outstanding problems in the MRI research community. Towards this goal, we hosted the second fastMRI competition targeted towards reconstructing MR images with subsampled k-space data. We provided participants with data from 7,299 clinical brain scans (de-identified via a HIPAA-compliant procedure by NYU Langone Health), holding back the fully-sampled data from 894 of these scans for challenge evaluation purposes. In contrast to the 2019 challenge, we focused our radiologist evaluations on pathological assessment in brain images. We also debuted a new Transfer track that required participants to submit models evaluated on MRI scanners from outside the training set. We received 19 submissions from eight different groups. Results showed one team scoring best in both SSIM scores and qualitative radiologist evaluations. We also performed analysis on alternative metrics to mitigate the effects of background noise and collected feedback from the participants to inform future challenges. Lastly, we identify common failure modes across the submissions, highlighting areas of need for future research in the MRI reconstruction community.

124 citations


Journal ArticleDOI
TL;DR: In this article, a split-based coarse-to-fine vessel segmentation network for OCTA images (OCTA-Net) was proposed, with the ability to detect thick and thin vessels separately.
Abstract: Optical Coherence Tomography Angiography (OCTA) is a non-invasive imaging technique that has been increasingly used to image the retinal vasculature at capillary level resolution. However, automated segmentation of retinal vessels in OCTA has been under-studied due to various challenges such as low capillary visibility and high vessel complexity, despite its significance in understanding many vision-related diseases. In addition, there is no publicly available OCTA dataset with manually graded vessels for training and validation of segmentation algorithms. To address these issues, for the first time in the field of retinal image analysis we construct a dedicated Retinal OCTA SEgmentation dataset (ROSE), which consists of 229 OCTA images with vessel annotations at either centerline-level or pixel level. This dataset with the source code has been released for public access to assist researchers in the community in undertaking research in related topics. Secondly, we introduce a novel split-based coarse-to-fine vessel segmentation network for OCTA images (OCTA-Net), with the ability to detect thick and thin vessels separately. In the OCTA-Net, a split-based coarse segmentation module is first utilized to produce a preliminary confidence map of vessels, and a split-based refined segmentation module is then used to optimize the shape/contour of the retinal microvasculature. We perform a thorough evaluation of the state-of-the-art vessel segmentation models and our OCTA-Net on the constructed ROSE dataset. The experimental results demonstrate that our OCTA-Net yields better vessel segmentation performance in OCTA than both traditional and other deep learning methods. In addition, we provide a fractal dimension analysis on the segmented microvasculature, and the statistical analysis demonstrates significant differences between the healthy control and Alzheimer’s Disease group. This consolidates that the analysis of retinal microvasculature may offer a new scheme to study various neurodegenerative diseases.

94 citations


Journal ArticleDOI
TL;DR: Deep Ultrasound Localization Microscopy (Deep-ULM) as discussed by the authors employs a convolutional neural network to perform localization microscopy in dense scenarios, learning the nonlinear image-domain implications of overlapping RF signals originating from such sets of closely spaced microbubbles.
Abstract: Ultrasound localization microscopy has enabled super-resolution vascular imaging through precise localization of individual ultrasound contrast agents (microbubbles) across numerous imaging frames. However, analysis of high-density regions with significant overlaps among the microbubble point spread responses yields high localization errors, constraining the technique to low-concentration conditions. As such, long acquisition times are required to sufficiently cover the vascular bed. In this work, we present a fast and precise method for obtaining super-resolution vascular images from high-density contrast-enhanced ultrasound imaging data. This method, which we term Deep Ultrasound Localization Microscopy (Deep-ULM), exploits modern deep learning strategies and employs a convolutional neural network to perform localization microscopy in dense scenarios, learning the nonlinear image-domain implications of overlapping RF signals originating from such sets of closely spaced microbubbles. Deep-ULM is trained effectively using realistic on-line synthesized data, enabling robust inference in-vivo under a wide variety of imaging conditions. We show that deep learning attains super-resolution with challenging contrast-agent densities, both in-silico as well as in-vivo . Deep-ULM is suitable for real-time applications, resolving about 70 high-resolution patches ( $128\times 128$ pixels) per second on a standard PC. Exploiting GPU computation, this number increases to 1250 patches per second.

84 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a dual-domain residual-based optimization (DRONE) network, which consists of three modules respectively for embedding, refinement, and awareness, and the results from the embedding and refinement modules in the data and image domains are regularized for optimized image quality in the awareness module.
Abstract: Deep learning has attracted rapidly increasing attention in the field of tomographic image reconstruction, especially for CT, MRI, PET/SPECT, ultrasound and optical imaging. Among various topics, sparse-view CT remains a challenge which targets a decent image reconstruction from very few projections. To address this challenge, in this article we propose a Dual-domain Residual-based Optimization NEtwork (DRONE). DRONE consists of three modules respectively for embedding, refinement, and awareness. In the embedding module, a sparse sinogram is first extended. Then, sparse-view artifacts are effectively suppressed in the image domain. After that, the refinement module recovers image details in the residual data and image domains synergistically. Finally, the results from the embedding and refinement modules in the data and image domains are regularized for optimized image quality in the awareness module, which ensures the consistency between measurements and images with the kernel awareness of compressed sensing. The DRONE network is trained, validated, and tested on preclinical and clinical datasets, demonstrating its merits in edge preservation, feature recovery, and reconstruction accuracy.

77 citations


Journal ArticleDOI
TL;DR: A novel two-stage framework named SpineParseNet is proposed to achieve automated spine parsing for volumetric MR images and has great potential in clinical spinal disease diagnoses and treatments.
Abstract: Spine parsing (i.e., multi-class segmentation of vertebrae and intervertebral discs (IVDs)) for volumetric magnetic resonance (MR) image plays a significant role in various spinal disease diagnoses and treatments of spine disorders, yet is still a challenge due to the inter-class similarity and intra-class variation of spine images. Existing fully convolutional network based methods failed to explicitly exploit the dependencies between different spinal structures. In this article, we propose a novel two-stage framework named SpineParseNet to achieve automated spine parsing for volumetric MR images. The SpineParseNet consists of a 3D graph convolutional segmentation network (GCSN) for 3D coarse segmentation and a 2D residual U-Net (ResUNet) for 2D segmentation refinement. In 3D GCSN, region pooling is employed to project the image representation to graph representation, in which each node representation denotes a specific spinal structure. The adjacency matrix of the graph is designed according to the connection of spinal structures. The graph representation is evolved by graph convolutions. Subsequently, the proposed region unpooling module re-projects the evolved graph representation to a semantic image representation, which facilitates the 3D GCSN to generate reliable coarse segmentation. Finally, the 2D ResUNet refines the segmentation. Experiments on T2-weighted volumetric MR images of 215 subjects show that SpineParseNet achieves impressive performance with mean Dice similarity coefficients of 87.32 ± 4.75%, 87.78 ± 4.64%, and 87.49 ± 3.81% for the segmentations of 10 vertebrae, 9 IVDs, and all 19 spinal structures respectively. The proposed method has great potential in clinical spinal disease diagnoses and treatments.

77 citations


Journal ArticleDOI
TL;DR: An approach SMORE1 based on convolutional neural networks (CNNs) that restores image quality by improving resolution and reducing aliasing in MR images is presented and is shown to be visually and quantitatively superior to previously reported methods.
Abstract: High resolution magnetic resonance (MR) images are desired in many clinical and research applications. Acquiring such images with high signal-to-noise (SNR), however, can require a long scan duration, which is difficult for patient comfort, is more costly, and makes the images susceptible to motion artifacts. A very common practical compromise for both 2D and 3D MR imaging protocols is to acquire volumetric MR images with high in-plane resolution, but lower through-plane resolution. In addition to having poor resolution in one orientation, 2D MRI acquisitions will also have aliasing artifacts, which further degrade the appearance of these images. This paper presents an approach SMORE 1 based on convolutional neural networks (CNNs) that restores image quality by improving resolution and reducing aliasing in MR images. 2 This approach is self-supervised, which requires no external training data because the high-resolution and low-resolution data that are present in the image itself are used for training. For 3D MRI, the method consists of only one self-supervised super-resolution (SSR) deep CNN that is trained from the volumetric image data. For 2D MRI, there is a self-supervised anti-aliasing (SAA) deep CNN that precedes the SSR CNN, also trained from the volumetric image data. Both methods were evaluated on a broad collection of MR data, including filtered and downsampled images so that quantitative metrics could be computed and compared, and actual acquired low resolution images for which visual and sharpness measures could be computed and compared. The super-resolution method is shown to be visually and quantitatively superior to previously reported methods.

77 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a dual attention multi-instance deep learning network (DA-MIDL) for the early diagnosis of Alzheimer's disease and its prodromal stage mild cognitive impairment (MCI).
Abstract: Structural magnetic resonance imaging (sMRI) is widely used for the brain neurological disease diagnosis, which could reflect the variations of brain. However, due to the local brain atrophy, only a few regions in sMRI scans have obvious structural changes, which are highly correlative with pathological features. Hence, the key challenge of sMRI-based brain disease diagnosis is to enhance the identification of discriminative features. To address this issue, we propose a dual attention multi-instance deep learning network (DA-MIDL) for the early diagnosis of Alzheimer’s disease (AD) and its prodromal stage mild cognitive impairment (MCI). Specifically, DA-MIDL consists of three primary components: 1) the Patch-Nets with spatial attention blocks for extracting discriminative features within each sMRI patch whilst enhancing the features of abnormally changed micro-structures in the cerebrum, 2) an attention multi-instance learning (MIL) pooling operation for balancing the relative contribution of each patch and yield a global different weighted representation for the whole brain structure, and 3) an attention-aware global classifier for further learning the integral features and making the AD-related classification decisions. Our proposed DA-MIDL model is evaluated on the baseline sMRI scans of 1689 subjects from two independent datasets (i.e., ADNI and AIBL). The experimental results show that our DA-MIDL model can identify discriminative pathological locations and achieve better classification performance in terms of accuracy and generalizability, compared with several state-of-the-art methods.

Journal ArticleDOI
TL;DR: A generalizable framework for metal artifact reduction (MAR) is proposed by simultaneously leveraging the advantages of image domain and sinogram domain-based MAR techniques and produces superior artifact-reduced results while preserving the anatomical structures and outperforms other MAR methods.
Abstract: Computed tomography (CT) has been widely used for medical diagnosis, assessment, and therapy planning and guidance. In reality, CT images may be affected adversely in the presence of metallic objects, which could lead to severe metal artifacts and influence clinical diagnosis or dose calculation in radiation therapy. In this article, we propose a generalizable framework for metal artifact reduction (MAR) by simultaneously leveraging the advantages of image domain and sinogram domain-based MAR techniques. We formulate our framework as a sinogram completion problem and train a neural network (SinoNet) to restore the metal-affected projections. To improve the continuity of the completed projections at the boundary of metal trace and thus alleviate new artifacts in the reconstructed CT images, we train another neural network (PriorNet) to generate a good prior image to guide sinogram learning, and further design a novel residual sinogram learning strategy to effectively utilize the prior image information for better sinogram completion. The two networks are jointly trained in an end-to-end fashion with a differentiable forward projection (FP) operation so that the prior image generation and deep sinogram completion procedures can benefit from each other. Finally, the artifact-reduced CT images are reconstructed using the filtered backward projection (FBP) from the completed sinogram. Extensive experiments on simulated and real artifacts data demonstrate that our method produces superior artifact-reduced results while preserving the anatomical structures and outperforms other MAR methods.

Journal ArticleDOI
TL;DR: The results and numerical analysis have collectively demonstrated the robust performance of the model to reconstruct PAM images with as few as 2% of the original pixels, which can effectively shorten the imaging time without substantially sacrificing the image quality.
Abstract: One primary technical challenge in photoacoustic microscopy (PAM) is the necessary compromise between spatial resolution and imaging speed. In this study, we propose a novel application of deep learning principles to reconstruct undersampled PAM images and transcend the trade-off between spatial resolution and imaging speed. We compared various convolutional neural network (CNN) architectures, and selected a Fully Dense U-net (FD U-net) model that produced the best results. To mimic various undersampling conditions in practice, we artificially downsampled fully-sampled PAM images of mouse brain vasculature at different ratios. This allowed us to not only definitively establish the ground truth, but also train and test our deep learning model at various imaging conditions. Our results and numerical analysis have collectively demonstrated the robust performance of our model to reconstruct PAM images with as few as 2% of the original pixels, which can effectively shorten the imaging time without substantially sacrificing the image quality.

Journal ArticleDOI
TL;DR: FGADR dataset as discussed by the authors contains 2,842 images with pixel-level diabetic retinopathy annotations and 1,000 images with image-level labels graded by six board certified ophthalmologists with intra-rater consistency.
Abstract: People with diabetes are at risk of developing an eye disease called diabetic retinopathy (DR). This disease occurs when high blood glucose levels cause damage to blood vessels in the retina. Computer-aided DR diagnosis has become a promising tool for the early detection and severity grading of DR, due to the great success of deep learning. However, most current DR diagnosis systems do not achieve satisfactory performance or interpretability for ophthalmologists, due to the lack of training data with consistent and fine-grained annotations. To address this problem, we construct a large fine-grained annotated DR dataset containing 2,842 images (FGADR). Specifically, this dataset has 1,842 images with pixel-level DR-related lesion annotations, and 1,000 images with image-level labels graded by six board-certified ophthalmologists with intra-rater consistency. The proposed dataset will enable extensive studies on DR diagnosis. Further, we establish three benchmark tasks for evaluation: 1. DR lesion segmentation; 2. DR grading by joint classification and segmentation; 3. Transfer learning for ocular multi-disease identification. Moreover, a novel inductive transfer learning method is introduced for the third task. Extensive experiments using different state-of-the-art methods are conducted on our FGADR dataset, which can serve as baselines for future research. Our dataset will be released in https://csyizhou.github.io/FGADR/ .

Journal ArticleDOI
TL;DR: Self-Path as mentioned in this paper employs multi-task learning where the main task is tissue classification and pretext tasks are a variety of self-supervised tasks with labels inherent to the input images.
Abstract: While high-resolution pathology images lend themselves well to ‘data hungry’ deep learning algorithms, obtaining exhaustive annotations on these images for learning is a major challenge. In this article, we propose a self-supervised convolutional neural network (CNN) framework to leverage unlabeled data for learning generalizable and domain invariant representations in pathology images. Our proposed framework, termed as Self-Path, employs multi-task learning where the main task is tissue classification and pretext tasks are a variety of self-supervised tasks with labels inherent to the input images. We introduce novel pathology-specific self-supervision tasks that leverage contextual, multi-resolution and semantic features in pathology images for semi-supervised learning and domain adaptation. We investigate the effectiveness of Self-Path on 3 different pathology datasets. Our results show that Self-Path with the pathology-specific pretext tasks achieves state-of-the-art performance for semi-supervised learning when small amounts of labeled data are available. Further, we show that Self-Path improves domain adaptation for histopathology image classification when there is no labeled data available for the target domain. This approach can potentially be employed for other applications in computational pathology, where annotation budget is often limited or large amount of unlabeled image data is available.

Journal ArticleDOI
TL;DR: In this paper, a voxel-level anomaly modeling that mines out the relevant knowledge from normal CT lung scans is proposed to segment COVID-19 lesions in CT via anomaly modeling.
Abstract: Scarcity of annotated images hampers the building of automated solution for reliable COVID-19 diagnosis and evaluation from CT. To alleviate the burden of data annotation, we herein present a label-free approach for segmenting COVID-19 lesions in CT via voxel-level anomaly modeling that mines out the relevant knowledge from normal CT lung scans. Our modeling is inspired by the observation that the parts of tracheae and vessels, which lay in the high-intensity range where lesions belong to, exhibit strong patterns. To facilitate the learning of such patterns at a voxel level, we synthesize ‘lesions’ using a set of simple operations and insert the synthesized ‘lesions’ into normal CT lung scans to form training pairs, from which we learn a normalcy-recognizing network (NormNet) that recognizes normal tissues and separate them from possible COVID-19 lesions. Our experiments on three different public datasets validate the effectiveness of NormNet, which conspicuously outperforms a variety of unsupervised anomaly detection (UAD) methods.

Journal ArticleDOI
TL;DR: An uncertainty aware temporal ensembling (UATE) model for semi-supervised ABUS mass segmentation that outperforms the fully supervised method, and gets a promising result compared with existing semi- supervised methods.
Abstract: Accurate breast mass segmentation of automated breast ultrasound (ABUS) images plays a crucial role in 3D breast reconstruction which can assist radiologists in surgery planning. Although the convolutional neural network has great potential for breast mass segmentation due to the remarkable progress of deep learning, the lack of annotated data limits the performance of deep CNNs. In this article, we present an uncertainty aware temporal ensembling (UATE) model for semi-supervised ABUS mass segmentation. Specifically, a temporal ensembling segmentation (TEs) model is designed to segment breast mass using a few labeled images and a large number of unlabeled images. Considering the network output contains correct predictions and unreliable predictions, equally treating each prediction in pseudo label update and loss calculation may degrade the network performance. To alleviate this problem, the uncertainty map is estimated for each image. Then an adaptive ensembling momentum map and an uncertainty aware unsupervised loss are designed and integrated with TEs model. The effectiveness of the proposed UATE model is mainly verified on an ABUS dataset of 107 patients with 170 volumes, including 13382 2D labeled slices. The Jaccard index (JI), Dice similarity coefficient (DSC), pixel-wise accuracy (AC) and Hausdorff distance (HD) of the proposed method on testing set are 63.65%, 74.25%, 99.21% and 3.81mm respectively. Experimental results demonstrate that our semi-supervised method outperforms the fully supervised method, and get a promising result compared with existing semi-supervised methods.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a mutual multi-scale triplet graph convolutional network (MMTGCN) to analyze functional and structural connectivity for brain disorder diagnosis, which employed several templates with different scales of ROI parcellation to construct coarse-to-fine brain connectivity networks for each subject.
Abstract: Brain connectivity alterations associated with mental disorders have been widely reported in both functional MRI (fMRI) and diffusion MRI (dMRI). However, extracting useful information from the vast amount of information afforded by brain networks remains a great challenge. Capturing network topology, graph convolutional networks (GCNs) have demonstrated to be superior in learning network representations tailored for identifying specific brain disorders. Existing graph construction techniques generally rely on a specific brain parcellation to define regions-of-interest (ROIs) to construct networks, often limiting the analysis into a single spatial scale. In addition, most methods focus on the pairwise relationships between the ROIs and ignore high-order associations between subjects. In this letter, we propose a mutual multi-scale triplet graph convolutional network (MMTGCN) to analyze functional and structural connectivity for brain disorder diagnosis. We first employ several templates with different scales of ROI parcellation to construct coarse-to-fine brain connectivity networks for each subject. Then, a triplet GCN (TGCN) module is developed to learn functional/structural representations of brain connectivity networks at each scale, with the triplet relationship among subjects explicitly incorporated into the learning process. Finally, we propose a template mutual learning strategy to train different scale TGCNs collaboratively for disease classification. Experimental results on 1,160 subjects from three datasets with fMRI or dMRI data demonstrate that our MMTGCN outperforms several state-of-the-art methods in identifying three types of brain disorders.

Journal ArticleDOI
TL;DR: A convolutional neural network equipped with a novel and efficient adaptive dual attention module (ADAM) for automated skin lesion segmentation from dermoscopic images is presented, capable of achieving better segmentation performance than state-of-the-art deep learning models, particularly those equipped with attention mechanisms.
Abstract: We present a convolutional neural network (CNN) equipped with a novel and efficient adaptive dual attention module (ADAM) for automated skin lesion segmentation from dermoscopic images, which is an essential yet challenging step for the development of a computer-assisted skin disease diagnosis system. The proposed ADAM has three compelling characteristics. First, we integrate two global context modeling mechanisms into the ADAM, one aiming at capturing the boundary continuity of skin lesion by global average pooling while the other dealing with the shape irregularity by pixel-wise correlation. In this regard, our network, thanks to the proposed ADAM, is capable of extracting more comprehensive and discriminative features for recognizing the boundary of skin lesions. Second, the proposed ADAM supports multi-scale resolution fusion, and hence can capture multi-scale features to further improve the segmentation accuracy. Third, as we harness a spatial information weighting method in the proposed network, our method can reduce a lot of redundancies compared with traditional CNNs. The proposed network is implemented based on a dual encoder architecture, which is able to enlarge the receptive field without greatly increasing the network parameters. In addition, we assign different dilation rates to different ADAMs so that it can adaptively capture distinguishing features according to the size of a lesion. We extensively evaluate the proposed method on both ISBI2017 and ISIC2018 datasets and the experimental results demonstrate that, without using network ensemble schemes, our method is capable of achieving better segmentation performance than state-of-the-art deep learning models, particularly those equipped with attention mechanisms.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed an efficient architecture by distilling knowledge from well-trained medical image segmentation networks to train another lightweight network, which empowers the lightweight network to get a significant improvement on segmentation capability while retaining its runtime efficiency.
Abstract: Recent advances have been made in applying convolutional neural networks to achieve more precise prediction results for medical image segmentation problems. However, the success of existing methods has highly relied on huge computational complexity and massive storage, which is impractical in the real-world scenario. To deal with this problem, we propose an efficient architecture by distilling knowledge from well-trained medical image segmentation networks to train another lightweight network. This architecture empowers the lightweight network to get a significant improvement on segmentation capability while retaining its runtime efficiency. We further devise a novel distillation module tailored for medical image segmentation to transfer semantic region information from teacher to student network. It forces the student network to mimic the extent of difference of representations calculated from different tissue regions. This module avoids the ambiguous boundary problem encountered when dealing with medical imaging but instead encodes the internal information of each semantic region for transferring. Benefited from our module, the lightweight network could receive an improvement of up to 32.6% in our experiment while maintaining its portability in the inference phase. The entire structure has been verified on two widely accepted public CT datasets LiTS17 and KiTS19. We demonstrate that a lightweight network distilled by our method has non-negligible value in the scenario which requires relatively high operating speed and low storage usage.

Journal ArticleDOI
TL;DR: TransVW as discussed by the authors proposes a transferable visual words (TransVW) approach to achieve annotation efficiency for deep learning in medical image analysis, which exploits the semantics of visual words for self-supervised learning, requiring no expert annotation.
Abstract: This paper introduces a new concept called “transferable visual words” (TransVW), aiming to achieve annotation efficiency for deep learning in medical image analysis. Medical imaging—focusing on particular parts of the body for defined clinical purposes—generates images of great similarity in anatomy across patients and yields sophisticated anatomical patterns across images, which are associated with rich semantics about human anatomy and which are natural visual words . We show that these visual words can be automatically harvested according to anatomical consistency via self-discovery, and that the self-discovered visual words can serve as strong yet free supervision signals for deep models to learn semantics-enriched generic image representation via self-supervision (self-classification and self-restoration). Our extensive experiments demonstrate the annotation efficiency of TransVW by offering higher performance and faster convergence with reduced annotation cost in several applications. Our TransVW has several important advantages, including (1) TransVW is a fully autodidactic scheme, which exploits the semantics of visual words for self-supervised learning, requiring no expert annotation; (2) visual word learning is an add-on strategy, which complements existing self-supervised methods, boosting their performance; and (3) the learned image representation is semantics-enriched models, which have proven to be more robust and generalizable, saving annotation efforts for a variety of applications through transfer learning. Our code, pre-trained models, and curated visual words are available at https://github.com/JLiangLab/TransVW .

Journal ArticleDOI
TL;DR: A clinically oriented fundus enhancement network (cofe-Net) is proposed to suppress global degradation factors, while simultaneously preserving anatomical retinal structures and pathological characteristics for clinical observation and analysis and shows that the fundus correction method can benefit medical image analysis applications, e.g., retinal vessel segmentation and optic disc/cup detection.
Abstract: Retinal fundus images are widely used for the clinical screening and diagnosis of eye diseases. However, fundus images captured by operators with various levels of experience have a large variation in quality. Low-quality fundus images increase uncertainty in clinical observation and lead to the risk of misdiagnosis. However, due to the special optical beam of fundus imaging and structure of the retina, natural image enhancement methods cannot be utilized directly to address this. In this article, we first analyze the ophthalmoscope imaging system and simulate a reliable degradation of major inferior-quality factors, including uneven illumination, image blurring, and artifacts. Then, based on the degradation model, a clinically oriented fundus enhancement network (cofe-Net) is proposed to suppress global degradation factors, while simultaneously preserving anatomical retinal structures and pathological characteristics for clinical observation and analysis. Experiments on both synthetic and real images demonstrate that our algorithm effectively corrects low-quality fundus images without losing retinal details. Moreover, we also show that the fundus correction method can benefit medical image analysis applications, e.g., retinal vessel segmentation and optic disc/cup detection.

Journal ArticleDOI
TL;DR: The proposed SESV framework is capable of improving the accuracy of different DCNNs on different medical image segmentation tasks and introduces a verification network to determine whether to accept or reject the refined mask produced by the re-segmentation network on a region-by-region basis.
Abstract: Medical image segmentation is an essential task in computer-aided diagnosis. Despite their prevalence and success, deep convolutional neural networks (DCNNs) still need to be improved to produce accurate and robust enough segmentation results for clinical use. In this paper, we propose a novel and generic framework called Segmentation-Emendation-reSegmentation-Verification (SESV) to improve the accuracy of existing DCNNs in medical image segmentation, instead of designing a more accurate segmentation model. Our idea is to predict the segmentation errors produced by an existing model and then correct them. Since predicting segmentation errors is challenging, we design two ways to tolerate the mistakes in the error prediction. First, rather than using a predicted segmentation error map to correct the segmentation mask directly, we only treat the error map as the prior that indicates the locations where segmentation errors are prone to occur, and then concatenate the error map with the image and segmentation mask as the input of a re-segmentation network. Second, we introduce a verification network to determine whether to accept or reject the refined mask produced by the re-segmentation network on a region-by-region basis. The experimental results on the CRAG, ISIC, and IDRiD datasets suggest that using our SESV framework can improve the accuracy of DeepLabv3+ substantially and achieve advanced performance in the segmentation of gland cells, skin lesions, and retinal microaneurysms. Consistent conclusions can also be drawn when using PSPNet, U-Net, and FPN as the segmentation network, respectively. Therefore, our SESV framework is capable of improving the accuracy of different DCNNs on different medical image segmentation tasks.

Journal ArticleDOI
TL;DR: The MoNuSAC2020 dataset as discussed by the authors contains 46,000 nuclei from 37 hospitals, 71 patients, four organs, and four nucleus types from the International Symposium on Biomedical Imaging.
Abstract: Detecting various types of cells in and around the tumor matrix holds a special significance in characterizing the tumor micro-environment for cancer prognostication and research. Automating the tasks of detecting, segmenting, and classifying nuclei can free up the pathologists' time for higher value tasks and reduce errors due to fatigue and subjectivity. To encourage the computer vision research community to develop and test algorithms for these tasks, we prepared a large and diverse dataset of nucleus boundary annotations and class labels. The dataset has over 46,000 nuclei from 37 hospitals, 71 patients, four organs, and four nucleus types. We also organized a challenge around this dataset as a satellite event at the International Symposium on Biomedical Imaging (ISBI) in April 2020. The challenge saw a wide participation from across the world, and the top methods were able to match inter-human concordance for the challenge metric. In this paper, we summarize the dataset and the key findings of the challenge, including the commonalities and differences between the methods developed by various participants. We have released the MoNuSAC2020 dataset to the public.

Journal ArticleDOI
TL;DR: A new attention-driven weakly supervised algorithm comprising a hierarchical attention mining framework that unifies activation- and gradient-based visual attention in a holistic manner is proposed, enabling principled model training in a weakly-supervised fashion and facilitating the generation of visual-attention-driven model explanations by means of localization cues.
Abstract: We consider the problem of abnormality localization for clinical applications. While deep learning has driven much recent progress in medical imaging, many clinical challenges are not fully addressed, limiting its broader usage. While recent methods report high diagnostic accuracies, physicians have concerns trusting these algorithm results for diagnostic decision-making purposes because of a general lack of algorithm decision reasoning and interpretability. One potential way to address this problem is to further train these models to localize abnormalities in addition to just classifying them. However, doing this accurately will require a large amount of disease localization annotations by clinical experts, a task that is prohibitively expensive to accomplish for most applications. In this work, we take a step towards addressing these issues by means of a new attention-driven weakly supervised algorithm comprising a hierarchical attention mining framework that unifies activation- and gradient-based visual attention in a holistic manner. Our key algorithmic innovations include the design of explicit ordinal attention constraints, enabling principled model training in a weakly-supervised fashion, while also facilitating the generation of visual-attention-driven model explanations by means of localization cues. On two large-scale chest X-ray datasets (NIH ChestX-ray14 and CheXpert), we demonstrate significant localization performance improvements over the current state of the art while also achieving competitive classification performance.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce new asymmetric variants of popular loss functions and regularization techniques including a large margin loss, focal loss, adversarial training, mixup and data augmentation to counter logit shift of the underrepresented classes.
Abstract: Class imbalance poses a challenge for developing unbiased, accurate predictive models. In particular, in image segmentation neural networks may overfit to the foreground samples from small structures, which are often heavily under-represented in the training set, leading to poor generalization. In this study, we provide new insights on the problem of overfitting under class imbalance by inspecting the network behavior. We find empirically that when training with limited data and strong class imbalance, at test time the distribution of logit activations may shift across the decision boundary, while samples of the well-represented class seem unaffected. This bias leads to a systematic under-segmentation of small structures. This phenomenon is consistently observed for different databases, tasks and network architectures. To tackle this problem, we introduce new asymmetric variants of popular loss functions and regularization techniques including a large margin loss, focal loss, adversarial training, mixup and data augmentation, which are explicitly designed to counter logit shift of the under-represented classes. Extensive experiments are conducted on several challenging segmentation tasks. Our results demonstrate that the proposed modifications to the objective function can lead to significantly improved segmentation accuracy compared to baselines and alternative approaches.

Journal ArticleDOI
TL;DR: In this article, a diagnostic approach capable of evaluating the skin complications of diabetes mellitus at the very earlier stage is presented. But no methods of non-invasive assessment of glycation and associated metabolic processes in biotissues or prediction of possible skin complications, e.g., ulcers, currently exist for endocrinologists and clinical diagnosis.
Abstract: Aging and diabetes lead to protein glycation and cause dysfunction of collagen-containing tissues. The accompanying structural and functional changes of collagen significantly contribute to the development of various pathological malformations affecting the skin, blood vessels, and nerves, causing a number of complications, increasing disability risks and threat to life. In fact, no methods of non-invasive assessment of glycation and associated metabolic processes in biotissues or prediction of possible skin complications, e.g., ulcers, currently exist for endocrinologists and clinical diagnosis. In this publication, utilizing emerging photonics-based technology, innovative solutions in machine learning, and definitive physiological characteristics, we introduce a diagnostic approach capable of evaluating the skin complications of diabetes mellitus at the very earlier stage. The results of the feasibility studies, as well as the actual tests on patients with diabetes and healthy volunteers, clearly show the ability of the approach to differentiate diabetic and control groups. Furthermore, the developed in-house polarization-based hyperspectral imaging technique accomplished with the implementation of the artificial neural network provides new horizons in the study and diagnosis of age-related diseases.

Journal ArticleDOI
TL;DR: In this paper, a meta-inversion network (MetaInv-Net) is proposed to learn an initializer for the conjugate gradient (CG) algorithm that involved in one of the subproblems of the backbone model.
Abstract: X-ray Computed Tomography (CT) is widely used in clinical applications such as diagnosis and image-guided interventions. In this paper, we propose a new deep learning based model for CT image reconstruction with the backbone network architecture built by unrolling an iterative algorithm. However, unlike the existing strategy to include as many data-adaptive components in the unrolled dynamics model as possible, we find that it is enough to only learn the parts where traditional designs mostly rely on intuitions and experience. More specifically, we propose to learn an initializer for the conjugate gradient (CG) algorithm that involved in one of the subproblems of the backbone model. Other components, such as image priors and hyperparameters, are kept as the original design. Since a hypernetwork is introduced to inference on the initialization of the CG module, it makes the proposed model a certain meta-learning model. Therefore, we shall call the proposed model the meta-inversion network (MetaInv-Net). The proposed MetaInv-Net can be designed with much less trainable parameters while still preserves its superior image reconstruction performance than some state-of-the-art deep models in CT imaging. In simulated and real data experiments, MetaInv-Net performs very well and can be generalized beyond the training setting, i.e., to other scanning settings, noise levels, and data sets.

Journal ArticleDOI
TL;DR: Inspired by CycleGAN based on the global constraints of the adversarial loss and cycle consistency, the proposed CSI-GAN treats low and high quality images as those in two domains and computes local structure and illumination constraints for learning both overall characteristics and local details.
Abstract: The development of medical imaging techniques has greatly supported clinical decision making. However, poor imaging quality, such as non-uniform illumination or imbalanced intensity, brings challenges for automated screening, analysis and diagnosis of diseases. Previously, bi-directional GANs (e.g., CycleGAN), have been proposed to improve the quality of input images without the requirement of paired images. However, these methods focus on global appearance, without imposing constraints on structure or illumination, which are essential features for medical image interpretation. In this paper, we propose a novel and versatile bi-directional GAN, named Structure and illumination constrained GAN (StillGAN), for medical image quality enhancement. Our StillGAN treats low- and high-quality images as two distinct domains, and introduces local structure and illumination constraints for learning both overall characteristics and local details. Extensive experiments on three medical image datasets (e.g., corneal confocal microscopy, retinal color fundus and endoscopy images) demonstrate that our method performs better than both conventional methods and other deep learning-based methods. In addition, we have investigated the impact of the proposed method on different medical image analysis and clinical tasks such as nerve segmentation, tortuosity grading, fovea localization and disease classification.

Journal ArticleDOI
TL;DR: A 3D fully convolutional network named Hyper-net is proposed to segment melanoma from hyperspectral pathology images to perform the segmentation of melanoma and surpassed the 2D model with the accuracy over 92%.
Abstract: Skin biopsy histopathological analysis is one of the primary methods used for pathologists to assess the presence and deterioration of melanoma in clinical. A comprehensive and reliable pathological analysis is the result of correctly segmented melanoma and its interaction with benign tissues, and therefore providing accurate therapy. In this study, we applied the deep convolution network on the hyperspectral pathology images to perform the segmentation of melanoma. To make the best use of spectral properties of three dimensional hyperspectral data, we proposed a 3D fully convolutional network named Hyper-net to segment melanoma from hyperspectral pathology images. In order to enhance the sensitivity of the model, we made a specific modification to the loss function with caution of false negative in diagnosis. The performance of Hyper-net surpassed the 2D model with the accuracy over 92%. The false negative rate decreased by nearly 66% using Hyper-net with the modified loss function. These findings demonstrated the ability of the Hyper-net for assisting pathologists in diagnosis of melanoma based on hyperspectral pathology images.