scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Medical Imaging in 2022"


Journal ArticleDOI
TL;DR: Pathomic fusion as discussed by the authors proposes an interpretable strategy for end-to-end multimodal fusion of histology image and genomic (mutations, CNV, RNA-Seq) features for survival outcome prediction.
Abstract: Cancer diagnosis, prognosis, mymargin and therapeutic response predictions are based on morphological information from histology slides and molecular profiles from genomic data. However, most deep learning-based objective outcome prediction and grading paradigms are based on histology or genomics alone and do not make use of the complementary information in an intuitive manner. In this work, we propose Pathomic Fusion, an interpretable strategy for end-to-end multimodal fusion of histology image and genomic (mutations, CNV, RNA-Seq) features for survival outcome prediction. Our approach models pairwise feature interactions across modalities by taking the Kronecker product of unimodal feature representations, and controls the expressiveness of each representation via a gating-based attention mechanism. Following supervised learning, we are able to interpret and saliently localize features across each modality, and understand how feature importance shifts when conditioning on multimodal input. We validate our approach using glioma and clear cell renal cell carcinoma datasets from the Cancer Genome Atlas (TCGA), which contains paired whole-slide image, genotype, and transcriptome data with ground truth survival and histologic grade labels. In a 15-fold cross-validation, our results demonstrate that the proposed multimodal fusion paradigm improves prognostic determinations from ground truth grading and molecular subtyping, as well as unimodal deep networks trained on histology and genomic data alone. The proposed method establishes insight and theory on how to train deep networks on multimodal biomedical data in an intuitive manner, which will be useful for other problems in medicine that seek to combine heterogeneous data streams for understanding diseases and predicting response and resistance to treatment. Code and trained models are made available at: https://github.com/mahmoodlab/PathomicFusion.

87 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a 3D end-to-end synthesis network called Bidirectional Mapping Generative Adversarial Networks (BMGAN), which embeds the semantic information of PET images into the highdimensional latent space.
Abstract: Fusing multi-modality medical images, such as magnetic resonance (MR) imaging and positron emission tomography (PET), can provide various anatomical and functional information about the human body. However, PET data is not always available for several reasons, such as high cost, radiation hazard, and other limitations. This paper proposes a 3D end-to-end synthesis network called Bidirectional Mapping Generative Adversarial Networks (BMGAN). Image contexts and latent vectors are effectively used for brain MR-to-PET synthesis. Specifically, a bidirectional mapping mechanism is designed to embed the semantic information of PET images into the high-dimensional latent space. Moreover, the 3D Dense-UNet generator architecture and the hybrid loss functions are further constructed to improve the visual quality of cross-modality synthetic images. The most appealing part is that the proposed method can synthesize perceptually realistic PET images while preserving the diverse brain structures of different subjects. Experimental results demonstrate that the performance of the proposed method outperforms other competitive methods in terms of quantitative measures, qualitative displays, and evaluation metrics for classification.

54 citations


Journal ArticleDOI
TL;DR: SLATER as discussed by the authors uses a deep adversarial network with cross-attention transformers to map noise and latent variables onto coil-combined MR images, and then performs a zero-shot reconstruction by incorporating the imaging operator and optimizing the prior to maximize consistency to undersampled data.
Abstract: Supervised reconstruction models are characteristically trained on matched pairs of undersampled and fully-sampled data to capture an MRI prior, along with supervision regarding the imaging operator to enforce data consistency. To reduce supervision requirements, the recent deep image prior framework instead conjoins untrained MRI priors with the imaging operator during inference. Yet, canonical convolutional architectures are suboptimal in capturing long-range relationships, and priors based on randomly initialized networks may yield suboptimal performance. To address these limitations, here we introduce a novel unsupervised MRI reconstruction method based on zero-Shot Learned Adversarial TransformERs (SLATER). SLATER embodies a deep adversarial network with cross-attention transformers to map noise and latent variables onto coil-combined MR images. During pre-training, this unconditional network learns a high-quality MRI prior in an unsupervised generative modeling task. During inference, a zero-shot reconstruction is then performed by incorporating the imaging operator and optimizing the prior to maximize consistency to undersampled data. Comprehensive experiments on brain MRI datasets clearly demonstrate the superior performance of SLATER against state-of-the-art unsupervised methods.

53 citations


Journal ArticleDOI
TL;DR: ResViT as mentioned in this paper employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine residual convolutional and transformer modules, while a weight sharing strategy is introduced among ART blocks to mitigate computational burden.
Abstract: Generative adversarial models with convolutional neural network (CNN) backbones have recently been established as state-of-the-art in numerous medical image synthesis tasks. However, CNNs are designed to perform local processing with compact filters, and this inductive bias compromises learning of contextual features. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, that leverages the contextual sensitivity of vision transformers along with the precision of convolution operators and realism of adversarial learning.} ResViT's generator employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine residual convolutional and transformer modules. Residual connections in ART blocks promote diversity in captured representations, while a channel compression module distills task-relevant information. A weight sharing strategy is introduced among ART blocks to mitigate computational burden. A unified implementation is introduced to avoid the need to rebuild separate synthesis models for varying source-target modality configurations. Comprehensive demonstrations are performed for synthesizing missing sequences in multi-contrast MRI, and CT images from MRI. Our results indicate superiority of ResViT against competing CNN- and transformer-based methods in terms of qualitative observations and quantitative metrics.

47 citations


Journal ArticleDOI
TL;DR: SimCVD as mentioned in this paper predicts signed distance maps of object boundaries in a contrastive objective, with only two independent dropout as mask, and performs structural distillation by distilling pair-wise similarities.
Abstract: Automated segmentation in medical image analysis is a challenging task that requires a large amount of manually labeled data. However, most existing learning-based approaches usually suffer from limited manually annotated medical data, which poses a major practical problem for accurate and robust medical image segmentation. In addition, most existing semi-supervised approaches are usually not robust compared with the supervised counterparts, and also lack explicit modeling of geometric structure and semantic information, both of which limit the segmentation accuracy. In this work, we present SimCVD, a simple contrastive distillation framework that significantly advances state-of-the-art voxel-wise representation learning. We first describe an unsupervised training strategy, which takes two views of an input volume and predicts their signed distance maps of object boundaries in a contrastive objective, with only two independent dropout as mask. This simple approach works surprisingly well, performing on the same level as previous fully supervised methods with much less labeled data. We hypothesize that dropout can be viewed as a minimal form of data augmentation and makes the network robust to representation collapse. Then, we propose to perform structural distillation by distilling pair-wise similarities. We evaluate SimCVD on two popular datasets: the Left Atrial Segmentation Challenge (LA) and the NIH pancreas CT dataset. The results on the LA dataset demonstrate that, in two types of labeled ratios (i.e., 20% and 10%), SimCVD achieves an average Dice score of 90.85% and 89.03% respectively, a 0.91% and 2.22% improvement compared to previous best results. Our method can be trained in an end-to-end fashion, showing the promise of utilizing SimCVD as a general framework for downstream tasks, such as medical image synthesis, enhancement, and registration.

42 citations


Journal ArticleDOI
TL;DR: KiU-Net as mentioned in this paper uses an overcomplete convolutional architecture where the input image is projected into a higher dimension such that the receptive field from increasing in the deep layers of the network is constrained.
Abstract: Most methods for medical image segmentation use U-Net or its variants as they have been successful in most of the applications. After a detailed analysis of these "traditional" encoder-decoder based approaches, we observed that they perform poorly in detecting smaller structures and are unable to segment boundary regions precisely. This issue can be attributed to the increase in receptive field size as we go deeper into the encoder. The extra focus on learning high level features causes U-Net based approaches to learn less information about low-level features which are crucial for detecting small structures. To overcome this issue, we propose using an overcomplete convolutional architecture where we project the input image into a higher dimension such that we constrain the receptive field from increasing in the deep layers of the network. We design a new architecture for im- age segmentation- KiU-Net which has two branches: (1) an overcomplete convolutional network Kite-Net which learns to capture fine details and accurate edges of the input, and (2) U-Net which learns high level features. Furthermore, we also propose KiU-Net 3D which is a 3D convolutional architecture for volumetric segmentation. We perform a detailed study of KiU-Net by performing experiments on five different datasets covering various image modalities. We achieve a good performance with an additional benefit of fewer parameters and faster convergence. We also demonstrate that the extensions of KiU-Net based on residual blocks and dense blocks result in further performance improvements. Code: https://github.com/jeya-maria-jose/KiU-Net-pytorch.

41 citations


Journal ArticleDOI
TL;DR: A novel method based on adversarial diffusion modeling, SynDiff, for improved reliability in medical image synthesis and indicates that SynDiff offers superior performance against competing baselines both qualitatively and quantitatively.
Abstract: Imputation of missing images via source-to-target modality translation can improve diversity in medical imaging protocols. A pervasive approach for synthesizing target images involves one-shot mapping through generative adversarial networks (GAN). Yet, GAN models that implicitly characterize the image distribution can suffer from limited sample fidelity. Here, we propose a novel method based on adversarial diffusion modeling, SynDiff, for improved performance in medical image translation. To capture a direct correlate of the image distribution, SynDiff leverages a conditional diffusion process that progressively maps noise and source images onto the target image. For fast and accurate image sampling during inference, large diffusion steps are taken with adversarial projections in the reverse diffusion direction. To enable training on unpaired datasets, a cycle-consistent architecture is devised with coupled diffusive and non-diffusive modules that bilaterally translate between two modalities. Extensive assessments are reported on the utility of SynDiff against competing GAN and diffusion models in multi-contrast MRI and MRI-CT translation. Our demonstrations indicate that SynDiff offers quantitatively and qualitatively superior performance against competing baselines.

35 citations


Journal ArticleDOI
TL;DR: A fully automated multimodal MRI-based multi-task learning framework for simultaneous glioma segmentation and IDH genotyping and an uncertainty-aware pseudo-label selection is proposed to generate IDH pseudo-labels from larger unlabeled data for improving the accuracy of IDHgenotyping by using semi-supervised learning.
Abstract: The accurate prediction of isocitrate dehydrogenase (IDH) mutation and glioma segmentation are important tasks for computer-aided diagnosis using preoperative multimodal magnetic resonance imaging (MRI). The two tasks are ongoing challenges due to the significant inter-tumor and intra-tumor heterogeneity. The existing methods to address them are mostly based on single-task approaches without considering the correlation between the two tasks. In addition, the acquisition of IDH genetic labels is expensive and costly, resulting in a limited number of IDH mutation data for modeling. To comprehensively address these problems, we propose a fully automated multimodal MRI-based multi-task learning framework for simultaneous glioma segmentation and IDH genotyping. Specifically, the task correlation and heterogeneity are tackled with a hybrid CNN-Transformer encoder that consists of a convolutional neural network and a transformer to extract the shared spatial and global information learned from a decoder for glioma segmentation and a multi-scale classifier for IDH genotyping. Then, a multi-task learning loss is designed to balance the two tasks by combining the segmentation and classification loss functions with uncertain weights. Finally, an uncertainty-aware pseudo-label selection is proposed to generate IDH pseudo-labels from larger unlabeled data for improving the accuracy of IDH genotyping by using semi-supervised learning. We evaluate our method on a multi-institutional public dataset. Experimental results show that our proposed multi-task network achieves promising performance and outperforms the single-task learning counterparts and other existing state-of-the-art methods. With the introduction of unlabeled data, the semi-supervised multi-task learning framework further improves the performance of glioma segmentation and IDH genotyping. The source codes of our framework are publicly available at https://github.com/miacsu/MTTU-Net.git.

32 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a single-model based strategy for classification of skin lesions on small and imbalanced datasets, which achieved the classification accuracy comparable or superior to those of multiple ensembling models on different dermoscopic image datasets.
Abstract: Deep convolutional neural network (DCNN) models have been widely explored for skin disease diagnosis and some of them have achieved the diagnostic outcomes comparable or even superior to those of dermatologists. However, broad implementation of DCNN in skin disease detection is hindered by small size and data imbalance of the publically accessible skin lesion datasets. This paper proposes a novel single-model based strategy for classification of skin lesions on small and imbalanced datasets. First, various DCNNs are trained on different small and imbalanced datasets to verify that the models with moderate complexity outperform the larger models. Second, regularization DropOut and DropBlock are added to reduce overfitting and a Modified RandAugment augmentation strategy is proposed to deal with the defects of sample underrepresentation in the small dataset. Finally, a novel Multi-Weighted New Loss (MWNL) function and an end-to-end cumulative learning strategy (CLS) are introduced to overcome the challenge of uneven sample size and classification difficulty and to reduce the impact of abnormal samples on training. By combining Modified RandAugment, MWNL and CLS, our single DCNN model method achieved the classification accuracy comparable or superior to those of multiple ensembling models on different dermoscopic image datasets. Our study shows that this method is able to achieve a high classification performance at a low cost of computational resources and inference time, potentially suitable to implement in mobile devices for automated screening of skin lesions and many other malignancies in low resource settings.

27 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a relation transformer block (RTB) to incorporate attention mechanisms at two main levels: a self-attention transformer exploits global dependencies among lesion features, while a cross-att attention transformer allows interactions between lesion and vessel features to alleviate ambiguity in lesion detection caused by complex fundus structures.
Abstract: Automatic diabetic retinopathy (DR) lesions segmentation makes great sense of assisting ophthalmologists in diagnosis. Although many researches have been conducted on this task, most prior works paid too much attention to the designs of networks instead of considering the pathological association for lesions. Through investigating the pathogenic causes of DR lesions in advance, we found that certain lesions are closed to specific vessels and present relative patterns to each other. Motivated by the observation, we propose a relation transformer block (RTB) to incorporate attention mechanisms at two main levels: a self-attention transformer exploits global dependencies among lesion features, while a cross-attention transformer allows interactions between lesion and vessel features by integrating valuable vascular information to alleviate ambiguity in lesion detection caused by complex fundus structures. In addition, to capture the small lesion patterns first, we propose a global transformer block (GTB) which preserves detailed information in deep network. By integrating the above blocks of dual-branches, our network segments the four kinds of lesions simultaneously. Comprehensive experiments on IDRiD and DDR datasets well demonstrate the superiority of our approach, which achieves competitive performance compared to state-of-the-arts.

25 citations


Journal ArticleDOI
TL;DR: Through investigating the pathogenic causes of DR lesions in advance, it is found that certain lesions are closed to specific vessels and present relative patterns to each other, which inspires a relation transformer block (RTB) to incorporate attention mechanisms at two main levels.
Abstract: Automatic diabetic retinopathy (DR) lesions segmentation makes great sense of assisting ophthalmologists in diagnosis. Although many researches have been conducted on this task, most prior works paid too much attention to the designs of networks instead of considering the pathological association for lesions. Through investigating the pathogenic causes of DR lesions in advance, we found that certain lesions are closed to specific vessels and present relative patterns to each other. Motivated by the observation, we propose a relation transformer block (RTB) to incorporate attention mechanisms at two main levels: a self-attention transformer exploits global dependencies among lesion features, while a cross-attention transformer allows interactions between lesion and vessel features by integrating valuable vascular information to alleviate ambiguity in lesion detection caused by complex fundus structures. In addition, to capture the small lesion patterns first, we propose a global transformer block (GTB) which preserves detailed information in deep network. By integrating the above blocks of dual-branches, our network segments the four kinds of lesions simultaneously. Comprehensive experiments on IDRiD and DDR datasets well demonstrate the superiority of our approach, which achieves competitive performance compared to state-of-the-arts.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a saliency-guided morphology-aware U-Net (SMU-Net) for lesion segmentation in breast ultrasound images, which consists of a main network with an additional middle stream and an auxiliary network.
Abstract: Deep learning methods, especially convolutional neural networks, have been successfully applied to lesion segmentation in breast ultrasound (BUS) images. However, pattern complexity and intensity similarity between the surrounding tissues (i.e., background) and lesion regions (i.e., foreground) bring challenges for lesion segmentation. Considering that such rich texture information is contained in background, very few methods have tried to explore and exploit background-salient representations for assisting foreground segmentation. Additionally, other characteristics of BUS images, i.e., 1) low-contrast appearance and blurry boundary, and 2) significant shape and position variation of lesions, also increase the difficulty in accurate lesion segmentation. In this paper, we present a saliency-guided morphology-aware U-Net (SMU-Net) for lesion segmentation in BUS images. The SMU-Net is composed of a main network with an additional middle stream and an auxiliary network. Specifically, we first propose generation of saliency maps which incorporate both low-level and high-level image structures, for foreground and background. These saliency maps are then employed to guide the main network and auxiliary network for respectively learning foreground-salient and background-salient representations. Furthermore, we devise an additional middle stream which basically consists of background-assisted fusion, shape-aware, edge-aware and position-aware units. This stream receives the coarse-to-fine representations from the main network and auxiliary network for efficiently fusing the foreground-salient and background-salient features and enhancing the ability of learning morphological information for network. Extensive experiments on five datasets demonstrate higher performance and superior robustness to the scale of dataset than several state-of-the-art deep learning approaches in breast lesion segmentation in ultrasound image.

Journal ArticleDOI
TL;DR: This work proposes a new denoising method based on score-based reverse diffusion sampling, which overcomes all the aforementioned drawbacks and establishes state-of-the-art performance, while having desirable properties which prior MMSE denoisers did not have.
Abstract: Patient scans from MRI often suffer from noise, which hampers the diagnostic capability of such images. As a method to mitigate such artifacts, denoising is largely studied both within the medical imaging community and beyond the community as a general subject. However, recent deep neural network-based approaches mostly rely on the minimum mean squared error (MMSE) estimates, which tend to produce a blurred output. Moreover, such models suffer when deployed in real-world situations: out-of-distribution data, and complex noise distributions that deviate from the usual parametric noise models. In this work, we propose a new denoising method based on score-based reverse diffusion sampling, which overcomes all the aforementioned drawbacks. Our network, trained only with coronal knee scans, excels even on out-of-distribution in vivo liver MRI data, contaminated with a complex mixture of noise. Even more, we propose a method to enhance the resolution of the denoised image with the same network. With extensive experiments, we show that our method establishes state-of-the-art performance while having desirable properties which prior MMSE denoisers did not have: flexibly choosing the extent of denoising, and quantifying uncertainty.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a dual encoder-based dynamic-channel graph convolutional network with edge enhancement (DE-DCGCN-EE) for retinal vessel segmentation.
Abstract: Retinal vessel segmentation with deep learning technology is a crucial auxiliary method for clinicians to diagnose fundus diseases. However, the deep learning approaches inevitably lose the edge information, which contains spatial features of vessels while performing down-sampling, leading to the limited segmentation performance of fine blood vessels. Furthermore, the existing methods ignore the dynamic topological correlations among feature maps in the deep learning framework, resulting in the inefficient capture of the channel characterization. To address these limitations, we propose a novel dual encoder-based dynamic-channel graph convolutional network with edge enhancement (DE-DCGCN-EE) for retinal vessel segmentation. Specifically, we first design an edge detection-based dual encoder to preserve the edge of vessels in down-sampling. Secondly, we investigate a dynamic-channel graph convolutional network to map the image channels to the topological space and synthesize the features of each channel on the topological map, which solves the limitation of insufficient channel information utilization. Finally, we study an edge enhancement block, aiming to fuse the edge and spatial features in the dual encoder, which is beneficial to improve the accuracy of fine blood vessel segmentation. Competitive experimental results on five retinal image datasets validate the efficacy of the proposed DE-DCGCN-EE, which achieves more remarkable segmentation results against the other state-of-the-art methods, indicating its potential clinical application.

Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed PF-Net, which combines 2D and 3D convolutions to deal with CT volumes with large inter-slice spacing, and uses multi-scale guided dense attention to segment complex Pulmonary Fibrosis (PF) lesions.
Abstract: Computed Tomography (CT) plays an important role in monitoring radiation-induced Pulmonary Fibrosis (PF), where accurate segmentation of the PF lesions is highly desired for diagnosis and treatment follow-up. However, the task is challenged by ambiguous boundary, irregular shape, various position and size of the lesions, as well as the difficulty in acquiring a large set of annotated volumetric images for training. To overcome these problems, we propose a novel convolutional neural network called PF-Net and incorporate it into a semi-supervised learning framework based on Iterative Confidence-based Refinement And Weighting of pseudo Labels (I-CRAWL). Our PF-Net combines 2D and 3D convolutions to deal with CT volumes with large inter-slice spacing, and uses multi-scale guided dense attention to segment complex PF lesions. For semi-supervised learning, our I-CRAWL employs pixel-level uncertainty-based confidence-aware refinement to improve the accuracy of pseudo labels of unannotated images, and uses image-level uncertainty for confidence-based image weighting to suppress low-quality pseudo labels in an iterative training process. Extensive experiments with CT scans of Rhesus Macaques with radiation-induced PF showed that: 1) PF-Net achieved higher segmentation accuracy than existing 2D, 3D and 2.5D neural networks, and 2) I-CRAWL outperformed state-of-the-art semi-supervised learning methods for the PF lesion segmentation task. Our method has a potential to improve the diagnosis of PF and clinical assessment of side effects of radiotherapy for lung cancers.

Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed a content-noise complementary learning (CNCL) strategy, in which two deep learning predictors are used to learn the respective content and noise of the image dataset complementarily.
Abstract: Medical imaging denoising faces great challenges, yet is in great demand. With its distinctive characteristics, medical imaging denoising in the image domain requires innovative deep learning strategies. In this study, we propose a simple yet effective strategy, the content-noise complementary learning (CNCL) strategy, in which two deep learning predictors are used to learn the respective content and noise of the image dataset complementarily. A medical image denoising pipeline based on the CNCL strategy is presented, and is implemented as a generative adversarial network, where various representative networks (including U-Net, DnCNN, and SRDenseNet) are investigated as the predictors. The performance of these implemented models has been validated on medical imaging datasets including CT, MR, and PET. The results show that this strategy outperforms state-of-the-art denoising algorithms in terms of visual quality and quantitative metrics, and the strategy demonstrates a robust generalization capability. These findings validate that this simple yet effective strategy demonstrates promising potential for medical image denoising tasks, which could exert a clinical impact in the future. Code is available at: https://github.com/gengmufeng/CNCL-denoising.

Journal ArticleDOI
TL;DR: Compared with existing competitive algorithms, quantitative and qualitative results show that the proposed DIOR brings a promising improvement in artifact removal, detail restoration and edge preservation.
Abstract: Limited-angle CT is a challenging problem in real applications. Incomplete projection data will lead to severe artifacts and distortions in reconstruction images. To tackle this problem, we propose a novel reconstruction framework termed Deep Iterative Optimization-based Residual-learning (DIOR) for limited-angle CT. Instead of directly deploying the regularization term on image space, the DIOR combines iterative optimization and deep learning based on the residual domain, significantly improving the convergence property and generalization ability. Specifically, the asymmetric convolutional modules are adopted to strengthen the feature extraction capacity in smooth regions for deep priors. Besides, in our DIOR method, the information contained in low-frequency and high-frequency components is also evaluated by perceptual loss to improve the performance in tissue preservation. Both simulated and clinical datasets are performed to validate the performance of DIOR. Compared with existing competitive algorithms, quantitative and qualitative results show that the proposed method brings a promising improvement in artifact removal, detail restoration and edge preservation.

Journal ArticleDOI
TL;DR: A novel dual encoder-based dynamic-channel graph convolutional network with edge enhancement (DE-DCGCN-EE) for retinal vessel segmentation is proposed, which achieves more remarkable segmentation results against the other state-of-the-art methods, indicating its potential clinical application.
Abstract: Retinal vessel segmentation with deep learning technology is a crucial auxiliary method for clinicians to diagnose fundus diseases. However, the deep learning approaches inevitably lose the edge information, which contains spatial features of vessels while performing down-sampling, leading to the limited segmentation performance of fine blood vessels. Furthermore, the existing methods ignore the dynamic topological correlations among feature maps in the deep learning framework, resulting in the inefficient capture of the channel characterization. To address these limitations, we propose a novel dual encoder-based dynamic-channel graph convolutional network with edge enhancement (DE-DCGCN-EE) for retinal vessel segmentation. Specifically, we first design an edge detection-based dual encoder to preserve the edge of vessels in down-sampling. Secondly, we investigate a dynamic-channel graph convolutional network to map the image channels to the topological space and synthesize the features of each channel on the topological map, which solves the limitation of insufficient channel information utilization. Finally, we study an edge enhancement block, aiming to fuse the edge and spatial features in the dual encoder, which is beneficial to improve the accuracy of fine blood vessel segmentation. Competitive experimental results on five retinal image datasets validate the efficacy of the proposed DE-DCGCN-EE, which achieves more remarkable segmentation results against the other state-of-the-art methods, indicating its potential clinical application.

Journal ArticleDOI
TL;DR: In this paper , a generative approach is proposed to learn image registration without acquired imaging data, producing powerful networks agnostic to contrast introduced by magnetic resonance imaging (MRI) while classical registration methods accurately estimate the spatial correspondence between images, they solve an optimization problem for every new image pair.
Abstract: We introduce a strategy for learning image registration without acquired imaging data, producing powerful networks agnostic to contrast introduced by magnetic resonance imaging (MRI). While classical registration methods accurately estimate the spatial correspondence between images, they solve an optimization problem for every new image pair. Learning-based techniques are fast at test time but limited to registering images with contrasts and geometric content similar to those seen during training. We propose to remove this dependency on training data by leveraging a generative strategy for diverse synthetic label maps and images that exposes networks to a wide range of variability, forcing them to learn more invariant features. This approach results in powerful networks that accurately generalize to a broad array of MRI contrasts. We present extensive experiments with a focus on 3D neuroimaging, showing that this strategy enables robust and accurate registration of arbitrary MRI contrasts even if the target contrast is not seen by the networks during training. We demonstrate registration accuracy surpassing the state of the art both within and across contrasts, using a single model. Critically, training on arbitrary shapes synthesized from noise distributions results in competitive performance, removing the dependency on acquired data of any kind. Additionally, since anatomical label maps are often available for the anatomy of interest, we show that synthesizing images from these dramatically boosts performance, while still avoiding the need for real intensity images. Our code is available at doic https://w3id.org/synthmorph.

Journal ArticleDOI
TL;DR: Detailed experiments on multi-institutional datasets clearly demonstrate enhanced generalization performance of FedGIMP against site-specific and federated methods based on conditional models, as well as traditional reconstruction methods.
Abstract: Multi-institutional efforts can facilitate training of deep MRI reconstruction models, albeit privacy risks arise during cross-site sharing of imaging data. Federated learning (FL) has recently been introduced to address privacy concerns by enabling distributed training without transfer of imaging data. Existing FL methods employ conditional reconstruction models to map from undersampled to fully-sampled acquisitions via explicit knowledge of the accelerated imaging operator. Since conditional models generalize poorly across different acceleration rates or sampling densities, imaging operators must be fixed between training and testing, and they are typically matched across sites. To improve patient privacy, performance and flexibility in multi-site collaborations, here we introduce Federated learning of Generative IMage Priors (FedGIMP) for MRI reconstruction. FedGIMP leverages a two-stage approach: cross-site learning of a generative MRI prior, and prior adaptation following injection of the imaging operator. The global MRI prior is learned via an unconditional adversarial model that synthesizes high-quality MR images based on latent variables. A novel mapper subnetwork produces site-specific latents to maintain specificity in the prior. During inference, the prior is first combined with subject-specific imaging operators to enable reconstruction, and it is then adapted to individual cross-sections by minimizing a data-consistency loss. Comprehensive experiments on multi-institutional datasets clearly demonstrate enhanced performance of FedGIMP against both centralized and FL methods based on conditional models.

Journal ArticleDOI
TL;DR: DIOR as mentioned in this paper combines iterative optimization and deep learning based on the residual domain to improve the convergence property and generalization ability of residual domain for limited-angle CT image reconstruction.
Abstract: Limited-angle CT is a challenging problem in real applications. Incomplete projection data will lead to severe artifacts and distortions in reconstruction images. To tackle this problem, we propose a novel reconstruction framework termed Deep Iterative Optimization-based Residual-learning (DIOR) for limited-angle CT. Instead of directly deploying the regularization term on image space, the DIOR combines iterative optimization and deep learning based on the residual domain, significantly improving the convergence property and generalization ability. Specifically, the asymmetric convolutional modules are adopted to strengthen the feature extraction capacity in smooth regions for deep priors. Besides, in our DIOR method, the information contained in low-frequency and high-frequency components is also evaluated by perceptual loss to improve the performance in tissue preservation. Both simulated and clinical datasets are performed to validate the performance of DIOR. Compared with existing competitive algorithms, quantitative and qualitative results show that the proposed method brings a promising improvement in artifact removal, detail restoration and edge preservation.

Journal ArticleDOI
TL;DR: In this article , a graph neural network (GNN) based deep learning framework with multiple graph reasoning modules is proposed to explicitly leverage both region and boundary features in an end-to-end manner.
Abstract: Segmentation is a fundamental task in biomedical image analysis. Unlike the existing region-based dense pixel classification methods or boundary-based polygon regression methods, we build a novel graph neural network (GNN) based deep learning framework with multiple graph reasoning modules to explicitly leverage both region and boundary features in an end-to-end manner. The mechanism extracts discriminative region and boundary features, referred to as initialized region and boundary node embeddings, using a proposed Attention Enhancement Module (AEM). The weighted links between cross-domain nodes (region and boundary feature domains) in each graph are defined in a data-dependent way, which retains both global and local cross-node relationships. The iterative message aggregation and node update mechanism can enhance the interaction between each graph reasoning module's global semantic information and local spatial characteristics. Our model, in particular, is capable of concurrently addressing region and boundary feature reasoning and aggregation at several different feature levels due to the proposed multi-level feature node embeddings in different parallel graph reasoning modules. Experiments on two types of challenging datasets demonstrate that our method outperforms state-of-the-art approaches for segmentation of polyps in colonoscopy images and of the optic disc and optic cup in colour fundus images. The trained models will be made available at: https://github.com/smallmax00/Graph_Region_Boudnary.

Journal ArticleDOI
TL;DR: TranSMS as mentioned in this paper uses a vision transformer module to capture contextual relationships in low-resolution input images, a dense convolutional module for localizing high-resolution image features, and a data-consistency module to ensure measurement fidelity.
Abstract: Magnetic particle imaging (MPI) offers exceptional contrast for magnetic nanoparticles (MNP) at high spatio-temporal resolution. A common procedure in MPI starts with a calibration scan to measure the system matrix (SM), which is then used to set up an inverse problem to reconstruct images of the MNP distribution during subsequent scans. This calibration enables the reconstruction to sensitively account for various system imperfections. Yet time-consuming SM measurements have to be repeated under notable changes in system properties. Here, we introduce a novel deep learning approach for accelerated MPI calibration based on Transformers for SM super-resolution (TranSMS). Low-resolution SM measurements are performed using large MNP samples for improved signal-to-noise ratio efficiency, and the high-resolution SM is super-resolved via model-based deep learning. TranSMS leverages a vision transformer module to capture contextual relationships in low-resolution input images, a dense convolutional module for localizing high-resolution image features, and a data-consistency module to ensure measurement fidelity. Demonstrations on simulated and experimental data indicate that TranSMS significantly improves SM recovery and MPI reconstruction for up to 64-fold acceleration in two-dimensional imaging.

Journal ArticleDOI
TL;DR: In this paper , a genetic U-Net is proposed to generate a U-shaped CNN that can achieve better retinal vessel segmentation but with fewer architecture-based parameters, thereby addressing the above issues.
Abstract: Recently, many methods based on hand-designed convolutional neural networks (CNNs) have achieved promising results in automatic retinal vessel segmentation. However, these CNNs remain constrained in capturing retinal vessels in complex fundus images. To improve their segmentation performance, these CNNs tend to have many parameters, which may lead to overfitting and high computational complexity. Moreover, the manual design of competitive CNNs is time-consuming and requires extensive empirical knowledge. Herein, a novel automated design method, called Genetic U-Net, is proposed to generate a U-shaped CNN that can achieve better retinal vessel segmentation but with fewer architecture-based parameters, thereby addressing the above issues. First, we devised a condensed but flexible search space based on a U-shaped encoder-decoder. Then, we used an improved genetic algorithm to identify better-performing architectures in the search space and investigated the possibility of finding a superior network architecture with fewer parameters. The experimental results show that the architecture obtained using the proposed method offered a superior performance with less than 1% of the number of the original U-Net parameters in particular and with significantly fewer parameters than other state-of-the-art models. Furthermore, through in-depth investigation of the experimental results, several effective operations and patterns of networks to generate superior retinal vessel segmentations were identified. The codes of this work are available at https://github.com/96jhwei/Genetic-U-Net .

Journal ArticleDOI
TL;DR: In this paper , an efficient non-Cartesian unrolled neural network-based reconstruction and an accurate approximation for backpropagation through the non-uniform fast Fourier transform (NUFFT) operator are used to accurately reconstruct and backpropagate multi-coil non-cartesian data.
Abstract: Optimizing k-space sampling trajectories is a promising yet challenging topic for fast magnetic resonance imaging (MRI). This work proposes to optimize a reconstruction method and sampling trajectories jointly concerning image reconstruction quality in a supervised learning manner. We parameterize trajectories with quadratic B-spline kernels to reduce the number of parameters and apply multi-scale optimization, which may help to avoid sub-optimal local minima. The algorithm includes an efficient non-Cartesian unrolled neural network-based reconstruction and an accurate approximation for backpropagation through the non-uniform fast Fourier transform (NUFFT) operator to accurately reconstruct and back-propagate multi-coil non-Cartesian data. Penalties on slew rate and gradient amplitude enforce hardware constraints. Sampling and reconstruction are trained jointly using large public datasets. To correct for possible eddy-current effects introduced by the curved trajectory, we use a pencil-beam trajectory mapping technique. In both simulations and in- vivo experiments, the learned trajectory demonstrates significantly improved image quality compared to previous model-based and learning-based trajectory optimization methods for 10× acceleration factors. Though trained with neural network-based reconstruction, the proposed trajectory also leads to improved image quality with compressed sensing-based reconstruction.

Journal ArticleDOI
TL;DR: In this article , a self-supervised few-shot semantic segmentation (FSS) framework for medical images, named SSL-ALPNet, exploits superpixel-based pseudo-labels to provide supervision signals.
Abstract: Fully-supervised deep learning segmentation models are inflexible when encountering new unseen semantic classes and their fine-tuning often requires significant amounts of annotated data. Few-shot semantic segmentation (FSS) aims to solve this inflexibility by learning to segment an arbitrary unseen semantically meaningful class by referring to only a few labeled examples, without involving fine-tuning. State-of-the-art FSS methods are typically designed for segmenting natural images and rely on abundant annotated data of training classes to learn image representations that generalize well to unseen testing classes. However, such a training mechanism is impractical in annotation-scarce medical imaging scenarios. To address this challenge, in this work, we propose a novel self-supervised FSS framework for medical images, named SSL-ALPNet, in order to bypass the requirement for annotations during training. The proposed method exploits superpixel-based pseudo-labels to provide supervision signals. In addition, we propose a simple yet effective adaptive local prototype pooling module which is plugged into the prototype networks to further boost segmentation accuracy. We demonstrate the general applicability of the proposed approach using three different tasks: organ segmentation of abdominal CT and MRI images respectively, and cardiac segmentation of MRI images. The proposed method yields higher Dice scores than conventional FSS methods which require manual annotations for training in our experiments.

Journal ArticleDOI
TL;DR: A GZSL method that uses self supervised learning for selecting representative vectors of disease classes; and synthesizing features of unseen classes is proposed, and a novel approach to generate GradCAM saliency maps that highlight diseased regions with greater accuracy is proposed.
Abstract: In many real world medical image classification settings, access to samples of all disease classes is not feasible, affecting the robustness of a system expected to have high performance in analyzing novel test data. This is a case of generalized zero shot learning (GZSL) aiming to recognize seen and unseen classes. We propose a GZSL method that uses self supervised learning (SSL) for: 1) selecting representative vectors of disease classes; and 2) synthesizing features of unseen classes. We also propose a novel approach to generate GradCAM saliency maps that highlight diseased regions with greater accuracy. We exploit information from the novel saliency maps to improve the clustering process by: 1) Enforcing the saliency maps of different classes to be different; and 2) Ensuring that clusters in the space of image and saliency features should yield class centroids having similar semantic information. This ensures the anchor vectors are representative of each class. Different from previous approaches, our proposed approach does not require class attribute vectors which are essential part of GZSL methods for natural images but are not available for medical images. Using a simple architecture the proposed method outperforms state of the art SSL based GZSL performance for natural images as well as multiple types of medical images. We also conduct many ablation studies to investigate the influence of different loss terms in our method.

Journal ArticleDOI
TL;DR: In this paper , a conservative-radical network (CoraNet) is proposed for semi-supervised medical image segmentation, which consists of three major components: a conservative radical module (CRM), a certain region segmentation network (c-SN), and an uncertain region segmentization network (UC-SN) that could be alternatively trained in an end-to-end manner.
Abstract: In semi-supervised medical image segmentation, most previous works draw on the common assumption that higher entropy means higher uncertainty. In this paper, we investigate a novel method of estimating uncertainty. We observe that, when assigned different misclassification costs in a certain degree, if the segmentation result of a pixel becomes inconsistent, this pixel shows a relative uncertainty in its segmentation. Therefore, we present a new semi-supervised segmentation model, namely, conservative-radical network (CoraNet in short) based on our uncertainty estimation and separate self-training strategy. In particular, our CoraNet model consists of three major components: a conservative-radical module (CRM), a certain region segmentation network (C-SN), and an uncertain region segmentation network (UC-SN) that could be alternatively trained in an end-to-end manner. We have extensively evaluated our method on various segmentation tasks with publicly available benchmark datasets, including CT pancreas, MR endocardium, and MR multi-structures segmentation on the ACDC dataset. Compared with the current state of the art, our CoraNet has demonstrated superior performance. In addition, we have also analyzed its connection with and difference from conventional methods of uncertainty estimation in semi-supervised medical image segmentation.

Journal ArticleDOI
TL;DR: It is demonstrated that the eye movement of radiologists reading medical images can be a new form of supervision to train the DNN-based computer-aided diagnosis (CAD) system with considerable improvement in diagnosis performance, with the help of gaze supervision.
Abstract: When deep neural network (DNN) was first introduced to the medical image analysis community, researchers were impressed by its performance. However, it is evident now that a large number of manually labeled data is often a must to train a properly functioning DNN. This demand for supervision data and labels is a major bottleneck in current medical image analysis, since collecting a large number of annotations from experienced experts can be time-consuming and expensive. In this paper, we demonstrate that the eye movement of radiologists reading medical images can be a new form of supervision to train the DNN-based computer-aided diagnosis (CAD) system. Particularly, we record the tracks of the radiologists’ gaze when they are reading images. The gaze information is processed and then used to supervise the DNN’s attention via an Attention Consistency module. To the best of our knowledge, the above pipeline is among the earliest efforts to leverage expert eye movement for deep-learning-based CAD. We have conducted extensive experiments on knee X-ray images for osteoarthritis assessment. The results show that our method can achieve considerable improvement in diagnosis performance, with the help of gaze supervision.

Journal ArticleDOI
TL;DR: The capability of properly restoring cataractous images in the absence of annotated data promises the proposed algorithm outstanding clinical practicability.
Abstract: Cataracts are the leading cause of vision loss worldwide. Restoration algorithms are developed to improve the readability of cataract fundus images in order to increase the certainty in diagnosis and treatment for cataract patients. Unfortunately, the requirement of annotation limits the application of these algorithms in clinics. This paper proposes a network to annotation-freely restore cataractous fundus images (ArcNet) so as to boost the clinical practicability of restoration. Annotations are unnecessary in ArcNet, where the high-frequency component is extracted from fundus images to replace segmentation in the preservation of retinal structures. The restoration model is learned from the synthesized images and adapted to real cataract images. Extensive experiments are implemented to verify the performance and effectiveness of ArcNet. Favorable performance is achieved using ArcNet against state-of-the-art algorithms, and the diagnosis of ocular fundus diseases in cataract patients is promoted by ArcNet. The capability of properly restoring cataractous images in the absence of annotated data promises the proposed algorithm outstanding clinical practicability.