scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Medical Imaging in 2018"


Journal ArticleDOI
TL;DR: This work proposes a novel hybrid densely connected UNet (H-DenseUNet), which consists of a 2-D Dense UNet for efficiently extracting intra-slice features and a 3-D counterpart for hierarchically aggregating volumetric contexts under the spirit of the auto-context algorithm for liver and tumor segmentation.
Abstract: Liver cancer is one of the leading causes of cancer death. To assist doctors in hepatocellular carcinoma diagnosis and treatment planning, an accurate and automatic liver and tumor segmentation method is highly demanded in the clinical practice. Recently, fully convolutional neural networks (FCNs), including 2-D and 3-D FCNs, serve as the backbone in many volumetric image segmentation. However, 2-D convolutions cannot fully leverage the spatial information along the third dimension while 3-D convolutions suffer from high computational cost and GPU memory consumption. To address these issues, we propose a novel hybrid densely connected UNet (H-DenseUNet), which consists of a 2-D DenseUNet for efficiently extracting intra-slice features and a 3-D counterpart for hierarchically aggregating volumetric contexts under the spirit of the auto-context algorithm for liver and tumor segmentation. We formulate the learning process of the H-DenseUNet in an end-to-end manner, where the intra-slice representations and inter-slice features can be jointly optimized through a hybrid feature fusion layer. We extensively evaluated our method on the data set of the MICCAI 2017 Liver Tumor Segmentation Challenge and 3DIRCADb data set. Our method outperformed other state-of-the-arts on the segmentation results of tumors and achieved very competitive performance for liver segmentation even with a single model.

1,561 citations


Journal ArticleDOI
TL;DR: A framework for reconstructing dynamic sequences of 2-D cardiac magnetic resonance images from undersampled data using a deep cascade of convolutional neural networks (CNNs) to accelerate the data acquisition process is proposed and it is demonstrated that CNNs can learn spatio-temporal correlations efficiently by combining convolution and data sharing approaches.
Abstract: Inspired by recent advances in deep learning, we propose a framework for reconstructing dynamic sequences of 2-D cardiac magnetic resonance (MR) images from undersampled data using a deep cascade of convolutional neural networks (CNNs) to accelerate the data acquisition process. In particular, we address the case where data are acquired using aggressive Cartesian undersampling. First, we show that when each 2-D image frame is reconstructed independently, the proposed method outperforms state-of-the-art 2-D compressed sensing approaches, such as dictionary learning-based MR image reconstruction, in terms of reconstruction error and reconstruction speed. Second, when reconstructing the frames of the sequences jointly, we demonstrate that CNNs can learn spatio-temporal correlations efficiently by combining convolution and data sharing approaches. We show that the proposed method consistently outperforms state-of-the-art methods and is capable of preserving anatomical structure more faithfully up to 11-fold undersampling. Moreover, reconstruction is very fast: each complete dynamic sequence can be reconstructed in less than 10 s and, for the 2-D case, each image frame can be reconstructed in 23 ms, enabling real-time applications.

1,062 citations


Journal ArticleDOI
TL;DR: How far state-of-the-art deep learning methods can go at assessing CMRI, i.e., segmenting the myocardium and the two ventricles as well as classifying pathologies is measured, to open the door to highly accurate and fully automatic analysis of cardiac CMRI.
Abstract: Delineation of the left ventricular cavity, myocardium, and right ventricle from cardiac magnetic resonance images (multi-slice 2-D cine MRI) is a common clinical task to establish diagnosis. The automation of the corresponding tasks has thus been the subject of intense research over the past decades. In this paper, we introduce the “Automatic Cardiac Diagnosis Challenge” dataset (ACDC), the largest publicly available and fully annotated dataset for the purpose of cardiac MRI (CMR) assessment. The dataset contains data from 150 multi-equipments CMRI recordings with reference measurements and classification from two medical experts. The overarching objective of this paper is to measure how far state-of-the-art deep learning methods can go at assessing CMRI, i.e., segmenting the myocardium and the two ventricles as well as classifying pathologies. In the wake of the 2017 MICCAI-ACDC challenge, we report results from deep learning methods provided by nine research groups for the segmentation task and four groups for the classification task. Results show that the best methods faithfully reproduce the expert analysis, leading to a mean value of 0.97 correlation score for the automatic extraction of clinical indices and an accuracy of 0.96 for automatic diagnosis. These results clearly open the door to highly accurate and fully automatic analysis of cardiac CMRI. We also identify scenarios for which deep learning methods are still failing. Both the dataset and detailed results are publicly available online, while the platform will remain open for new submissions.

1,056 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper introduced a new CT image denoising method based on the generative adversarial network (GAN) with Wasserstein distance and perceptual similarity, which is capable of not only reducing the image noise level but also trying to keep the critical information at the same time.
Abstract: The continuous development and extensive use of computed tomography (CT) in medical practice has raised a public concern over the associated radiation dose to the patient. Reducing the radiation dose may lead to increased noise and artifacts, which can adversely affect the radiologists’ judgment and confidence. Hence, advanced image reconstruction from low-dose CT data is needed to improve the diagnostic performance, which is a challenging problem due to its ill-posed nature. Over the past years, various low-dose CT methods have produced impressive results. However, most of the algorithms developed for this application, including the recently popularized deep learning techniques, aim for minimizing the mean-squared error (MSE) between a denoised CT image and the ground truth under generic penalties. Although the peak signal-to-noise ratio is improved, MSE- or weighted-MSE-based methods can compromise the visibility of important structural details after aggressive denoising. This paper introduces a new CT image denoising method based on the generative adversarial network (GAN) with Wasserstein distance and perceptual similarity. The Wasserstein distance is a key concept of the optimal transport theory and promises to improve the performance of GAN. The perceptual loss suppresses noise by comparing the perceptual features of a denoised output against those of the ground truth in an established feature space, while the GAN focuses more on migrating the data noise distribution from strong to weak statistically. Therefore, our proposed method transfers our knowledge of visual perception to the image denoising task and is capable of not only reducing the image noise level but also trying to keep the critical information at the same time. Promising results have been obtained in our experiments with clinical CT images.

916 citations


Journal ArticleDOI
TL;DR: This paper provides a deep learning-based strategy for reconstruction of CS-MRI, and bridges a substantial gap between conventional non-learning methods working only on data from a single image, and prior knowledge from large training data sets.
Abstract: Compressed sensing magnetic resonance imaging (CS-MRI) enables fast acquisition, which is highly desirable for numerous clinical applications. This can not only reduce the scanning cost and ease patient burden, but also potentially reduce motion artefacts and the effect of contrast washout, thus yielding better image quality. Different from parallel imaging-based fast MRI, which utilizes multiple coils to simultaneously receive MR signals, CS-MRI breaks the Nyquist–Shannon sampling barrier to reconstruct MRI images with much less required raw data. This paper provides a deep learning-based strategy for reconstruction of CS-MRI, and bridges a substantial gap between conventional non-learning methods working only on data from a single image, and prior knowledge from large training data sets. In particular, a novel conditional Generative Adversarial Networks-based model (DAGAN)-based model is proposed to reconstruct CS-MRI. In our DAGAN architecture, we have designed a refinement learning method to stabilize our U-Net based generator, which provides an end-to-end network to reduce aliasing artefacts. To better preserve texture and edges in the reconstruction, we have coupled the adversarial loss with an innovative content loss. In addition, we incorporate frequency-domain information to enforce similarity in both the image and frequency domains. We have performed comprehensive comparison studies with both conventional CS-MRI reconstruction methods and newly investigated deep learning approaches. Compared with these methods, our DAGAN method provides superior reconstruction with preserved perceptual image details. Furthermore, each image is reconstructed in about 5 ms, which is suitable for real-time processing.

835 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a multi-scale input layer, U-shape convolutional network, side-output layer, and multi-label loss function for OD and OC segmentation.
Abstract: Glaucoma is a chronic eye disease that leads to irreversible vision loss. The cup to disc ratio (CDR) plays an important role in the screening and diagnosis of glaucoma. Thus, the accurate and automatic segmentation of optic disc (OD) and optic cup (OC) from fundus images is a fundamental task. Most existing methods segment them separately, and rely on hand-crafted visual feature from fundus images. In this paper, we propose a deep learning architecture, named M-Net, which solves the OD and OC segmentation jointly in a one-stage multi-label system. The proposed M-Net mainly consists of multi-scale input layer, U-shape convolutional network, side-output layer, and multi-label loss function. The multi-scale input layer constructs an image pyramid to achieve multiple level receptive field sizes. The U-shape convolutional network is employed as the main body network structure to learn the rich hierarchical representation, while the side-output layer acts as an early classifier that produces a companion local prediction map for different scale layers. Finally, a multi-label loss function is proposed to generate the final segmentation map. For improving the segmentation performance further, we also introduce the polar transformation, which provides the representation of the original image in the polar coordinate system. The experiments show that our M-Net system achieves state-of-the-art OD and OC segmentation result on ORIGA data set. Simultaneously, the proposed method also obtains the satisfactory glaucoma screening performances with calculated CDR value on both ORIGA and SCES datasets.

653 citations


Journal ArticleDOI
TL;DR: A novel deep learning-based interactive segmentation framework by incorporating CNNs into a bounding box and scribble-based segmentation pipeline and proposing a weighted loss function considering network and interaction-based uncertainty for the fine tuning is proposed.
Abstract: Convolutional neural networks (CNNs) have achieved state-of-the-art performance for automatic medical image segmentation. However, they have not demonstrated sufficiently accurate and robust results for clinical use. In addition, they are limited by the lack of image-specific adaptation and the lack of generalizability to previously unseen object classes (a.k.a. zero-shot learning). To address these problems, we propose a novel deep learning-based interactive segmentation framework by incorporating CNNs into a bounding box and scribble-based segmentation pipeline. We propose image-specific fine tuning to make a CNN model adaptive to a specific test image, which can be either unsupervised (without additional user interactions) or supervised (with additional scribbles). We also propose a weighted loss function considering network and interaction-based uncertainty for the fine tuning. We applied this framework to two applications: 2-D segmentation of multiple organs from fetal magnetic resonance (MR) slices, where only two types of these organs were annotated for training and 3-D segmentation of brain tumor core (excluding edema) and whole brain tumor (including edema) from different MR sequences, where only the tumor core in one MR sequence was annotated for training. Experimental results show that: 1) our model is more robust to segment previously unseen objects than state-of-the-art CNNs; 2) image-specific fine tuning with the proposed weighted loss function significantly improves segmentation accuracy; and 3) our method leads to accurate results with fewer user interactions and less user time than traditional interactive segmentation methods.

582 citations


Journal ArticleDOI
TL;DR: The Learned Primal-Dual algorithm for tomographic reconstruction accounts for a (possibly non-linear) forward operator in a deep neural network by unrolling a proximal primal-dual optimization method, but where the proximal operators have been replaced with convolutional neural networks.
Abstract: We propose the Learned Primal-Dual algorithm for tomographic reconstruction. The algorithm accounts for a (possibly non-linear) forward operator in a deep neural network by unrolling a proximal primal-dual optimization method, but where the proximal operators have been replaced with convolutional neural networks. The algorithm is trained end-to-end, working directly from raw measured data and it does not depend on any initial reconstruction such as filtered back-projection (FBP). We compare performance of the proposed method on low dose computed tomography reconstruction against FBP, total variation (TV), and deep learning based post-processing of FBP. For the Shepp-Logan phantom we obtain >6 dB peak signal to noise ratio improvement against all compared methods. For human phantoms the corresponding improvement is 6.6 dB over TV and 2.2 dB over learned post-processing along with a substantial improvement in the structural similarity index. Finally, our algorithm involves only ten forward-back-projection computations, making the method feasible for time critical clinical applications.

556 citations


Journal ArticleDOI
TL;DR: In this article, a generic training strategy that incorporates anatomical prior knowledge into CNNs through a new regularization model, which is trained end-to-end, encourages models to follow the global anatomical properties of the underlying anatomy via learnt non-linear representations of the shape.
Abstract: Incorporation of prior knowledge about organ shape and location is key to improve performance of image analysis approaches In particular, priors can be useful in cases where images are corrupted and contain artefacts due to limitations in image acquisition The highly constrained nature of anatomical objects can be well captured with learning-based techniques However, in most recent and promising techniques such as CNN-based segmentation it is not obvious how to incorporate such prior knowledge State-of-the-art methods operate as pixel-wise classifiers where the training objectives do not incorporate the structure and inter-dependencies of the output To overcome this limitation, we propose a generic training strategy that incorporates anatomical prior knowledge into CNNs through a new regularisation model, which is trained end-to-end The new framework encourages models to follow the global anatomical properties of the underlying anatomy ( eg shape, label structure) via learnt non-linear representations of the shape We show that the proposed approach can be easily adapted to different analysis tasks ( eg image enhancement, segmentation) and improve the prediction accuracy of the state-of-the-art models The applicability of our approach is shown on multi-modal cardiac data sets and public benchmarks In addition, we demonstrate how the learnt deep models of 3-D shapes can be interpreted and used as biomarkers for classification of cardiac pathologies

529 citations


Journal ArticleDOI
TL;DR: It is concluded that the deep-learning-based segmentation represents a registration-free method for multi-organ abdominal CT segmentation whose accuracy can surpass current methods, potentially supporting image-guided navigation in gastrointestinal endoscopy procedures.
Abstract: Automatic segmentation of abdominal anatomy on computed tomography (CT) images can support diagnosis, treatment planning, and treatment delivery workflows. Segmentation methods using statistical models and multi-atlas label fusion (MALF) require inter-subject image registrations, which are challenging for abdominal images, but alternative methods without registration have not yet achieved higher accuracy for most abdominal organs. We present a registration-free deep-learning-based segmentation algorithm for eight organs that are relevant for navigation in endoscopic pancreatic and biliary procedures, including the pancreas, the gastrointestinal tract (esophagus, stomach, and duodenum) and surrounding organs (liver, spleen, left kidney, and gallbladder). We directly compared the segmentation accuracy of the proposed method to the existing deep learning and MALF methods in a cross-validation on a multi-centre data set with 90 subjects. The proposed method yielded significantly higher Dice scores for all organs and lower mean absolute distances for most organs, including Dice scores of 0.78 versus 0.71, 0.74, and 0.74 for the pancreas, 0.90 versus 0.85, 0.87, and 0.83 for the stomach, and 0.76 versus 0.68, 0.69, and 0.66 for the esophagus. We conclude that the deep-learning-based segmentation represents a registration-free method for multi-organ abdominal CT segmentation whose accuracy can surpass current methods, potentially supporting image-guided navigation in gastrointestinal endoscopy procedures.

506 citations


Journal ArticleDOI
Yoseob Han1, Jong Chul Ye1
TL;DR: Recently, deep learning approaches using large receptive field neural networks such as U-Net have demonstrated impressive performance for sparse-view CT reconstruction as discussed by the authors, however, theoretical justification is still lacking, and the main goal of this paper is to reveal the limitation of U-net and propose new multi-resolution deep learning schemes.
Abstract: X-ray computed tomography (CT) using sparse projection views is a recent approach to reduce the radiation dose. However, due to the insufficient projection views, an analytic reconstruction approach using the filtered back projection (FBP) produces severe streaking artifacts. Recently, deep learning approaches using large receptive field neural networks such as U-Net have demonstrated impressive performance for sparse-view CT reconstruction. However, theoretical justification is still lacking. Inspired by the recent theory of deep convolutional framelets , the main goal of this paper is, therefore, to reveal the limitation of U-Net and propose new multi-resolution deep learning schemes. In particular, we show that the alternative U-Net variants such as dual frame and tight frame U-Nets satisfy the so-called frame condition which makes them better for effective recovery of high frequency edges in sparse-view CT. Using extensive experiments with real patient data set, we demonstrate that the new network architectures provide better reconstruction performance.

Journal ArticleDOI
TL;DR: RefineGAN as mentioned in this paper is a variant of fully-residual convolutional autoencoder and generative adversarial networks (GANs) specifically designed for CS-MRI formulation; it employs deeper generator and discriminator networks with cyclic data consistency loss for faithful interpolation in the given under-sampled data.
Abstract: Compressed sensing magnetic resonance imaging (CS-MRI) has provided theoretical foundations upon which the time-consuming MRI acquisition process can be accelerated. However, it primarily relies on iterative numerical solvers, which still hinders their adaptation in time-critical applications. In addition, recent advances in deep neural networks have shown their potential in computer vision and image processing, but their adaptation to MRI reconstruction is still in an early stage. In this paper, we propose a novel deep learning-based generative adversarial model, RefineGAN , for fast and accurate CS-MRI reconstruction. The proposed model is a variant of fully-residual convolutional autoencoder and generative adversarial networks (GANs), specifically designed for CS-MRI formulation; it employs deeper generator and discriminator networks with cyclic data consistency loss for faithful interpolation in the given under-sampled $k$ -space data. In addition, our solution leverages a chained network to further enhance the reconstruction quality. RefineGAN is fast and accurate—the reconstruction process is extremely rapid, as low as tens of milliseconds for reconstruction of a $256\times 256$ image, because it is one-way deployment on a feed-forward network, and the image quality is superior even for extremely low sampling rate (as low as 10%) due to the data-driven nature of the method. We demonstrate that RefineGAN outperforms the state-of-the-art CS-MRI methods by a large margin in terms of both running time and image quality via evaluation using several open-source MRI databases.

Journal ArticleDOI
TL;DR: This special issue focuses on data-driven tomographic reconstruction and covers the whole workflow of medical imaging: from tomographic raw data/features to reconstructed images and then extracted diagnostic features/readings.
Abstract: Over past several years, machine learning, or more generally artificial intelligence, has generated overwhelming research interest and attracted unprecedented public attention. As tomographic imaging researchers, we share the excitement from our imaging perspective [item 1) in the Appendix], and organized this special issue dedicated to the theme of “Machine learning for image reconstruction.” This special issue is a sister issue of the special issue published in May 2016 of this journal with the theme “Deep learning in medical imaging” [item 2) in the Appendix]. While the previous special issue targeted medical image processing/analysis, this special issue focuses on data-driven tomographic reconstruction. These two special issues are highly complementary, since image reconstruction and image analysis are two of the main pillars for medical imaging. Together we cover the whole workflow of medical imaging: from tomographic raw data/features to reconstructed images and then extracted diagnostic features/readings.

Journal ArticleDOI
TL;DR: A conveying path-based convolutional encoder-decoder (CPCE) network in 2-D and 3-D configurations within the GAN framework for LDCT denoising, which has a better performance in that it suppresses image noise and preserves subtle structures.
Abstract: Low-dose computed tomography (LDCT) has attracted major attention in the medical imaging field, since CT-associated X-ray radiation carries health risks for patients. The reduction of the CT radiation dose, however, compromises the signal-to-noise ratio, which affects image quality and diagnostic performance. Recently, deep-learning-based algorithms have achieved promising results in LDCT denoising, especially convolutional neural network (CNN) and generative adversarial network (GAN) architectures. This paper introduces a conveying path-based convolutional encoder-decoder (CPCE) network in 2-D and 3-D configurations within the GAN framework for LDCT denoising. A novel feature of this approach is that an initial 3-D CPCE denoising model can be directly obtained by extending a trained 2-D CNN, which is then fine-tuned to incorporate 3-D spatial information from adjacent slices. Based on the transfer learning from 2-D to 3-D, the 3-D network converges faster and achieves a better denoising performance when compared with a training from scratch. By comparing the CPCE network with recently published work based on the simulated Mayo data set and the real MGH data set, we demonstrate that the 3-D CPCE denoising model has a better performance in that it suppresses image noise and preserves subtle structures.

Journal ArticleDOI
TL;DR: A relaxed version of PGD wherein gradient descent enforces measurement consistency, while a CNN recursively projects the solution closer to the space of desired reconstruction images and shows an improvement over total variation-based regularization, dictionary learning, and a state-of-the-art deep learning-based direct reconstruction technique.
Abstract: We present a new image reconstruction method that replaces the projector in a projected gradient descent (PGD) with a convolutional neural network (CNN). Recently, CNNs trained as image-to-image regressors have been successfully used to solve inverse problems in imaging. However, unlike existing iterative image reconstruction algorithms, these CNN-based approaches usually lack a feedback mechanism to enforce that the reconstructed image is consistent with the measurements. We propose a relaxed version of PGD wherein gradient descent enforces measurement consistency, while a CNN recursively projects the solution closer to the space of desired reconstruction images. We show that this algorithm is guaranteed to converge and, under certain conditions, converges to a local minimum of a non-convex inverse problem. Finally, we propose a simple scheme to train the CNN to act like a projector. Our experiments on sparse-view computed-tomography reconstruction show an improvement over total variation-based regularization, dictionary learning, and a state-of-the-art deep learning-based direct reconstruction technique.

Journal ArticleDOI
TL;DR: In this paper, a learned experts' assessment-based reconstruction network (LEARN) was proposed for sparse-data computed tomography (CT) reconstruction, which utilizes application-oriented knowledge more effectively and recovers underlying images more favorably than competing algorithms.
Abstract: Compressive sensing (CS) has proved effective for tomographic reconstruction from sparsely collected data or under-sampled measurements, which are practically important for few-view computed tomography (CT), tomosynthesis, interior tomography, and so on. To perform sparse-data CT, the iterative reconstruction commonly uses regularizers in the CS framework. Currently, how to choose the parameters adaptively for regularization is a major open problem. In this paper, inspired by the idea of machine learning especially deep learning, we unfold the state-of-the-art “fields of experts”-based iterative reconstruction scheme up to a number of iterations for data-driven training, construct a learned experts’ assessment-based reconstruction network (LEARN) for sparse-data CT, and demonstrate the feasibility and merits of our LEARN network. The experimental results with our proposed LEARN network produces a superior performance with the well-known Mayo Clinic low-dose challenge data set relative to the several state-of-the-art methods, in terms of artifact reduction, feature preservation, and computational speed. This is consistent to our insight that because all the regularization terms and parameters used in the iterative reconstruction are now learned from the training data, our LEARN network utilizes application-oriented knowledge more effectively and recovers underlying images more favorably than competing algorithms. Also, the number of layers in the LEARN network is only 50, reducing the computational complexity of typical iterative algorithms by orders of magnitude.

Journal ArticleDOI
TL;DR: This paper proposes to implement an adversarial autoencoder for the task of retinal vessel network synthesis, and uses the generated vessel trees as an intermediate stage for the generation of color retinal images, which is accomplished with a generative adversarial network.
Abstract: In medical image analysis applications, the availability of the large amounts of annotated data is becoming increasingly critical. However, annotated medical data is often scarce and costly to obtain. In this paper, we address the problem of synthesizing retinal color images by applying recent techniques based on adversarial learning. In this setting, a generative model is trained to maximize a loss function provided by a second model attempting to classify its output into real or synthetic. In particular, we propose to implement an adversarial autoencoder for the task of retinal vessel network synthesis. We use the generated vessel trees as an intermediate stage for the generation of color retinal images, which is accomplished with a generative adversarial network. Both models require the optimization of almost everywhere differentiable loss functions, which allows us to train them jointly. The resulting model offers an end-to-end retinal image synthesis system capable of generating as many retinal images as the user requires, with their corresponding vessel networks, by sampling from a simple probability distribution that we impose to the associated latent space. We show that the learned latent space contains a well-defined semantic structure, implying that we can perform calculations in the space of retinal images, e.g., smoothly interpolating new data points between two retinal images. Visual and quantitative results demonstrate that the synthesized images are substantially different from those in the training set, while being also anatomically consistent and displaying a reasonable visual quality.

Journal ArticleDOI
TL;DR: A deep neural network is presented that is specifically designed to provide high resolution 3-D images from restricted photoacoustic measurements to represent an iterative scheme and incorporates gradient information of the data fit to compensate for limited view artifacts.
Abstract: Recent advances in deep learning for tomographic reconstructions have shown great potential to create accurate and high quality images with a considerable speed up. In this paper, we present a deep neural network that is specifically designed to provide high resolution 3-D images from restricted photoacoustic measurements. The network is designed to represent an iterative scheme and incorporates gradient information of the data fit to compensate for limited view artifacts. Due to the high complexity of the photoacoustic forward operator, we separate training and computation of the gradient information. A suitable prior for the desired image structures is learned as part of the training. The resulting network is trained and tested on a set of segmented vessels from lung computed tomography scans and then applied to in-vivo photoacoustic measurement data.

Journal ArticleDOI
TL;DR: The proposed DD-Net method for sparse-view CT reconstruction achieved a competitive performance relative to the state-of-the-art methods in terms of streaking artifacts removal and structure preservation and can increase the structure similarity by up to 18% and reduce the root mean square error by up-to-42%.
Abstract: Sparse-view computed tomography (CT) holds great promise for speeding up data acquisition and reducing radiation dose in CT scans. Recent advances in reconstruction algorithms for sparse-view CT, such as iterative reconstruction algorithms, obtained high-quality image while requiring advanced computing power. Lately, deep learning (DL) has been widely used in various applications and has obtained many remarkable outcomes. In this paper, we propose a new method for sparse-view CT reconstruction based on the DL approach. The method can be divided into two steps. First, filter backprojection (FBP) was used to reconstruct the CT image from sparsely sampled sinogram. Then, the FBP results were fed to a DL neural network, which is a DenseNet and deconvolution-based network (DD-Net). The DD-Net combines the advantages of DenseNet and deconvolution and applies shortcut connections to concatenate DenseNet and deconvolution to accelerate the training speed of the network; all of those operations can greatly increase the depth of network while enhancing the expression ability of the network. After the training, the proposed DD-Net achieved a competitive performance relative to the state-of-the-art methods in terms of streaking artifacts removal and structure preservation. Compared with the other state-of-the-art reconstruction methods, the DD-Net method can increase the structure similarity by up to 18% and reduce the root mean square error by up to 42%. These results indicate that DD-Net has great potential for sparse-view CT image reconstruction.

Journal ArticleDOI
TL;DR: A convolutional neural network (CNN)-based open MAR framework, which fuses the information from the original and corrected images to suppress artifacts and demonstrate the superior MAR capability of the proposed method to its competitors in terms of artifact suppression and preservation of anatomical structures in the vicinity of metal implants.
Abstract: In the presence of metal implants, metal artifacts are introduced to x-ray computed tomography CT images. Although a large number of metal artifact reduction (MAR) methods have been proposed in the past decades, MAR is still one of the major problems in clinical x-ray CT. In this paper, we develop a convolutional neural network (CNN)-based open MAR framework, which fuses the information from the original and corrected images to suppress artifacts. The proposed approach consists of two phases. In the CNN training phase, we build a database consisting of metal-free, metal-inserted and pre-corrected CT images, and image patches are extracted and used for CNN training. In the MAR phase, the uncorrected and pre-corrected images are used as the input of the trained CNN to generate a CNN image with reduced artifacts. To further reduce the remaining artifacts, water equivalent tissues in a CNN image are set to a uniform value to yield a CNN prior, whose forward projections are used to replace the metal-affected projections, followed by the FBP reconstruction. The effectiveness of the proposed method is validated on both simulated and real data. Experimental results demonstrate the superior MAR capability of the proposed method to its competitors in terms of artifact suppression and preservation of anatomical structures in the vicinity of metal implants.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel framelet-based denoising algorithm using wavelet residual network which synergistically combines the expressive power of deep learning and the performance guarantee from the framelet based denoizing algorithms.
Abstract: Model-based iterative reconstruction algorithms for low-dose X-ray computed tomography (CT) are computationally expensive. To address this problem, we recently proposed a deep convolutional neural network (CNN) for low-dose X-ray CT and won the second place in 2016 AAPM Low-Dose CT Grand Challenge. However, some of the textures were not fully recovered. To address this problem, here we propose a novel framelet-based denoising algorithm using wavelet residual network which synergistically combines the expressive power of deep learning and the performance guarantee from the framelet-based denoising algorithms. The new algorithms were inspired by the recent interpretation of the deep CNN as a cascaded convolution framelet signal representation. Extensive experimental results confirm that the proposed networks have significantly improved performance and preserve the detail texture of the original images.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a disc-aware ensemble network for automatic glaucoma screening, which integrates the deep hierarchical context of the global fundus image and the local optic disc region.
Abstract: Glaucoma is a chronic eye disease that leads to irreversible vision loss. Most of the existing automatic screening methods first segment the main structure and subsequently calculate the clinical measurement for the detection and screening of glaucoma. However, these measurement-based methods rely heavily on the segmentation accuracy and ignore various visual features. In this paper, we introduce a deep learning technique to gain additional image-relevant information and screen glaucoma from the fundus image directly. Specifically, a novel disc-aware ensemble network for automatic glaucoma screening is proposed, which integrates the deep hierarchical context of the global fundus image and the local optic disc region. Four deep streams on different levels and modules are, respectively, considered as global image stream, segmentation-guided network, local disc region stream, and disc polar transformation stream. Finally, the output probabilities of different streams are fused as the final screening result. The experiments on two glaucoma data sets (SCES and new SINDI data sets) show that our method outperforms other state-of-the-art algorithms.

Journal ArticleDOI
TL;DR: A multi-input multi-output fully convolutional neural network model for MRI synthesis that avoids the need for curriculum learning by exploiting the fact that the various input modalities are highly correlated.
Abstract: We propose a multi-input multi-output fully convolutional neural network model for MRI synthesis. The model is robust to missing data, as it benefits from, but does not require, additional input modalities. The model is trained end-to-end, and learns to embed all input modalities into a shared modality-invariant latent space. These latent representations are then combined into a single fused representation, which is transformed into the target output modality with a learnt decoder. We avoid the need for curriculum learning by exploiting the fact that the various input modalities are highly correlated. We also show that by incorporating information from segmentation masks the model can both decrease its error and generate data with synthetic lesions. We evaluate our model on the ISLES and BRATS data sets and demonstrate statistically significant improvements over state-of-the-art methods for single input tasks. This improvement increases further when multiple input modalities are used, demonstrating the benefits of learning a common latent space, again resulting in a statistically significant improvement over the current best method. Finally, we demonstrate our approach on non skull-stripped brain images, producing a statistically significant improvement over the previous best method. Code is made publicly available at https://github.com/agis85/multimodal_brain_synthesis .

Journal ArticleDOI
TL;DR: Based on the phase transition-sensitive predictions from the SV-RCNet, a simple yet effective inference scheme, namely the prior knowledge inference (PKI), by leveraging the natural characteristic of surgical video is proposed, which improves the consistency of results and largely boosts the recognition performance.
Abstract: We propose an analysis of surgical videos that is based on a novel recurrent convolutional network (SV-RCNet), specifically for automatic workflow recognition from surgical videos online, which is a key component for developing the context-aware computer-assisted intervention systems. Different from previous methods which harness visual and temporal information separately, the proposed SV-RCNet seamlessly integrates a convolutional neural network (CNN) and a recurrent neural network (RNN) to form a novel recurrent convolutional architecture in order to take full advantages of the complementary information of visual and temporal features learned from surgical videos. We effectively train the SV-RCNet in an end-to-end manner so that the visual representations and sequential dynamics can be jointly optimized in the learning process. In order to produce more discriminative spatio-temporal features, we exploit a deep residual network (ResNet) and a long short term memory (LSTM) network, to extract visual features and temporal dependencies, respectively, and integrate them into the SV-RCNet. Moreover, based on the phase transition-sensitive predictions from the SV-RCNet, we propose a simple yet effective inference scheme, namely the prior knowledge inference (PKI), by leveraging the natural characteristic of surgical video. Such a strategy further improves the consistency of results and largely boosts the recognition performance. Extensive experiments have been conducted with the MICCAI 2016 Modeling and Monitoring of Computer Assisted Interventions Workflow Challenge dataset and Cholec80 dataset to validate SV-RCNet. Our approach not only achieves superior performance on these two datasets but also outperforms the state-of-the-art methods by a significant margin.

Journal ArticleDOI
TL;DR: A novel CAD system based on a multi-scale convolutional mixture of expert (MCME) ensemble model to identify normal retina, and two common types of macular pathologies, namely, dry age-related macular degeneration, and diabetic macular edema is presented.
Abstract: Computer-aided diagnosis (CAD) of retinal pathologies is a current active area in medical image analysis. Due to the increasing use of retinal optical coherence tomography (OCT) imaging technique, a CAD system in retinal OCT is essential to assist ophthalmologist in the early detection of ocular diseases and treatment monitoring. This paper presents a novel CAD system based on a multi-scale convolutional mixture of expert (MCME) ensemble model to identify normal retina, and two common types of macular pathologies, namely, dry age-related macular degeneration, and diabetic macular edema. The proposed MCME modular model is a data-driven neural structure, which employs a new cost function for discriminative and fast learning of image features by applying convolutional neural networks on multiple-scale sub-images. MCME maximizes the likelihood function of the training data set and ground truth by considering a mixture model, which tries also to model the joint interaction between individual experts by using a correlated multivariate component for each expert module instead of only modeling the marginal distributions by independent Gaussian components. Two different macular OCT data sets from Heidelberg devices were considered for the evaluation of the method, i.e., a local data set of OCT images of 148 subjects and a public data set of 45 OCT acquisitions. For comparison purpose, we performed a wide range of classification measures to compare the results with the best configurations of the MCME method. With the MCME model of four scale-dependent experts, the precision rate of 98.86%, and the area under the receiver operating characteristic curve (AUC) of 0.9985 were obtained on average.

Journal ArticleDOI
TL;DR: A new type of cone-beam back-projection layer is proposed, efficiently calculating the forward pass of this layer, and it is shown that the learned algorithm can be interpreted using known concepts from cone beam reconstruction: the network is able to automatically learn strategies such as compensation weights and apodization windows.
Abstract: In this paper, we present a new deep learning framework for 3-D tomographic reconstruction. To this end, we map filtered back-projection-type algorithms to neural networks. However, the back-projection cannot be implemented as a fully connected layer due to its memory requirements. To overcome this problem, we propose a new type of cone-beam back-projection layer, efficiently calculating the forward pass. We derive this layer’s backward pass as a projection operation. Unlike most deep learning approaches for reconstruction, our new layer permits joint optimization of correction steps in volume and projection domain. Evaluation is performed numerically on a public data set in a limited angle setting showing a consistent improvement over analytical algorithms while keeping the same computational test-time complexity by design. In the region of interest, the peak signal-to-noise ratio has increased by 23%. In addition, we show that the learned algorithm can be interpreted using known concepts from cone beam reconstruction: the network is able to automatically learn strategies such as compensation weights and apodization windows.

Journal ArticleDOI
TL;DR: In this paper, a method to automatically detect mitotic tumor cells in breast cancer tissue sections based on convolutional neural networks (CNNs) was developed, which was trained in a single-center cohort and evaluated in an independent multicenter cohort from the cancer genome atlas.
Abstract: Manual counting of mitotic tumor cells in tissue sections constitutes one of the strongest prognostic markers for breast cancer. This procedure, however, is time-consuming and error-prone. We developed a method to automatically detect mitotic figures in breast cancer tissue sections based on convolutional neural networks (CNNs). Application of CNNs to hematoxylin and eosin (H&E) stained histological tissue sections is hampered by noisy and expensive reference standards established by pathologists, lack of generalization due to staining variation across laboratories, and high computational requirements needed to process gigapixel whole-slide images (WSIs). In this paper, we present a method to train and evaluate CNNs to specifically solve these issues in the context of mitosis detection in breast cancer WSIs. First, by combining image analysis of mitotic activity in phosphohistone-H3 restained slides and registration, we built a reference standard for mitosis detection in entire H&E WSIs requiring minimal manual annotation effort. Second, we designed a data augmentation strategy that creates diverse and realistic H&E stain variations by modifying H&E color channels directly. Using it during training combined with network ensembling resulted in a stain invariant mitosis detector. Third, we applied knowledge distillation to reduce the computational requirements of the mitosis detection ensemble with a negligible loss of performance. The system was trained in a single-center cohort and evaluated in an independent multicenter cohort from the cancer genome atlas on the three tasks of the tumor proliferation assessment challenge. We obtained a performance within the top three best methods for most of the tasks of the challenge.

Journal ArticleDOI
TL;DR: This paper proposes a novel CNN architecture, called Dense-Res-Inception Net (DRINet), which outperforms the U-Net in three different challenging applications, namely multi-class segmentation of cerebrospinal fluid on brain CT images, multi-organ segmentation on abdominalCT images, and multi- class brain tumor segmentsation on MR images.
Abstract: Convolutional neural networks (CNNs) have revolutionized medical image analysis over the past few years. The U-Net architecture is one of the most well-known CNN architectures for semantic segmentation and has achieved remarkable successes in many different medical image segmentation applications. The U-Net architecture consists of standard convolution layers, pooling layers, and upsampling layers. These convolution layers learn representative features of input images and construct segmentations based on the features. However, the features learned by standard convolution layers are not distinctive when the differences among different categories are subtle in terms of intensity, location, shape, and size. In this paper, we propose a novel CNN architecture, called Dense-Res-Inception Net (DRINet), which addresses this challenging problem. The proposed DRINet consists of three blocks, namely a convolutional block with dense connections, a deconvolutional block with residual inception modules, and an unpooling block. Our proposed architecture outperforms the U-Net in three different challenging applications, namely multi-class segmentation of cerebrospinal fluid on brain CT images, multi-organ segmentation on abdominal CT images, and multi-class brain tumor segmentation on MR images.

Journal ArticleDOI
TL;DR: A discriminative image analysis model equipped with a stain normalization component that transfers stains across datasets is designed, which achieves superior results in terms of accuracy and quality of normalized images compared to various baselines.
Abstract: It is generally recognized that color information is central to the automatic and visual analysis of histopathology tissue slides. In practice, pathologists rely on color, which reflects the presence of specific tissue components, to establish a diagnosis. Similarly, automatic histopathology image analysis algorithms rely on color or intensity measures to extract tissue features. With the increasing access to digitized histopathology images, color variation and its implications have become a critical issue. These variations are the result of not only a variety of factors involved in the preparation of tissue slides but also in the digitization process itself. Consequently, different strategies have been proposed to alleviate stain-related tissue inconsistencies in automatic image analysis systems. Such techniques generally rely on collecting color statistics to perform color matching across images. In this work, we propose a different approach for stain normalization that we refer to as stain transfer. We design a discriminative image analysis model equipped with a stain normalization component that transfers stains across datasets. Our model comprises a generative network that learns data set-specific staining properties and image-specific color transformations as well as a task-specific network (e.g., classifier or segmentation network). The model is trained end-to-end using a multi-objective cost function. We evaluate the proposed approach in the context of automatic histopathology image analysis on three data sets and two different analysis tasks: tissue segmentation and classification. The proposed method achieves superior results in terms of accuracy and quality of normalized images compared to various baselines.

Journal ArticleDOI
TL;DR: In this article, D-bar methods are used to provide robust direct reconstructions by using a low-pass filtering of the associated nonlinear Fourier data for electrical impedance tomography (EIT) images.
Abstract: The mathematical problem for electrical impedance tomography (EIT) is a highly nonlinear ill-posed inverse problem requiring carefully designed reconstruction procedures to ensure reliable image generation. D-bar methods are based on a rigorous mathematical analysis and provide robust direct reconstructions by using a low-pass filtering of the associated nonlinear Fourier data. Similarly to low-pass filtering of linear Fourier data, only using low frequencies in the image recovery process results in blurred images lacking sharp features, such as clear organ boundaries. Convolutional neural networks provide a powerful framework for post-processing such convolved direct reconstructions. In this paper, we demonstrate that these CNN techniques lead to sharp and reliable reconstructions even for the highly nonlinear inverse problem of EIT. The network is trained on data sets of simulated examples and then applied to experimental data without the need to perform an additional transfer training. Results for absolute EIT images are presented using experimental EIT data from the ACT4 and KIT4 EIT systems.