scispace - formally typeset
Open AccessJournal ArticleDOI

Low-Dose CT With a Residual Encoder-Decoder Convolutional Neural Network

TLDR
This work combines the autoencoder, deconvolution network, and shortcut connections into the residual encoder–decoder convolutional neural network (RED-CNN) for low-dose CT imaging and achieves a competitive performance relative to the-state-of-art methods in both simulated and clinical cases.
Abstract
Given the potential risk of X-ray radiation to the patient, low-dose CT has attracted a considerable interest in the medical imaging field. Currently, the main stream low-dose CT methods include vendor-specific sinogram domain filtration and iterative reconstruction algorithms, but they need to access raw data, whose formats are not transparent to most users. Due to the difficulty of modeling the statistical characteristics in the image domain, the existing methods for directly processing reconstructed images cannot eliminate image noise very well while keeping structural details. Inspired by the idea of deep learning, here we combine the autoencoder, deconvolution network, and shortcut connections into the residual encoder–decoder convolutional neural network (RED-CNN) for low-dose CT imaging. After patch-based training, the proposed RED-CNN achieves a competitive performance relative to the-state-of-art methods in both simulated and clinical cases. Especially, our method has been favorably evaluated in terms of noise suppression, structural preservation, and lesion detection.

read more

Content maybe subject to copyright    Report

Low-Dose CT with a Residual Encoder-Decoder Convolutional
Neural Network (RED-CNN)
Hu Chen,
College of Computer Science, Sichuan University, Chengdu 610065, China
Yi Zhang [Member, IEEE],
College of Computer Science, Sichuan University, Chengdu 610065, China
Mannudeep K. Kalra,
Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, USA
Feng Lin,
College of Computer Science, Sichuan University, Chengdu 610065, China
Yang Chen,
Laboratory of Image Science and Technology, Southeast University, Nanjing 210096, China, and
also with the Key Laboratory of Computer Network and Information Integration (Southeast
University), Ministry of Education, Nanjing 210096, China
Peixo Liao,
Department of Scientific Research and Education, The Sixth People’s Hospital of Chengdu,
Chengdu 610065, China
Jiliu Zhou [Senior Member, IEEE], and
College of Computer Science, Sichuan University, Chengdu 610065, China
Ge Wang [Fellow, IEEE]
Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
Abstract
Given the potential risk of X-ray radiation to the patient, low-dose CT has attracted a considerable
interest in the medical imaging field. Currently, the main stream low-dose CT methods include
vendor-specific sinogram domain filtration and iterative reconstruction algorithms, but they need
to access raw data whose formats are not transparent to most users. Due to the difficulty of
modeling the statistical characteristics in the image domain, the existing methods for directly
processing reconstructed images cannot eliminate image noise very well while keeping structural
details. Inspired by the idea of deep learning, here we combine the autoencoder, deconvolution
network, and shortcut connections into the residual encoder-decoder convolutional neural network
(RED-CNN) for low-dose CT imaging. After patch-based training, the proposed RED-CNN
achieves a competitive performance relative to the-state-of-art methods in both simulated and
Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the
IEEE by sending a request to pubs-permissions@ieee.org.
Correspondence to: Yi Zhang.
HHS Public Access
Author manuscript
IEEE Trans Med Imaging
. Author manuscript; available in PMC 2018 December 01.
Published in final edited form as:
IEEE Trans Med Imaging
. 2017 December ; 36(12): 2524–2535. doi:10.1109/TMI.2017.2715284.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

clinical cases. Especially, our method has been favorably evaluated in terms of noise suppression,
structural preservation, and lesion detection.
Index Terms
Low-dose CT; deep learning; auto-encoder; convolutional; deconvolutional; residual neural
network
I. Introduction
X-ray computed tomography (CT) has been widely utilized in clinical, industrial and other
applications. Due to the increasing use of medical CT, concerns have been expressed on the
overall radiation dose to a patient. The research interest has been strong in CT dose
reduction under the well-known guiding principle of ALARA (as low as reasonably
achievable) [1]. The most common way to lower the radiation dose is to reduce the X-ray
flux by decreasing the operating current and shortening the exposure time of an X-ray tube.
In general, the weaker the X-ray flux, the noisier a reconstructed CT image, which degrades
the signal-to-noise ratio and could compromise the diagnostic performance. To address this
inherent physical problem, many algorithms were designed to improve the image quality for
low-dose CT (LDCT). These algorithms can be generally categorized into three categories:
(a) sinogram domain filtration, (2) iterative reconstruction, and (3) image processing.
Sinogram filtering techniques perform on either raw data or log-transformed data before
image reconstruction, such as filtered backprojection (FBP). The main convenience in the
data domain is that the noise characteristic has been well known. Typical methods include
structural adaptive filtering [2], bilateral filtering [3], and penalized weighted least-squares
(PWLS) algorithms [4]. However, the sinogram filtering methods often suffer from spatial
resolution loss when edges in the sinogram domain are not well preserved.
Over the past decade, iterative reconstruction (IR) algorithms have attracted much attention
especially in the field of LDCT. This approach combines the statistical properties of data in
the sinogram domain, prior information in the image domain, and even parameters of the
imaging system into one unified objective function. With compressive sensing (CS) [5],
several image priors were formulated as sparse transforms to deal with the low-dose, few-
view, limited-angle and interior CT issues, such as total variation (TV) and its variants [6]–
[9], nonlocal means (NLM) [10–12], dictionary learning [13], low-rank [14], and other
techniques. Model based iterative reconstruction (MBIR) takes into account the physical
acquisition processes and has been implemented on some current CT scanners [15].
Although IR methods obtained exciting results, there are two weaknesses. First, on most of
modern MDCT scanners, IR techniques have replaced FBP based image reconstruction
techniques for radiation dose reduction. However, these IR techniques are vendor-specific
since the details of the scanner geometry and correction steps are not available to users and
other vendors. Second, there are substantial computational overhead costs associated with
popular IR techniques. Fully model-based iterative reconstruction techniques have greater
potential for radiation dose reduction but slow reconstruction speed and changes in image
appearance limit their clinical applications.
Chen et al. Page 2
IEEE Trans Med Imaging
. Author manuscript; available in PMC 2018 December 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

An alternative for LDCT is post-processing of reconstructed images, which does not rely on
raw data. These techniques can be directly applied on LDCT images, and integrated into any
CT system. In [16], NLM was introduced to take advantage of the feature similarity within a
large neighborhood in a reconstructed image. Inspired by the theory of sparse representation,
dictionary learning [17] was adapted for LDCT denoising, and resulted in substantially
improved quality abdomen images [18]. Meanwhile, block-matching 3D (BM3D) was
proved efficient for various X-ray imaging tasks [19–21]. In contrast to the other two kinds
of methods, the noise distribution in the image domain cannot be accurately determined,
which prevents users from achieving the optimal tradeoff between structure preservation and
noise supersession.
Recently, deep learning (DL) has generated an overwhelming enthusiasm in several imaging
applications, ranging from low-level to high-level tasks from image denoising, deblurring
and super resolution to segmentation, detection and recognition [22]. It simulates the
information processing procedure by human, and can efficiently learn high-level features
from pixel data through a hierarchical network framework [23].
Several DL algorithms have been proposed for image restoration using different network
models [24–31]. As the autoencoder (AE) has a great potential for image denoising, stacked
sparse denoising autoencoder (SSDA) and its variant were introduced [24–26].
Convolutional neural networks are powerful tools for feature extraction and were applied for
image denoising, deblurring and super resolution [27–29]. Burger et al. [30] analyzed the
performance of multilayer perception (MLP) as applied to image patches and obtained
competitive results as compared to the state-of-the-art methods. Previous studies also
applied DL for medical image analysis, such as tissue segmentation [32, 33], organ
classification [34] and nuclei detection [35]. Furthermore, reports started emerging on
tomographic imaging topics. For example, Wang et al. incorporated a DL-based
regularization term into a fast MRI reconstruction framework [36]. Chen et al. presented
preliminary results with a light-weight CNN-based framework for LDCT imaging [37]. A
deeper version using the wavelet transform as inputs was presented [38] which won the
second place in the “
2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge
.” The
filtered back-projection (FBP) workflow was mapped to a deep CNN architecture, reducing
the reconstruction error by a factor of two in the case of limited-angle tomography [39]. An
overall perspective was also published on deep learning, or machine learning in general, for
tomographic reconstruction [40].
Despite the interesting results on CNN for LDCT, the potential of the deep CNN has not
been fully realized. Although some studies involved construction of deeper networks [41,
42], most image denoising models had limited layers (usually 2~3 layers) since image
denoising is considered as a “low-level” task without intention to extract features. This is in
clear contrast to high-level tasks such as recognition or detection, in which pooling and other
operations are widely used to bypass image details and capture topological structures.
Inspired by the work of [31], we incorporated a deconvolution network [43] and shortcut
connections [41, 42] into a CNN model, which is referred to as a residual encoder-decoder
convolutional neural network (RED-CNN). In the second section, the proposed network
Chen et al. Page 3
IEEE Trans Med Imaging
. Author manuscript; available in PMC 2018 December 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

architecture is described. In the third section, the proposed model is evaluated and validated.
In the final section, the conclusion is drawn.
II. Methods
A. Noise Reduction Model
Our workflow starts with a straightforward FBP reconstruction from a low-dose scan, and
the image denoising problem is restricted within the image domain [37]. Since the DL-based
methods are independent of the statistical distribution of image noise, the LDCT problem
can be simplified to the following one. Assuming that X R
m
×
n
is a LDCT image and Y
R
m
×
n
is a corresponding normal dose CT (NDCT) image, the relationship between them can
be formulated as
(1)
where
σ
: R
m
×
n
R
m
×
n
denotes the complex degradation process involving quantum noise
and other factors. Then, the problem can be transformed to seek a function
f
:
(2)
where
f
is regarded as the optimal approximation of
σ
−1
, and can be estimated using DL
techniques.
B. Residual Autoencoder Network
The autoencoder (AE) was originally developed for unsupervised feature learning from
noisy inputs, which is also suitable for image restoration. In the context of image denoising,
CNN also demonstrated an excellent performance. However, due to its multiple down-
sampling operations, some image details can be missed by CNN. For LDCT, here we
propose a residual network combining AE and CNN, which has an origin in the work [31].
Rather than adopting fully-connected layers for encoding and decoding, we use both
convolutional and deconvolutional layers in symmetry. Furthermore, different from the
typical encoder-decoder structure, residual learning [41] with shortcuts is included to
facilitate the operations of the convolutional and corresponding deconvolutional layers.
There are two modifications to the network described in [31]: (a) the ReLU layers before
summation with residuals have been removed to abandon the positivity constraint on learned
residuals; and (b) shortcuts have been added to improve the learning process.
The overall architecture of the proposed RED-CNN network is shown in Fig. 1. This
network consists of 10 layers, including 5 convolutional and 5 deconvolutional layers
symmetrically arranged. Shortcuts connect matching convolutional and deconvolutional
layers. Each layer is followed by its rectified linear units (ReLU) [44]. The details about the
network are described as follows.
Chen et al. Page 4
IEEE Trans Med Imaging
. Author manuscript; available in PMC 2018 December 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

1) Patch Extraction—DL-based methods need a huge number of samples. This
requirement cannot be easily met in practice, especially for clinical imaging. In this study,
we propose to use overlapped patches in CT images. This strategy has been found to be
effective and efficient, because the perceptual differences of local regions can be detected,
and the number of samples are significantly boosted [24, 27, 28]. In our experiments, we
extracted patches from LDCT and corresponding NDCT images with a fixed size.
2) Stacked Encoders (Noise and Artifact Reduction)—Unlike the traditional
stacked AE networks, we use a chain of fully-connected convolutional layers as the stacked
encoders. Image noise and artifacts are suppressed from low-level to high-level step by step
in order to preserve essential information in the extracted patches. Moreover, since the
pooling layer (down-sampling) after a convolutional layer may discard important structural
details, it is abandoned in our encoder. As a result, there are only two types of layers in our
encoder: convolutional layers and ReLU units, and the stacked encoders
can be formulated as
(3)
where
N
is the number of convolutional layers, W
i
and b
i
denote the weights and biases
respectively, * represents the convolution operator, x
0
is the extracted patch from the input
images, and x
i
(i >
0) is the extracted features from the previous layers.
ReLU (x)
= max (0
,
x)
is the activation function. After the stacked encoders, the image patches are transformed
into a feature space, and the output is a feature vector x
N
whose size is
l
N
.
3) Stacked Decoders (Structural Detail Recovery)—Although the pooling operation
is removed, a serial of convolutions, which essentially act as noise filters, will still diminish
the details of input signals. Inspired by the recent results on semantic segmentation [45, 46,
47] and biomedical image segmentation [48, 49], deconvolutional layers are integrated into
our model for recovery of structural details, which can be seen as image reconstruction from
extracted features. We use a chain of fully-connected deconvolutional layers to form the
stacked decoders for image reconstruction. Since the encoders and decoders should appear
in pair, the convolutional and deconvolutional layers are symmetric in the proposed network.
To ensure the input and output of the network match exactly, the convolutional and
deconvolutional layers must have the same kernel size. Note that the data flow through the
convolutional and deconvolutional layers in our framework follows the rule of “FILO” (First
In Last Out). As demonstrated in Fig. 1, the first convolution layer corresponds to the last
deconvolutional layer, the last convolution layer corresponds to the first deconvolutional
layer, and so on. In other words, this architecture is featured by the symmetry of paired
convolution and deconvolution layers.
There are two types of layers in our decoder network: deconvolution and ReLU. Thus, the
stacked decoders
can be formulated as:
Chen et al. Page 5
IEEE Trans Med Imaging
. Author manuscript; available in PMC 2018 December 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Citations
More filters
Journal ArticleDOI

Low-Dose CT Image Denoising Using a Generative Adversarial Network With Wasserstein Distance and Perceptual Loss

TL;DR: Wang et al. as mentioned in this paper introduced a new CT image denoising method based on the generative adversarial network (GAN) with Wasserstein distance and perceptual similarity, which is capable of not only reducing the image noise level but also trying to keep the critical information at the same time.
Journal ArticleDOI

MoDL: Model-Based Deep Learning Architecture for Inverse Problems

TL;DR: In this article, a convolution neural network (CNN)-based regularization prior is proposed for inverse problems with the arbitrary structure, where the forward model is explicitly accounted for and a smaller network with fewer parameters is sufficient to capture the image information compared to direct inversion.
Journal ArticleDOI

Low Dose CT Image Denoising Using a Generative Adversarial Network with Wasserstein Distance and Perceptual Loss

TL;DR: This paper introduces a new CT image denoising method based on the generative adversarial network (GAN) with Wasserstein distance and perceptual similarity that is capable of not only reducing the image noise level but also trying to keep the critical information at the same time.
Journal ArticleDOI

Deep learning in medical imaging and radiation therapy.

TL;DR: The general principles of DL and convolutional neural networks are introduced, five major areas of application of DL in medical imaging and radiation therapy are surveyed, common themes are identified, methods for dataset expansion are discussed, and lessons learned, remaining challenges, and future directions are summarized.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Related Papers (5)