Low-Dose CT With a Residual Encoder-Decoder Convolutional Neural Network

doi:10.1109/TMI.2017.2715284

Low-Dose CT with a Residual Encoder-Decoder Convolutional

Neural Network (RED-CNN)

Hu Chen,

College of Computer Science, Sichuan University, Chengdu 610065, China

Yi Zhang [Member, IEEE],

College of Computer Science, Sichuan University, Chengdu 610065, China

Mannudeep K. Kalra,

Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, USA

Feng Lin,

College of Computer Science, Sichuan University, Chengdu 610065, China

Yang Chen,

Laboratory of Image Science and Technology, Southeast University, Nanjing 210096, China, and

also with the Key Laboratory of Computer Network and Information Integration (Southeast

University), Ministry of Education, Nanjing 210096, China

Peixo Liao,

Department of Scientific Research and Education, The Sixth People’s Hospital of Chengdu,

Chengdu 610065, China

Jiliu Zhou [Senior Member, IEEE], and

College of Computer Science, Sichuan University, Chengdu 610065, China

Ge Wang [Fellow, IEEE]

Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180 USA

Abstract

Given the potential risk of X-ray radiation to the patient, low-dose CT has attracted a considerable

interest in the medical imaging field. Currently, the main stream low-dose CT methods include

vendor-specific sinogram domain filtration and iterative reconstruction algorithms, but they need

to access raw data whose formats are not transparent to most users. Due to the difficulty of

modeling the statistical characteristics in the image domain, the existing methods for directly

processing reconstructed images cannot eliminate image noise very well while keeping structural

details. Inspired by the idea of deep learning, here we combine the autoencoder, deconvolution

network, and shortcut connections into the residual encoder-decoder convolutional neural network

(RED-CNN) for low-dose CT imaging. After patch-based training, the proposed RED-CNN

achieves a competitive performance relative to the-state-of-art methods in both simulated and

Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the

IEEE by sending a request to pubs-permissions@ieee.org.

Correspondence to: Yi Zhang.

HHS Public Access

Author manuscript

IEEE Trans Med Imaging

. Author manuscript; available in PMC 2018 December 01.

Published in final edited form as:

IEEE Trans Med Imaging

. 2017 December ; 36(12): 2524–2535. doi:10.1109/TMI.2017.2715284.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

clinical cases. Especially, our method has been favorably evaluated in terms of noise suppression,

structural preservation, and lesion detection.

Index Terms

Low-dose CT; deep learning; auto-encoder; convolutional; deconvolutional; residual neural

network

I. Introduction

X-ray computed tomography (CT) has been widely utilized in clinical, industrial and other

applications. Due to the increasing use of medical CT, concerns have been expressed on the

overall radiation dose to a patient. The research interest has been strong in CT dose

reduction under the well-known guiding principle of ALARA (as low as reasonably

achievable) [1]. The most common way to lower the radiation dose is to reduce the X-ray

flux by decreasing the operating current and shortening the exposure time of an X-ray tube.

In general, the weaker the X-ray flux, the noisier a reconstructed CT image, which degrades

the signal-to-noise ratio and could compromise the diagnostic performance. To address this

inherent physical problem, many algorithms were designed to improve the image quality for

low-dose CT (LDCT). These algorithms can be generally categorized into three categories:

(a) sinogram domain filtration, (2) iterative reconstruction, and (3) image processing.

Sinogram filtering techniques perform on either raw data or log-transformed data before

image reconstruction, such as filtered backprojection (FBP). The main convenience in the

data domain is that the noise characteristic has been well known. Typical methods include

structural adaptive filtering [2], bilateral filtering [3], and penalized weighted least-squares

(PWLS) algorithms [4]. However, the sinogram filtering methods often suffer from spatial

resolution loss when edges in the sinogram domain are not well preserved.

Over the past decade, iterative reconstruction (IR) algorithms have attracted much attention

especially in the field of LDCT. This approach combines the statistical properties of data in

the sinogram domain, prior information in the image domain, and even parameters of the

imaging system into one unified objective function. With compressive sensing (CS) [5],

several image priors were formulated as sparse transforms to deal with the low-dose, few-

view, limited-angle and interior CT issues, such as total variation (TV) and its variants [6]–

[9], nonlocal means (NLM) [10–12], dictionary learning [13], low-rank [14], and other

techniques. Model based iterative reconstruction (MBIR) takes into account the physical

acquisition processes and has been implemented on some current CT scanners [15].

Although IR methods obtained exciting results, there are two weaknesses. First, on most of

modern MDCT scanners, IR techniques have replaced FBP based image reconstruction

techniques for radiation dose reduction. However, these IR techniques are vendor-specific

since the details of the scanner geometry and correction steps are not available to users and

other vendors. Second, there are substantial computational overhead costs associated with

popular IR techniques. Fully model-based iterative reconstruction techniques have greater

potential for radiation dose reduction but slow reconstruction speed and changes in image

appearance limit their clinical applications.

Chen et al. Page 2

IEEE Trans Med Imaging

. Author manuscript; available in PMC 2018 December 01.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

An alternative for LDCT is post-processing of reconstructed images, which does not rely on

raw data. These techniques can be directly applied on LDCT images, and integrated into any

CT system. In [16], NLM was introduced to take advantage of the feature similarity within a

large neighborhood in a reconstructed image. Inspired by the theory of sparse representation,

dictionary learning [17] was adapted for LDCT denoising, and resulted in substantially

improved quality abdomen images [18]. Meanwhile, block-matching 3D (BM3D) was

proved efficient for various X-ray imaging tasks [19–21]. In contrast to the other two kinds

of methods, the noise distribution in the image domain cannot be accurately determined,

which prevents users from achieving the optimal tradeoff between structure preservation and

noise supersession.

Recently, deep learning (DL) has generated an overwhelming enthusiasm in several imaging

applications, ranging from low-level to high-level tasks from image denoising, deblurring

and super resolution to segmentation, detection and recognition [22]. It simulates the

information processing procedure by human, and can efficiently learn high-level features

from pixel data through a hierarchical network framework [23].

Several DL algorithms have been proposed for image restoration using different network

models [24–31]. As the autoencoder (AE) has a great potential for image denoising, stacked

sparse denoising autoencoder (SSDA) and its variant were introduced [24–26].

Convolutional neural networks are powerful tools for feature extraction and were applied for

image denoising, deblurring and super resolution [27–29]. Burger et al. [30] analyzed the

performance of multilayer perception (MLP) as applied to image patches and obtained

competitive results as compared to the state-of-the-art methods. Previous studies also

applied DL for medical image analysis, such as tissue segmentation [32, 33], organ

classification [34] and nuclei detection [35]. Furthermore, reports started emerging on

tomographic imaging topics. For example, Wang et al. incorporated a DL-based

regularization term into a fast MRI reconstruction framework [36]. Chen et al. presented

preliminary results with a light-weight CNN-based framework for LDCT imaging [37]. A

deeper version using the wavelet transform as inputs was presented [38] which won the

second place in the “

2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge

.” The

filtered back-projection (FBP) workflow was mapped to a deep CNN architecture, reducing

the reconstruction error by a factor of two in the case of limited-angle tomography [39]. An

overall perspective was also published on deep learning, or machine learning in general, for

tomographic reconstruction [40].

Despite the interesting results on CNN for LDCT, the potential of the deep CNN has not

been fully realized. Although some studies involved construction of deeper networks [41,

42], most image denoising models had limited layers (usually 2~3 layers) since image

denoising is considered as a “low-level” task without intention to extract features. This is in

clear contrast to high-level tasks such as recognition or detection, in which pooling and other

operations are widely used to bypass image details and capture topological structures.

Inspired by the work of [31], we incorporated a deconvolution network [43] and shortcut

connections [41, 42] into a CNN model, which is referred to as a residual encoder-decoder

convolutional neural network (RED-CNN). In the second section, the proposed network

Chen et al. Page 3

IEEE Trans Med Imaging

. Author manuscript; available in PMC 2018 December 01.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

architecture is described. In the third section, the proposed model is evaluated and validated.

In the final section, the conclusion is drawn.

II. Methods

A. Noise Reduction Model

Our workflow starts with a straightforward FBP reconstruction from a low-dose scan, and

the image denoising problem is restricted within the image domain [37]. Since the DL-based

methods are independent of the statistical distribution of image noise, the LDCT problem

can be simplified to the following one. Assuming that X ∈ R

m

×

n

is a LDCT image and Y ∈

R

m

×

n

is a corresponding normal dose CT (NDCT) image, the relationship between them can

be formulated as

(1)

where

σ

: R

m

×

n

→ R

m

×

n

denotes the complex degradation process involving quantum noise

and other factors. Then, the problem can be transformed to seek a function

f

:

(2)

where

f

is regarded as the optimal approximation of

σ

−1

, and can be estimated using DL

techniques.

B. Residual Autoencoder Network

The autoencoder (AE) was originally developed for unsupervised feature learning from

noisy inputs, which is also suitable for image restoration. In the context of image denoising,

CNN also demonstrated an excellent performance. However, due to its multiple down-

sampling operations, some image details can be missed by CNN. For LDCT, here we

propose a residual network combining AE and CNN, which has an origin in the work [31].

Rather than adopting fully-connected layers for encoding and decoding, we use both

convolutional and deconvolutional layers in symmetry. Furthermore, different from the

typical encoder-decoder structure, residual learning [41] with shortcuts is included to

facilitate the operations of the convolutional and corresponding deconvolutional layers.

There are two modifications to the network described in [31]: (a) the ReLU layers before

summation with residuals have been removed to abandon the positivity constraint on learned

residuals; and (b) shortcuts have been added to improve the learning process.

The overall architecture of the proposed RED-CNN network is shown in Fig. 1. This

network consists of 10 layers, including 5 convolutional and 5 deconvolutional layers

symmetrically arranged. Shortcuts connect matching convolutional and deconvolutional

layers. Each layer is followed by its rectified linear units (ReLU) [44]. The details about the

network are described as follows.

Chen et al. Page 4

IEEE Trans Med Imaging

. Author manuscript; available in PMC 2018 December 01.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

1) Patch Extraction—DL-based methods need a huge number of samples. This

requirement cannot be easily met in practice, especially for clinical imaging. In this study,

we propose to use overlapped patches in CT images. This strategy has been found to be

effective and efficient, because the perceptual differences of local regions can be detected,

and the number of samples are significantly boosted [24, 27, 28]. In our experiments, we

extracted patches from LDCT and corresponding NDCT images with a fixed size.

2) Stacked Encoders (Noise and Artifact Reduction)—Unlike the traditional

stacked AE networks, we use a chain of fully-connected convolutional layers as the stacked

encoders. Image noise and artifacts are suppressed from low-level to high-level step by step

in order to preserve essential information in the extracted patches. Moreover, since the

pooling layer (down-sampling) after a convolutional layer may discard important structural

details, it is abandoned in our encoder. As a result, there are only two types of layers in our

encoder: convolutional layers and ReLU units, and the stacked encoders

can be formulated as

(3)

where

N

is the number of convolutional layers, W

i

and b

i

denote the weights and biases

respectively, * represents the convolution operator, x

0

is the extracted patch from the input

images, and x

i

(i >

0) is the extracted features from the previous layers.

ReLU (x)

= max (0

,

x)

is the activation function. After the stacked encoders, the image patches are transformed

into a feature space, and the output is a feature vector x

N

whose size is

l

N

.

3) Stacked Decoders (Structural Detail Recovery)—Although the pooling operation

is removed, a serial of convolutions, which essentially act as noise filters, will still diminish

the details of input signals. Inspired by the recent results on semantic segmentation [45, 46,

47] and biomedical image segmentation [48, 49], deconvolutional layers are integrated into

our model for recovery of structural details, which can be seen as image reconstruction from

extracted features. We use a chain of fully-connected deconvolutional layers to form the

stacked decoders for image reconstruction. Since the encoders and decoders should appear

in pair, the convolutional and deconvolutional layers are symmetric in the proposed network.

To ensure the input and output of the network match exactly, the convolutional and

deconvolutional layers must have the same kernel size. Note that the data flow through the

convolutional and deconvolutional layers in our framework follows the rule of “FILO” (First

In Last Out). As demonstrated in Fig. 1, the first convolution layer corresponds to the last

deconvolutional layer, the last convolution layer corresponds to the first deconvolutional

layer, and so on. In other words, this architecture is featured by the symmetry of paired

convolution and deconvolution layers.

There are two types of layers in our decoder network: deconvolution and ReLU. Thus, the

stacked decoders

can be formulated as:

Chen et al. Page 5

IEEE Trans Med Imaging

. Author manuscript; available in PMC 2018 December 01.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Low-Dose CT With a Residual Encoder-Decoder Convolutional Neural Network

Citations

Applications of machine learning in drug discovery and development.

Low-Dose CT Image Denoising Using a Generative Adversarial Network With Wasserstein Distance and Perceptual Loss

MoDL: Model-Based Deep Learning Architecture for Inverse Problems

Low Dose CT Image Denoising Using a Generative Adversarial Network with Wasserstein Distance and Perceptual Loss

Deep learning in medical imaging and radiation therapy.

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

U-Net: Convolutional Networks for Biomedical Image Segmentation

Deep learning

Fully convolutional networks for semantic segmentation

Related Papers (5)

U-Net: Convolutional Networks for Biomedical Image Segmentation

Deep Residual Learning for Image Recognition

Deep Convolutional Neural Network for Inverse Problems in Imaging

Image quality assessment: from error visibility to structural similarity

Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization