scispace - formally typeset
Search or ask a question

Showing papers on "Iterative reconstruction published in 2020"


Journal ArticleDOI
TL;DR: Two versions of a novel deep learning architecture are proposed, dubbed as ADMM-CSNet, by combining the traditional model-based CS method and data-driven deep learning method for image reconstruction from sparsely sampled measurements, which achieved favorable reconstruction accuracy in fast computational speed compared with the traditional and the other deep learning methods.
Abstract: Compressive sensing (CS) is an effective technique for reconstructing image from a small amount of sampled data. It has been widely applied in medical imaging, remote sensing, image compression, etc. In this paper, we propose two versions of a novel deep learning architecture, dubbed as ADMM-CSNet, by combining the traditional model-based CS method and data-driven deep learning method for image reconstruction from sparsely sampled measurements. We first consider a generalized CS model for image reconstruction with undetermined regularizations in undetermined transform domains, and then two efficient solvers using Alternating Direction Method of Multipliers (ADMM) algorithm for optimizing the model are proposed. We further unroll and generalize the ADMM algorithm to be two deep architectures, in which all parameters of the CS model and the ADMM algorithm are discriminatively learned by end-to-end training. For both applications of fast CS complex-valued MR imaging and CS imaging of real-valued natural images, the proposed ADMM-CSNet achieved favorable reconstruction accuracy in fast computational speed compared with the traditional and the other deep learning methods.

470 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: A framework that combines embedding with activation tensor manipulation to perform high quality local edits along with global semantic edits on images and can restore high frequency features in images and thus significantly improves the quality of reconstructed images.
Abstract: We propose Image2StyleGAN++, a flexible image editing framework with many applications. Our framework extends the recent Image2StyleGAN in three ways. First, we introduce noise optimization as a complement to the W+ latent space embedding. Our noise optimization can restore high frequency features in images and thus significantly improves the quality of reconstructed images, e.g. a big increase of PSNR from 20 dB to 45 dB. Second, we extend the global W+ latent space embedding to enable local embeddings. Third, we combine embedding with activation tensor manipulation to perform high quality local edits along with global semantic edits on images. Such edits motivate various high quality image editing applications, e.g. image reconstruction, image inpainting, image crossover, local style transfer, image editing using scribbles, and attribute level feature transfer. Examples of the edited images are shown across the paper for visual inspection.

440 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: In this paper, the authors proposed Implicit Feature Networks (IF-Nets), which deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data retaining the nice properties of recent learned implicit functions.
Abstract: While many works focus on 3D reconstruction from images, in this paper, we focus on 3D shape reconstruction and completion from a variety of 3D inputs, which are deficient in some respect: low and high resolution voxels, sparse and dense point clouds, complete or incomplete. Processing of such 3D inputs is an increasingly important problem as they are the output of 3D scanners, which are becoming more accessible, and are the intermediate output of 3D computer vision algorithms. Recently, learned implicit functions have shown great promise as they produce continuous reconstructions. However, we identified two limitations in reconstruction from 3D inputs: 1) details present in the input data are not retained, and 2) poor reconstruction of articulated humans. To solve this, we propose Implicit Feature Networks (IF-Nets), which deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data retaining the nice properties of recent learned implicit functions, but critically they can also retain detail when it is present in the input data, and can reconstruct articulated humans. Our work differs from prior work in two crucial aspects. First, instead of using a single vector to encode a 3D shape, we extract a learnable 3-dimensional multi-scale tensor of deep features, which is aligned with the original Euclidean space embedding the shape. Second, instead of classifying x-y-z point coordinates directly, we classify deep features extracted from the tensor at a continuous query point. We show that this forces our model to make decisions based on global and local shape structure, as opposed to point coordinates, which are arbitrary under Euclidean transformations. Experiments demonstrate that IF-Nets outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions. Code and project website is available at https://virtualhumans.mpi-inf.mpg.de/ifnets/.

390 citations


Journal ArticleDOI
TL;DR: The stability test with algorithms and easy-to-use software detects the instability phenomena and is aimed at researchers, to test their networks for instabilities, and for government agencies, such as the Food and Drug Administration (FDA), to secure safe use of deep learning methods.
Abstract: Deep learning, due to its unprecedented success in tasks such as image classification, has emerged as a new tool in image reconstruction with potential to change the field. In this paper, we demonstrate a crucial phenomenon: Deep learning typically yields unstable methods for image reconstruction. The instabilities usually occur in several forms: 1) Certain tiny, almost undetectable perturbations, both in the image and sampling domain, may result in severe artefacts in the reconstruction; 2) a small structural change, for example, a tumor, may not be captured in the reconstructed image; and 3) (a counterintuitive type of instability) more samples may yield poorer performance. Our stability test with algorithms and easy-to-use software detects the instability phenomena. The test is aimed at researchers, to test their networks for instabilities, and for government agencies, such as the Food and Drug Administration (FDA), to secure safe use of deep learning methods.

355 citations


Journal ArticleDOI
TL;DR: A modified convolutional neural network architecture termed fully dense UNet (FD-UNet) is proposed for removing artifacts from two-dimensional PAT images reconstructed from sparse data and the proposed CNN is compared with the standard UNet in terms of reconstructed image quality.
Abstract: Photoacoustic imaging is an emerging imaging modality that is based upon the photoacoustic effect. In photoacoustic tomography (PAT), the induced acoustic pressure waves are measured by an array of detectors and used to reconstruct an image of the initial pressure distribution. A common challenge faced in PAT is that the measured acoustic waves can only be sparsely sampled. Reconstructing sparsely sampled data using standard methods results in severe artifacts that obscure information within the image. We propose a modified convolutional neural network (CNN) architecture termed fully dense UNet (FD-UNet) for removing artifacts from two-dimensional PAT images reconstructed from sparse data and compare the proposed CNN with the standard UNet in terms of reconstructed image quality.

293 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: This paper proposes ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image and shows numerous qualitative examples of animated, high-quality reconstructed avatars unseen in the literature so far.
Abstract: In this paper, we propose ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image. Existing approaches to digitize 3D humans struggle to handle pose variations and recover details. Also, they do not produce models that are animation ready. In contrast, ARCH is a learned pose-aware model that produces detailed 3D rigged full-body human avatars from a single unconstrained RGB image. A Semantic Space and a Semantic Deformation Field are created using a parametric 3D body estimator. They allow the transformation of 2D/3D clothed humans into a canonical space, reducing ambiguities in geometry caused by pose variations and occlusions in training data. Detailed surface geometry and appearance are learned using an implicit function representation with spatial local features. Furthermore, we propose additional per-pixel supervision on the 3D reconstruction using opacity-aware differentiable rendering. Our experiments indicate that ARCH increases the fidelity of the reconstructed humans. We obtain more than 50% lower reconstruction errors for standard metrics compared to state-of-the-art methods on public datasets. We also show numerous qualitative examples of animated, high-quality reconstructed avatars unseen in the literature so far.

253 citations


Journal ArticleDOI
TL;DR: Several signal processing issues for maximizing the potential of deep reconstruction in fast MRI are discussed, which may facilitate further development of the networks and performance analysis from a theoretical point of view.
Abstract: Image reconstruction from undersampled k-space data has been playing an important role in fast magnetic resonance imaging (MRI). Recently, deep learning has demonstrated tremendous success in various fields and also shown potential in significantly accelerating MRI reconstruction with fewer measurements. This article provides an overview of deep-learning-based image reconstruction methods for MRI. Two types of deep-learningbased approaches are reviewed, those that are based on unrolled algorithms and those that are not, and the main structures of both are explained. Several signal processing issues for maximizing the potential of deep reconstruction in fast MRI are discussed, which may facilitate further development of the networks and performance analysis from a theoretical point of view.

232 citations


Journal ArticleDOI
TL;DR: An overview of the recent machine-learning approaches that have been proposed specifically for improving parallel imaging is provided and a general background introduction to parallel MRI is given and structured around the classical view of image- and k-space-based methods.
Abstract: Following the success of deep learning in a wide range of applications, neural network-based machine-learning techniques have received interest as a means of accelerating magnetic resonance imaging (MRI). A number of ideas inspired by deeplearning techniques for computer vision and image processing have been successfully applied to nonlinear image reconstruction in the spirit of compressed sensing for both low-dose computed tomography and accelerated MRI. The additional integration of multicoil information to recover missing k-space lines in the MRI reconstruction process is studied less frequently, even though it is the de facto standard for the currently used accelerated MR acquisitions. This article provides an overview of the recent machine-learning approaches that have been proposed specifically for improving parallel imaging. A general background introduction to parallel MRI is given and structured around the classical view of image- and k-space-based methods. Linear and nonlinear methods are covered, followed by a discussion of the recent efforts to further improve parallel imaging using machine learning and, specifically, artificial neural networks. Image domain-based techniques that introduce improved regularizers are covered as well as k-space-based methods, where the focus is on better interpolation strategies using neural networks. Issues and open problems are discussed and recent efforts for producing open data sets and benchmarks for the community are examined.

204 citations


Journal ArticleDOI
TL;DR: An image CS framework using convolutional neural network (dubbed CSNet) that includes a sampling network and a reconstruction network, which are optimized jointly, which suggest that the learned sampling matrices can improve the traditional image CS reconstruction methods significantly.
Abstract: In the study of compressed sensing (CS), the two main challenges are the design of sampling matrix and the development of reconstruction method. On the one hand, the usually used random sampling matrices (e.g., GRM) are signal independent, which ignore the characteristics of the signal. On the other hand, the state-of-the-art image CS methods (e.g., GSR and MH) achieve quite good performance, however with much higher computational complexity. To deal with the two challenges, we propose an image CS framework using convolutional neural network (dubbed CSNet) that includes a sampling network and a reconstruction network, which are optimized jointly. The sampling network adaptively learns the sampling matrix from the training images, which makes the CS measurements retain more image structural information for better reconstruction. Specifically, three types of sampling matrices are learned, i.e., floating-point matrix, {0, 1}-binary matrix, and {−1, +1}-bipolar matrix. The last two matrices are specially designed for easy storage and hardware implementation. The reconstruction network, which contains a linear initial reconstruction network and a non-linear deep reconstruction network, learns an end-to-end mapping between the CS measurements and the reconstructed images. Experimental results demonstrate that CSNet offers state-of-the-art reconstruction quality, while achieving fast running speed. In addition, CSNet with {0, 1}-binary matrix, and {−1, +1}-bipolar matrix gets comparable performance with the existing deep learning-based CS methods, outperforms the traditional CS methods. Experimental results further suggest that the learned sampling matrices can improve the traditional image CS reconstruction methods significantly.

195 citations


Journal ArticleDOI
TL;DR: The new DLIR algorithm reduced noise and improved spatial resolution and detectability without perceived alteration of the texture, commonly reported with IR.
Abstract: To assess the impact on image quality and dose reduction of a new deep learning image reconstruction (DLIR) algorithm compared with a hybrid iterative reconstruction (IR) algorithm. Data acquisitions were performed at seven dose levels (CTDIvol : 15/10/7.5/5/2.5/1/0.5 mGy) using a standard phantom designed for image quality assessment. Raw data were reconstructed using the filtered back projection (FBP), two levels of IR (ASiR-V50% (AV50); ASiR-V100% (AV100)), and three levels of DLIR (TrueFidelity™ low, medium, high). Noise power spectrum (NPS) and task-based transfer function (TTF) were computed. Detectability index (d′) was computed to model a large mass in the liver, a small calcification, and a small subtle lesion with low contrast. NPS peaks were higher with AV50 than with all DLIR levels and only higher with DLIR-H than with AV100. The average NPS spatial frequencies were higher with DLIR than with IR. For all DLIR levels, TTF50% obtained with DLIR was higher than that with IR. d′ was higher with DLIR than with AV50 but lower with DLIR-L and DLIR-M than with AV100. d′ values were higher with DLIR-H than with AV100 for the small low-contrast lesion (10 ± 4%) and in the same range for the other simulated lesions. New DLIR algorithm reduced noise and improved spatial resolution and detectability without changing the noise texture. Images obtained with DLIR seem to indicate a greater potential for dose optimization than those with hybrid IR. • This study assessed the impact on image quality and radiation dose of a new deep learning image reconstruction (DLIR) algorithm as compared with hybrid iterative reconstruction (IR) algorithm. • The new DLIR algorithm reduced noise and improved spatial resolution and detectability without perceived alteration of the texture, commonly reported with IR. • As compared with IR, DLIR seems to open further possibility of dose optimization.

172 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: Zhang et al. as discussed by the authors model the HDR-to-LDR image formation pipeline as the dynamic range clipping, non-linear mapping from a camera response function, and quantization.
Abstract: Recovering a high dynamic range (HDR) image from a single low dynamic range (LDR) input image is challenging due to missing details in under-/over-exposed regions caused by quantization and saturation of camera sensors. In contrast to existing learning-based methods, our core idea is to incorporate the domain knowledge of the LDR image formation pipeline into our model. We model the HDR-to-LDR image formation pipeline as the (1) dynamic range clipping, (2) non-linear mapping from a camera response function, and (3) quantization. We then propose to learn three specialized CNNs to reverse these steps. By decomposing the problem into specific sub-tasks, we impose effective physical constraints to facilitate the training of individual sub-networks. Finally, we jointly fine-tune the entire model end-to-end to reduce error accumulation. With extensive quantitative and qualitative experiments on diverse image datasets, we demonstrate that the proposed method performs favorably against state-of-the-art single-image HDR reconstruction algorithms.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: Mao et al. as discussed by the authors employed multiple latent codes to generate multiple feature maps at some intermediate layer of the generator, then compose them with adaptive channel importance to recover the input image.
Abstract: Despite the success of Generative Adversarial Networks (GANs) in image synthesis, applying trained GAN models to real image processing remains challenging. Previous methods typically invert a target image back to the latent space either by back-propagation or by learning an additional encoder. However, the reconstructions from both of the methods are far from ideal. In this work, we propose a novel approach, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks. In particular, we employ multiple latent codes to generate multiple feature maps at some intermediate layer of the generator, then compose them with adaptive channel importance to recover the input image. Such an over-parameterization of the latent space significantly improves the image reconstruction quality, outperforming existing competitors. The resulting high-fidelity image reconstruction enables the trained GAN models as prior to many real-world applications, such as image colorization, super-resolution, image inpainting, and semantic manipulation. We further analyze the properties of the layer-wise representation learned by GAN models and shed light on what knowledge each layer is capable of representing.

Journal ArticleDOI
TL;DR: This paper proposes a fully automatic solution to estimate 3D layout of an indoor scene from a single 2D image, and proposes a novel technique to automatically identify the layout topology of an input image, followed by a nonlinear optimization with equality constraints to estimate the final 3D layouts of a scene.
Abstract: 3D layout is crucial for scene understanding and reconstruction, and very useful in applications like real estate and furniture design. In this paper, we propose a fully automatic solution to estimate 3D layout of an indoor scene from a single 2D image. Our technique contains two key components. Firstly, we train a neural network that directly estimates room structure lines from the input image. Secondly, we propose a novel technique to automatically identify the layout topology of an input image, followed by a nonlinear optimization with equality constraints to estimate the final 3D layout of a scene. Based on our knowledge, this is the first fully automatic technique to achieve single image-based 3D layout estimation of an indoor scene. We evaluate our method on the public datasets $LSUN$ , $Hedau$ and $3DGP$ and the results show that the proposed method achieves accurate 3D layout reconstruction on various images with different layout topologies.

Journal ArticleDOI
01 Jan 2020
TL;DR: The field of medical image reconstruction has seen roughly four types of methods: analytical methods, such as filtered backprojection (FBP) for X-ray computed tomography (CT) and the inverse Fourier transform for magnetic resonance imaging (MRI), based on simple mathematical models for the imaging systems.
Abstract: The field of medical image reconstruction has seen roughly four types of methods. The first type tended to be analytical methods, such as filtered backprojection (FBP) for X-ray computed tomography (CT) and the inverse Fourier transform for magnetic resonance imaging (MRI), based on simple mathematical models for the imaging systems. These methods are typically fast, but have suboptimal properties such as poor resolution-noise tradeoff for CT. A second type is iterative reconstruction methods based on more complete models for the imaging system physics and, where appropriate, models for the sensor statistics. These iterative methods improved image quality by reducing noise and artifacts. The U.S. Food and Drug Administration (FDA)-approved methods among these have been based on relatively simple regularization models. A third type of methods has been designed to accommodate modified data acquisition methods, such as reduced sampling in MRI and CT to reduce scan time or radiation dose. These methods typically involve mathematical image models involving assumptions such as sparsity or low rank . A fourth type of methods replaces mathematically designed models of signals and systems with data-driven or adaptive models inspired by the field of machine learning . This article focuses on the two most recent trends in medical image reconstruction: methods based on sparsity or low-rank models and data-driven methods based on machine learning techniques.

Journal ArticleDOI
TL;DR: DeepcomplexMRI as discussed by the authors proposes a multi-channel image reconstruction method, named DeepcomplexMRI, to accelerate parallel MR imaging with residual complex convolutional neural network, which takes advantage of the availability of a large number of existing multichannel groudtruth images and uses them as target data to train the deep residual convolution neural network offline.

Journal ArticleDOI
TL;DR: Experiments demonstrate that the proposed Two-stream Fusion Network (TFNet) can fuse PAN and MS images effectively, and produce pan-sharpened images competitive with even superior to state of the arts images.

Journal ArticleDOI
TL;DR: On DLR images, the image noise was lower, and high-contrast spatial resolution and task-based detectability were better than on images reconstructed with other state-of-the art techniques.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work addresses the problem of multi-person 3D pose estimation from a single image by incorporating the SMPL parametric body model in a top-down framework and proposing two novel losses that enable more coherent reconstruction in natural images.
Abstract: In this work, we address the problem of multi-person 3D pose estimation from a single image. A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently. However, this type of prediction suffers from incoherent results, e.g., interpenetration and inconsistent depth ordering between the people in the scene. Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene. To this end, a key design choice is the incorporation of the SMPL parametric body model in our top-down framework, which enables the use of two novel losses. First, a distance field-based collision loss penalizes interpenetration among the reconstructed people. Second, a depth ordering-aware loss reasons about occlusions and promotes a depth ordering of people that leads to a rendering which is consistent with the annotated instance segmentation. This provides depth supervision signals to the network, even if the image has no explicit 3D annotations. The experiments show that our approach outperforms previous methods on standard 3D pose benchmarks, while our proposed losses enable more coherent reconstruction in natural images. The project website with videos, results, and code can be found at: https://jiangwenpl.github.io/multiperson

Journal ArticleDOI
TL;DR: Pixel-DL as discussed by the authors employs pixel-wise interpolation governed by the physics of photoacoustic wave propagation and then uses a convolution neural network to reconstruct an image, achieving comparable or better performance to iterative methods and consistently outperformed other CNN-based approaches.
Abstract: Photoacoustic tomography (PAT) is a non-ionizing imaging modality capable of acquiring high contrast and resolution images of optical absorption at depths greater than traditional optical imaging techniques. Practical considerations with instrumentation and geometry limit the number of available acoustic sensors and their “view” of the imaging target, which result in image reconstruction artifacts degrading image quality. Iterative reconstruction methods can be used to reduce artifacts but are computationally expensive. In this work, we propose a novel deep learning approach termed pixel-wise deep learning (Pixel-DL) that first employs pixel-wise interpolation governed by the physics of photoacoustic wave propagation and then uses a convolution neural network to reconstruct an image. Simulated photoacoustic data from synthetic, mouse-brain, lung, and fundus vasculature phantoms were used for training and testing. Results demonstrated that Pixel-DL achieved comparable or better performance to iterative methods and consistently outperformed other CNN-based approaches for correcting artifacts. Pixel-DL is a computationally efficient approach that enables for real-time PAT rendering and improved image reconstruction quality for limited-view and sparse PAT.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: Fast and flexible algorithms for SCI based on the plug-and-play (PnP) framework are developed and it is first time show that PnP can recover a UHD color video from a snapshot 2D measurement.
Abstract: Snapshot compressive imaging (SCI) aims to capture the high-dimensional (usually 3D) images using a 2D sensor (detector) in a single snapshot. Though enjoying the advantages of low-bandwidth, low-power and low-cost, applying SCI to large-scale problems (HD or UHD videos) in our daily life is still challenging. The bottleneck lies in the reconstruction algorithms; they are either too slow (iterative optimization algorithms) or not flexible to the encoding process (deep learning based end-to-end networks). In this paper, we develop fast and flexible algorithms for SCI based on the plug-and-play (PnP) framework. In addition to the widely used PnP-ADMM method, we further propose the PnP-GAP (generalized alternating projection) algorithm with a lower computational workload and prove the {global convergence} of PnP-GAP under the SCI hardware constraints. By employing deep denoising priors, we first time show that PnP can recover a UHD color video (3840x1644x48 with PNSR above 30dB) from a snapshot 2D measurement. Extensive results on both simulation and real datasets verify the superiority of our proposed algorithm.

Journal ArticleDOI
TL;DR: This study presents a novel deep framework, termed NHDRRnet, which adopts an alternative direction and attempts to remove ghosting artifacts by exploiting the non-local correlation in inputs and incorporates a triple-pass residual module to capture more powerful local features, which proves to be effective in further boosting the performance.
Abstract: One of the most challenging problems in reconstructing a high dynamic range (HDR) image from multiple low dynamic range (LDR) inputs is the ghosting artifacts caused by the object motion across different inputs. When the object motion is slight, most existing methods can well suppress the ghosting artifacts through aligning LDR inputs based on optical flow or detecting anomalies among them. However, they often fail to produce satisfactory results in practice, since the real object motion can be very large. In this study, we present a novel deep framework, termed NHDRRnet, which adopts an alternative direction and attempts to remove ghosting artifacts by exploiting the non-local correlation in inputs. In NHDRRnet, we first adopt an Unet architecture to fuse all inputs and map the fusion results into a low-dimensional deep feature space. Then, we feed the resultant features into a novel global non-local module which reconstructs each pixel by weighted averaging all the other pixels using the weights determined by their correspondences. By doing this, the proposed NHDRRnet is able to adaptively select the useful information (e.g., which are not corrupted by large motions or adverse lighting conditions) in the whole deep feature space to accurately reconstruct each pixel. In addition, we also incorporate a triple-pass residual module to capture more powerful local features, which proves to be effective in further boosting the performance. Extensive experiments on three benchmark datasets demonstrate the superiority of the proposed NDHRnet in terms of suppressing the ghosting artifacts in HDR reconstruction, especially when the objects have large motions.

Journal ArticleDOI
TL;DR: Compared with 30% ASIR-V, DLIR improved CT evaluation of the abdomen in the portal venous phase, and DLIR strength should be chosen to balance the degree of desired denoising for a clinical task relative to mild blurring, which increases with progressively higher DLIR strengths.
Abstract: OBJECTIVE. The purpose of this study was to perform quantitative and qualitative evaluation of a deep learning image reconstruction (DLIR) algorithm in contrast-enhanced oncologic CT of the abdomen...

Journal ArticleDOI
TL;DR: A Soft-edge assisted Network (SeaNet) to reconstruct the high-quality SR image with the help of image soft-edge and converges rapidly and achieves excellent performance under the assistance of imagesoft-edge.
Abstract: The task of single image super-resolution (SISR) is a highly ill-posed inverse problem since reconstructing the high-frequency details from a low-resolution image is challenging. Most previous CNN-based super-resolution (SR) methods tend to directly learn the mapping from the low-resolution image to the high-resolution image through some complex convolutional neural networks. However, the method of blindly increasing the depth of the network is not the best choice because the performance improvement of such methods is marginal but the computational cost is huge. A more efficient method is to integrate the image prior knowledge into the model to assist the image reconstruction. Indeed, the soft-edge has been widely applied in many computer vision tasks as the role of an important image feature. In this paper, we propose a Soft-edge assisted Network (SeaNet) to reconstruct the high-quality SR image with the help of image soft-edge. The proposed SeaNet consists of three sub-nets: a rough image reconstruction network (RIRN), a soft-edge reconstruction network (Edge-Net), and an image refinement network (IRN). The complete reconstruction process consists of two stages. In Stage-I, the rough SR feature maps and the SR soft-edge are reconstructed by the RIRN and Edge-Net, respectively. In Stage-II, the outputs of the previous stages are fused and then fed to the IRN for high-quality SR image reconstruction. Extensive experiments show that our SeaNet converges rapidly and achieves excellent performance under the assistance of image soft-edge. The code is available at https://gitlab.com/junchenglee/seanet-pytorch .

Journal ArticleDOI
TL;DR: An end-to-end, data-driven method of solving inverse problems inspired by the Neumann series, which is called a Neumann network and outperforms traditional inverse problem solution methods, model-free deep learning approaches, and state-of-the-art unrolled iterative methods on standard datasets.
Abstract: Many challenging image processing tasks can be described by an ill-posed linear inverse problem: deblurring, deconvolution, inpainting, compressed sensing, and superresolution all lie in this framework. Traditional inverse problem solvers minimize a cost function consisting of a data-fit term, which measures how well an image matches the observations, and a regularizer, which reflects prior knowledge and promotes images with desirable properties like smoothness. Recent advances in machine learning and image processing have illustrated that it is often possible to learn a regularizer from training data that can outperform more traditional regularizers. We present an end-to-end, data-driven method of solving inverse problems inspired by the Neumann series, which we call a Neumann network. Rather than unroll an iterative optimization algorithm, we truncate a Neumann series which directly solves the linear inverse problem with a data-driven nonlinear regularizer. The Neumann network architecture outperforms traditional inverse problem solution methods, model-free deep learning approaches, and state-of-the-art unrolled iterative methods on standard datasets. Finally, when the images belong to a union of subspaces and under appropriate assumptions on the forward model, we prove there exists a Neumann network configuration that well-approximates the optimal oracle estimator for the inverse problem and demonstrate empirically that the trained Neumann network has the form predicted by theory.

Proceedings ArticleDOI
01 Mar 2020
TL;DR: A novel neural network architecture for video reconstruction from events that is smaller (38k vs. 10M parameters) and faster than state-of-the-art with minimal impact to performance is proposed.
Abstract: Event cameras are powerful new sensors able to capture high dynamic range with microsecond temporal resolution and no motion blur. Their strength is detecting brightness changes (called events) rather than capturing direct brightness images; however, algorithms can be used to convert events into usable image representations for applications such as classification. Previous works rely on hand-crafted spatial and temporal smoothing techniques to reconstruct images from events. State-of-the-art video reconstruction has recently been achieved using neural networks that are large (10M parameters) and computationally expensive, requiring 30ms for a forward-pass at 640 × 480 resolution on a modern GPU. We propose a novel neural network architecture for video reconstruction from events that is smaller (38k vs. 10M parameters) and faster (10ms vs. 30ms) than state-of-the-art with minimal impact to performance.

Journal ArticleDOI
TL;DR: This tutorial reviews the classical CS formulation and outline steps needed to transform this formulation into a deep-learning-based reconstruction framework, and discusses considerations in applying unrolled neural networks in the clinical setting.
Abstract: Compressed sensing (CS) reconstruction methods leverage sparse structure in underlying signals to recover high-resolution images from highly undersampled measurements. When applied to magnetic resonance imaging (MRI), CS has the potential to dramatically shorten MRI scan times, increase diagnostic value, and improve the overall patient experience. However, CS has several shortcomings that limit its clinical translation. These include 1) artifacts arising from inaccurate sparse modeling assumptions, 2) extensive parameter tuning required for each clinical application, and 3) clinically infeasible reconstruction times. Recently, CS has been extended to incorporate deep neural networks as a way of learning complex image priors from historical exam data. Commonly referred to as unrolled neural networks, these techniques have proven to be a compelling and practical approach to address the challenges of sparse CS. In this tutorial, we review the classical CS formulation and outline steps needed to transform this formulation into a deep-learning-based reconstruction framework. Supplementary open-source code in Python is used to demonstrate this approach with open databases. Further, we discuss considerations in applying unrolled neural networks in the clinical setting.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a regularization by artifact-removal (RARE) algorithm, which can leverage priors learned on datasets containing only undersampled measurements, which is applicable to problems where it is practically impossible to have fully-sampled groundtruth data for training.
Abstract: Regularization by denoising (RED) is an image reconstruction framework that uses an image denoiser as a prior. Recent work has shown the state-of-the-art performance of RED with learned denoisers corresponding to pre-trained convolutional neural nets (CNNs). In this work, we propose to broaden the current denoiser-centric view of RED by considering priors corresponding to networks trained for more general artifact-removal. The key benefit of the proposed family of algorithms, called regularization by artifact-removal (RARE) , is that it can leverage priors learned on datasets containing only undersampled measurements. This makes RARE applicable to problems where it is practically impossible to have fully-sampled groundtruth data for training. We validate RARE on both simulated and experimentally collected data by reconstructing a free-breathing whole-body 3D MRIs into ten respiratory phases from heavily undersampled k-space measurements. Our results corroborate the potential of learning regularizers for iterative inversion directly on undersampled and noisy measurements.

Journal ArticleDOI
TL;DR: The deep-learning based CT reconstruction demonstrated a strong noise magnitude reduction compared to FBP while maintaining similar noise texture and high-contrast spatial resolution, however, the algorithm resulted in images with a locally non-stationary noise in lung textured backgrounds and had somewhat degraded low-cont contrast spatial resolution similar to what has been observed in currently available iterative reconstruction techniques.
Abstract: PURPOSE To characterize the noise and spatial resolution properties of a commercially available deep learning-based computed tomography (CT) reconstruction algorithm. METHODS Two phantom experiments were performed. The first used a multisized image quality phantom (Mercury v3.0, Duke University) imaged at five radiation dose levels (CTDIvol : 0.9, 1.2, 3.6, 7.0, and 22.3 mGy) with a fixed tube current technique on a commercial CT scanner (GE Revolution CT). Images were reconstructed with conventional (FBP), iterative (GE ASiR-V), and deep learning-based (GE True Fidelity) reconstruction algorithms. Noise power spectrum (NPS), high-contrast (air-polyethylene interface), and intermediate-contrast (water-polyethylene interface) task transfer functions (TTF) were measured for each dose level and phantom size and summarized in terms of average noise frequency (fav ) and frequency at which the TTF was reduced to 50% (f50% ), respectively. The second experiment used a custom phantom with low-contrast rods and lung texture sections for the assessment of low-contrast TTF and noise spatial distribution. The phantom was imaged at five dose levels (CTDIvol : 1.0, 2.1, 3.0, 6.0, and 10.0 mGy) with 20 repeated scans at each dose, and images reconstructed with the same reconstruction algorithms. The local noise stationarity was assessed by generating spatial noise maps from the ensemble of repeated images and computing a noise inhomogeneity index, η , following AAPM TG233 methods. All measurements were compared among the algorithms. RESULTS Compared to FBP, noise magnitude was reduced on average (± one standard deviation) by 74 ± 6% and 68 ± 4% for ASiR-V (at "100%" setting) and True Fidelity (at "High" setting), respectively. The noise texture from ASiR-V had substantially lower noise frequency content with 55 ± 4% lower NPS fav compared to FBP while True Fidelity had only marginally different noise frequency content with 9 ± 5% lower NPS fav compared to FBP. Both ASiR-V and True Fidelity demonstrated locally nonstationary noise in a lung texture background at all radiation dose levels, with higher noise near high-contrast edges of vessels and lower noise in uniform regions. At the 1.0 mGy dose level η values were 314% and 271% higher in ASiR-V and True Fidelity compared to FBP, respectively. High-contrast spatial resolution was similar between all algorithms for all dose levels and phantom sizes (<3% difference in TTF f50% ). Compared to FBP, low-contrast spatial resolution was lower for ASiR-V and True Fidelity with a reduction of TTF f50% of up to 42% and 36%, respectively. CONCLUSIONS The deep learning-based CT reconstruction demonstrated a strong noise magnitude reduction compared to FBP while maintaining similar noise texture and high-contrast spatial resolution. However, the algorithm resulted in images with a locally nonstationary noise in lung textured backgrounds and had somewhat degraded low-contrast spatial resolution similar to what has been observed in currently available iterative reconstruction techniques.

Journal ArticleDOI
TL;DR: In this paper, a review of learned image reconstruction is presented, summarizing the current trends and explaining how these approaches fit within, and to some extent have arisen from, a framework that encompasses classical reconstruction methods.
Abstract: Biomedical photoacoustic tomography, which can provide high-resolution 3D soft tissue images based on optical absorption, has advanced to the stage at which translation from the laboratory to clinical settings is becoming possible. The need for rapid image formation and the practical restrictions on data acquisition that arise from the constraints of a clinical workflow are presenting new image reconstruction challenges. There are many classical approaches to image reconstruction, but ameliorating the effects of incomplete or imperfect data through the incorporation of accurate priors is challenging and leads to slow algorithms. Recently, the application of deep learning (DL), or deep neural networks, to this problem has received a great deal of attention. We review the literature on learned image reconstruction, summarizing the current trends and explain how these approaches fit within, and to some extent have arisen from, a framework that encompasses classical reconstruction methods. In particular, it shows how these techniques can be understood from a Bayesian perspective, providing useful insights. We also provide a concise tutorial demonstration of three prototypical approaches to learned image reconstruction. The code and data sets for these demonstrations are available to researchers. It is anticipated that it is in in vivo applications—where data may be sparse, fast imaging critical, and priors difficult to construct by hand—that DL will have the most impact. With this in mind, we conclude with some indications of possible future research directions.

Journal ArticleDOI
TL;DR: Deep Image Fusion Network (DIF-Net) as discussed by the authors proposes an unsupervised loss function using the structure tensor representation of the multi-channel image contrasts, which is minimized by a stochastic deep learning solver with large-scale examples.
Abstract: Convolutional neural networks (CNNs) have facilitated substantial progress on various problems in computer vision and image processing. However, applying them to image fusion has remained challenging due to the lack of the labelled data for supervised learning. This paper introduces a deep image fusion network (DIF-Net), an unsupervised deep learning framework for image fusion. The DIF-Net parameterizes the entire processes of image fusion, comprising of feature extraction, feature fusion, and image reconstruction, using a CNN. The purpose of DIF-Net is to generate an output image which has an identical contrast to high-dimensional input images. To realize this, we propose an unsupervised loss function using the structure tensor representation of the multi-channel image contrasts. Different from traditional fusion methods that involve time-consuming optimization or iterative procedures to obtain the results, our loss function is minimized by a stochastic deep learning solver with large-scale examples. Consequently, the proposed method can produce fused images that preserve source image details through a single forward network trained without reference ground-truth labels. The proposed method has broad applicability to various image fusion problems, including multi-spectral, multi-focus, and multi-exposure image fusions. Quantitative and qualitative evaluations show that the proposed technique outperforms existing state-of-the-art approaches for various applications.