scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Image and Video Processing in 2018"


Journal ArticleDOI
TL;DR: In this article, a semi-supervised deep learning approach was proposed to recover high-resolution (HR) CT images from low resolution (LR) counterparts by enforcing the cycle-consistency in terms of Wasserstein distance to establish a nonlinear end-to-end mapping from noisy LR input images to denoised and deblurred HR outputs.
Abstract: Computed tomography (CT) is widely used in screening, diagnosis, and image-guided therapy for both clinical and research purposes. Since CT involves ionizing radiation, an overarching thrust of related technical research is development of novel methods enabling ultrahigh quality imaging with fine structural details while reducing the X-ray radiation. In this paper, we present a semi-supervised deep learning approach to accurately recover high-resolution (HR) CT images from low-resolution (LR) counterparts. Specifically, with the generative adversarial network (GAN) as the building block, we enforce the cycle-consistency in terms of the Wasserstein distance to establish a nonlinear end-to-end mapping from noisy LR input images to denoised and deblurred HR outputs. We also include the joint constraints in the loss function to facilitate structural preservation. In this deep imaging process, we incorporate deep convolutional neural network (CNN), residual learning, and network in network techniques for feature extraction and restoration. In contrast to the current trend of increasing network depth and complexity to boost the CT imaging performance, which limit its real-world applications by imposing considerable computational and memory overheads, we apply a parallel $1\times1$ CNN to compress the output of the hidden layer and optimize the number of layers and the number of filters for each convolutional layer. Quantitative and qualitative evaluations demonstrate that our proposed model is accurate, efficient and robust for super-resolution (SR) image restoration from noisy LR input images. In particular, we validate our composite SR networks on three large-scale CT datasets, and obtain promising results as compared to the other state-of-the-art methods.

242 citations


Journal ArticleDOI
TL;DR: This work develops a convolutional neural network (CNN) that is able to learn the statistical information contained in the speckle intensity patterns captured on a set of diffusers having the same macroscopic parameter.
Abstract: Imaging through scattering is an important, yet challenging problem. Tremendous progress has been made by exploiting the deterministic input-output "transmission matrix" for a fixed medium. However, this "one-to-one" mapping is highly susceptible to speckle decorrelations - small perturbations to the scattering medium lead to model errors and severe degradation of the imaging performance. Our goal here is to develop a new framework that is highly scalable to both medium perturbations and measurement requirement. To do so, we propose a statistical "one-to-all" deep learning technique that encapsulates a wide range of statistical variations for the model to be resilient to speckle decorrelations. Specifically, we develop a convolutional neural network (CNN) that is able to learn the statistical information contained in the speckle intensity patterns captured on a set of diffusers having the same macroscopic parameter. We then show for the first time, to the best of our knowledge, that the trained CNN is able to generalize and make high-quality object predictions through an entirely different set of diffusers of the same class. Our work paves the way to a highly scalable deep learning approach for imaging through scattering media.

225 citations


Posted Content
TL;DR: In this paper, an end-to-end trainable model for image compression based on variational autoencoders is proposed, which incorporates a hyperprior to effectively capture spatial dependencies in the latent representation.
Abstract: We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.

126 citations


Journal ArticleDOI
TL;DR: In this article, the authors demonstrate that deep neural networks can be trained to perform fringe analysis, which substantially enhances the accuracy of phase demodulation from a single fringe pattern, using carrier fringe patterns under the scenario of fringe projection profilometry.
Abstract: In many optical metrology techniques, fringe pattern analysis is the central algorithm for recovering the underlying phase distribution from the recorded fringe patterns. Despite extensive research efforts for decades, how to extract the desired phase information, with the highest possible accuracy, from the minimum number of fringe patterns remains one of the most challenging open problems. Inspired by recent successes of deep learning techniques for computer vision and other applications, here, we demonstrate for the first time, to our knowledge, that the deep neural networks can be trained to perform fringe analysis, which substantially enhances the accuracy of phase demodulation from a single fringe pattern. The effectiveness of the proposed method is experimentally verified using carrier fringe patterns under the scenario of fringe projection profilometry. Experimental results demonstrate its superior performance in terms of high accuracy and edge-preserving over two representative single-frame techniques: Fourier transform profilometry and Windowed Fourier profilometry.

115 citations


Journal ArticleDOI
TL;DR: In this article, a deep neural network is used to transform quantitative phase images (QPI) of label free tissue sections into images that are equivalent to brightfield microscopy images of the same samples that are histochemically stained.
Abstract: Using a deep neural network, we demonstrate a digital staining technique, which we term PhaseStain, to transform quantitative phase images (QPI) of labelfree tissue sections into images that are equivalent to brightfield microscopy images of the same samples that are histochemically stained. Through pairs of image data (QPI and the corresponding brightfield images, acquired after staining) we train a generative adversarial network (GAN) and demonstrate the effectiveness of this virtual staining approach using sections of human skin, kidney and liver tissue, matching the brightfield microscopy images of the same samples stained with Hematoxylin and Eosin, Jones' stain, and Masson's trichrome stain, respectively. This digital staining framework might further strengthen various uses of labelfree QPI techniques in pathology applications and biomedical research in general, by eliminating the need for chemical staining, reducing sample preparation related costs and saving time. Our results provide a powerful example of some of the unique opportunities created by data driven image transformations enabled by deep learning.

103 citations


Journal ArticleDOI
TL;DR: This paper has constructed a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions, and demonstrates the value of the new resource, which is called the live video quality challenge database (LIVE-VQC), by conducting a comparison with leading NR video quality predictors on it.
Abstract: The great variations of videographic skills, camera designs, compression and processing protocols, and displays lead to an enormous variety of video impairments. Current no-reference (NR) video quality models are unable to handle this diversity of distortions. This is true in part because available video quality assessment databases contain very limited content, fixed resolutions, were captured using a small number of camera devices by a few videographers and have been subjected to a modest number of distortions. As such, these databases fail to adequately represent real world videos, which contain very different kinds of content obtained under highly diverse imaging conditions and are subject to authentic, often commingled distortions that are impossible to simulate. As a result, NR video quality predictors tested on real-world video data often perform poorly. Towards advancing NR video quality prediction, we constructed a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions. We collected a large number of subjective video quality scores via crowdsourcing. A total of 4776 unique participants took part in the study, yielding more than 205000 opinion scores, resulting in an average of 240 recorded human opinions per video. We demonstrate the value of the new resource, which we call the LIVE Video Quality Challenge Database (LIVE-VQC), by conducting a comparison of leading NR video quality predictors on it. This study is the largest video quality assessment study ever conducted along several key dimensions: number of unique contents, capture devices, distortion types and combinations of distortions, study participants, and recorded subjective scores. The database is available for download on this link: this http URL .

97 citations


Posted Content
Fabian Mentzer1, Eirikur Agustsson1, Michael Tschannen1, Radu Timofte1, Luc Van Gool1 
TL;DR: The first practical learned lossless image compression system, L3C, is proposed and it outperforms the popular engineered codecs, PNG, WebP and JPEG 2000, and finds that learning the auxiliary representation is crucial and outperforms predefined auxiliary representations such as an RGB pyramid significantly.
Abstract: We propose the first practical learned lossless image compression system, L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and JPEG 2000. At the core of our method is a fully parallelizable hierarchical probabilistic model for adaptive entropy coding which is optimized end-to-end for the compression task. In contrast to recent autoregressive discrete probabilistic models such as PixelCNN, our method i) models the image distribution jointly with learned auxiliary representations instead of exclusively modeling the image distribution in RGB space, and ii) only requires three forward-passes to predict all pixel probabilities instead of one for each pixel. As a result, L3C obtains over two orders of magnitude speedups when sampling compared to the fastest PixelCNN variant (Multiscale-PixelCNN). Furthermore, we find that learning the auxiliary representation is crucial and outperforms predefined auxiliary representations such as an RGB pyramid significantly.

96 citations


Journal ArticleDOI
TL;DR: In this article, a nonlocal sparsity reinforced deep convolutional neural network denoising (NN3D) is proposed, which is a combination of a local multiscale denoiser and a non-local filter.
Abstract: We introduce a paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising. It is a combination of a local multiscale denoising by a convolutional neural network (CNN) based denoiser and a nonlocal denoising based on a nonlocal filter (NLF) exploiting the mutual similarities between groups of patches. CNN models are leveraged with noise levels that progressively decrease at every iteration of our framework, while their output is regularized by a nonlocal prior implicit within the NLF. Unlike complicated neural networks that embed the nonlocality prior within the layers of the network, our framework is modular, it uses standard pre-trained CNNs together with standard nonlocal filters. An instance of the proposed framework, called NN3D, is evaluated over large grayscale image datasets showing state-of-the-art performance.

83 citations


Posted Content
TL;DR: The objective of this paper is to study the relationship between 3D quality and bitrate at different frame rates and show that increasing the frame rate of 3D videos beyond 60 fps may not be visually distinguishable.
Abstract: Increasing the frame rate of a 3D video generally results in improved Quality of Experience (QoE). However, higher frame rates involve a higher degree of complexity in capturing, transmission, storage, and display. The question that arises here is what frame rate guarantees high viewing quality of experience given the existing/required 3D devices and technologies (3D cameras, 3D TVs, compression, transmission bandwidth, and storage capacity). This question has already been addressed for the case of 2D video, but not for 3D. The objective of this paper is to study the relationship between 3D quality and bitrate at different frame rates. Our performance evaluations show that increasing the frame rate of 3D videos beyond 60 fps may not be visually distinguishable. In addition, our experiments show that when the available bandwidth is reduced, the highest possible 3D quality of experience can be achieved by adjusting (decreasing) the frame rate instead of increasing the compression ratio. The results of our study are of particular interest to network providers for rate adaptation in variable bitrate channels.

81 citations


Journal ArticleDOI
TL;DR: This diagnostic study describes a novel attention-based deep neural network framework for classifying microscopy images to identify Barrett esophagus and esophageal adenocarcinoma.
Abstract: Deep learning-based methods, such as the sliding window approach for cropped-image classification and heuristic aggregation for whole-slide inference, for analyzing histological patterns in high-resolution microscopy images have shown promising results. These approaches, however, require a laborious annotation process and are fragmented. This diagnostic study collected deidentified high-resolution histological images (N = 379) for training a new model composed of a convolutional neural network and a grid-based attention network, trainable without region-of-interest annotations. Histological images of patients who underwent endoscopic esophagus and gastroesophageal junction mucosal biopsy between January 1, 2016, and December 31, 2018, at Dartmouth-Hitchcock Medical Center (Lebanon, New Hampshire) were collected. The method achieved a mean accuracy of 0.83 in classifying 123 test images. These results were comparable with or better than the performance from the current state-of-the-art sliding window approach, which was trained with regions of interest. Results of this study suggest that the proposed attention-based deep neural network framework for Barrett esophagus and esophageal adenocarcinoma detection is important because it is based solely on tissue-level annotations, unlike existing methods that are based on regions of interest. This new model is expected to open avenues for applying deep learning to digital pathology.

78 citations


Posted Content
TL;DR: In this article, a patch-based singular value shrinkage method for diffusion magnetic resonance image estimation targeted at low signal to noise ratio and accelerated acquisitions is proposed, where asymptotically optimal signal recovery guarantees can be attained by modeling the noise propagation in the reconstruction and subsequently simulating or calculating the limit singular value spectrum.
Abstract: We propose a patch-based singular value shrinkage method for diffusion magnetic resonance image estimation targeted at low signal to noise ratio and accelerated acquisitions. It operates on the complex data resulting from a sensitivity encoding reconstruction, where asymptotically optimal signal recovery guarantees can be attained by modeling the noise propagation in the reconstruction and subsequently simulating or calculating the limit singular value spectrum. Simple strategies are presented to deal with phase inconsistencies and optimize patch construction. The pertinence of our contributions is quantitatively validated on synthetic data, an in vivo adult example, and challenging neonatal and fetal cohorts. Our methodology is compared with related approaches, which generally operate on magnitude-only data and use data-based noise level estimation and singular value truncation. Visual examples are provided to illustrate effectiveness in generating denoised and debiased diffusion estimates with well preserved spatial and diffusion detail.

Journal ArticleDOI
TL;DR: DeepISP as mentioned in this paper is a full end-to-end deep neural model of the camera image signal processing (ISP) pipeline that learns a mapping from the raw low-light mosaiced image to the final visually compelling image and encompasses low-level tasks such as demosaicing and denoising as well as higher-level task such as color correction and image adjustment.
Abstract: We present DeepISP, a full end-to-end deep neural model of the camera image signal processing (ISP) pipeline. Our model learns a mapping from the raw low-light mosaiced image to the final visually compelling image and encompasses low-level tasks such as demosaicing and denoising as well as higher-level tasks such as color correction and image adjustment. The training and evaluation of the pipeline were performed on a dedicated dataset containing pairs of low-light and well-lit images captured by a Samsung S7 smartphone camera in both raw and processed JPEG formats. The proposed solution achieves state-of-the-art performance in objective evaluation of PSNR on the subtask of joint denoising and demosaicing. For the full end-to-end pipeline, it achieves better visual quality compared to the manufacturer ISP, in both a subjective human assessment and when rated by a deep model trained for assessing image quality.

Posted Content
TL;DR: The LIVE-NFLX-II database is designed, a highly-realistic database which contains subjective QoE responses to various design dimensions, such as bitrate adaptation algorithms, network conditions and video content, and builds on recent advancements in content-adaptive encoding.
Abstract: Measuring Quality of Experience (QoE) and integrating these measurements into video streaming algorithms is a multi-faceted problem that fundamentally requires the design of comprehensive subjective QoE databases and metrics. To achieve this goal, we have recently designed the LIVE-NFLX-II database, a highly-realistic database which contains subjective QoE responses to various design dimensions, such as bitrate adaptation algorithms, network conditions and video content. Our database builds on recent advancements in content-adaptive encoding and incorporates actual network traces to capture realistic network variations on the client device. Using our database, we study the effects of multiple streaming dimensions on user experience and evaluate video quality and quality of experience models. We believe that the tools introduced here will help inspire further progress on the development of perceptually-optimized client adaptation and video streaming strategies. The database is publicly available at this http URL.

Journal ArticleDOI
TL;DR: In this paper, a phase-control technique was introduced to characterize complex media based on the use of fast 1D spatial light modulators and a 1D-to-2D transformation performed by the same medium being analyzed.
Abstract: Controlling the propagation and interaction of light in complex media has sparked major interest in the last few years. Unfortunately, spatial light modulation devices suffer from limited speed that precludes real-time applications such as imaging in live tissue. To address this critical problem we introduce a phase-control technique to characterize complex media based on the use of fast 1D spatial light modulators and a 1D-to-2D transformation performed by the same medium being analyzed. We implement the concept using a micro-electro-mechanical grating light valve (GLV) with 1088 degrees of freedom modulated at 350 KHz, enabling unprecedented high-speed wavefront measurements. We continuously measure the transmission matrix, calculate the optimal wavefront and project a focus through various dynamic scattering samples in real-time, all within 2.4 ms per cycle. These results improve prior wavefront shaping modulation speed by more than an order of magnitude and open new opportunities for optical processing using 1D-to-2D transformations.

Posted Content
TL;DR: This paper overviews recent graph spectral techniques in GSP specifically for image/video processing, including image compression, image restoration, image filtering, and image segmentation.
Abstract: Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing. The topics covered include image compression, image restoration, image filtering and image segmentation.

Journal ArticleDOI
TL;DR: An umbrella under which the various research activities in the field are broadly probed and taxonomized is offered, and a taxonomy for the various detection methods is proposed, enabling an objective design and implementation for target detection in SAR imagery.
Abstract: Target detection is the front-end stage in any automatic target recognition system for synthetic aperture radar (SAR) imagery (SAR-ATR). The efficacy of the detector directly impacts the succeeding stages in the SAR-ATR processing chain. There are numerous methods reported in the literature for implementing the detector. We offer an umbrella under which the various research activities in the field are broadly probed and taxonomized. First, a taxonomy for the various detection methods is proposed. Second, the underlying assumptions for different implementation strategies are overviewed. Third, a tabular comparison between careful selections of representative examples is introduced. Finally, a novel discussion is presented, wherein the issues covered include suitability of SAR data models, understanding the multiplicative SAR data models, and two unique perspectives on constant false alarm rate (CFAR) detection: signal processing and pattern recognition. From a signal processing perspective, CFAR is shown to be a finite impulse response band-pass filter. From a statistical pattern recognition perspective, CFAR is shown to be a suboptimal one-class classifier: a Euclidian distance classifier and a quadratic discriminant with a missing term for one-parameter and two-parameter CFAR, respectively. We make a contribution toward enabling an objective design and implementation for target detection in SAR imagery.

Posted Content
TL;DR: This review systematically presents various unsupervised deep learning models, tools, and benchmark datasets applied to medical image analysis and discusses autoencoders and its other variants, Restricted Boltzmann machines (RBM), Deep belief networks (DBN), Deep Boltzman machine (DBM), and Generative adversarial network (GAN).
Abstract: Interpretation of medical images for diagnosis and treatment of complex disease from high-dimensional and heterogeneous data remains a key challenge in transforming healthcare. In the last few years, both supervised and unsupervised deep learning achieved promising results in the area of medical imaging and image analysis. Unlike supervised learning which is biased towards how it is being supervised and manual efforts to create class label for the algorithm, unsupervised learning derive insights directly from the data itself, group the data and help to make data driven decisions without any external bias. This review systematically presents various unsupervised models applied to medical image analysis, including autoencoders and its several variants, Restricted Boltzmann machines, Deep belief networks, Deep Boltzmann machine and Generative adversarial network. Future research opportunities and challenges of unsupervised techniques for medical image analysis have also been discussed.

Posted Content
Johannes Ballé1
TL;DR: Together, the Sadam and GDN techniques stabilize the training procedure of nonlinear image transforms and increase their capacity to approximate the (unknown) rate-distortion optimal transform functions.
Abstract: We assess the performance of two techniques in the context of nonlinear transform coding with artificial neural networks, Sadam and GDN. Both techniques have been successfully used in state-of-the-art image compression methods, but their performance has not been individually assessed to this point. Together, the techniques stabilize the training procedure of nonlinear image transforms and increase their capacity to approximate the (unknown) rate-distortion optimal transform functions. Besides comparing their performance to established alternatives, we detail the implementation of both methods and provide open-source code along with the paper.

Journal ArticleDOI
TL;DR: This paper proposes a new penalty, which simultaneously enforces group and within-group sparsity to the cost of being nonconvex, and shows on simulated and real datasets that well-chosen penalties can significantly improve the unmixing performance compared to classical sparse regression techniques or to the naive bundle approach.
Abstract: Hyperspectral images provide much more information than conventional imaging techniques, allowing a precise identification of the materials in the observed scene, but because of the limited spatial resolution, the observations are usually mixtures of the contributions of several materials. The spectral unmixing problem aims at recovering the spectra of the pure materials of the scene (endmembers), along with their proportions (abundances) in each pixel. In order to deal with the intra-class variability of the materials and the induced spectral variability of the endmembers, several spectra per material, constituting endmember bundles, can be considered. However, the usual abundance estimation techniques do not take advantage of the particular structure of these bundles, organized into groups of spectra. In this paper, we propose to use group sparsity by introducing mixed norms in the abundance estimation optimization problem. In particular, we propose a new penalty which simultaneously enforces group and within group sparsity, to the cost of being nonconvex. All the proposed penalties are compatible with the abundance sum-to-one constraint, which is not the case with traditional sparse regression. We show on simulated and real datasets that well chosen penalties can significantly improve the unmixing performance compared to the naive bundle approach.

Posted Content
TL;DR: Wang et al. as mentioned in this paper proposed a 3D convolutional neural network (3D CNN) based model to automatically segment gliomas, which achieved Dice scores of 072, 083 and 081 for the complete tumor, tumor core and enhancing tumor.
Abstract: Glioma is one of the most common and aggressive types of primary brain tumors The accurate segmentation of subcortical brain structures is crucial to the study of gliomas in that it helps the monitoring of the progression of gliomas and aids the evaluation of treatment outcomes However, the large amount of required human labor makes it difficult to obtain the manually segmented Magnetic Resonance Imaging (MRI) data, limiting the use of precise quantitative measurements in the clinical practice In this work, we try to address this problem by developing a 3D Convolutional Neural Network~(3D CNN) based model to automatically segment gliomas The major difficulty of our segmentation model comes with the fact that the location, structure, and shape of gliomas vary significantly among different patients In order to accurately classify each voxel, our model captures multi-scale contextual information by extracting features from two scales of receptive fields To fully exploit the tumor structure, we propose a novel architecture that hierarchically segments different lesion regions of the necrotic and non-enhancing tumor~(NCR/NET), peritumoral edema~(ED) and GD-enhancing tumor~(ET) Additionally, we utilize densely connected convolutional blocks to further boost the performance We train our model with a patch-wise training schema to mitigate the class imbalance problem The proposed method is validated on the BraTS 2017 dataset and it achieves Dice scores of 072, 083 and 081 for the complete tumor, tumor core and enhancing tumor, respectively These results are comparable to the reported state-of-the-art results, and our method is better than existing 3D-based methods in terms of compactness, time and space efficiency

Posted Content
TL;DR: The correlation between the subjective and objective results confirm that VIF quality metric outperforms all to ther tested metrics in the presence of the tested types of distortions.
Abstract: While there exists a wide variety of Low Dynamic Range (LDR) quality metrics, only a limited number of metrics are designed specifically for the High Dynamic Range (HDR) content. With the introduction of HDR video compression standardization effort by international standardization bodies, the need for an efficient video quality metric for HDR applications has become more pronounced. The objective of this study is to compare the performance of the existing full-reference LDR and HDR video quality metrics on HDR content and identify the most effective one for HDR applications. To this end, a new HDR video dataset is created, which consists of representative indoor and outdoor video sequences with different brightness, motion levels and different representing types of distortions. The quality of each distorted video in this dataset is evaluated both subjectively and objectively. The correlation between the subjective and objective results confirm that VIF quality metric outperforms all to ther tested metrics in the presence of the tested types of distortions.

Journal ArticleDOI
TL;DR: In this article, a human-in-the-loop strategy was proposed to reduce the burden of WSI annotation and display of neural network predictions on WSIs, and the network performance was evaluated for the segmentation of renal micro compartments.
Abstract: Neural networks promise to bring robust, quantitative analysis to medical fields, but adoption is limited by the technicalities of training these networks. To address this translation gap between medical researchers and neural networks in the field of pathology, we have created an intuitive interface which utilizes the commonly used whole slide image (WSI) viewer, Aperio ImageScope (Leica Biosystems Imaging, Inc.), for the annotation and display of neural network predictions on WSIs. Leveraging this, we propose the use of a human-in-the-loop strategy to reduce the burden of WSI annotation. We track network performance improvements as a function of iteration and quantify the use of this pipeline for the segmentation of renal histologic findings on WSIs. More specifically, we present network performance when applied to segmentation of renal micro compartments, and demonstrate multi-class segmentation in human and mouse renal tissue slides. Finally, to show the adaptability of this technique to other medical imaging fields, we demonstrate its ability to iteratively segment human prostate glands from radiology imaging data.

Posted Content
TL;DR: This paper shows that comparable performances can be obtained with a unique learned transform in the case of autoencoders, and saves a lot of training time.
Abstract: This paper explores the problem of learning transforms for image compression via autoencoders. Usually, the rate-distortion performances of image compression are tuned by varying the quantization step size. In the case of autoen-coders, this in principle would require learning one transform per rate-distortion point at a given quantization step size. Here, we show that comparable performances can be obtained with a unique learned transform. The different rate-distortion points are then reached by varying the quantization step size at test time. This approach saves a lot of training time.

Book ChapterDOI
TL;DR: This study proposes a novel elastography technique which uses deep Convolutional Neural Network to get a coarse but robust time-delay estimation between two ultrasound images and calculates the finer displacement exploiting all the information of all the samples of RF data simultaneously.
Abstract: Displacement estimation is very important in ultrasound elastography and failing to estimate displacement correctly results in failure in generating strain images. As conventional ultrasound elastography techniques suffer from decorrelation noise, they are prone to fail in estimating displacement between echo signals obtained during tissue distortions. This study proposes a novel elastography technique which addresses the decorrelation in estimating displacement field. We call our method GLUENet (GLobal Ultrasound Elastography Network) which uses deep Convolutional Neural Network (CNN) to get a coarse time-delay estimation between two ultrasound images. This displacement is later used for formulating a nonlinear cost function which incorporates similarity of RF data intensity and prior information of estimated displacement. By optimizing this cost function, we calculate the finer displacement by exploiting all the information of all the samples of RF data simultaneously. The Contrast to Noise Ratio (CNR) and Signal to Noise Ratio (SNR) of the strain images from our technique is very much close to that of strain images from GLUE. While most elastography algorithms are sensitive to parameter tuning, our robust algorithm is substantially less sensitive to parameter tuning.

Journal ArticleDOI
TL;DR: In this paper, a marked point process and reversible jump Markov chain Monte Carlo (RJ-MCMC) moves are used to sample the posterior distribution of interest to estimate the number of points belonging to the same surface.
Abstract: Light detection and ranging (Lidar) data can be used to capture the depth and intensity profile of a 3D scene. This modality relies on constructing, for each pixel, a histogram of time delays between emitted light pulses and detected photon arrivals. In a general setting, more than one surface can be observed in a single pixel. The problem of estimating the number of surfaces, their reflectivity and position becomes very challenging in the low-photon regime (which equates to short acquisition times) or relatively high background levels (i.e., strong ambient illumination). This paper presents a new approach to 3D reconstruction using single-photon, single-wavelength Lidar data, which is capable of identifying multiple surfaces in each pixel. Adopting a Bayesian approach, the 3D structure to be recovered is modelled as a marked point process and reversible jump Markov chain Monte Carlo (RJ-MCMC) moves are proposed to sample the posterior distribution of interest. In order to promote spatial correlation between points belonging to the same surface, we propose a prior that combines an area interaction process and a Strauss process. New RJ-MCMC dilation and erosion updates are presented to achieve an efficient exploration of the configuration space. To further reduce the computational load, we adopt a multiresolution approach, processing the data from a coarse to the finest scale. The experiments performed with synthetic and real data show that the algorithm obtains better reconstructions than other recently published optimization algorithms for lower execution times.

Journal ArticleDOI
TL;DR: In this paper, a linear forward model is proposed to derive slice-wise phase and absorption transfer functions using angled illumination, which facilitates flexible and efficient data acquisition, enabling arbitrary sampling of the illumination angles.
Abstract: We demonstrate a motion-free intensity diffraction tomography technique that enables direct inversion of 3D phase and absorption from intensity-only measurements for weakly scattering samples. We derive a novel linear forward model, featuring slice-wise phase and absorption transfer functions using angled illumination. This new framework facilitates flexible and efficient data acquisition, enabling arbitrary sampling of the illumination angles. The reconstruction algorithm performs 3D synthetic aperture using a robust, computation and memory efficient slice-wise deconvolution to achieve resolution up to the incoherent limit. We demonstrate our technique with thick biological samples having both sparse 3D structures and dense cell clusters. We further investigate the limitation of our technique when imaging strongly scattering samples. Imaging performance and the influence of multiple scattering is evaluated using a 3D sample consisting of stacked phase and absorption resolution targets. This computational microscopy system is directly built on a standard commercial microscope with a simple LED array source add-on, and promises broad applications by leveraging the ubiquitous microscopy platforms with minimal hardware modifications.

Posted Content
TL;DR: This paper presents a new algorithm, Accelerated Wirtinger Flow (AWF), for ptychographic image reconstruction from phaseless diffraction pattern measurements, based on combining Nesterov's acceleration approach with Wirtser gradient descent.
Abstract: This paper presents a new algorithm, Accelerated Wirtinger Flow (AWF), for ptychographic image reconstruction from phaseless diffraction pattern measurements. AWF is based on combining Nesterov's acceleration approach with Wirtinger gradient descent. Theoretical results enable prespecification of all AWF algorithm parameters, with no need for computationally-expensive line searches and no need for manual parameter tuning. AWF is evaluated in the context of simulated X-ray ptychography, where we demonstrate fast convergence and low per-iteration computational complexity. We also show examples where AWF reaches higher image quality with less computation than classical algorithms. AWF is also shown to have robustness to noise and probe misalignment.

Journal ArticleDOI
TL;DR: In this article, a deep neural network (DNN) was proposed to detect the lateral and axial positions, and the particle sizes via a DNN, and numerically investigated the performance of the DNN in terms of the errors in the detected positions and sizes.
Abstract: This paper proposes a particle volume reconstruction directly from an in-line hologram using a deep neural network. Digital holographic volume reconstruction conventionally uses multiple diffraction calculations to obtain sectional reconstructed images from an in-line hologram, followed by detection of the lateral and axial positions, and the sizes of particles by using focus metrics. However, the axial resolution is limited by the numerical aperture of the optical system, and the processes are time-consuming. The method proposed here can simultaneously detect the lateral and axial positions, and the particle sizes via a deep neural network (DNN). We numerically investigated the performance of the DNN in terms of the errors in the detected positions and sizes. The calculation time is faster than conventional diffracted-based approaches.

Posted Content
TL;DR: This paper restricts the denoisers to the class of graph filters under a linearity assumption, or more specifically the symmetric smoothing filters, and introduces a new analysis technique via the concept of consensus equilibrium to provide interpretations to problems involving multiple priors.
Abstract: The Plug-and-Play (PnP) ADMM algorithm is a powerful image restoration framework that allows advanced image denoising priors to be integrated into physical forward models to generate high quality image restoration results. However, despite the enormous number of applications and several theoretical studies trying to prove the convergence by leveraging tools in convex analysis, very little is known about why the algorithm is doing so well. The goal of this paper is to fill the gap by discussing the performance of PnP ADMM. By restricting the denoisers to the class of graph filters under a linearity assumption, or more specifically the symmetric smoothing filters, we offer three contributions: (1) We show conditions under which an equivalent maximum-a-posteriori (MAP) optimization exists, (2) we present a geometric interpretation and show that the performance gain is due to an intrinsic pre-denoising characteristic of the PnP prior, (3) we introduce a new analysis technique via the concept of consensus equilibrium, and provide interpretations to problems involving multiple priors.

Journal ArticleDOI
TL;DR: A novel design of human visual system response in a convolutional filter form to decompose meaningful features that are closely tied with image sharpness level is proposed and an innovative NR-ISA metric called HVS-MaxPol is designed that requires minimal computational cost, produces high correlation accuracy with imagesharpness level, and scales to assess the synthetic and natural image blur.
Abstract: In this paper, we propose a novel design of Human Visual System (HVS) response in a convolution filter form to decompose meaningful features that are closely tied with image sharpness level. No-reference (NR) Image sharpness assessment (ISA) techniques have emerged as the standard of image quality assessment in diverse imaging applications. Despite their high correlation with subjective scoring, they are challenging for practical considerations due to high computational cost and lack of scalability across different image blurs. We bridge this gap by synthesizing the HVS response as a linear combination of Finite Impulse Response (FIR) derivative filters to boost the falloff of high band frequency magnitudes in natural imaging paradigm. The numerical implementation of the HVS filter is carried out with MaxPol filter library that can be arbitrarily set for any differential orders and cutoff frequencies to balance out the estimation of informative features and noise sensitivities. We then design an innovative NR-ISA metric called `HVS-MaxPol' that (a) requires minimal computational cost, (b) produce high correlation accuracy with image blurriness, and (c) scales to assess synthetic and natural image blur. Specifically, the synthetic blur images are constructed by blurring the raw images using Gaussian filter, while natural blur is observed from real-life application such as motion, out-of-focus, etc. Furthermore, we create a natural benchmark database in digital pathology for validation of image focus quality in whole slide imaging systems called `FocusPath' consisting of 864 blurred images. Thorough experiments are designed to test and validate the efficiency of HVS-MaxPol across different blur databases and state-of-the-art NR-ISA metrics. The experiment result indicates that our metric has the best overall performance with respect to speed, accuracy and scalability.