scispace - formally typeset
Search or ask a question

Showing papers on "Image resolution published in 2019"


Proceedings ArticleDOI
02 May 2019
TL;DR: SinGAN, an unconditional generative model that can be learned from a single natural image, is introduced, trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image.
Abstract: We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks.

660 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: Li et al. as mentioned in this paper proposed a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image, which achieved better visual quality with sharper edges and finer textures on real-world scenes.
Abstract: Most of the existing learning-based single image super-resolution (SISR) methods are trained and evaluated on simulated datasets, where the low-resolution (LR) images are generated by applying a simple and uniform degradation (i.e., bicubic downsampling) to their high-resolution (HR) counterparts. However, the degradations in real-world LR images are far more complicated. As a consequence, the SISR models trained on simulated data become less effective when applied to practical scenarios. In this paper, we build a real-world super-resolution (RealSR) dataset where paired LR-HR images on the same scene are captured by adjusting the focal length of a digital camera. An image registration algorithm is developed to progressively align the image pairs at different resolutions. Considering that the degradation kernels are naturally non-uniform in our dataset, we present a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image. Our extensive experiments demonstrate that SISR models trained on our RealSR dataset deliver better visual quality with sharper edges and finer textures on real-world scenes than those trained on simulated datasets. Though our RealSR dataset is built by using only two cameras (Canon 5D3 and Nikon D810), the trained model generalizes well to other camera devices such as Sony a7II and mobile phones.

318 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: Hessian AWare Quantization (HAWQ), a novel second-order quantization method that allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum, is introduced.
Abstract: Model size and inference speed/power have become a major challenge in the deployment of neural networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra-low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow lower precision as compared to other layers. However, there is no systematic way to determine the precision of different layers. A brute force approach is not feasible for deep networks, as the search space for mixed-precision is exponential in the number of layers. Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision. Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order quantization method to address these problems. HAWQ allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum. Moreover, HAWQ provides a deterministic fine-tuning order for quantizing layers. We show the results of our method on Cifar-10 using ResNet20, and on ImageNet using Inception-V3, ResNet50 and SqueezeNext models. Comparing HAWQ with state-of-the-art shows that we can achieve similar/better accuracy with 8× activation compression ratio on ResNet20, as compared to DNAS, and up to 1% higher accuracy with up to 14% smaller models on ResNet50 and Inception-V3, compared to recently proposed methods of RVQuant and HAQ. Furthermore, we show that we can quantize SqueezeNext to just 1MB model size while achieving above 68% top1 accuracy on ImageNet.

313 citations


Journal ArticleDOI
TL;DR: A generative adversarial network (GAN)-based edge-enhancement network (EEGAN) for robust satellite image SR reconstruction along with the adversarial learning strategy that is insensitive to noise is proposed.
Abstract: The current superresolution (SR) methods based on deep learning have shown remarkable comparative advantages but remain unsatisfactory in recovering the high-frequency edge details of the images in noise-contaminated imaging conditions, e.g., remote sensing satellite imaging. In this paper, we propose a generative adversarial network (GAN)-based edge-enhancement network (EEGAN) for robust satellite image SR reconstruction along with the adversarial learning strategy that is insensitive to noise. In particular, EEGAN consists of two main subnetworks: an ultradense subnetwork (UDSN) and an edge-enhancement subnetwork (EESN). In UDSN, a group of 2-D dense blocks is assembled for feature extraction and to obtain an intermediate high-resolution result that looks sharp but is eroded with artifacts and noises as previous GAN-based methods do. Then, EESN is constructed to extract and enhance the image contours by purifying the noise-contaminated components with mask processing. The recovered intermediate image and enhanced edges can be combined to generate the result that enjoys high credibility and clear contents. Extensive experiments on Kaggle Open Source Data set , Jilin-1 video satellite images, and Digitalglobe show superior reconstruction performance compared to the state-of-the-art SR approaches.

305 citations


Proceedings ArticleDOI
04 Mar 2019
TL;DR: Zhang et al. as mentioned in this paper proposed an improved network architecture consisting of four modules: an encoder, decoder, multi-scale feature fusion module, and refinement module, which achieved higher accuracy than the current state-of-the-art.
Abstract: This paper considers the problem of single image depth estimation. The employment of convolutional neural networks (CNNs) has recently brought about significant advancements in the research of this problem. However, most existing methods suffer from loss of spatial resolution in the estimated depth maps; a typical symptom is distorted and blurry reconstruction of object boundaries. In this paper, toward more accurate estimation with a focus on depth maps with higher spatial resolution, we propose two improvements to existing approaches. One is about the strategy of fusing features extracted at different scales, for which we propose an improved network architecture consisting of four modules: an encoder, decoder, multi-scale feature fusion module, and refinement module. The other is about loss functions for measuring inference errors used in training. We show that three loss terms, which measure errors in depth, gradients and surface normals, respectively, contribute to improvement of accuracy in an complementary fashion. Experimental results show that these two improvements enable to attain higher accuracy than the current state-of-the-arts, which is given by finer resolution reconstruction, for example, with small objects and object boundaries.

256 citations


Journal ArticleDOI
TL;DR: DeepISP as mentioned in this paper is a full end-to-end deep neural model of the camera image signal processing pipeline that learns a mapping from the raw low-light mosaiced image to the final visually compelling image.
Abstract: We present DeepISP, a full end-to-end deep neural model of the camera image signal processing pipeline. Our model learns a mapping from the raw low-light mosaiced image to the final visually compelling image and encompasses low-level tasks, such as demosaicing and denoising, as well as higher-level tasks, such as color correction and image adjustment. The training and evaluation of the pipeline were performed on a dedicated data set containing pairs of low-light and well-lit images captured by a Samsung S7 smartphone camera in both raw and processed JPEG formats. The proposed solution achieves the state-of-the-art performance in objective evaluation of peak signal-to-noise ratio on the subtask of joint denoising and demosaicing. For the full end-to-end pipeline, it achieves better visual quality compared to the manufacturer ISP, in both a subjective human assessment and when rated by a deep model trained for assessing image quality.

204 citations


Journal ArticleDOI
TL;DR: Investigation of the potential of convolutional neural networks to provide accurate ear density using nadir high spatial resolution RGB images showed high and similar heritability, demonstrating that ear density derived from high resolution RGB imagery could replace the traditional counting method.

177 citations


Journal ArticleDOI
TL;DR: Decorrelation analysis offers an improved method for assessing image resolution that works on a single image and is insensitive to common image artifacts and can be applied generally to any type of microscopy images.
Abstract: Super-resolution microscopy opened diverse new avenues of research by overcoming the resolution limit imposed by diffraction. Exploitation of the fluorescent emission of individual fluorophores made it possible to reveal structures beyond the diffraction limit. To accurately determine the resolution achieved during imaging is challenging with existing metrics. Here, we propose a method for assessing the resolution of individual super-resolved images based on image partial phase autocorrelation. The algorithm is model-free and does not require any user-defined parameters. We demonstrate its performance on a wide variety of imaging modalities, including diffraction-limited techniques. Finally, we show how our method can be used to optimize image acquisition and post-processing in super-resolution microscopy. Decorrelation analysis offers an improved method for assessing image resolution that works on a single image and is insensitive to common image artifacts. The method can be applied generally to any type of microscopy images.

165 citations


Proceedings ArticleDOI
01 Jun 2019
TL;DR: Wang et al. as discussed by the authors proposed a model-based deep learning approach for merging an HrMS and LrHS images to generate a high-resolution hyperspectral (HrHS) image.
Abstract: Hyperspectral imaging can help better understand the characteristics of different materials, compared with traditional image systems. However, only high-resolution multispectral (HrMS) and low-resolution hyperspectral (LrHS) images can generally be captured at video rate in practice. In this paper, we propose a model-based deep learning approach for merging an HrMS and LrHS images to generate a high-resolution hyperspectral (HrHS) image. In specific, we construct a novel MS/HS fusion model which takes the observation models of low-resolution images and the low-rankness knowledge along the spectral mode of HrHS image into consideration. Then we design an iterative algorithm to solve the model by exploiting the proximal gradient method. And then, by unfolding the designed algorithm, we construct a deep network, called MS/HS Fusion Net, with learning the proximal operators and model parameters by convolutional neural networks. Experimental results on simulated and real data substantiate the superiority of our method both visually and quantitatively as compared with state-of-the-art methods along this line of research.

164 citations


Proceedings ArticleDOI
16 Jun 2019
TL;DR: The application of super-resolution techniques to satellite imagery, and the effects of these techniques on object detection algorithm performance are explored, as well as the performance of object detection as a function of native resolution and object pixel size.
Abstract: We explore the application of super-resolution techniques to satellite imagery, and the effects of these techniques on object detection algorithm performance. Specifically, we enhance satellite imagery beyond its native resolution, and test if we can identify various types of vehicles, planes, and boats with greater accuracy than native resolution. Using the Very Deep Super-Resolution (VDSR) framework and a custom Random Forest Super-Resolution (RFSR) framework we generate enhancement levels of 2x, 4x, and 8x over five distinct resolutions ranging from 30 cm to 4.8 meters. Using both native and super-resolved data, we then train several custom detection models using the SIMRDWN object detection framework. SIMRDWN combines a number of popular object detection algorithms (e.g. SSD, YOLO) into a unified framework that is designed to rapidly detect objects in large satellite images. This approach allows us to quantify the effects of super-resolution techniques on object detection performance across multiple classes and resolutions. We also quantify the performance of object detection as a function of native resolution and object pixel size. For our test set we note that performance degrades from mean average precision (mAP) = 0.53 at 30 cm resolution, down to mAP = 0.11 at 4.8 m resolution. Super-resolving native 30 cm imagery to 15 cm yields the greatest benefit; a 13-36% improvement in mAP. Super-resolution is less beneficial at coarser resolutions, though still provides a small improvement in performance.

163 citations


Journal ArticleDOI
Renwei Dian1, Shutao Li1
TL;DR: A novel subspace-based low tensor multi-rank regularization method for the fusion, which fully exploits the spectral correlations and non-local similarities in the HR-HSI.
Abstract: Recently, combining a low spatial resolution hyperspectral image (LR-HSI) with a high spatial resolution multispectral image (HR-MSI) into an HR-HSI has become a popular scheme to enhance the spatial resolution of HSI. We propose a novel subspace-based low tensor multi-rank regularization method for the fusion, which fully exploits the spectral correlations and non-local similarities in the HR-HSI. To make use of high spectral correlations, the HR-HSI is approximated by spectral subspace and coefficients. We first learn the spectral subspace from the LR-HSI via singular value decomposition, and then estimate the coefficients via the low tensor multi-rank prior. More specifically, based on the learned cluster structure in the HR-MSI, the patches in coefficients are grouped. We collect the coefficients in the same cluster into a three-dimensional tensor and impose the low tensor multi-rank prior on these collected tensors, which fully model the non-local self-similarities in the HR-HSI. The coefficients optimization is solved by the alternating direction method of multipliers. Experiments on two public HSI datasets demonstrate the advantages of our method.

Proceedings ArticleDOI
20 Sep 2019
TL;DR: This work learns to invert the effects of bicubic downsampling in order to restore the natural image characteristics present in the data, and can be trained with direct pixel-wise supervision in the high resolution domain, while robustly generalizing to real input.
Abstract: Most current super-resolution methods rely on low and high resolution image pairs to train a network in a fully supervised manner. However, such image pairs are not available in real-world applications. Instead of directly addressing this problem, most works employ the popular bicubic downsampling strategy to artificially generate a corresponding low resolution image. Unfortunately, this strategy introduces significant artifacts, removing natural sensor noise and other real-world characteristics. Super-resolution networks trained on such bicubic images therefore struggle to generalize to natural images. In this work, we propose an unsupervised approach for image super-resolution. Given only unpaired data, we learn to invert the effects of bicubic downsampling in order to restore the natural image characteristics present in the data. This allows us to generate realistic image pairs, faithfully reflecting the distribution of real-world images. Our super-resolution network can therefore be trained with direct pixel-wise supervision in the high resolution domain, while robustly generalizing to real input. We demonstrate the effectiveness of our approach in quantitative and qualitative experiments.

Journal ArticleDOI
TL;DR: An extended super-resolution convolutional neural network (ESRCNN) to a data fusion framework that blends Landsat-8 and Sentinel-2 images has the potential to help generate continuous reflectance observations of higher temporal frequency than that can be obtained from a single Lands at-like sensor.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: A learning-based method using residual convolutional networks is proposed to reconstruct light fields with higher spatial resolution and shows good performances in preserving the inherent epipolar property in light field images.
Abstract: Light field cameras are considered to have many potential applications since angular and spatial information is captured simultaneously. However, the limited spatial resolution has brought lots of difficulties in developing related applications and becomes the main bottleneck of light field cameras. In this paper, a learning-based method using residual convolutional networks is proposed to reconstruct light fields with higher spatial resolution. The view images in one light field are first grouped into different image stacks with consistent sub-pixel offsets and fed into different network branches to implicitly learn inherent corresponding relations. The residual information in different spatial directions is then calculated from each branch and further integrated to supplement high-frequency details for the view image. Finally, a flexible solution is proposed to super-resolve entire light field images with various angular resolutions. Experimental results on synthetic and real-world datasets demonstrate that the proposed method outperforms other state-of-the-art methods by a large margin in both visual and numerical evaluations. Furthermore, the proposed method shows good performances in preserving the inherent epipolar property in light field images.

Journal ArticleDOI
TL;DR: This paper proposes effective and efficient end-to-end convolutional neural network models for spatially super-resolving LF images with an hourglass shape, which allows feature extraction to be performed at the low-resolution level to save both the computational and memory costs.
Abstract: Light field (LF) photography is an emerging paradigm for capturing more immersive representations of the real world. However, arising from the inherent tradeoff between the angular and spatial dimensions, the spatial resolution of LF images captured by commercial micro-lens-based LF cameras is significantly constrained. In this paper, we propose effective and efficient end-to-end convolutional neural network models for spatially super-resolving LF images. Specifically, the proposed models have an hourglass shape, which allows feature extraction to be performed at the low-resolution level to save both the computational and memory costs. To fully make use of the 4D structure information of LF data in both the spatial and angular domains, we propose to use 4D convolution to characterize the relationship among pixels. Moreover, as an approximation of 4D convolution, we also propose to use spatial-angular separable (SAS) convolutions for more computationally and memory-efficient extraction of spatial-angular joint features. Extensive experimental results on 57 test LF images with various challenging natural scenes show significant advantages from the proposed models over the state-of-the-art methods. That is, an average PSNR gain of more than 3.0 dB and better visual quality are achieved, and our methods preserve the LF structure of the super-resolved LF images better, which is highly desirable for subsequent applications. In addition, the SAS convolution-based model can achieve three times speed up with only negligible reconstruction quality decrease when compared with the 4D convolution-based one. The source code of our method is available online.

Journal ArticleDOI
TL;DR: This paper designs a new detail injection based convolutional neural network (DiCNN) framework for pansharpening with the MS details being directly formulated in end-to-end manners, where the first detail injections based CNN mines MS details through the PAN image and the MS image, and the second one utilizes only the PAN picture.
Abstract: Pansharpening aims to fuse a multispectral (MS) image with an associated panchromatic (PAN) image, producing a composite image with the spectral resolution of the former and the spatial resolution of the latter. Traditional pansharpening methods can be ascribed to a unified detail injection context, which views the injected MS details as the integration of PAN details and bandwise injection gains. In this paper, we design a new detail injection based convolutional neural network (DiCNN) framework for pansharpening with the MS details being directly formulated in end-to-end manners, where the first detail injection based CNN (DiCNN1) mines MS details through the PAN image and the MS image, and the second one (DiCNN2) utilizes only the PAN image. The main advantage of the proposed DiCNNs is that they provide explicit physical interpretations and can achieve fast convergence while achieving high pansharpening quality. Furthermore, the effectiveness of the proposed approaches is also analyzed from a relatively theoretical point of view. Our methods are evaluated via experiments on real MS image datasets, achieving excellent performance when compared to other state-of-the-art methods.

Journal ArticleDOI
TL;DR: This work proposes the use of a flat–curved–flat imaging strategy, in which the sample plane is magnified onto a large spherical image surface and then seamlessly conjugated to multiple planar sensors, to perform video-rate, gigapixel imaging of biological dynamics at centimetre scale and micrometre resolution.
Abstract: Large-scale imaging of biological dynamics with high spatiotemporal resolution is indispensable to system biology studies However, conventional microscopes have an inherent compromise between the achievable field of view and spatial resolution due to the space–bandwidth product theorem In addition, a further challenge is the ability to handle the enormous amount of data generated by a large-scale imaging platform Here, we break these bottlenecks by proposing the use of a flat–curved–flat imaging strategy, in which the sample plane is magnified onto a large spherical image surface and then seamlessly conjugated to multiple planar sensors Our real-time, ultra-large-scale, high-resolution (RUSH) imaging platform operates with a 10 × 12 mm2 field of view, a uniform resolution of ~120 μm after deconvolution and a data throughput of 51 gigapixels per second We use the RUSH platform to perform video-rate, gigapixel imaging of biological dynamics at centimetre scale and micrometre resolution, including brain-wide structural imaging and functional imaging in awake, behaving mice Video-rate imaging of the brains of awake mice with a field of view of 10 × 12 mm2 and a spatial resolution of 12 µm is accomplished

Journal ArticleDOI
TL;DR: Several BDSD-based approaches are provided to solve the issue of getting a robustness of the BDSD with respect to the spectral bands to be fused, and the validity of the proposed approaches against the benchmark is demonstrated.
Abstract: Pansharpening refers to the fusion of a multispectral (MS) image with a finer spectral resolution but coarser spatial resolution than a panchromatic (PAN) image. The classical pansharpening problem can be dealt with component substitution or multiresolution analysis techniques. One of the most notable approaches in the former class is the band-dependent spatial-detail (BDSD) method. It has been shown state-of-the-art performance, in particular, when the fusion of four band data sets is addressed. However, new sensors, such as the WorldView-2/-3 ones, usually acquire MS images with more than four spectral bands to be fused with the PAN image. The BDSD method has shown limitations in performance in these cases. Thus, in this paper, several BDSD-based approaches are provided to solve this issue getting a robustness of the BDSD with respect to the spectral bands to be fused. The experimental results conducted both at reduced and at full resolutions on four real data sets acquired by the IKONOS, the QuickBird, the WorldView-2, and the WorldView-3 sensors demonstrate the validity of the proposed approaches against the benchmark.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: The proposed KMSR consists of two stages: a pool of realistic blur-kernels with a generative adversarial network (GAN) and then a super-resolution network with HR and corresponding LR images constructed with the generated kernels that incorporates blur-kernel modeling in the training.
Abstract: Deep convolutional neural networks (CNNs), trained on corresponding pairs of high- and low-resolution images, achieve state-of-the-art performance in single-image super-resolution and surpass previous signal-processing based approaches. However, their performance is limited when applied to real photographs. The reason lies in their training data: low-resolution (LR) images are obtained by bicubic interpolation of the corresponding high-resolution (HR) images. The applied convolution kernel significantly differs from real-world camera-blur. Consequently, while current CNNs well super-resolve bicubic-downsampled LR images, they often fail on camera-captured LR images. To improve generalization and robustness of deep super-resolution CNNs on real photographs, we present a kernel modeling super-resolution network (KMSR) that incorporates blur-kernel modeling in the training. Our proposed KMSR consists of two stages: we first build a pool of realistic blur-kernels with a generative adversarial network (GAN) and then we train a super-resolution network with HR and corresponding LR images constructed with the generated kernels. Our extensive experimental validations demonstrate the effectiveness of our single-image super-resolution approach on photographs with unknown blur-kernels.

Proceedings ArticleDOI
16 Jun 2019
TL;DR: The 3rd NTIRE challenge on single-image super-resolution (restoration of rich details in a low-resolution image) is reviewed with a focus on proposed solutions and results and the state-of-the-art in real-world single image super- resolution.
Abstract: This paper reviewed the 3rd NTIRE challenge on single-image super-resolution (restoration of rich details in a low-resolution image) with a focus on proposed solutions and results. The challenge had 1 track, which was aimed at the real-world single image super-resolution problem with an unknown scaling factor. Participants were mapping low-resolution images captured by a DSLR camera with a shorter focal length to their high-resolution images captured at a longer focal length. With this challenge, we introduced a novel real-world super-resolution dataset (RealSR). The track had 403 registered participants, and 36 teams competed in the final testing phase. They gauge the state-of-the-art in real-world single image super-resolution.

Journal ArticleDOI
TL;DR: In this paper, a multiframe super-resolution algorithm was proposed to create a complete RGB image directly from a burst of color filter arrays (CFAs) raw images, harnessing natural hand tremor, typical in handheld photography.
Abstract: Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multiframe super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google's flagship phone.

Journal ArticleDOI
TL;DR: In this paper, the authors survey the competing factors that determine spatial and temporal resolution for transmission electron microscopy and scanning transmission electron microscope of liquids and discuss the effects of sample thickness, stability and dose sensitivity on spatial and time resolution.
Abstract: Liquid cell electron microscopy possesses a combination of spatial and temporal resolution that provides a unique view of static structures and dynamic processes in liquids. Optimizing the resolution in liquids requires consideration of both the microscope performance and the properties of the sample. In this Review, we survey the competing factors that determine spatial and temporal resolution for transmission electron microscopy and scanning transmission electron microscopy of liquids. We discuss the effects of sample thickness, stability and dose sensitivity on spatial and temporal resolution. We show that for some liquid samples, spatial resolution can be improved by spherical and chromatic aberration correction. However, other benefits offered by aberration correction may be even more useful for liquid samples. We consider the greater image interpretability offered by spherical aberration correction and the improved dose efficiency for thicker samples offered by chromatic aberration correction. Finally, we discuss the importance of detector and sample parameters for higher resolution in future experiments. Liquid cell electron microscopy provides a unique combination of spatial and temporal resolution, which is useful for imaging static and dynamic processes in liquids. In this Review, we discuss the resolution expected when imaging liquid specimens in transmission electron microscopy and scanning transmission electron microscopy and consider the benefits of spherical and chromatic aberration for resolution, image interpretability and dose efficiency.

Journal ArticleDOI
TL;DR: In this article, the authors introduced the Q-ISM principle and obtained super-resolved optical images of a biological sample stained with fluorescent quantum dots using photon antibunching, a quantum effect, as a resolutionenhancing contrast mechanism.
Abstract: The principles of quantum optics have yielded a plethora of ideas to surpass the classical limitations of sensitivity and resolution in optical microscopy. While some ideas have been applied in proof-of-principle experiments, imaging a biological sample has remained challenging, mainly due to the inherently weak signal measured and the fragility of quantum states of light. In principle, however, these quantum protocols can add new information without sacrificing the classical information and can therefore enhance the capabilities of existing super-resolution techniques. Image scanning microscopy, a recent addition to the family of super-resolution methods, generates a robust resolution enhancement without reducing the signal level. Here, we introduce quantum image scanning microscopy: combining image scanning microscopy with the measurement of quantum photon correlation allows increasing the resolution of image scanning microscopy up to twofold, four times beyond the diffraction limit. We introduce the Q-ISM principle and obtain super-resolved optical images of a biological sample stained with fluorescent quantum dots using photon antibunching, a quantum effect, as a resolution-enhancing contrast mechanism. Characterizing not only the fluorescence intensity but also the inherent quantum correlations of the fluorescent photon stream can enhance the spatial resolution of image scanning microscopy up to twofold, a fourfold improvement over the diffraction limit.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: DSGAN is proposed to introduce natural image characteristics in bicubic downscaled images to improve the performance of the SR model and separate the low and high image frequencies and treat them differently during training.
Abstract: Most of the recent literature on image super-resolution (SR) assumes the availability of training data in the form of paired low resolution (LR) and high resolution (HR) images or the knowledge of the downgrading operator (usually bicubic downscaling). While the proposed methods perform well on standard benchmarks, they often fail to produce convincing results in real-world settings. This is because real-world images can be subject to corruptions such as sensor noise, which are severely altered by bicubic downscaling. Therefore, the models never see a real-world image during training, which limits their generalization capabilities. Moreover, it is cumbersome to collect paired LR and HR images in the same source domain. To address this problem, we propose DSGAN to introduce natural image characteristics in bicubically downscaled images. It can be trained in an unsupervised fashion on HR images, thereby generating LR images with the same characteristics as the original images. We then use the generated data to train a SR model, which greatly improves its performance on real-world images. Furthermore, we propose to separate the low and high image frequencies and treat them differently during training. Since the low frequencies are preserved by downsampling operations, we only require adversarial training to modify the high frequencies. This idea is applied to our DSGAN model as well as the SR model. We demonstrate the effectiveness of our method in several experiments through quantitative and qualitative analysis. Our solution is the winner of the AIM Challenge on Real World SR at ICCV 2019.

Journal ArticleDOI
TL;DR: A new end-to-end bidirectional pyramid network for pan-sharpening is introduced, which permits the network to process MS and PAN images in two separate branches level by level.
Abstract: Pan-sharpening is an important preprocessing step for remote sensing image processing tasks; it fuses a low-resolution multispectral image and a high-resolution (HR) panchromatic (PAN) image to reconstruct a HR multispectral (MS) image. This paper introduces a new end-to-end bidirectional pyramid network for pan-sharpening. The overall structure of the proposed network is a bidirectional pyramid, which permits the network to process MS and PAN images in two separate branches level by level. At each level of the network, spatial details extracted from the PAN image are injected into the upsampled MS image to reconstruct the pan-sharpened image from coarse resolution to fine resolution. Subpixel convolutional layers and the enhanced residual blocks are used to make the network efficient. Comparison of the results obtained with our proposed method and the results using other widely used state-of-the-art approaches confirms that our proposed method outperforms the others in visual appearance and objective indexes.

Journal ArticleDOI
TL;DR: In this article, a learning-based approach for constructing joint filters based on Convolutional Neural Networks is proposed, which can selectively transfer salient structures that are consistent with both guidance and target images.
Abstract: Joint image filters leverage the guidance image as a prior and transfer the structural details from the guidance image to the target image for suppressing noise or enhancing spatial resolution. Existing methods either rely on various explicit filter constructions or hand-designed objective functions, thereby making it difficult to understand, improve, and accelerate these filters in a coherent framework. In this paper, we propose a learning-based approach for constructing joint filters based on Convolutional Neural Networks. In contrast to existing methods that consider only the guidance image, the proposed algorithm can selectively transfer salient structures that are consistent with both guidance and target images. We show that the model trained on a certain type of data, e.g., RGB and depth images, generalizes well to other modalities, e.g., flash/non-Flash and RGB/NIR images. We validate the effectiveness of the proposed joint filter through extensive experimental evaluations with state-of-the-art methods.

Journal ArticleDOI
TL;DR: This work explores a novel foveated reconstruction method that employs the recent advances in generative adversarial neural networks to reconstruct a plausible peripheral video from a small fraction of pixels provided every frame.
Abstract: In order to provide an immersive visual experience, modern displays require head mounting, high image resolution, low latency, as well as high refresh rate. This poses a challenging computational problem. On the other hand, the human visual system can consume only a tiny fraction of this video stream due to the drastic acuity loss in the peripheral vision. Foveated rendering and compression can save computations by reducing the image quality in the peripheral vision. However, this can cause noticeable artifacts in the periphery, or, if done conservatively, would provide only modest savings. In this work, we explore a novel foveated reconstruction method that employs the recent advances in generative adversarial neural networks. We reconstruct a plausible peripheral video from a small fraction of pixels provided every frame. The reconstruction is done by finding the closest matching video to this sparse input stream of pixels on the learned manifold of natural videos. Our method is more efficient than the state-of-the-art foveated rendering, while providing the visual experience with no noticeable quality degradation. We conducted a user study to validate our reconstruction method and compare it against existing foveated rendering and video compression techniques. Our method is fast enough to drive gaze-contingent head-mounted displays in real time on modern hardware. We plan to publish the trained network to establish a new quality bar for foveated rendering and compression as well as encourage follow-up research.

Journal ArticleDOI
TL;DR: A novel convolutional neural network (CNN)-based framework is developed for light field reconstruction from a sparse set of views and it is indicated that the reconstruction can be efficiently modeled as angular restoration on an epipolar plane image (EPI).
Abstract: In this paper, a novel convolutional neural network (CNN)-based framework is developed for light field reconstruction from a sparse set of views. We indicate that the reconstruction can be efficiently modeled as angular restoration on an epipolar plane image (EPI). The main problem in direct reconstruction on the EPI involves an information asymmetry between the spatial and angular dimensions, where the detailed portion in the angular dimensions is damaged by undersampling. Directly upsampling or super-resolving the light field in the angular dimensions causes ghosting effects. To suppress these ghosting effects, we contribute a novel “blur-restoration-deblur” framework. First, the “blur” step is applied to extract the low-frequency components of the light field in the spatial dimensions by convolving each EPI slice with a selected blur kernel. Then, the “restoration” step is implemented by a CNN, which is trained to restore the angular details of the EPI. Finally, we use a non-blind “deblur” operation to recover the spatial high frequencies suppressed by the EPI blur. We evaluate our approach on several datasets, including synthetic scenes, real-world scenes and challenging microscope light field data. We demonstrate the high performance and robustness of the proposed framework compared with state-of-the-art algorithms. We further show extended applications, including depth enhancement and interpolation for unstructured input. More importantly, a novel rendering approach is presented by combining the proposed framework and depth information to handle large disparities.

Journal ArticleDOI
TL;DR: The requirements of image CR are translated into operable optimization targets for training CNN-CR and the visual quality of the compact resolved image is ensured by constraining its difference from a naively downsampled version and the information loss of imageCR is measured by upsampling/super-resolving the compact-resolved image and comparing that to the original image.
Abstract: We study the dual problem of image super-resolution (SR), which we term image compact-resolution (CR). Opposite to image SR that hallucinates a visually plausible high-resolution image given a low-resolution input, image CR provides a low-resolution version of a high-resolution image, such that the low-resolution version is both visually pleasing and as informative as possible compared to the high-resolution image. We propose a convolutional neural network (CNN) for image CR, namely, CNN-CR, inspired by the great success of CNN for image SR. Specifically, we translate the requirements of image CR into operable optimization targets for training CNN-CR: the visual quality of the compact resolved image is ensured by constraining its difference from a naively downsampled version and the information loss of image CR is measured by upsampling/super-resolving the compact-resolved image and comparing that to the original image. Accordingly, CNN-CR can be trained either separately or jointly with a CNN for image SR. We explore different training strategies as well as different network structures for CNN-CR. Our experimental results show that the proposed CNN-CR clearly outperforms simple bicubic downsampling and achieves on average 2.25 dB improvement in terms of the reconstruction quality on a large collection of natural images. We further investigate two applications of image CR, i.e., low-bit-rate image compression and image retargeting. Experimental results show that the proposed CNN-CR helps achieve significant bits saving than High Efficiency Video Coding when applied to image compression and produce visually pleasing results when applied to image retargeting.

Proceedings ArticleDOI
18 Nov 2019
TL;DR: The AIM 2019 challenge on real world super-resolution addresses the real world setting, where paired true high and low-resolution images are unavailable, and aims to advance the state-of-the-art and provide a standard benchmark for this newly emerging task.
Abstract: This paper reviews the AIM 2019 challenge on real world super-resolution. It focuses on the participating methods and final results. The challenge addresses the real world setting, where paired true high and low-resolution images are unavailable. For training, only one set of source input images is therefore provided in the challenge. In Track 1: Source Domain the aim is to super-resolve such images while preserving the low level image characteristics of the source input domain. In Track 2: Target Domain a set of high-quality images is also provided for training, that defines the output domain and desired quality of the super-resolved images. To allow for quantitative evaluation, the source input images in both tracks are constructed using artificial, but realistic, image degradations. The challenge is the first of its kind, aiming to advance the state-of-the-art and provide a standard benchmark for this newly emerging task. In total 7 teams competed in the final testing phase, demonstrating new and innovative solutions to the problem.