scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fast and Full-Resolution Light Field Deblurring Using a Deep Neural Network

TL;DR: This work generates a complex blurry light field dataset and proposes a learning-based deblurring approach that is about 16K times faster than Srinivasan et.
Abstract: Restoring a sharp light field image from its blurry input has become essential due to the increasing popularity of parallax-based image processing. State-of-the-art blind light field deblurring methods suffer from several issues such as slow processing, reduced spatial size, and a limited motion blur model. In this work, we address these challenging problems by generating a complex blurry light field dataset and proposing a learning-based deblurring approach. In particular, we model the full 6-degree of freedom (6-DOF) light field camera motion, which is used to create the blurry dataset using a combination of real light fields captured with a Lytro Illum camera, and synthetic light field renderings of 3D scenes. Furthermore, we propose a light field deblurring network that is built with the capability of large receptive fields. We also introduce a simple strategy of angular sampling to train on the large-scale blurry light field effectively. We evaluate our method through both quantitative and qualitative measurements and demonstrate superior performance compared to the state-of-the-art method with a massive speedup in execution time. Our method is about 16K times faster than Srinivasan et. al. [22] and can deblur a full-resolution light field in less than 2 seconds.
Citations
More filters
Journal ArticleDOI
TL;DR: A comprehensive and timely survey of recently published deep-learning based image deblurring approaches can be found in this article , where the authors discuss common causes of image blur, introduce benchmark datasets and performance metrics, and summarize different problem formulations.
Abstract: Image deblurring is a classic problem in low-level computer vision with the aim to recover a sharp image from a blurred input image. Advances in deep learning have led to significant progress in solving this problem, and a large number of deblurring networks have been proposed. This paper presents a comprehensive and timely survey of recently published deep-learning based image deblurring approaches, aiming to serve the community as a useful literature review. We start by discussing common causes of image blur, introduce benchmark datasets and performance metrics, and summarize different problem formulations. Next, we present a taxonomy of methods using convolutional neural networks (CNN) based on architecture, loss function, and application, offering a detailed review and comparison. In addition, we discuss some domain-specific deblurring applications including face images, text, and stereo image pairs. We conclude by discussing key challenges and future research directions.

65 citations

Journal ArticleDOI
01 Nov 2022
TL;DR: A high-quality and challenging urban scene dataset, containing 1074 samples composed of real-world and synthetic light field images as well as pixel-wise annotations for 14 semantic classes, is proposed, believed to be the largest and the most diverse light field dataset for semantic segmentation.
Abstract: As one of the fundamental technologies for scene understanding, semantic segmentation has been widely explored in the last few years. Light field cameras encode the geometric information by simultaneously recording the spatial information and angular information of light rays, which provides us with a new way to solve this issue. In this paper, we propose a high-quality and challenging urban scene dataset, containing 1074 samples composed of real-world and synthetic light field images as well as pixel-wise annotations for 14 semantic classes. To the best of our knowledge, it is the largest and the most diverse light field dataset for semantic segmentation. We further design two new semantic segmentation baselines tailored for light field and compare them with state-of-the-art RGB, video and RGB-D-based methods using the proposed dataset. The outperforming results of our baselines demonstrate the advantages of the geometric information in light field for this task. We also provide evaluations of super-resolution and depth estimation methods, showing that the proposed dataset presents new challenges and supports detailed comparisons among different methods. We expect this work inspires new research direction and stimulates scientific progress in related fields. The complete dataset is available at https://github.com/HAWKEYE-Group/UrbanLF.

37 citations

Journal ArticleDOI
TL;DR: A deep neural network L3Fnet is proposed for Low-Light Light Field (L3F) restoration, which not only performs visual enhancement of each LF view but also preserves the epipolar geometry across views and can be used for low-light enhancement of single-frame images, despite it being engineered for LF data.
Abstract: Light Field (LF) offers unique advantages such as post-capture refocusing and depth estimation, but low-light conditions limit these capabilities. To restore low-light LFs we should harness the geometric cues present in different LF views, which is not possible using single-frame low-light enhancement techniques. We, therefore, propose a deep neural network for Low-Light Light Field (L3F) restoration, which we refer to as L3Fnet. The proposed L3Fnet not only performs the necessary visual enhancement of each LF view but also preserves the epipolar geometry across views. We achieve this by adopting a two-stage architecture for L3Fnet. Stage-I looks at all the LF views to encode the LF geometry. This encoded information is then used in Stage-II to reconstruct each LF view. To facilitate learning-based techniques for low-light LF imaging, we collected a comprehensive LF dataset of various scenes. For each scene, we captured four LFs, one with near-optimal exposure and ISO settings and the others at different levels of low-light conditions varying from low to extreme low-light settings. The effectiveness of the proposed L3Fnet is supported by both visual and numerical comparisons on this dataset. To further analyze the performance of low-light reconstruction methods, we also propose an L3F-wild dataset that contains LF captured late at night with almost zero lux values. No ground truth is available in this dataset. To perform well on the L3F-wild dataset, any method must adapt to the light level of the captured scene. To do this we propose a novel pre-processing block that makes L3Fnet robust to various degrees of low-light conditions. Lastly, we show that L3Fnet can also be used for low-light enhancement of single-frame images, despite it being engineered for LF data. We do so by converting the single-frame DSLR image into a form suitable to L3Fnet, which we call as pseudo-LF.

26 citations


Cites background from "Fast and Full-Resolution Light Fiel..."

  • ...This includes tasks such as spatial super-resolution [24]–[27], deblurring [28]–[30], denoising [31]–[35], and depth estimation [7]–[9]....

    [...]

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a multitask learning mechanism for remote sensing image motion deblurring, which contains an image restoration subtask and an image texture complexity recognition one, which provides the auxiliary texture complexity information to help the optimization of restoration subbranch.
Abstract: As a fundamental preprocessing technique, remote sensing image motion deblurring is important for visual understanding tasks. Most conventional approaches formulate the image motion deblurring task as a kernel estimation. Because the kernel estimation is a highly ill-posed problem, many priors have been applied to model the images and kernels. Even though these methods have obtained relatively better performances, they are usually time-consuming and not robust for different conditions. To address this problem, we propose a multitask learning mechanism for remote sensing image motion deblurring in this article, which contains an image restoration subtask and an image texture complexity recognition one. First, we consider the image motion deblurring problem as a domain transformation problem, from the blurred domain to a clear one. Specifically, the blurred domain represents the data space consisted of blurring images, and the definition of clear domain is similar. Second, we design a novel weighted attention map loss to enhance the reconstruction capability of the restoration subbranch for difficult local regions. Third, based on the restoration subbranch, a recognition subbranch is incorporated into the framework to guide the deblurring process, which provides the auxiliary texture complexity information to help the optimization of restoration subbranch. Additionally, in order to optimize the proposed network, we construct three large-scale datasets, and each sample in the dataset contains a clear image, a blurred image, and its texture label obtained by corresponding texture complexity. Finally, the experimental results on three constructed datasets demonstrate the robustness and the effectiveness of the proposed method.

7 citations

Posted Content
TL;DR: Zhang et al. as discussed by the authors introduced a large-scale dataset to enable versatile applications for RGB, RGB-D and light field saliency detection, containing 102 classes and 4204 samples.
Abstract: Light field data exhibit favorable characteristics conducive to saliency detection. The success of learning-based light field saliency detection is heavily dependent on how a comprehensive dataset can be constructed for higher generalizability of models, how high dimensional light field data can be effectively exploited, and how a flexible model can be designed to achieve versatility for desktop computers and mobile devices. To answer these questions, first we introduce a large-scale dataset to enable versatile applications for RGB, RGB-D and light field saliency detection, containing 102 classes and 4204 samples. Second, we present an asymmetrical two-stream model consisting of the Focal stream and RGB stream. The Focal stream is designed to achieve higher performance on desktop computers and transfer focusness knowledge to the RGB stream, relying on two tailor-made modules. The RGB stream guarantees the flexibility and memory/computation efficiency on mobile devices through three distillation schemes. Experiments demonstrate that our Focal stream achieves state-of-the-arts performance. The RGB stream achieves Top-2 F-measure on DUTLF-V2, which tremendously minimizes the model size by 83% and boosts FPS by 5 times, compared with the best performing method. Furthermore, our proposed distillation schemes are applicable to RGB saliency models, achieving impressive performance gains while ensuring flexibility.

5 citations

References
More filters
Proceedings ArticleDOI
02 Nov 2016
TL;DR: TensorFlow as mentioned in this paper is a machine learning system that operates at large scale and in heterogeneous environments, using dataflow graphs to represent computation, shared state, and the operations that mutate that state.
Abstract: TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

10,913 citations

Posted Content
TL;DR: This work presents a highly accurate single-image superresolution (SR) method using a very deep convolutional network inspired by VGG-net used for ImageNet classification and uses extremely high learning rates enabled by adjustable gradient clipping.
Abstract: We present a highly accurate single-image super-resolution (SR) method. Our method uses a very deep convolutional network inspired by VGG-net used for ImageNet classification \cite{simonyan2015very}. We find increasing our network depth shows a significant improvement in accuracy. Our final model uses 20 weight layers. By cascading small filters many times in a deep network structure, contextual information over large image regions is exploited in an efficient way. With very deep networks, however, convergence speed becomes a critical issue during training. We propose a simple yet effective training procedure. We learn residuals only and use extremely high learning rates ($10^4$ times higher than SRCNN \cite{dong2015image}) enabled by adjustable gradient clipping. Our proposed method performs better than existing methods in accuracy and visual improvements in our results are easily noticeable.

3,628 citations


"Fast and Full-Resolution Light Fiel..." refers methods in this paper

  • ...We draw on a previous study [9] by predicting and adding the residual image with the input to produce the deblurred result....

    [...]

Posted Content
TL;DR: A small change in the stylization architecture results in a significant qualitative improvement in the generated images, and can be used to train high-performance architectures for real-time image generation.
Abstract: It this paper we revisit the fast stylization method introduced in Ulyanov et. al. (2016). We show how a small change in the stylization architecture results in a significant qualitative improvement in the generated images. The change is limited to swapping batch normalization with instance normalization, and to apply the latter both at training and testing times. The resulting method can be used to train high-performance architectures for real-time image generation. The code will is made available on github at this https URL. Full paper can be found at arXiv:1701.02096.

3,118 citations


"Fast and Full-Resolution Light Fiel..." refers methods in this paper

  • ...The residual blocks follow the traditional ResNet model [7] and simple modification is made by applying instance normalization [26] instead of batch normalization to normalize the features’ contrasts....

    [...]

  • ...This strategy is supported by applying instance normalization [26] and ReLU activation on every 2D convolution layer, except for the last layer 2D Conv4, which utilizes Tanh activation without any normalization....

    [...]

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This work proposes a multi-scale convolutional neural network that restores sharp images in an end-to-end manner where blur is caused by various sources and presents a new large-scale dataset that provides pairs of realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera.
Abstract: Non-uniform blind deblurring for general dynamic scenes is a challenging computer vision problem as blurs arise not only from multiple object motions but also from camera shake, scene depth variation. To remove these complicated motion blurs, conventional energy optimization based methods rely on simple assumptions such that blur kernel is partially uniform or locally linear. Moreover, recent machine learning based methods also depend on synthetic blur datasets generated under these assumptions. This makes conventional deblurring methods fail to remove blurs where blur kernel is difficult to approximate or parameterize (e.g. object motion boundaries). In this work, we propose a multi-scale convolutional neural network that restores sharp images in an end-to-end manner where blur is caused by various sources. Together, we present multi-scale loss function that mimics conventional coarse-to-fine approaches. Furthermore, we propose a new large-scale dataset that provides pairs of realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera. With the proposed model trained on this dataset, we demonstrate empirically that our method achieves the state-of-the-art performance in dynamic scene deblurring not only qualitatively, but also quantitatively.

1,560 citations


"Fast and Full-Resolution Light Fiel..." refers background or methods in this paper

  • ...Network Architecture We use the combination of convolution-deconvolutional and residual styles on the network, which is proven to produce satisfying results on image deblurring [10, 17]....

    [...]

  • ...Recent blur datasets are only available for 2D or 3D image (video) deblurring [10, 17, 24]....

    [...]

  • ...Earlier works have achieved state of the art performance for deblurring 2D images and 3D videos [10, 17, 24]....

    [...]

  • ...These recent works apply image deblurring directly without blur kernel estimation [10, 12, 17, 24]....

    [...]

Proceedings ArticleDOI
29 Jul 2007
TL;DR: A simple modification to a conventional camera is proposed to insert a patterned occluder within the aperture of the camera lens, creating a coded aperture, and introduces a criterion for depth discriminability which is used to design the preferred aperture pattern.
Abstract: A conventional camera captures blurred versions of scene information away from the plane of focus. Camera systems have been proposed that allow for recording all-focus images, or for extracting depth, but to record both simultaneously has required more extensive hardware and reduced spatial resolution. We propose a simple modification to a conventional camera that allows for the simultaneous recovery of both (a) high resolution image information and (b) depth information adequate for semi-automatic extraction of a layered depth representation of the image. Our modification is to insert a patterned occluder within the aperture of the camera lens, creating a coded aperture. We introduce a criterion for depth discriminability which we use to design the preferred aperture pattern. Using a statistical model of images, we can recover both depth information and an all-focus image from single photographs taken with the modified camera. A layered depth map is then extracted, requiring user-drawn strokes to clarify layer assignments in some cases. The resulting sharp image and layered depth map can be combined for various photographic applications, including automatic scene segmentation, post-exposure refocusing, or re-rendering of the scene from an alternate viewpoint.

1,489 citations


"Fast and Full-Resolution Light Fiel..." refers background in this paper

  • ...[15] introduced a sparse derivative prior that concentrates on the derivatives of low intensity pixels....

    [...]