scispace - formally typeset
Proceedings ArticleDOI

A Unified Deep Learning Approach for Foveated Rendering & Novel View Synthesis from Sparse RGB-D Light Fields

15 Dec 2020-pp 1-8

...read more


Citations
More filters
Journal ArticleDOI

[...]

TL;DR: Foveated rendering as mentioned in this paper adapts the image synthesis process to the user's gaze by exploiting the human visual system's limitations, in particular in terms of reduced acuity in peripheral vision, it strives to deliver high-quality visual experiences at very reduced computational, storage and transmission costs.
Abstract: Foveated rendering adapts the image synthesis process to the user’s gaze. By exploiting the human visual system’s limitations, in particular in terms of reduced acuity in peripheral vision, it strives to deliver high-quality visual experiences at very reduced computational, storage, and transmission costs. Despite the very substantial progress made in the past decades, the solution landscape is still fragmented, and several research problems remain open. In this work, we present an up-to-date integrative view of the domain from the point of view of the rendering methods employed, discussing general characteristics, commonalities, differences, advantages, and limitations. We cover, in particular, techniques based on adaptive resolution, geometric simplification, shading simplification, chromatic degradation, as well spatio-temporal deterioration. Next, we review the main areas where foveated rendering is already in use today. We finally point out relevant research issues and analyze research trends.

References
More filters
Proceedings ArticleDOI

[...]

27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

93,356 citations

Proceedings Article

[...]

01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

78,539 citations

Journal ArticleDOI

[...]

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

30,333 citations

Posted Content

[...]

Sergey Ioffe1, Christian Szegedy1
TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.

17,151 citations

Book

[...]

01 Jan 2006
TL;DR: In this paper, the authors present a unified solution to focus problems by instead recording the light field inside the camera: not just the position but also the direction of light rays striking the image plane.
Abstract: Focusing images well has been difficult since the beginnings of photography in 1839. Three manifestations of the problem are: the chore of having to choose what to focus on before clicking the shutter, the awkward coupling between aperture size and depth of field, and the high optical complexity of lenses required to produce aberration-free images. These problems arise because conventional cameras record only the sum of all light rays striking each pixel on the image plane. This dissertation presents a unified solution to these focus problems by instead recording the light field inside the camera: not just the position but also the direction of light rays striking the image plane. I describe the design, prototyping and performance of a digital camera that records this light field in a single photographic exposure. The basic idea is to use an array of microlenses in front of the photosensor in a regular digital camera. The main price behind this new kind of photography is the sacrifice of some image resolution to collect directional ray information. However, it is possible to smoothly vary the optical configuration from the light field camera back to a conventional camera by reducing the separation between the microlenses and photosensor. This allows a selectable trade-off between image resolution and refocusing power. More importantly, current semiconductor technology is already capable of producing sensors with an order of magnitude more resolution than we need in final images. The extra ray directional information enables unprecedented capabilities after exposure. For example, it is possible to compute final photographs that are refocused at different depths, or that have extended depth of field, by re-sorting the recorded light rays appropriately. Theory predicts, and experiments corroborate, that blur due to incorrect focus can be reduced by a factor approximately equal to the directional resolution of the recorded light rays. Similarly, digital correction of lens aberrations re-sorts aberrant light rays to where they should ideally have converged, improving image contrast and resolution. Future cameras based on these principles will be physically simpler, capture light more quickly, and provide greater flexibility in finishing photographs.

520 citations