A Unified Deep Learning Approach for Foveated Rendering & Novel View Synthesis from Sparse RGB-D Light Fields

doi:10.1109/IC3D51119.2020.9376340

Citations

PDF

Open Access

More filters

Journal Article•DOI•

An integrative view of foveated rendering

[...]

Bipul Mohanto¹, Abm Tariqul Islam¹, Enrico Gobbetti, Oliver G. Staadt¹•Institutions (1)

University of Rostock¹

21 Oct 2021-Computers & Graphics

TL;DR: Foveated rendering as mentioned in this paper adapts the image synthesis process to the user's gaze by exploiting the human visual system's limitations, in particular in terms of reduced acuity in peripheral vision, it strives to deliver high-quality visual experiences at very reduced computational, storage and transmission costs.

...read moreread less

6 citations

Journal Article•DOI•

2T-UNET: A Two-Tower UNet with Depth Clues for Robust Stereo Depth Estimation

[...]

Rohit Choudhary, Mansi Sharma, Rithvik Anil

27 Oct 2022-arXiv.org

TL;DR: The depth estimation problem is revisits, avoiding the explicit stereo matching step using a simple two-tower convolutional neural network, and the proposed algorithm is entitled 2T-UNet, which surpasses state-of-the-art monocular and stereo depth estimation methods on the challenging Scene dataset.

...read moreread less

Abstract: —Stereo correspondence matching is an essential part of the multi-step stereo depth estimation process. This paper revisits the depth estimation problem, avoiding the explicit stereo matching step using a simple two-tower convolutional neural network. The proposed algorithm is entitled as 2T-UNet. The idea behind 2T-UNet is to replace cost volume construction with twin convolution towers. These towers have an allowance for different weights between them. Additionally, the input for twin encoders in 2T-UNet are different compared to the existing stereo methods. Generally, a stereo network takes a right and left image pair as input to determine the scene geometry. However, in the 2T-UNet model, the right stereo image is taken as one input and the left stereo image along with its monocular depth clue information, is taken as the other input. Depth clues provide complementary suggestions that help enhance the quality of predicted scene geometry. The 2T-UNet surpasses state-of-the-art monocular and stereo depth estimation methods on the challenging Scene ﬂow dataset, both quantitatively and qualitatively. The architecture performs incredibly well on complex natural scenes, highlight- ing its usefulness for various real-time applications. Pretrained weights and code will be made readily available.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Light Field Reconstruction Using Deep Convolutional Network on EPI

[...]

Gaochang Wu¹, Mandan Zhao¹, Liangyong Wang², Qionghai Dai¹, Tianyou Chai², Yebin Liu¹ - Show less +2 more•Institutions (2)

Tsinghua University¹, Northeastern University (China)²

21 Jul 2017

TL;DR: This paper takes advantage of the clear texture structure of the epipolar plane image (EPI) in the light field data and model the problem of light field reconstruction from a sparse set of views as a CNN-based angular detail restoration on EPI.

...read moreread less

Abstract: In this paper, we take advantage of the clear texture structure of the epipolar plane image (EPI) in the light field data and model the problem of light field reconstruction from a sparse set of views as a CNN-based angular detail restoration on EPI. We indicate that one of the main challenges in sparsely sampled light field reconstruction is the information asymmetry between the spatial and angular domain, where the detail portion in the angular domain is damaged by undersampling. To balance the spatial and angular information, the spatial high frequency components of an EPI is removed using EPI blur, before feeding to the network. Finally, a non-blind deblur operation is used to recover the spatial detail suppressed by the EPI blur. We evaluate our approach on several datasets including synthetic scenes, real-world scenes and challenging microscope light field data. We demonstrate the high performance and robustness of the proposed framework compared with the state-of-the-arts algorithms. We also show a further application for depth enhancement by using the reconstructed light field.

...read moreread less

184 citations

Posted Content•

Learning to Synthesize a 4D RGBD Light Field from a Single Image

[...]

Pratul P. Srinivasan¹, Tongzhou Wang², Ashwin Sreelal, Ravi Ramamoorthi³, Ren Ng¹ - Show less +1 more•Institutions (3)

University of California¹, Massachusetts Institute of Technology², University of California, San Diego³

10 Aug 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a machine learning algorithm that takes as input a 2D RGB image and synthesizes a 4D RGBD light field (color and depth of the scene in each ray direction), unique in predicting RGBD for each light field ray and improving unsupervised single image depth estimation by enforcing consistency of ray depths that should intersect the same scene point.

...read moreread less

Abstract: We present a machine learning algorithm that takes as input a 2D RGB image and synthesizes a 4D RGBD light field (color and depth of the scene in each ray direction). For training, we introduce the largest public light field dataset, consisting of over 3300 plenoptic camera light fields of scenes containing flowers and plants. Our synthesis pipeline consists of a convolutional neural network (CNN) that estimates scene geometry, a stage that renders a Lambertian light field using that geometry, and a second CNN that predicts occluded rays and non-Lambertian effects. Our algorithm builds on recent view synthesis methods, but is unique in predicting RGBD for each light field ray and improving unsupervised single image depth estimation by enforcing consistency of ray depths that should intersect the same scene point. Please see our supplementary video at this https URL

...read moreread less

150 citations

Journal Article•DOI•

Near-eye varifocal augmented reality display using see-through screens

[...]

Kaan Akşit¹, Ward Lopes¹, Jonghyun Kim¹, Peter Shirley¹, David Luebke¹ - Show less +1 more•Institutions (1)

Nvidia¹

20 Nov 2017-ACM Transactions on Graphics

TL;DR: A new optical design for see-through near-eye displays is presented that is simple, compact, varifocal, and provides a wide field of view with clear peripheral vision and large eyebox and establishes fundamental trade-offs between the quantitative parameters of resolution, field-of- view, and the form-factor of the design.

...read moreread less

Abstract: We present a new optical design for see-through near-eye displays that is simple, compact, varifocal, and provides a wide field of view with clear peripheral vision and large eyebox. Key to this effort is a novel see-through rear-projection screen. We project an image to the see-through screen using an off-axis path, which is then relayed to the user's eyes through an on-axis partially-reflective magnifying surface. Converting the off-axis path to a compact on-axis imaging path simplifies the optical design. We establish fundamental trade-offs between the quantitative parameters of resolution, field of view, and the form-factor of our design. We demonstrate a wearable binocular near-eye display using off-the-shelf projection displays, custom-designed see-through spherical concave mirrors, and see-through screen designs using either custom holographic optical elements or polarization-selective diffusers.

...read moreread less

112 citations

Journal Article•DOI•

DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos

[...]

Anton S. Kaplanyan¹, Anton Sochenov¹, Thomas Leimkühler¹, Mikhail I. Okunev¹, Todd Goodall¹, Gizem Rufo¹ - Show less +2 more•Institutions (1)

Facebook¹

08 Nov 2019-ACM Transactions on Graphics

TL;DR: This work explores a novel foveated reconstruction method that employs the recent advances in generative adversarial neural networks to reconstruct a plausible peripheral video from a small fraction of pixels provided every frame.

...read moreread less

Abstract: In order to provide an immersive visual experience, modern displays require head mounting, high image resolution, low latency, as well as high refresh rate. This poses a challenging computational problem. On the other hand, the human visual system can consume only a tiny fraction of this video stream due to the drastic acuity loss in the peripheral vision. Foveated rendering and compression can save computations by reducing the image quality in the peripheral vision. However, this can cause noticeable artifacts in the periphery, or, if done conservatively, would provide only modest savings. In this work, we explore a novel foveated reconstruction method that employs the recent advances in generative adversarial neural networks. We reconstruct a plausible peripheral video from a small fraction of pixels provided every frame. The reconstruction is done by finding the closest matching video to this sparse input stream of pixels on the learned manifold of natural videos. Our method is more efficient than the state-of-the-art foveated rendering, while providing the visual experience with no noticeable quality degradation. We conducted a user study to validate our reconstruction method and compare it against existing foveated rendering and video compression techniques. Our method is fast enough to drive gaze-contingent head-mounted displays in real time on modern hardware. We plan to publish the trained network to establish a new quality bar for foveated rendering and compression as well as encourage follow-up research.

...read moreread less

104 citations

Journal Article•DOI•

Towards multifocal displays with dense focal stacks

[...]

Jen-Hao Rick Chang¹, B. V. K. Vijaya Kumar¹, Aswin C. Sankaranarayanan¹•Institutions (1)

Carnegie Mellon University¹

04 Dec 2018-ACM Transactions on Graphics

TL;DR: In this article, a focus-tunable lens is used to sweep a range of focal lengths at a high frequency and, subsequently, tracking the focal length precisely at microsecond time resolutions using an optical module.

...read moreread less

Abstract: We present a virtual reality display that is capable of generating a dense collection of depth/focal planes. This is achieved by driving a focus-tunable lens to sweep a range of focal lengths at a high frequency and, subsequently, tracking the focal length precisely at microsecond time resolutions using an optical module. Precise tracking of the focal length, coupled with a high-speed display, enables our lab prototype to generate 1600 focal planes per second. This enables a novel first-of-its-kind virtual reality multifocal display that is capable of resolving the vergence-accommodation conflict endemic to today's displays.

...read moreread less

62 citations

A Unified Deep Learning Approach for Foveated Rendering & Novel View Synthesis from Sparse RGB-D Light Fields

Citations

References

Related Papers (5)