scispace - formally typeset
Search or ask a question
Author

Kadvekar Rohit Tushar

Bio: Kadvekar Rohit Tushar is an academic researcher from Indian Institute of Technology Madras. The author has contributed to research in topics: Deep learning & Depth map. The author has an hindex of 1, co-authored 2 publications receiving 1 citations.

Papers
More filters
Proceedings ArticleDOI
15 Dec 2020
TL;DR: Wang et al. as discussed by the authors proposed a novel Bilateral grid based 3D convolutional neural network, dubbed as 3DBG-UNet, that parameterize high dimensional feature space by encoding compact 3D bilateral grids with UNets and infers sharp geometric layout of the scene.
Abstract: The task of predicting smooth and edge-consistent depth maps is notoriously difficult for single image depth estimation. This paper proposes a novel Bilateral Grid based 3D convolutional neural network, dubbed as 3DBG-UNet, that parameterize high dimensional feature space by encoding compact 3D bilateral grids with UNets and infers sharp geometric layout of the scene. Further, an another novel 3DBGES-UNet model is introduced that integrate 3DBG-UNet for inferring an accurate depth map given a single color view. The 3DBGES-UNet concatenate 3DBG-UNet geometry map with the inception network edge accentuation map and a spatial object's boundary map obtained by leveraging semantic segmentation and train the UNet model with ResNet backbone. Both models are designed with a particular attention to explicitly account for edges or minute details. Preserving sharp discontinuities at depth edges is critical for many applications such as realistic integration of virtual objects in AR video or occlusion-aware view synthesis for 3D display applications. The proposed depth prediction network achieves state-of-the-art performance in both qualitative and quantitative evaluations on the challenging NYUv2-Depth data. The code and corresponding pre-trained weights will be made publicly available.

6 citations

Posted Content
TL;DR: Wang et al. as mentioned in this paper proposed a novel Bilateral grid based 3D convolutional neural network that parameterizes high dimensional feature space by encoding compact 3D bilateral grids with UNets and infers sharp geometric layout of the scene.
Abstract: The task of predicting smooth and edge-consistent depth maps is notoriously difficult for single image depth estimation. This paper proposes a novel Bilateral Grid based 3D convolutional neural network, dubbed as 3DBG-UNet, that parameterizes high dimensional feature space by encoding compact 3D bilateral grids with UNets and infers sharp geometric layout of the scene. Further, another novel 3DBGES-UNet model is introduced that integrate 3DBG-UNet for inferring an accurate depth map given a single color view. The 3DBGES-UNet concatenates 3DBG-UNet geometry map with the inception network edge accentuation map and a spatial object's boundary map obtained by leveraging semantic segmentation and train the UNet model with ResNet backbone. Both models are designed with a particular attention to explicitly account for edges or minute details. Preserving sharp discontinuities at depth edges is critical for many applications such as realistic integration of virtual objects in AR video or occlusion-aware view synthesis for 3D display applications.The proposed depth prediction network achieves state-of-the-art performance in both qualitative and quantitative evaluations on the challenging NYUv2-Depth data. The code and corresponding pre-trained weights will be made publicly available.

Cited by
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors develop a deep learning framework DL-ROM (deep learning-reduced order modeling) to create a neural network capable of non-linear projections to reduced order states.
Abstract: Reduced order modeling (ROM) has been widely used to create lower order, computationally inexpensive representations of higher-order dynamical systems. Using these representations, ROMs can efficiently model flow fields while using significantly lesser parameters. Conventional ROMs accomplish this by linearly projecting higher-order manifolds to lower-dimensional space using dimensionality reduction techniques such as proper orthogonal decomposition (POD). In this work, we develop a novel deep learning framework DL-ROM (deep learning—reduced order modeling) to create a neural network capable of non-linear projections to reduced order states. We then use the learned reduced state to efficiently predict future time steps of the simulation using 3D Autoencoder and 3D U-Net-based architectures. Our model DL-ROM can create highly accurate reconstructions from the learned ROM and is thus able to efficiently predict future time steps by temporally traversing in the learned reduced state. All of this is achieved without ground truth supervision or needing to iteratively solve the expensive Navier–Stokes (NS) equations thereby resulting in massive computational savings. To test the effectiveness and performance of our approach, we evaluate our implementation on five different computational fluid dynamics (CFD) datasets using reconstruction performance and computational runtime metrics. DL-ROM can reduce the computational run times of iterative solvers by nearly two orders of magnitude while maintaining an acceptable error threshold.

39 citations

Journal ArticleDOI
TL;DR: In this article , a dual-encoder single-decoder CNN with different weights for feature fusion is proposed for depth estimation of multi-exposure stereo image sequences in 3D HDR video content.
Abstract: Display technologies have evolved over the years. It is critical to develop practical HDR capturing, processing, and display solutions to bring 3D technologies to the next level. Depth estimation of multi-exposure stereo image sequences is an essential task in the development of cost-effective 3D HDR video content. In this paper, we develop a novel deep architecture for multi-exposure stereo depth estimation. The proposed architecture has two novel components. First, the stereo matching technique used in traditional stereo depth estimation is revamped. For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed. The proposed formulation circumvents the cost volume construction requirement, which is replaced by a ResNet based dual-encoder single-decoder CNN with different weights for feature fusion. EfficientNet based blocks are used to learn the disparity. Secondly, we combine disparity maps obtained from the stereo images at different exposure levels using a robust disparity feature fusion approach. The disparity maps obtained at different exposures are merged using weight maps calculated for different quality measures. The final predicted disparity map obtained is more robust and retains best features that preserve the depth discontinuities. The proposed CNN offers flexibility to train using standard dynamic range stereo data or with multi-exposure low dynamic range stereo sequences. In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods, both quantitatively and qualitatively, on challenging Scene flow and differently exposed Middlebury stereo datasets. The architecture performs exceedingly well on complex natural scenes, demonstrating its usefulness for diverse 3D HDR applications.
Journal ArticleDOI
TL;DR: The depth estimation problem is revisits, avoiding the explicit stereo matching step using a simple two-tower convolutional neural network, and the proposed algorithm is entitled 2T-UNet, which surpasses state-of-the-art monocular and stereo depth estimation methods on the challenging Scene dataset.
Abstract: —Stereo correspondence matching is an essential part of the multi-step stereo depth estimation process. This paper revisits the depth estimation problem, avoiding the explicit stereo matching step using a simple two-tower convolutional neural network. The proposed algorithm is entitled as 2T-UNet. The idea behind 2T-UNet is to replace cost volume construction with twin convolution towers. These towers have an allowance for different weights between them. Additionally, the input for twin encoders in 2T-UNet are different compared to the existing stereo methods. Generally, a stereo network takes a right and left image pair as input to determine the scene geometry. However, in the 2T-UNet model, the right stereo image is taken as one input and the left stereo image along with its monocular depth clue information, is taken as the other input. Depth clues provide complementary suggestions that help enhance the quality of predicted scene geometry. The 2T-UNet surpasses state-of-the-art monocular and stereo depth estimation methods on the challenging Scene flow dataset, both quantitatively and qualitatively. The architecture performs incredibly well on complex natural scenes, highlight- ing its usefulness for various real-time applications. Pretrained weights and code will be made readily available.
Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed an unsupervised monocular image depth prediction algorithm based on Fourier domain analysis to take advantage of the complementary properties of small-scale and large-scale images.
Abstract: Aiming at the problems of high cost and low accuracy of scene details during the depth map generation in 3D reconstruction, we propose an unsupervised monocular image depth prediction algorithm based on Fourier domain analysis. Generally speaking, small-scale images can better display depth details, while large-scale images can more reliably display the depth distribution value of the entire image. In order to take advantage of these complementary properties, we crop the input image with different cropped image ratios to generate multiple disparity map candidates, and then use Fourier frequency domain analysis algorithms to fuse disparity mapping candidates into left and right disparity maps. At the same time, we propose a loss function based on MSSIM to compensate the difference between left and right views and realize unsupervised monocular image depth prediction model training. Experimental results show that our method has good performance on the KITTI dataset.
Journal ArticleDOI
01 Mar 2023-Sensors
TL;DR: NDWTN as mentioned in this paper proposes a moderately dense encoder-decoder network based on discrete wavelet decomposition and trainable coefficients (LL, LH, HL, HH), which preserves the highfrequency information that is otherwise lost during the downsampling process in the encoder.
Abstract: Applications such as medical diagnosis, navigation, robotics, etc., require 3D images. Recently, deep learning networks have been extensively applied to estimate depth. Depth prediction from 2D images poses a problem that is both ill–posed and non–linear. Such networks are computationally and time–wise expensive as they have dense configurations. Further, the network performance depends on the trained model configuration, the loss functions used, and the dataset applied for training. We propose a moderately dense encoder–decoder network based on discrete wavelet decomposition and trainable coefficients (LL, LH, HL, HH). Our Nested Wavelet–Net (NDWTN) preserves the high–frequency information that is otherwise lost during the downsampling process in the encoder. Furthermore, we study the effect of activation functions, batch normalization, convolution layers, skip, etc., in our models. The network is trained with NYU datasets. Our network trains faster with good results.