scispace - formally typeset

Author

Himanshu Kumar

Bio: Himanshu Kumar is an academic researcher from Indian Institute of Technology, Jodhpur. The author has contributed to research in topic(s): Depth map & Point spread function. The author has an hindex of 5, co-authored 19 publication(s) receiving 64 citation(s). Previous affiliations of Himanshu Kumar include Indian Institute of Technology Kanpur.
Papers
More filters

Proceedings ArticleDOI
01 Dec 2015
TL;DR: A model that combines two monocular depth cues namely Texture and Defocus is presented, which mainly focuses on modifying the erroneous regions in defocus map by using the texture energy present at that region.
Abstract: As imaging is a process of 2D projection of a 3D scene, the depth information is lost at the time of image capture from conventional camera. This depth information can be inferred back from a set of visual cues present in the image. In this work, we present a model that combines two monocular depth cues namely Texture and Defocus. Depth is related to the spatial extent of the defocus blur by assuming that more an object is blurred, the farther it is from the camera. At first, we estimate the amount of defocus blur present at edge pixels of an image. This is referred as the Sparse Defocus map. Using the sparse defocus map we generate the full defocus map. However such defocus maps always contain hole regions and ambiguity in depth. To handle this problem an additional depth cue, in our case texture has been integrated to generate better defocus map. This integration mainly focuses on modifying the erroneous regions in defocus map by using the texture energy present at that region. The sparse defocus map is corrected using texture based rules. The hole regions, where there are no significant edges and texture are detected and corrected in sparse defocus map. We have used region wise propagation for better defocus map generation. The accuracy of full defocus map is increased with the region wise propagation.

12 citations


Proceedings ArticleDOI
19 Jun 2021
Abstract: Camera scene detection is among the most popular computer vision problem on smartphones. While many custom solutions were developed for this task by phone vendors, none of the designed models were available publicly up until now. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions that can demonstrate a real-time performance on smartphones and IoT platforms. For this, the participants were provided with a large-scale CamSDD dataset consisting of more than 11K images belonging to the 30 most important scene categories. The runtime of all models was evaluated on the popular Apple Bionic A11 platform that can be found in many iOS devices. The proposed solutions are fully compatible with all major mobile AI accelerators and can demonstrate more than 100-200 FPS on the majority of recent smartphone platforms while achieving a top-3 accuracy of more than 98%. A detailed description of all models developed in the challenge is provided in this paper.

12 citations


Proceedings ArticleDOI
01 Dec 2015
TL;DR: This paper presents a method to resolve focal plane ambiguity in depth map creation from defocus blur with the help of Chromatic Aberration (CA).
Abstract: Focal plane ambiguity in depth map creation from defocus blur has remained an challenging problem. In this paper, we present a method to resolve this issue with the help of Chromatic Aberration(CA). CA is a distortion referred to focal length variation of the lens with wavelength of light. When light, a mixture of various monochromatic components, passes through a lens, multiple focal planes are generated due to CA. There also exists an inherent ordering in the defocus blur of the object for different RGB components depending on whether the object lies in the near or far focus region. The ordering reverses as soon as focal planes are crossed. By using this difference in ordering of the defocus amount for different RGB components of an object, we can deduce whether the object is present in front of or behind the image plane, hence using this information, a more reliable depth map can be obtained.

7 citations


Proceedings ArticleDOI
01 Dec 2017
TL;DR: This work proposes a novel spectrum based image compression technique, which first blur the image with Point Spread Function determined using spectrum of given image, and then performs deconvolution using the known blur PSF to recover the image.
Abstract: Compression is an important aspect of image processing with respect to transmission and storage. It is an established field of research with very little scope for improvement through customary coding based compression techniques. Consequently, non-customary ways of compression have become an important area for future research. Any information that can be restored can be compressed. Using this principle, we propose a novel spectrum based image compression technique. We first blur the image with Point Spread Function (PSF) determined using spectrum of given image. Blurring increases the DC component in the image which in turns gets further compressed by DCT based JPEG compression compared to original image. To recover the image, we perform deconvolution using the known blur PSF.

5 citations


Journal ArticleDOI
TL;DR: This paper presents a novel framework to generate a more accurate depth map for video using defocus and motion cues and corrects the errors in other parts of depth map caused by inaccurate estimation of defocus blur and motion.
Abstract: Significant recent developments in 3D display technology have focused on techniques for converting 2D media into 3D. Depth map is an integral part of 2D-to-3D conversion. Combining multiple depth cues results in a more accurate depth map as it compensates for the errors caused by one depth cue as well as its absence by other depth cues. In this paper, we present a novel framework to generate a more accurate depth map for video using defocus and motion cues. The moving objects present in the scene are the source of errors in both defocus and motion-based depth map estimation. The proposed method rectifies these errors in the depth map by integrating defocus blur and motion cues. In addition, it also corrects the errors in other parts of depth map caused by inaccurate estimation of defocus blur and motion. Since the proposed integration approach relies on the characteristics of point spread functions of defocus and motion blur along with their relations to camera parameters, it is more accurate and reliable.

5 citations


Cited by
More filters

Proceedings ArticleDOI
Majed El Helou1, Ruofan Zhou1, Sabine Süsstrunk1, Radu Timofte2  +40 moreInstitutions (10)
01 May 2021
Abstract: Image relighting is attracting increasing interest due to its various applications. From a research perspective, im-age relighting can be exploited to conduct both image normalization for domain adaptation, and also for data augmentation. It also has multiple direct uses for photo montage and aesthetic enhancement. In this paper, we review the NTIRE 2021 depth guided image relighting challenge.We rely on the VIDIT dataset for each of our two challenge tracks, including depth information. The first track is on one-to-one relighting where the goal is to transform the illumination setup of an input image (color temperature and light source position) to the target illumination setup. In the second track, the any-to-any relighting challenge, the objective is to transform the illumination settings of the in-put image to match those of another guide image, similar to style transfer. In both tracks, participants were given depth information about the captured scenes. We had nearly 250 registered participants, leading to 18 confirmed team sub-missions in the final competition stage. The competitions, methods, and final results are presented in this paper.

22 citations


Proceedings ArticleDOI
01 Jan 2021
Abstract: Image super-resolution is one of the most popular computer vision problems with many important applications to mobile devices. While many solutions have been proposed for this task, they are usually not optimized even for common smartphone AI hardware, not to mention more constrained smart TV platforms that are often supporting INT8 inference only. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image super-resolution solutions that can demonstrate a real-time performance on mobile or edge NPUs. For this, the participants were provided with the DIV2K dataset and trained quantized models to do an efficient 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated NPU capable of accelerating quantized neural networks. The proposed solutions are fully compatible with all major mobile AI accelerators and are capable of reconstructing Full HD images under 40-60 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.

18 citations


Posted Content
Abstract: Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based depth estimation solutions that can demonstrate a nearly real-time performance on smartphones and IoT platforms. For this, the participants were provided with a new large-scale dataset containing RGB-depth image pairs obtained with a dedicated stereo ZED camera producing high-resolution depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the popular Raspberry Pi 4 platform with a mobile ARM-based Broadcom chipset. The proposed solutions can generate VGA resolution depth maps at up to 10 FPS on the Raspberry Pi 4 while achieving high fidelity results, and are compatible with any Android or Linux-based mobile devices. A detailed description of all models developed in the challenge is provided in this paper.

16 citations


Proceedings ArticleDOI
01 Jan 2021
Abstract: As the quality of mobile cameras starts to play a crucial role in modern smartphones, more and more attention is now being paid to ISP algorithms used to improve various perceptual aspects of mobile photos. In this Mobile AI challenge, the target was to develop an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly real-time performance on smartphone NPUs. For this, the participants were provided with a novel learned ISP dataset consisting of RAW-RGB image pairs captured with the Sony IMX586 Quad Bayer mobile sensor and a professional 102-megapixel medium format camera. The runtime of all models was evaluated on the MediaTek Dimensity 1000+ platform with a dedicated AI processing unit capable of accelerating both floating-point and quantized neural networks. The proposed solutions are fully compatible with the above NPU and are capable of processing Full HD photos under 60-100 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.

14 citations


Proceedings ArticleDOI
Andrey Ignatov1, Andrés Romero1, Heewon Kim2, Radu Timofte1  +27 moreInstitutions (5)
17 May 2021
Abstract: Video super-resolution has recently become one of the most important mobile-related problems due to the rise of video communication and streaming services. While many solutions have been proposed for this task, the majority of them are too computationally expensive to run on portable devices with limited hardware resources. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based video super-resolution solutions that can achieve a real-time performance on mobile GPUs. The participants were provided with the REDS dataset and trained their models to do an efficient 4X video upscaling. The runtime of all models was evaluated on the OPPO Find X2 smartphone with the Snapdragon 865 SoC capable of accelerating floating-point networks on its Adreno GPU. The proposed solutions are fully compatible with any mobile GPU and can upscale videos to HD resolution at up to 80 FPS while demonstrating high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.

14 citations


Network Information
Related Authors (2)
Sumana Gupta

74 papers, 338 citations

94% related
K. S. Venkatesh

117 papers, 511 citations

82% related
Performance
Metrics

Author's H-index: 5

No. of papers from the Author in previous years
YearPapers
20214
20204
20194
20181
20173
20153