scispace - formally typeset
Search or ask a question
Author

Himanshu Kumar

Bio: Himanshu Kumar is an academic researcher from Indian Institute of Technology, Jodhpur. The author has contributed to research in topics: Depth map & Point spread function. The author has an hindex of 5, co-authored 19 publications receiving 64 citations. Previous affiliations of Himanshu Kumar include Indian Institute of Technology Kanpur.

Papers
More filters
Proceedings ArticleDOI
19 Jun 2021
TL;DR: In this article, the first Mobile AI challenge was introduced to develop quantized deep learning-based camera scene classification solutions that can demonstrate a real-time performance on smartphones and IoT platforms.
Abstract: Camera scene detection is among the most popular computer vision problem on smartphones. While many custom solutions were developed for this task by phone vendors, none of the designed models were available publicly up until now. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions that can demonstrate a real-time performance on smartphones and IoT platforms. For this, the participants were provided with a large-scale CamSDD dataset consisting of more than 11K images belonging to the 30 most important scene categories. The runtime of all models was evaluated on the popular Apple Bionic A11 platform that can be found in many iOS devices. The proposed solutions are fully compatible with all major mobile AI accelerators and can demonstrate more than 100-200 FPS on the majority of recent smartphone platforms while achieving a top-3 accuracy of more than 98%. A detailed description of all models developed in the challenge is provided in this paper.

26 citations

Proceedings ArticleDOI
01 Dec 2015
TL;DR: A model that combines two monocular depth cues namely Texture and Defocus is presented, which mainly focuses on modifying the erroneous regions in defocus map by using the texture energy present at that region.
Abstract: As imaging is a process of 2D projection of a 3D scene, the depth information is lost at the time of image capture from conventional camera. This depth information can be inferred back from a set of visual cues present in the image. In this work, we present a model that combines two monocular depth cues namely Texture and Defocus. Depth is related to the spatial extent of the defocus blur by assuming that more an object is blurred, the farther it is from the camera. At first, we estimate the amount of defocus blur present at edge pixels of an image. This is referred as the Sparse Defocus map. Using the sparse defocus map we generate the full defocus map. However such defocus maps always contain hole regions and ambiguity in depth. To handle this problem an additional depth cue, in our case texture has been integrated to generate better defocus map. This integration mainly focuses on modifying the erroneous regions in defocus map by using the texture energy present at that region. The sparse defocus map is corrected using texture based rules. The hole regions, where there are no significant edges and texture are detected and corrected in sparse defocus map. We have used region wise propagation for better defocus map generation. The accuracy of full defocus map is increased with the region wise propagation.

15 citations

Journal ArticleDOI
TL;DR: This paper presents a novel framework to generate a more accurate depth map for video using defocus and motion cues and corrects the errors in other parts of depth map caused by inaccurate estimation of defocus blur and motion.
Abstract: Significant recent developments in 3D display technology have focused on techniques for converting 2D media into 3D. Depth map is an integral part of 2D-to-3D conversion. Combining multiple depth cues results in a more accurate depth map as it compensates for the errors caused by one depth cue as well as its absence by other depth cues. In this paper, we present a novel framework to generate a more accurate depth map for video using defocus and motion cues. The moving objects present in the scene are the source of errors in both defocus and motion-based depth map estimation. The proposed method rectifies these errors in the depth map by integrating defocus blur and motion cues. In addition, it also corrects the errors in other parts of depth map caused by inaccurate estimation of defocus blur and motion. Since the proposed integration approach relies on the characteristics of point spread functions of defocus and motion blur along with their relations to camera parameters, it is more accurate and reliable.

14 citations

Journal ArticleDOI
TL;DR: A novel method to estimate the concurrent defocus and motion blurs in a single image is proposed, which works well for real images as well as for compressed images.
Abstract: The occurrence of motion blur along with defocus blur is a common phenomena in natural images. Usually, these blurs are spatially varying in nature for any general image and estimation of one type of blur is affected by presence of other. In this paper, we propose a novel method to estimate the concurrent defocus and motion blurs in a single image. Unlike the recent methods, which perform well only on simulated conditions or in presence of single type of blur, proposed method works well for real images as well as for compressed images. In this paper, we consider only commonly associated motion and defocus blurs for analysis. Decoupling of motion and defocus blur provides a fundamental tool that can be used for various analysis and applications.

11 citations

Proceedings ArticleDOI
01 Aug 2017
TL;DR: The proposed method uses color uniformity principle to detect hole regions present in depth map and provides a framework to identify falsely detected holes in order to increase effectiveness of the method.
Abstract: Depth map estimation forms an integral part of many applications such as 2D-to-3D creation. There exists various methods in literature for depth map estimation using different cues and structure. Usually, depth information is decoded from these cues at the edges and matting is applied to spread it over neighboring regions. Defocus is one such cue due to its natural existence and does not require any precondition compared to other cues. However, there can exist regions in images with no edges. These regions are referred to hole regions and are the main source of error in estimated depth map. In this paper, we propose a method to correct some of these errors to obtain an accurate depth map. The proposed method uses color uniformity principle to detect hole regions present in depth map. We also provide a framework to identify falsely detected holes in order to increase effectiveness of our method.

7 citations


Cited by
More filters
Proceedings ArticleDOI
01 Jan 2021
TL;DR: In this paper, the authors introduced the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image super-resolution solutions that can demonstrate a realtime performance on mobile or edge NPUs.
Abstract: Image super-resolution is one of the most popular computer vision problems with many important applications to mobile devices. While many solutions have been proposed for this task, they are usually not optimized even for common smartphone AI hardware, not to mention more constrained smart TV platforms that are often supporting INT8 inference only. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image super-resolution solutions that can demonstrate a real-time performance on mobile or edge NPUs. For this, the participants were provided with the DIV2K dataset and trained quantized models to do an efficient 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated NPU capable of accelerating quantized neural networks. The proposed solutions are fully compatible with all major mobile AI accelerators and are capable of reconstructing Full HD images under 40-60 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.

74 citations

Proceedings ArticleDOI
17 May 2021
TL;DR: In this paper, the first Mobile AI challenge was introduced, where the target is to develop an end-to-end deep learning-based video super-resolution solutions that can achieve a real-time performance on mobile GPUs.
Abstract: Video super-resolution has recently become one of the most important mobile-related problems due to the rise of video communication and streaming services. While many solutions have been proposed for this task, the majority of them are too computationally expensive to run on portable devices with limited hardware resources. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based video super-resolution solutions that can achieve a real-time performance on mobile GPUs. The participants were provided with the REDS dataset and trained their models to do an efficient 4X video upscaling. The runtime of all models was evaluated on the OPPO Find X2 smartphone with the Snapdragon 865 SoC capable of accelerating floating-point networks on its Adreno GPU. The proposed solutions are fully compatible with any mobile GPU and can upscale videos to HD resolution at up to 80 FPS while demonstrating high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.

54 citations

Proceedings ArticleDOI
01 Jan 2021
TL;DR: In this article, an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly real-time performance on smartphone NPUs was developed.
Abstract: As the quality of mobile cameras starts to play a crucial role in modern smartphones, more and more attention is now being paid to ISP algorithms used to improve various perceptual aspects of mobile photos. In this Mobile AI challenge, the target was to develop an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly real-time performance on smartphone NPUs. For this, the participants were provided with a novel learned ISP dataset consisting of RAW-RGB image pairs captured with the Sony IMX586 Quad Bayer mobile sensor and a professional 102-megapixel medium format camera. The runtime of all models was evaluated on the MediaTek Dimensity 1000+ platform with a dedicated AI processing unit capable of accelerating both floating-point and quantized neural networks. The proposed solutions are fully compatible with the above NPU and are capable of processing Full HD photos under 60-100 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.

45 citations

Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this article, the authors introduced the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs.
Abstract: Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs. For this, the participants were provided with a novel large-scale dataset consisting of noisy-clean image pairs captured in the wild. The runtime of all models was evaluated on the Samsung Exynos 2100 chipset with a powerful Mali GPU capable of accelerating floating-point and quantized neural networks. The proposed solutions are fully compatible with any mobile GPU and are capable of processing 480p resolution images under 40-80 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.

34 citations

Proceedings ArticleDOI
01 May 2021
TL;DR: The NTIRE 2021 depth guided image relighting challenge as mentioned in this paper focused on one-to-one relighting where the goal is to transform the illumination setup of an input image (color temperature and light source position) to the target illumination setup.
Abstract: Image relighting is attracting increasing interest due to its various applications. From a research perspective, im-age relighting can be exploited to conduct both image normalization for domain adaptation, and also for data augmentation. It also has multiple direct uses for photo montage and aesthetic enhancement. In this paper, we review the NTIRE 2021 depth guided image relighting challenge.We rely on the VIDIT dataset for each of our two challenge tracks, including depth information. The first track is on one-to-one relighting where the goal is to transform the illumination setup of an input image (color temperature and light source position) to the target illumination setup. In the second track, the any-to-any relighting challenge, the objective is to transform the illumination settings of the in-put image to match those of another guide image, similar to style transfer. In both tracks, participants were given depth information about the captured scenes. We had nearly 250 registered participants, leading to 18 confirmed team sub-missions in the final competition stage. The competitions, methods, and final results are presented in this paper.

33 citations