scispace - formally typeset
Search or ask a question
Author

Shaojie Zhuo

Bio: Shaojie Zhuo is an academic researcher from Qualcomm. The author has contributed to research in topics: Image restoration & Internal medicine. The author has an hindex of 18, co-authored 23 publications receiving 1462 citations. Previous affiliations of Shaojie Zhuo include National University of Singapore.

Papers
More filters
Proceedings ArticleDOI
24 Apr 2015
TL;DR: A novel image formation model for RGB-IR cameras that can be easily calibrated, and an efficient algorithm that jointly addresses three restoration problems - channel deblurring, channel separation and pixel demosaicing - using quadratic image regularizers are proposed.
Abstract: A convenient solution to RGB-Infrared photography is to extend the basic RGB mosaic with a fourth filter type with high transmittance in the near-infrared band. Unfortunately, applying conventional demosaicing algorithms to RGB-IR sensors is not possible for two reasons. First, the RGB and near-infrared image are differently focused due to different refractive indices of each band. Second, manufacturing constraints introduce crosstalk between RGB and IR channels. In this paper we propose a novel image formation model for RGB-IR cameras that can be easily calibrated, and propose an efficient algorithm that jointly addresses three restoration problems--channel deblurring, channel separation and pixel demosaicing--using quadratic image regularizers. We also extend our algorithm to handle more general regularizers and pixel saturation. Experiments show that our method produces sharp, full-resolution images of pure RGB color and IR.

35 citations

Patent
Shaojie Zhuo1, Xiaopeng Zhang1, Chen Feng1, Liang Shen1, Jiaya Jia1 
07 Apr 2014
TL;DR: In this paper, a system and methods for multispectral imaging using a gradient scale map generated from gradient vectors in the visible light image and a NIR image are disclosed.
Abstract: Systems and methods for multispectral imaging are disclosed. The multispectral imaging system can include a near infrared (NIR) imaging sensor and a visible imaging sensor. The disclosed systems and methods can be implemented to de-noise a visible light image using a gradient scale map generated from gradient vectors in the visible light image and a NIR image. The gradient scale map may be used to determine the amount of de-noising guidance applied from the NIR image to the visible light image on a pixel-by-pixel basis.

28 citations

Patent
19 Feb 2014
TL;DR: In this article, a system and methods for detecting and attenuating shadows in a visible light image are disclosed. In various embodiments, shadows on human skin may be detected and attenuated using multi-spectral imaging techniques.
Abstract: Systems and methods for detecting and attenuating shadows in a visible light image are disclosed. In various embodiments, shadows on human skin may be detected and attenuated using multi-spectral imaging techniques. Multispectral image data that includes a living subject can be processed to detect live-subject portions of the multispectral image data. Shadows in the detected live-subject portions of the multispectral image data can be identified. The identified shadows in at least part of the multispectral image data can be attenuated.

22 citations

Journal ArticleDOI
TL;DR: This paper proposes a patch-based method to remove the out-of-focus blur of a video and build an all-in-focus video, and employs the idea of a bilateral filter to temporally smooth the reconstructed video.
Abstract: Amateur videos always contain focusing issues. A focusing mistake may produce out-of-focus blur, which seriously degrades the expressive force of the video. In this paper, we propose a patch-based method to remove the out-of-focus blur of a video and build an all-in-focus video. We assume that the out-of-focus blurry region in one frame will be clear in a portion of other frames; thus, the clear corresponding regions can be used to reconstruct the blurry one. We divide each video frame into a grid of patches and track each patch in the surrounding frames. We independently reconstruct each video frame by building a Markov random field model to identify the optimal target patches that are sharp, similar to the original patches, and are coherent with their neighboring patches within the overlapped regions. To recover an all-in-focus video, an iterative framework is utilized, in which the reconstructed video of each iteration is substituted in the next iteration. Finally, we employ the idea of a bilateral filter to temporally smooth the reconstructed video. The experimental results and the comparison with the previous works demonstrate the effectiveness of our method.

19 citations

Journal ArticleDOI
TL;DR: This article investigates human-scenery positional relationships and construct a photographic assistance system to optimize the position of human subjects in a given background scene, thereby assisting the user in capturing high-quality souvenir photos.
Abstract: People often take photographs at tourist sites and these pictures usually have two main elements: a person in the foreground and scenery in the background. This type of “souvenir photo” is one of the most common photos clicked by tourists. Although algorithms that aid a user-photographer in taking a well-composed picture of a scene exist [Ni et al. 2013], few studies have addressed the issue of properly positioning human subjects in photographs. In photography, the common guidelines of composing portrait images exist. However, these rules usually do not consider the background scene. Therefore, in this article, we investigate human-scenery positional relationships and construct a photographic assistance system to optimize the position of human subjects in a given background scene, thereby assisting the user in capturing high-quality souvenir photos. We collect thousands of well-composed portrait photographs to learn human-scenery aesthetic composition rules. In addition, we define a set of negative rules to exclude undesirable compositions. Recommendation results are achieved by combining the first learned positive rule with our proposed negative rules. We implement the proposed system on an Android platform in a smartphone. The system demonstrates its efficacy by producing well-composed souvenir photos.

14 citations


Cited by
More filters
Proceedings ArticleDOI
20 Jun 2011
TL;DR: This work proposes a regional contrast based saliency extraction algorithm, which simultaneously evaluates global contrast differences and spatial coherence, and consistently outperformed existing saliency detection methods.
Abstract: Automatic estimation of salient object regions across images, without any prior assumption or knowledge of the contents of the corresponding scenes, enhances many computer vision and computer graphics applications. We introduce a regional contrast based salient object detection algorithm, which simultaneously evaluates global contrast differences and spatial weighted coherence scores. The proposed algorithm is simple, efficient, naturally multi-scale, and produces full-resolution, high-quality saliency maps. These saliency maps are further used to initialize a novel iterative version of GrabCut, namely SaliencyCut, for high quality unsupervised salient object segmentation. We extensively evaluated our algorithm using traditional salient object detection datasets, as well as a more challenging Internet image dataset. Our experimental results demonstrate that our algorithm consistently outperforms 15 existing salient object detection and segmentation methods, yielding higher precision and better recall rates. We also show that our algorithm can be used to efficiently extract salient object masks from Internet images, enabling effective sketch-based image retrieval (SBIR) via simple shape comparisons. Despite such noisy internet images, where the saliency regions are ambiguous, our saliency guided image retrieval achieves a superior retrieval rate compared with state-of-the-art SBIR methods, and additionally provides important target object region information.

3,653 citations

Book ChapterDOI
08 Oct 2016
TL;DR: This paper proposes a fully automatic approach to colorization that produces vibrant and realistic colorizations and shows that colorization can be a powerful pretext task for self-supervised feature learning, acting as a cross-channel encoder.
Abstract: Given a grayscale photograph as input, this paper attacks the problem of hallucinating a plausible color version of the photograph. This problem is clearly underconstrained, so previous approaches have either relied on significant user interaction or resulted in desaturated colorizations. We propose a fully automatic approach that produces vibrant and realistic colorizations. We embrace the underlying uncertainty of the problem by posing it as a classification task and use class-rebalancing at training time to increase the diversity of colors in the result. The system is implemented as a feed-forward pass in a CNN at test time and is trained on over a million color images. We evaluate our algorithm using a “colorization Turing test,” asking human participants to choose between a generated and ground truth color image. Our method successfully fools humans on 32 % of the trials, significantly higher than previous methods. Moreover, we show that colorization can be a powerful pretext task for self-supervised feature learning, acting as a cross-channel encoder. This approach results in state-of-the-art performance on several feature learning benchmarks.

2,326 citations

Posted Content
TL;DR: In this article, the problem of hallucinating a plausible color version of the photograph is addressed by posing it as a classification task and using class-balancing at training time to increase the diversity of colors in the result.
Abstract: Given a grayscale photograph as input, this paper attacks the problem of hallucinating a plausible color version of the photograph. This problem is clearly underconstrained, so previous approaches have either relied on significant user interaction or resulted in desaturated colorizations. We propose a fully automatic approach that produces vibrant and realistic colorizations. We embrace the underlying uncertainty of the problem by posing it as a classification task and use class-rebalancing at training time to increase the diversity of colors in the result. The system is implemented as a feed-forward pass in a CNN at test time and is trained on over a million color images. We evaluate our algorithm using a "colorization Turing test," asking human participants to choose between a generated and ground truth color image. Our method successfully fools humans on 32% of the trials, significantly higher than previous methods. Moreover, we show that colorization can be a powerful pretext task for self-supervised feature learning, acting as a cross-channel encoder. This approach results in state-of-the-art performance on several feature learning benchmarks.

2,087 citations

Journal ArticleDOI
TL;DR: It is found that the models designed specifically for salient object detection generally work better than models in closely related areas, which provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems.
Abstract: We extensively compare, qualitatively and quantitatively, 41 state-of-the-art models (29 salient object detection, 10 fixation prediction, 1 objectness, and 1 baseline) over seven challenging data sets for the purpose of benchmarking salient object detection and segmentation methods. From the results obtained so far, our evaluation shows a consistent rapid progress over the last few years in terms of both accuracy and running time. The top contenders in this benchmark significantly outperform the models identified as the best in the previous benchmark conducted three years ago. We find that the models designed specifically for salient object detection generally work better than models in closely related areas, which in turn provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems. In particular, we analyze the influences of center bias and scene complexity in model performance, which, along with the hard cases for the state-of-the-art models, provide useful hints toward constructing more challenging large-scale data sets and better saliency models. Finally, we propose probable solutions for tackling several open problems, such as evaluation scores and data set bias, which also suggest future research directions in the rapidly growing field of salient object detection.

1,372 citations

Proceedings ArticleDOI
23 Jun 2014
TL;DR: It is observed that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their corresponding image windows in to a small fixed size, so as to train a generic objectness measure.
Abstract: Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their corresponding image windows in to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g. ADD, BITWISE SHIFT, etc.). Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1, 000 proposals. Increasing the numbers of proposals and color spaces for computing BING features, our performance can be further improved to 99.5% DR.

1,034 citations