scispace - formally typeset
Search or ask a question

Showing papers by "Jian Sun published in 2011"


Journal ArticleDOI
TL;DR: A simple but effective image prior - dark channel prior to remove haze from a single input image is proposed, based on a key observation - most local patches in haze-free outdoor images contain some pixels which have very low intensities in at least one color channel.
Abstract: In this paper, we propose a simple but effective image prior-dark channel prior to remove haze from a single input image. The dark channel prior is a kind of statistics of outdoor haze-free images. It is based on a key observation-most local patches in outdoor haze-free images contain some pixels whose intensity is very low in at least one color channel. Using this prior with the haze imaging model, we can directly estimate the thickness of the haze and recover a high-quality haze-free image. Results on a variety of hazy images demonstrate the power of the proposed prior. Moreover, a high-quality depth map can also be obtained as a byproduct of haze removal.

3,668 citations


Journal ArticleDOI
TL;DR: A set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, are proposed to describe a salient object locally, regionally, and globally.
Abstract: In this paper, we study the salient object detection problem for images. We formulate this problem as a binary labeling task where we separate the salient object from the background. We propose a set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, to describe a salient object locally, regionally, and globally. A conditional random field is learned to effectively combine these features for salient object detection. Further, we extend the proposed approach to detect a salient object from sequential images by introducing the dynamic salient features. We collected a large image database containing tens of thousands of carefully labeled images by multiple users and a video segment database, and conducted a set of experiments over them to demonstrate the effectiveness of the proposed approach.

2,319 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper proposes a global sampling method that uses all samples available in the image to handle the computational complexity introduced by the large number of samples, and poses the sampling task as a correspondence problem.
Abstract: Alpha matting refers to the problem of softly extracting the foreground from an image. Given a trimap (specifying known foreground/background and unknown pixels), a straightforward way to compute the alpha value is to sample some known foreground and background colors for each unknown pixel. Existing sampling-based matting methods often collect samples near the unknown pixels only. They fail if good samples cannot be found nearby. In this paper, we propose a global sampling method that uses all samples available in the image. Our global sample set avoids missing good samples. A simple but effective cost function is defined to tackle the ambiguity in the sample selection process. To handle the computational complexity introduced by the large number of samples, we pose the sampling task as a correspondence problem. The correspondence search is efficiently achieved by generalizing a randomized algorithm previously designed for patch matching[3]. A variety of experiments show that our global sampling method produces both visually and quantitatively high-quality matting results.

343 citations


Journal ArticleDOI
TL;DR: A novel generic image prior-gradient profile prior is proposed, which implies the prior knowledge of natural image gradients and proposes a gradient field transformation to constrain the gradient fields of the high resolution image and the enhanced image when performing single image super-resolution and sharpness enhancement.
Abstract: In this paper, we propose a novel generic image prior-gradient profile prior, which implies the prior knowledge of natural image gradients. In this prior, the image gradients are represented by gradient profiles, which are 1-D profiles of gradient magnitudes perpendicular to image structures. We model the gradient profiles by a parametric gradient profile model. Using this model, the prior knowledge of the gradient profiles are learned from a large collection of natural images, which are called gradient profile prior. Based on this prior, we propose a gradient field transformation to constrain the gradient fields of the high resolution image and the enhanced image when performing single image super-resolution and sharpness enhancement. With this simple but very effective approach, we are able to produce state-of-the-art results. The reconstructed high resolution images or the enhanced images are sharp while have rare ringing or jaggy artifacts.

297 citations


Proceedings ArticleDOI
Jie Feng1, Yichen Wei2, Litian Tao3, Chao Zhang1, Jian Sun2 
06 Nov 2011
TL;DR: This work proposes to detect salient objects by directly measuring the saliency of an image window in the original image and adopt the well established sliding window based object detection paradigm.
Abstract: Conventional saliency analysis methods measure the saliency of individual pixels. The resulting saliency map inevitably loses information in the original image and finding salient objects in it is difficult. We propose to detect salient objects by directly measuring the saliency of an image window in the original image and adopt the well established sliding window based object detection paradigm.

192 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: The proposed associate-predict model is built on an extra generic identity data set, in which each identity contains multiple images with large intra-personal variation, and can substantially improve the performance of most existing face recognition methods.
Abstract: Handling intra-personal variation is a major challenge in face recognition. It is difficult how to appropriately measure the similarity between human faces under significantly different settings (e.g., pose, illumination, and expression). In this paper, we propose a new model, called “Associate-Predict” (AP) model, to address this issue. The associate-predict model is built on an extra generic identity data set, in which each identity contains multiple images with large intra-personal variation. When considering two faces under significantly different settings (e.g., non-frontal and frontal), we first “associate” one input face with alike identities from the generic identity date set. Using the associated faces, we generatively “predict” the appearance of one input face under the setting of another input face, or discriminatively “predict” the likelihood whether two input faces are from the same person or not. We call the two proposed prediction methods as “appearance-prediction” and “likelihood-prediction”. By leveraging an extra data set (“memory”) and the “associate-predict” model, the intra-personal variation can be effectively handled. To improve the generalization ability of our model, we further add a switching mechanism — we directly compare the appearances of two faces if they have close intra-personal settings; otherwise, we use the associate-predict model for the recognition. Experiments on two public face benchmarks (Multi-PIE and LFW) demonstrated that our final model can substantially improve the performance of most existing face recognition methods

191 citations


Journal ArticleDOI
Tommer Leyvand1, Meekhof Casey Leon1, Yichen Wei1, Jian Sun1, Baining Guo1 
TL;DR: Kinect Identity, a key component of Microsoft's Kinect for the Xbox 360, combines multiple technologies and careful user interaction design to achieve the goal of recognizing and tracking player identity.
Abstract: Kinect Identity, a key component of Microsoft's Kinect for the Xbox 360, combines multiple technologies and careful user interaction design to achieve the goal of recognizing and tracking player identity.

161 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: A novel clustering algorithm for tagging a face dataset (e. g., a personal photo album) is presented, which outperforms competitive clustering algorithms in term of both precision/recall and efficiency.
Abstract: We present a novel clustering algorithm for tagging a face dataset (e. g., a personal photo album). The core of the algorithm is a new dissimilarity, called Rank-Order distance, which measures the dissimilarity between two faces using their neighboring information in the dataset. The Rank-Order distance is motivated by an observation that faces of the same person usually share their top neighbors. Specifically, for each face, we generate a ranking order list by sorting all other faces in the dataset by absolute distance (e. g., L1 or L2 distance between extracted face recognition features). Then, the Rank-Order distance of two faces is calculated using their ranking orders. Using the new distance, a Rank-Order distance based clustering algorithm is designed to iteratively group all faces into a small number of clusters for effective tagging. The proposed algorithm outperforms competitive clustering algorithms in term of both precision/recall and efficiency.

135 citations


Journal ArticleDOI
TL;DR: A new scalable face representation is developed using both local and global features and it is shown that the inverted index based on local features provides candidate images with good recall, while the multi-reference re-ranking with global hamming signature leads to good precision.
Abstract: State-of-the-art image retrieval systems achieve scalability by using a bag-of-words representation and textual retrieval methods, but their performance degrades quickly in the face image domain, mainly because they produce visual words with low discriminative power for face images and ignore the special properties of faces. The leading features for face recognition can achieve good retrieval performance, but these features are not suitable for inverted indexing as they are high-dimensional and global and thus not scalable in either computational or storage cost. In this paper, we aim to build a scalable face image retrieval system. For this purpose, we develop a new scalable face representation using both local and global features. In the indexing stage, we exploit special properties of faces to design new component-based local features, which are subsequently quantized into visual words using a novel identity-based quantization scheme. We also use a very small Hamming signature (40 bytes) to encode the discriminative global feature for each face. In the retrieval stage, candidate images are first retrieved from the inverted index of visual words. We then use a new multireference distance to rerank the candidate images using the Hamming signature. On a one millon face database, we show that our local features and global Hamming signatures are complementary-the inverted index based on local features provides candidate images with good recall, while the multireference reranking with global Hamming signature leads to good precision. As a result, our system is not only scalable but also outperforms the linear scan retrieval system using the state-of-the-art face recognition feature in term of the quality.

121 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: The results show that the learned NLR-MRF model significantly outperforms the traditional MRF models and produces state-of-the-art results.
Abstract: In this paper, we design a novel MRF framework which is called Non-Local Range Markov Random Field (NLR-MRF). The local spatial range of clique in traditional MRF is extended to the non-local range which is defined over the local patch and also its similar patches in a non-local window. Then the traditional local spatial filter is extended to the non-local range filter that convolves an image over the non-local ranges of pixels. In this framework, we propose a gradient-based discriminative learning method to learn the potential functions and non-local range filter bank. As the gradients of loss function with respect to model parameters are explicitly computed, efficient gradient-based optimization methods are utilized to train the proposed model. We implement this framework for image denoising and in-painting, the results show that the learned NLR-MRF model significantly outperforms the traditional MRF models and produces state-of-the-art results.

68 citations


Journal ArticleDOI
TL;DR: This paper proposed a new set of gradient oriented feature, Haar of Oriented Gradients (HOOG), to effectively capture the shape and texture features on animal head and proposed two detection algorithms, namely Bruteforce detection and Deformable detection, to effectively exploit the shape feature and texture feature simultaneously.
Abstract: Robust object detection has many important applications in real-world online photo processing. For example, both Google image search and MSN live image search have integrated human face detector to retrieve face or portrait photos. Inspired by the success of such face filtering approach, in this paper, we focus on another popular online photo category-animal, which is one of the top five categories in the MSN live image search query log. As a first attempt, we focus on the problem of animal head detection of a set of relatively large land animals that are popular on the internet, such as cat, tiger, panda, fox, and cheetah. First, we proposed a new set of gradient oriented feature, Haar of Oriented Gradients (HOOG), to effectively capture the shape and texture features on animal head. Then, we proposed two detection algorithms, namely Bruteforce detection and Deformable detection, to effectively exploit the shape feature and texture feature simultaneously. Experimental results on 14 379 well labeled animals images validate the superiority of the proposed approach. Additionally, we apply the animal head detector to improve the image search result through text based online photo search result filtering.

Patent
Jiyang Liu1, Jian Sun1, Heung-Yeung Shum1, Xiaosong Yang1, Yu-Ting Kuo1, Lei Zhang1, Yi Li1, Qifa Ke1, Ce Liu1 
31 Oct 2011
TL;DR: In this article, search queries containing multiple modes of query input are used to identify responsive results, and in some embodiments additional query refinements or suggestions can be made based on the content of the query or the initially responsive results.
Abstract: Search queries containing multiple modes of query input are used to identify responsive results. The search queries can be composed of combinations of keyword or text input, image input, video input, audio input, or other modes of input. The multiple modes of query input can be present in an initial search request, or an initial request containing a single type of query input can be supplemented with a second type of input. In addition to providing responsive results, in some embodiments additional query refinements or suggestions can be made based on the content of the query or the initially responsive results.

Patent
Jian Sun1, Qi Yin1, Xiaoou Tang1
13 May 2011
TL;DR: Some implementations provide techniques and arrangements to address intrapersonal variations encountered during facial recognition, such as transforming a portion of an image from a first-person condition to a second-person one to enable more accurate comparison with another image as mentioned in this paper.
Abstract: Some implementations provide techniques and arrangements to address intrapersonal variations encountered during facial recognition. For example, some implementations transform at least a portion of an image from a first intrapersonal condition to a second intrapersonal condition to enable more accurate comparison with another image. Some implementations may determine a pose category of an input image and may modify at least a portion of the input image to a different pose category of another image for comparing the input image with the other image. Further, some implementations provide for compression of data representing at least a portion of the input image to decrease the dimensionality of the data.

Journal ArticleDOI
TL;DR: An algorithm to compute a set of cycles from a point data that presumably sample a smooth manifold M ⊂ ℝ d to approximate a shortest basis of the first homology group H 1 (M) over coefficients in the finite field ℤ 2 .
Abstract: Inference of topological and geometric attributes of a hidden manifold from its point data is a fundamental problem arising in many scientific studies and engineering applications. In this paper, we present an algorithm to compute a set of cycles from a point data that presumably sample a smooth manifold M ⊂ ℝ d . These cycles approximate a shortest basis of the first homology group H 1 (M) over coefficients in the finite field ℤ 2 . Previous results addressed the issue of computing the rank of the homology groups from point data, but there is no result on approximating the shortest basis of a manifold from its point sample. In arriving at our result, we also present a polynomial time algorithm for computing a shortest basis of H 1 (K) for any finite simplicial complex K whose edges have non-negative weights.

Patent
20 Dec 2011
TL;DR: In this paper, a plurality of images of a user's eye are acquired and an enhanced user eye gaze is estimated by narrowing a database of eye information and corresponding known gaze lines to a subset of the eye information having gaze lines corresponding to a gaze target area.
Abstract: Systems, methods, and computer media for estimating user eye gaze are provided. A plurality of images of a user's eye are acquired. At least one image of at least part of the user's field of view is acquired. At least one gaze target area in the user's field of view is determined based on the plurality of images of the user's eye. An enhanced user eye gaze is then estimated by narrowing a database of eye information and corresponding known gaze lines to a subset of the eye information having gaze lines corresponding to a gaze target area. User eye information derived from the images of the user's eye is then compared with the narrowed subset of the eye information, and an enhanced estimated user eye gaze is identified as the known gaze line of a matching eye image.

Patent
Jian Sun1, Qi Yin1, Xiaoou Tang1
13 May 2011
TL;DR: Some implementations employ an identity data set having a plurality of images representing different intrapersonal settings as mentioned in this paper, and some implementations utilize a likelihood-prediction approach for comparing images that generates a classifier for an input image based on an association of an image with the ID data set.
Abstract: Some implementations provide techniques and arrangements to address intrapersonal variations encountered during facial recognition. For example, some implementations employ an identity data set having a plurality of images representing different intrapersonal settings. A predictive model may associate one or more input images with one or more images in the identity data set. Some implementations may use an appearance-prediction approach to compare two images by predicting an appearance of at least one of the images under an intrapersonal setting of the other image. Further, some implementations may utilize a likelihood-prediction approach for comparing images that generates a classifier for an input image based on an association of an input image with the identity data set.

Proceedings ArticleDOI
Lu Yuan1, Jian Sun1
06 Nov 2011
TL;DR: This paper proposes a new “hybrid” image capture mode: a high-res JPEG file and a low-res RAW file as alternative of the original RAW file to reconstruct a high quality image by combining the advantages of two kinds of files.
Abstract: A camera RAW file contains minimally processed data from the image sensor. The contents of the RAW file include more information, and potentially higher quality, than the commonly used JPEG file. But the RAW file is typically several times larger than the JPEG file (taking fewer images, slower quick shooting) and lacks the standard file format (not ready-to-use, prolonging the image workflow). These drawbacks limit its applications. In this paper, we suggest a new “hybrid” image capture mode: a high-res JPEG file and a low-res RAW file as alternative of the original RAW file. Most RAW users can be benefited from such a combination. To address this problem, we provide an effective approach to reconstruct a high quality image by combining the advantages of two kinds of files. We formulate this reconstruction process as a global optimization problem by enforcing two constraints: reconstruction constraint and detail consistency constraint. The final recovered image is smaller than the full-res RAW file, enables faster quick shooting, and has both richer information (e.g., color space, dynamic range, lossless 14 bits data) and higher resolution. In practice, the functionality of capturing such a “hybrid” image pair in one-shot has been supported in some existing digital cameras.

Patent
Yu-Ting Kuo1, Yi Li1, Fang Wen1, Qifa Ke1, Jian Sun1 
18 Aug 2011
TL;DR: In this paper, a refined user query may be generated based on a selected visual cue, in association with query result groups, which are presented to the user in conjunction with visual cues.
Abstract: Methods and computer-storage media having computer-executable instructions embodied thereon that facilitate refining query results using visual cues are provided. Query results are determined in response to an indication of a user query. One or more groups of query results are generated from the query results based on categories of query results that share similar features. Visual cues are associated with each of the query result groups. Visual cues, in association with query result groups, are presented to a user. Query results associated with a selected visual cue may be presented to a user. A refined user query may be generated based on a selected visual cue.

Patent
Jian Sun1, Zhimin Cao1, Qi Yin1, Xiaoou Tang1
24 May 2011
TL;DR: In this paper, techniques for obtaining compact face descriptors and using pose-specific comparisons to deal with different pose combinations for image comparison are described. But none of these techniques are suitable for face recognition.
Abstract: Described herein are techniques for obtaining compact face descriptors and using pose-specific comparisons to deal with different pose combinations for image comparison.

Patent
Lu Yuan1, Jian Sun1
07 Jun 2011
TL;DR: In this article, a non-linear function that characterizes the luminance of shadow, mid-tone, and highlight portions of the image is applied to the input image to improve the appearance of underexposed and overexposed regions.
Abstract: Techniques for automatic exposure correction of images are provided. In particular, the exposure of an input image may be improved by automatically modifying a non-linear function that characterizes the luminance of shadow, mid-tone, and highlight portions of the image. The input image may be segmented into a number of regions and each region is assigned a zone, where the zone indicates a specified range of luminance values. An initial zone assigned to a region of the image may be changed in order to reflect an optimal zone of the region. Based, in part, on the optimal zones for each region of the image, luminance modification parameters may be calculated and applied to the non-linear function in order to produce a modified version of the input image that improves the appearance of overexposed and/or underexposed regions of the input image.

Patent
Lu Yuan1, Fang Wen1, Jian Sun1
26 May 2011
TL;DR: In this article, a dual-phase approach to red eye correction may prevent overly aggressive or overly conservative red eye reduction by detecting an eye portion in a digital image, which may include the performance of a strong eye correction for the eye portion when the eye part includes a weak red eye.
Abstract: A dual-phase approach to red eye correction may prevent overly aggressive or overly conservative red eye reduction. The dual-phase approach may include detecting an eye portion in a digital image. Once the eye portion is detected, the dual-phase approach may include the performance of a strong red eye correction for the eye portion when the eye portion includes a strong red eye. Otherwise, the dual-phase approach may include the performance of a weak red eye correction for the eye portion when the eye portion includes a weak red eye. The weak red eye may be distinguished from the strong red eye based a redness threshold that shows the weak red eye as having less redness hue than the strong red eye.

Patent
Jian Sun1, Fang Wen1, Chunhui Zhu1
17 May 2011
TL;DR: Rank ordered-based object image clustering may facilitate robust clustering of digital images as discussed by the authors, where the rank order distance for each pairing of object images by normalizing the asymmetric distances of corresponding object images is obtained.
Abstract: Rank ordered-based object image clustering may facilitate robust clustering of digital images. The rank order-based clustering of object images may include defining asymmetric distances between each object image and one or more other object images in a set of multiple object images using generated ordered lists. The rank order-based clustering may further include obtaining a rank order distance for each pairing of object images by normalizing the asymmetric distances of corresponding object images. The multiple object images are further clustered into object image clusters based on the rank order distances and adaptive absolute distance.

Patent
16 May 2011
TL;DR: In this article, a computing device is described that is configured to select a pixel pair including a foreground pixel of an image and a background pixel of the image from a global set of pixels based at least on spatial distances from an unknown pixel and color distances from the unknown pixel.
Abstract: A computing device is described herein that is configured to select a pixel pair including a foreground pixel of an image and a background pixel of the image from a global set of pixels based at least on spatial distances from an unknown pixel and color distances from the unknown pixel. The computing device is further configured to determine an opacity measure for the unknown pixel based at least on the selected pixel pair.


Proceedings Article
01 Jan 2011
TL;DR: In this article, a novel Fourier-theoretic approach for estimating the symmetry group G of a geometric object X is proposed, which takes as input a geometric similarity matrix between low-order combinations of features of X and then searches within the tree of all feature permutations to detect the sparse subset that defines the symmetry groups G of X.
Abstract: In this paper, we propose a novel Fourier-theoretic approach for estimating the symmetry group G of a geometric object X. Our approach takes as input a geometric similarity matrix between low-order combinations of features of X and then searches within the tree of all feature permutations to detect the sparse subset that defines the symmetry group G of X. Using the Fourier-theoretic approach, we construct an efficient marginal-based search strategy, which can recover the symmetry group G effectively. The framework introduced in this paper can be used to discover symmetries of more abstract geometric spaces and is robust to deformation noise. Experimental results show that our approach can fully determine the symmetries of various geometric objects.