scispace - formally typeset
Search or ask a question

Showing papers by "Jian Sun published in 2010"


Book ChapterDOI
05 Sep 2010
TL;DR: The guided filter is demonstrated that it is both effective and efficient in a great variety of computer vision and computer graphics applications including noise reduction, detail smoothing/enhancement, HDR compression, image matting/feathering, haze removal, and joint upsampling.
Abstract: In this paper, we propose a novel type of explicit image filter - guided filter. Derived from a local linear model, the guided filter generates the filtering output by considering the content of a guidance image, which can be the input image itself or another different image. The guided filter can perform as an edge-preserving smoothing operator like the popular bilateral filter [1], but has better behavior near the edges. It also has a theoretical connection with the matting Laplacian matrix [2], so is a more generic concept than a smoothing operator and can better utilize the structures in the guidance image. Moreover, the guided filter has a fast and non-approximate linear-time algorithm, whose computational complexity is independent of the filtering kernel size. We demonstrate that the guided filter is both effective and efficient in a great variety of computer vision and computer graphics applications including noise reduction, detail smoothing/enhancement, HDR compression, image matting/feathering, haze removal, and joint upsampling.

2,215 citations


Journal ArticleDOI
TL;DR: A novel examplar-based inpainting algorithm through investigating the sparsity of natural image patches that enables better discrimination of structure and texture, and the patch sparse representation forces the newly inpainted regions to be sharp and consistent with the surrounding textures.
Abstract: This paper introduces a novel examplar-based inpainting algorithm through investigating the sparsity of natural image patches. Two novel concepts of sparsity at the patch level are proposed for modeling the patch priority and patch representation, which are two crucial steps for patch propagation in the examplar-based inpainting approach. First, patch structure sparsity is designed to measure the confidence of a patch located at the image structure (e.g., the edge or corner) by the sparseness of its nonzero similarities to the neighboring patches. The patch with larger structure sparsity will be assigned higher priority for further inpainting. Second, it is assumed that the patch to be filled can be represented by the sparse linear combination of candidate patches under the local patch consistency constraint in a framework of sparse representation. Compared with the traditional examplar-based inpainting approach, structure sparsity enables better discrimination of structure and texture, and the patch sparse representation forces the newly inpainted regions to be sharp and consistent with the surrounding textures. Experiments on synthetic and natural images show the advantages of the proposed approach.

520 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work proposes a pose-adaptive matching method that uses pose-specific classifiers to deal with different pose combinations of the matching face pair, and finds that a simple normalization mechanism after PCA can further improve the discriminative ability of the descriptor.
Abstract: We present a novel approach to address the representation issue and the matching issue in face recognition (verification). Firstly, our approach encodes the micro-structures of the face by a new learning-based encoding method. Unlike many previous manually designed encoding methods (e.g., LBP or SIFT), we use unsupervised learning techniques to learn an encoder from the training examples, which can automatically achieve very good tradeoff between discriminative power and invariance. Then we apply PCA to get a compact face descriptor. We find that a simple normalization mechanism after PCA can further improve the discriminative ability of the descriptor. The resulting face representation, learning-based (LE) descriptor, is compact, highly discriminative, and easy-to-extract. To handle the large pose variation in real-life scenarios, we propose a pose-adaptive matching method that uses pose-specific classifiers to deal with different pose combinations (e.g., frontal v.s. frontal, frontal v.s. left) of the matching face pair. Our approach is comparable with the state-of-the-art methods on the Labeled Face in Wild (LFW) benchmark (we achieved 84.45% recognition rate), while maintaining excellent compactness, simplicity, and generalization ability across different datasets.

470 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper derives an efficient algorithm to solve a large kernel matting Laplacian and uses adaptive kernel sizes by a KD-tree trimap segmentation technique to reduce running time.
Abstract: Image matting is of great importance in both computer vision and graphics applications. Most existing state-of-the-art techniques rely on large sparse matrices such as the matting Laplacian [12]. However, solving these linear systems is often time-consuming, which is unfavored for the user interaction. In this paper, we propose a fast method for high quality matting. We first derive an efficient algorithm to solve a large kernel matting Laplacian. A large kernel propagates information more quickly and may improve the matte quality. To further reduce running time, we also use adaptive kernel sizes by a KD-tree trimap segmentation technique. A variety of experiments show that our algorithm provides high quality results and is 5 to 20 times faster than previous methods.

181 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper proposes a context-constrained hallucination approach for image super-resolution through building a training set of high-resolution/low-resolution image segment pairs, where the high- resolution pixel is hallucinated from its texturally similar segments which are retrieved from the training set by texture similarity.
Abstract: This paper proposes a context-constrained hallucination approach for image super-resolution. Through building a training set of high-resolution/low-resolution image segment pairs, the high-resolution pixel is hallucinated from its texturally similar segments which are retrieved from the training set by texture similarity. Given the discrete hallucinated examples, a continuous energy function is designed to enforce the fidelity of high-resolution image to low-resolution input and the constraints imposed by the hallucinated examples and the edge smoothness prior. The reconstructed high-resolution image is sharp with minimal artifacts both along the edges and in the textural regions.

132 citations


Patent
Jian Sun1, Kaiming He1, Xiaoou Tang1
01 Feb 2010
TL;DR: In this article, techniques and technologies for de-hazing hazy images are described, and some of the disclosed methods include removing the effects of the haze from a hazy image and outputting the recovered, dehazed image.
Abstract: Techniques and technologies for de-hazing hazy images are described. Some techniques provide for determining the effects of the haze and removing the same from an image to recover a de-hazed image. Thus, the de-hazed image does not contain the effects of the haze. Some disclosed technologies allow for similar results. This document also discloses systems and methods for de-hazing images. Some of the disclosed de-hazing systems include an image capture device for capturing the hazy image and a processor for removing the effects of the haze from the hazy image. These systems store the recovered, de-hazed images in a memory and/or display the de-hazed images on a display. Some of the disclosed methods include removing the effects of the haze from a hazy image and outputting the recovered, de-hazed image.

99 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: A temporal matching constraint is introduced in AAM fitting and a color-based face segmentation is introduced as a soft constraint to improve the AAM tracker's performance, as demonstrated with experiments on various challenging real-world videos.
Abstract: Active Appearance Model (AAM) based face tracking has advantages of accurate alignment, high efficiency, and effectiveness for handling face deformation. However, AAM suffers from the generalization problem and has difficulties in images with cluttered backgrounds. In this paper, we introduce two novel constraints into AAM fitting to address the above problems. We first introduce a temporal matching constraint in AAM fitting. In the proposed fitting scheme, the temporal matching enforces an inter-frame local appearance constraint between frames. The resulting model takes advantage of temporal matching's good generalizability, but does not suffer from the mismatched points. To make AAM more stable for cluttered backgrounds, we introduce a color-based face segmentation as a soft constraint. Both constraints effectively improve the AAM tracker's performance, as demonstrated with experiments on various challenging real-world videos.

84 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: A new scalable face representation is developed using both local and global features and it is shown that the inverted index based on local features provides candidate images with good recall, while the multi-reference re-ranking with global hamming signature leads to good precision.
Abstract: State-of-the-art image retrieval systems achieve scalability by using bag-of-words representation and textual retrieval methods, but their performance degrades quickly in the face image domain, mainly because they 1) produce visual words with low discriminative power for face images, and 2) ignore the special properties of the faces. The leading features for face recognition can achieve good retrieval performance, but these features are not suitable for inverted indexing as they are high-dimensional and global, thus not scalable in either computational or storage cost. In this paper we aim to build a scalable face image retrieval system. For this purpose, we develop a new scalable face representation using both local and global features. In the indexing stage, we exploit special properties of faces to design new component-based local features, which are subsequently quantized into visual words using a novel identity-based quantization scheme. We also use a very small hamming signature (40 bytes) to encode the discriminative global feature for each face. In the retrieval stage, candidate images are firstly retrieved from the inverted index of visual words. We then use a new multi-reference distance to re-rank the candidate images using the hamming signature. On a one-millon face database, we show that our local features and global hamming signatures are complementary — the inverted index based on local features provides candidate images with good recall, while the multi-reference re-ranking with global hamming signature leads to good precision. As a result, our system is not only scalable but also outperforms the linear scan retrieval system using the state-of-the-art face recognition feature in term of the quality.

75 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: A novel adaptive bottom-up approach to parallelize the BK algorithm that is more cache-friendly within smaller subgraphs; it keeps balanced workloads among computing cores; and it causes little overhead and is adaptable to the number of available cores.
Abstract: Graph-cuts optimization is prevalent in vision and graphics problems. It is thus of great practical importance to parallelize the graph-cuts optimization using today's ubiquitous multi-core machines. However, the current best serial algorithm by Boykov and Kolmogorov (called the BK algorithm) still has the superior empirical performance. It is non-trivial to parallelize as expensive synchronization overhead easily offsets the advantage of parallelism. In this paper, we propose a novel adaptive bottom-up approach to parallelize the BK algorithm. We first uniformly partition the graph into a number of regularly-shaped disjoint subgraphs and process them in parallel, then we incrementally merge the subgraphs in an adaptive way to obtain the global optimum. The new algorithm has three benefits: 1) it is more cache-friendly within smaller subgraphs; 2) it keeps balanced workloads among computing cores; 3) it causes little overhead and is adaptable to the number of available cores. Extensive experiments in common applications such as 2D/3D image segmentations and 3D surface fitting demonstrate the effectiveness of our approach.

75 citations


Patent
Jian Sun1, Kaiming He1, Jiangyu Liu1
21 Jul 2010
TL;DR: In this article, a user interface enables interactive image matting to be performed on an image and provides results including an alpha matte as feedback in real-time, in which the user can interactively refine the alpha matte to obtain a satisfactory result.
Abstract: A user interface enables interactive image matting to be performed on an image The user interface may provide results including an alpha matte as feedback in real time. The user interface may provide interactive tools for selecting a portion of the image, and an unknown region for alpha matte processing may be automatically generated adjacent to the selected region. The user may interactively refine the alpha matte as desired to obtain a satisfactory result.

66 citations


Book ChapterDOI
01 Jan 2010
TL;DR: A new algorithm Super-level-set Hierarchical Clustering (SHC), to the authors' knowledge, the first algorithm focused on constructing Markov State Models at multiple resolutions, which is able to produce MSMs at different resolutions using different super density level sets.
Abstract: Simulating biologically relevant timescales at atomic resolution is a challenging task since typical atomistic simulations are at least two orders of magnitude shorter. Markov State Models (MSMs) provide one means of overcoming this gap without sacrificing atomic resolution by extracting long time dynamics from short simulations. MSMs coarse grain space by dividing conformational space into long-lived, or metastable, states. This is equivalent to coarse graining time by integrating out fast motions within metastable states. By varying the degree of coarse graining one can vary the resolution of an MSM; therefore, MSMs are inherently multi-resolution. Here we introduce a new algorithm Super-level-set Hierarchical Clustering (SHC), to our knowledge, the first algorithm focused on constructing MSMs at multiple resolutions. The key insight of this algorithm is to generate a set of super levels covering different density regions of phase space, then cluster each super level separately, and finally recombine this information into a single MSM. SHC is able to produce MSMs at different resolutions using different super density level sets. To demonstrate the power of this algorithm we apply it to a small RNA hairpin, generating MSMs at four different resolutions. We validate these MSMs by showing that they are able to reproduce the original simulation data. Furthermore, long time folding dynamics are extracted from these models. The results show that there are no metastable on-pathway intermediate states. Instead, the folded state serves as a hub directly connected to multiple unfolded/misfolded states which are separated from each other by large free energy barriers.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: An algorithm to compute a set of loops from a point data that presumably sample a smooth manifold M ⊂ Rd to approximate a shortest basis of the one dimensional homology group H1(M) over coefficients in finite field Z2.
Abstract: Inference of topological and geometric attributes of a hidden manifold from its point data is a fundamental problem arising in many scientific studies and engineering applications. In this paper we present an algorithm to compute a set of loops from a point data that presumably sample a smooth manifold M ⊂ Rd. These loops approximate a shortest basis of the one dimensional homology group H1(M) over coefficients in finite field Z2. Previous results addressed the issue of computing the rank of the homology groups from point data, but there is no result on approximating the shortest basis of a manifold from its point sample. In arriving our result, we also present a polynomial time algorithm for computing a shortest basis of H1 (Κ) for any finite simplicial complex Κ whose edges have non-negative weights.

Proceedings ArticleDOI
17 Jan 2010
TL;DR: This work considers the problem of reconstructing a road network from a collection of path traces and provides guarantees on the accuracy of the reconstruction under reasonable assumptions, and can be used to process aCollection of polygonal paths in the plane to allow efficient path similarity queries against new query paths on the same road network.
Abstract: We consider the problem of reconstructing a road network from a collection of path traces and provide guarantees on the accuracy of the reconstruction under reasonable assumptions. Our algorithm can be used to process a collection of polygonal paths in the plane so that shared structures (subpaths) among the paths in the collection can be discovered and the collection can be organized to allow efficient path similarity queries against new query paths on the same road network. This is a timely problem, as GPS and other location traces of both people and vehicles are becoming available on a large scale and there is a real need to create appropriate data structures and data bases for such data.

Patent
29 Jan 2010
TL;DR: In this article, a video game system (or other data processing system) can visually identify a person entering a field of view of the system and determine whether the person has been previously interacting with the system.
Abstract: A video game system (or other data processing system) can visually identify a person entering a field of view of the system and determine whether the person has been previously interacting with the system. In one embodiment, the system establishes thresholds, enrolls players, performs the video game (or other application) including interacting with a subset of the players based on the enrolling, determines that a person has become detectable in the field of view of the system, automatically determines whether the person is one of the enrolled players, maps the person to an enrolled player and interacts with the person based on the mapping if it is determined that the person is one of the enrolled players, and assigns a new identification to the person and interacts with the person based on the new identification if it is determined that the person is not one of the enrolled players.

Journal ArticleDOI
01 Jul 2010
TL;DR: A new concept called fuzzy geodesics is presented and it is shown that fuzzy geodeics are stable with respect to the Gromov‐Hausdorff distance and a new object called the intersection configuration for a set of points on a shape is proposed.
Abstract: A geodesic is a parameterized curve on a Riemannian manifold governed by a second order partial differential equation. Geodesics are notoriously unstable: small perturbations of the underlying manifold may lead to dramatic changes of the course of a geodesic. Such instability makes it difficult to use geodesics in many applications, in particular in the world of discrete geometry. In this paper, we consider a geodesic as the indicator function of the set of the points on the geodesic. From this perspective, we present a new concept called fuzzy geodesics and show that fuzzy geodesics are stable with respect to the Gromov-Hausdorff distance. Based on fuzzy geodesics, we propose a new object called the intersection configuration for a set of points on a shape and demonstrate its effectiveness in the application of finding consistent correspondences between sparse sets of points on shapes differing by extreme deformations.

Patent
Qifa Ke1, Yi Li1, Heung-Yeung Shum1, Jian Sun1, Zhong Wu1 
03 Jun 2010
TL;DR: In this paper, a system for identifying individuals in digital images and for providing matching digital images is provided, where faces are detected in the images and facial components are identified in each face.
Abstract: A system for identifying individuals in digital images and for providing matching digital images is provided. A set of images that include faces of known individuals is received. Faces are detected in the images and facial components are identified in each face. Visual words corresponding to the facial components are generated, stored, and associated with identifiers of the individuals. At a later time, a user may provide an image that includes the face of one of the known individuals. Visual words are determined from the face of the individual in the provided image and matched against the stored visual words. Images associated with matching visual words are ranked and presented to the user.

Journal ArticleDOI
TL;DR: This paper designs a Markov random field (MRF) scale selection model, which selects scales for image segments, then the denoised image is the composition of segments at their optimal scales in the scale space.

Proceedings ArticleDOI
02 May 2010
TL;DR: The present paper is a report of the 3D Shape Retrieval Contest 2010 (SHREC'10) feature detection and description benchmark results.
Abstract: Feature-based approaches have recently become very popular in computer vision and image analysis applications, and are becoming a promising direction in shape retrieval. The SHREC'10 feature detection and description benchmark simulates the feature detection and description stages of feature-based shape retrieval algorithms. The benchmark tests the performance of shape feature detectors and descriptors under a wide variety of transformations. The benchmark allows evaluating how algorithms cope with certain classes of transformations and strength of the transformations that can be dealt with. The present paper is a report of the 3D Shape Retrieval Contest 2010 (SHREC'10) feature detection and description benchmark results.