scispace - formally typeset
Search or ask a question

Showing papers by "Jianchao Yang published in 2014"


Proceedings ArticleDOI
23 Jun 2014
TL;DR: It is shown that the dark-channel feature is the most informative one for this task, which confirms the observation of He et al.
Abstract: Haze is one of the major factors that degrade outdoor images. Removing haze from a single image is known to be severely ill-posed, and assumptions made in previous methods do not hold in many situations. In this paper, we systematically investigate different haze-relevant features in a learning framework to identify the best feature combination for image dehazing. We show that the dark-channel feature is the most informative one for this task, which confirms the observation of He et al. [8] from a learning perspective, while other haze-relevant features also contribute significantly in a complementary way. We also find that surprisingly, the synthetic hazy image patches we use for feature investigation serve well as training data for realworld images, which allows us to train specific models for specific applications. Experiment results demonstrate that the proposed algorithm outperforms state-of-the-art methods on both synthetic and real-world datasets.

534 citations


Proceedings ArticleDOI
03 Nov 2014
TL;DR: The RAPID (RAting PIctorial aesthetics using Deep learning) system is presented, which adopts a novel deep neural network approach to enable automatic feature learning and style attributes of images to help improve the aesthetic quality categorization accuracy.
Abstract: Effective visual features are essential for computational aesthetic quality rating systems. Existing methods used machine learning and statistical modeling techniques on handcrafted features or generic image descriptors. A recently-published large-scale dataset, the AVA dataset, has further empowered machine learning based approaches. We present the RAPID (RAting PIctorial aesthetics using Deep learning) system, which adopts a novel deep neural network approach to enable automatic feature learning. The central idea is to incorporate heterogeneous inputs generated from the image, which include a global view and a local view, and to unify the feature learning and classifier training using a double-column deep convolutional neural network. In addition, we utilize the style attributes of images to help improve the aesthetic quality categorization accuracy. Experimental results show that our approach significantly outperforms the state of the art on the AVA dataset.

370 citations


Proceedings ArticleDOI
23 Jun 2014
TL;DR: This work proposes a unified framework for simultaneous human parsing and pose estimation based on semantic parts by utilizing Parselets and Mixture of Joint-Group Templates as the representations for these semantic parts.
Abstract: We study the problem of human body configuration analysis, more specifically, human parsing and human pose estimation. These two tasks, ie identifying the semantic regions and body joints respectively over the human body image, are intrinsically highly correlated. However, previous works generally solve these two problems separately or iteratively. In this work, we propose a unified framework for simultaneous human parsing and pose estimation based on semantic parts. By utilizing Parselets and Mixture of Joint-Group Templates as the representations for these semantic parts, we seamlessly formulate the human parsing and pose estimation problem jointly within a unified framework via a tailored and-or graph. A novel Grid Layout Feature is then designed to effectively capture the spatial co-occurrence/occlusion information between/within the Parselets and MJGTs. Thus the mutually complementary nature of these two tasks can be harnessed to boost the performance of each other. The resultant unified model can be solved using the structure learning framework in a principled way. Comprehensive evaluations on two benchmark datasets for both tasks demonstrate the effectiveness of the proposed framework when compared with the state-of-the-art methods.

106 citations


Patent
30 Jul 2014
TL;DR: In this article, a regularized double-column neural network is proposed to combine the local and global representations of images as inputs and learn the best representation for a particular feature through multiple convolutional and fully connected layers.
Abstract: Deep convolutional neural networks receive local and global representations of images as inputs and learn the best representation for a particular feature through multiple convolutional and fully connected layers. A double-column neural network structure receives each of the local and global representations as two heterogeneous parallel inputs to the two columns. After some layers of transformations, the two columns are merged to form the final classifier. Additionally, features may be learned in one of the fully connected layers. The features of the images may be leveraged to boost classification accuracy of other features by learning a regularized double-column neural network.

84 citations


Proceedings ArticleDOI
23 Jun 2014
TL;DR: A scalable solution based on the nearest class mean classifier (NCM), built on local feature embedding, local feature metric learning and max-margin template selection, which is naturally amenable to NCM and thus to such open-ended classification problems.
Abstract: This paper addresses the large-scale visual font recogni- tion (VFR) problem, which aims at automatic identification of the typeface, weight, and slope of the text in an image or photo without any knowledge of content. Although vi- sual font recognition has many practical applications, it has largely been neglected by the vision community. To address the VFR problem, we construct a large-scale dataset con- taining 2, 420 font classes, which easily exceeds the scale of most image categorization datasets in computer vision. As font recognition is inherently dynamic and open-ended, i.e., new classes and data for existing categories are constantly added to the database over time, we propose a scalable so- lution based on the nearest class mean classifier (NCM). The core algorithm is built on local feature embedding, lo- cal feature metric learning and max-margin template se- lection, which is naturally amenable to NCM and thus to such open-ended classification problems. The new algo- rithm can generalize to new classes and new data at lit- tle added cost. Extensive experiments demonstrate that our approach is very effective on our synthetic test images, and achieves promising results on real world test images.

49 citations


Proceedings Article
27 Jul 2014
TL;DR: An iterative regularization scheme, where the sparse representation obtained from the previous iteration is used to build the graph Laplacian for the current iteration of regularization, which demonstrates the superiority of the algorithm compared to l1-Graph and other competing clustering methods.
Abstract: l1-Graph has been proven to be effective in data clustering, which partitions the data space by using the sparse representation of the data as the similarity measure. However, the sparse representation is performed for each datum separately without taking into account the geometric structure of the data. Motivated by l1-Graph and manifold leaning, we propose Laplacian Regularized l1-Graph (LRl1-Graph) for data clustering. The sparse representations of LRl1-Graph are regularized by the geometric information of the data so that they vary smoothly along the geodesics of the data manifold by the graph Laplacian according to the manifold assumption. Moreover, we propose an iterative regularization scheme, where the sparse representation obtained from the previous iteration is used to build the graph Laplacian for the current iteration of regularization. The experimental results on real data sets demonstrate the superiority of our algorithm compared to l1-Graph and other competing clustering methods.

40 citations


Journal ArticleDOI
TL;DR: A novel supervised super-vector encoding framework to learn discriminative image feature representations that gives significant improvement over the unsupervised counterpart and outperforms the state-of-the-arts.

31 citations


Patent
30 Jul 2014
TL;DR: In this article, a first set of attributes (e.g., style) is generated through pre-trained single column neural networks and leveraged to regularize the training process of a regularized double-column convolutional neural network (RDCNN).
Abstract: A first set of attributes (eg, style) is generated through pre-trained single column neural networks and leveraged to regularize the training process of a regularized double-column convolutional neural network (RDCNN) Parameters of the first column (eg, style) of the RDCNN are fixed during RDCNN training Parameters of the second column (eg, aesthetics) are fine-tuned while training the RDCNN and the learning process is supervised by the label identified by the second column (eg, aesthetics) Thus, features of the images may be leveraged to boost classification accuracy of other features by learning a RDCNN

31 citations


Patent
01 Jul 2014
TL;DR: In this paper, multi-feature image haze removal is described in one or more implementations, feature maps are extracted from a hazy image of a scene and the calculated airlight represents constant light of the scene using the computed portions of light and the ascertained airlight.
Abstract: Multi-feature image haze removal is described In one or more implementations, feature maps are extracted from a hazy image of a scene The feature maps convey information about visual characteristics of the scene captured in the hazy image Based on the feature maps, portions of light that are not scattered by the atmosphere and are captured to produce the hazy image are computed Additionally, airlight of the hazy image is ascertained based on at least one of the feature maps The calculated airlight represents constant light of the scene Using the computed portions of light and the ascertained airlight, a dehazed image is generated from the hazy image

25 citations


Patent
22 Jul 2014
TL;DR: In this article, text detection and recognition techniques for detecting and recognizing text in images have been proposed, where multiple color spaces and multiple-stage filtering may be applied to detect the text components.
Abstract: Techniques for detecting and recognizing text may be provided. For example, an image may be analyzed to detect and recognize text therein. The analysis may involve detecting text components in the image. For example, multiple color spaces and multiple-stage filtering may be applied to detect the text components. Further, the analysis may involve extracting text lines based on the text components. For example, global information about the text components can be analyzed to generate best-fitting text lines. The analysis may also involve pruning and splitting the text lines to generate bounding boxes around groups of text components. Text recognition may be applied to the bounding boxes to recognize text therein.

20 citations


Proceedings ArticleDOI
01 Jan 2014
TL;DR: Regularized l1- graph (Rl1-Graph) is proposed, which is a regularized representation of the graph Laplacian constructed by the sparse codes that partitions the data space by using the sparse representations of the data as the similarity measure.
Abstract: l1-Graph has been proven to be effective in data clustering, which partitions the data space by using the sparse representation of the data as the similarity measure. However, the sparse representation is performed for each datum independently without taking into account the geometric structure of the data. Motivated by l1-Graph and manifold leaning, we propose Regularized l1-Graph (Rl1-Graph) for data clustering. Compared to l1-Graph, the sparse representations of Rl1-Graph are regularized by the geometric information of the data. In accordance with the manifold assumption, the sparse representations vary smoothly along the geodesics of the data manifold through the graph Laplacian constructed by the sparse codes. Experimental results on various data sets demonstrate the superiority of our algorithm compared to l1-Graph and other competing clustering methods.

Proceedings ArticleDOI
02 May 2014
TL;DR: An efficient iterative algorithm is developed to solve the NLCS recovery problem, which is shown to have stable convergence behavior in experiments and significantly improves the state-of-the-art of image compressive sampling.
Abstract: Compressive sampling (CS) aims at acquiring a signal at a sampling rate below the Nyquist rate by exploiting prior knowledge that a signal is sparse or correlated in some domain. Despite the remarkable progress in the theory of CS, the sampling rate on a single image required by CS is still very high in practice. In this paper, a non-local compressive sampling (NLCS) recovery method is proposed to further reduce the sampling rate by exploiting non-local patch correlation and local piecewise smoothness present in natural images. Two non-local sparsity measures, i.e., non-local wavelet sparsity and non-local joint sparsity, are proposed to exploit the patch correlation in NLCS. An efficient iterative algorithm is developed to solve the NLCS recovery problem, which is shown to have stable convergence behavior in experiments. The experimental results show that our NLCS significantly improves the state-of-the-art of image compressive sampling.

Patent
Hailin Jin1, Zhuoyuan Chen1, Scott Cohen1, Jianchao Yang1, Zhe Lin1 
11 Mar 2014
TL;DR: In this article, an image patch is denoised based on an average of the matching patches from reference frame and the similar motion patches determined from the previous and subsequent image frames.
Abstract: In techniques for video denoising using optical flow, image frames of video content include noise that corrupts the video content. A reference frame is selected, and matching patches to an image patch in the reference frame are determined from within the reference frame. A noise estimate is computed for previous and subsequent image frames relative to the reference frame. The noise estimate for an image frame is computed based on optical flow, and is usable to determine a contribution of similar motion patches to denoise the image patch in the reference frame. The similar motion patches from the previous and subsequent image frames that correspond to the image patch in the reference frame are determined based on the optical flow computations. The image patch is denoised based on an average of the matching patches from reference frame and the similar motion patches determined from the previous and subsequent image frames.

Proceedings Article
08 Dec 2014
TL;DR: This work introduces a cascaded scale space formulation for blind deblurring that suggests a natural approach robust to noise and small scale structures through tying the estimation across multiple scales and balancing the contributions of different scales automatically by learning from data.
Abstract: The presence of noise and small scale structures usually leads to large kernel estimation errors in blind image deblurring empirically, if not a total failure. We present a scale space perspective on blind deblurring algorithms, and introduce a cascaded scale space formulation for blind deblurring. This new formulation suggests a natural approach robust to noise and small scale structures through tying the estimation across multiple scales and balancing the contributions of different scales automatically by learning from data. The proposed formulation also allows to handle non-uniform blur with a straightforward extension. Experiments are conducted on both benchmark dataset and real-world images to validate the effectiveness of the proposed method. One surprising finding based on our approach is that blur kernel estimation is not necessarily best at the finest scale.

Proceedings ArticleDOI
24 Mar 2014
TL;DR: This paper constructs a unified SR formulation, and proposes an iterative joint super resolution (IJSR) algorithm to solve the optimization, which leads to an impressive improvement of SR results both quantitatively and qualitatively.
Abstract: Existing example-based super resolution (SR) methods are built upon either external-examples or self-examples. Although effective in certain cases, both methods suffer from their inherent limitation. This paper goes beyond these two classes of most common example-based SR approaches, and proposes a novel joint SR perspective. The joint SR exploits and maximizes the complementary advantages of external- and self-example based methods. We elaborate on exploitable priors for image components of different nature, and formulate their corresponding loss functions mathematically. Equipped with that, we construct a unified SR formulation, and propose an iterative joint super resolution (IJSR) algorithm to solve the optimization. Such a joint perspective approach leads to an impressive improvement of SR results both quantitatively and qualitatively.

Patent
07 Nov 2014
TL;DR: In this article, a Gaussian Mixture Model (GMM) is used for image feature representation, which can be used in any number of classification tasks and is particularly effective with respect to fine-grained image classification tasks.
Abstract: Techniques are disclosed for image feature representation. The techniques exhibit discriminative power that can be used in any number of classification tasks, and are particularly effective with respect to fine-grained image classification tasks. In an embodiment, a given image to be classified is divided into image patches. A vector is generated for each image patch. Each image patch vector is compared to the Gaussian mixture components (each mixture component is also a vector) of a Gaussian Mixture Model (GMM). Each such comparison generates a similarity score for each image patch vector. For each Gaussian mixture component, the image patch vectors associated with a similarity score that is too low are eliminated. The selectively pooled vectors from all the Gaussian mixture components are then concatenated to form the final image feature vector, which can be provided to a classifier so the given input image can be properly categorized.

Patent
23 May 2014
TL;DR: In this article, the authors define a font graph as a set of nodes representing fonts and a finite set of undirected edges denoting similarities between fonts, enabling users to browse and identify similar fonts.
Abstract: Font graphs are defined having a finite set of nodes representing fonts and a finite set of undirected edges denoting similarities between fonts. The font graphs enable users to browse and identify similar fonts. Indications corresponding to a degree of similarity between connected nodes may be provided. A selection of a desired font or characteristics associated with one or more attributes of the desired font is received from a user interacting with the font graph. The font graph is dynamically redefined based on the selection.

Patent
16 May 2014
TL;DR: In this article, patch partitioning and image processing techniques are described, where one or more modules are configured to perform operations including grouping a plurality of patches taken from a training sample of images into respective ones of partitions, calculating an image processing operator for each of the partitions, determining distances between the plurality of partitions that describe image similarity, and configuring a database to provide the determined distance and the image processing operators to process an image in response to identification of a respective partition that corresponds to a patch taken from the image.
Abstract: Patch partition and image processing techniques are described. In one or more implementations, a system includes one or more modules implemented at least partially in hardware. The one or more modules are configured to perform operations including grouping a plurality of patches taken from a plurality of training samples of images into respective ones of a plurality of partitions, calculating an image processing operator for each of the partitions, determining distances between the plurality of partitions that describe image similarity of patches of the plurality of partitions, one to another, and configuring a database to provide the determined distance and the image processing operator to process an image in response to identification of a respective partition that corresponds to a patch taken from the image.

Proceedings ArticleDOI
04 May 2014
TL;DR: A new image colorization method is developed, epitomic imagecolorization, which automatically transfers color from the reference color image to the target grayscale image by a robust feature matching scheme using a new feature representation, namely the heterogeneous feature epitome.
Abstract: Image colorization adds color to grayscale images. It not only increases the visual appeal of grayscale images, but also enriches the information conveyed by scientific images that lack color information. We develop a new image colorization method, epitomic image colorization, which automatically transfers color from the reference color image to the target grayscale image by a robust feature matching scheme using a new feature representation, namely the heterogeneous feature epitome. As a generative model, heterogeneous feature epitome is a condensed representation of image appearance which is employed for measuring the dissimilarity between reference patches and target patches in a way robust to noise in the reference image. We build a Markov Random Field (MRF) model with the learned heterogeneous feature epitome from the reference image, and inference in the MRF model achieves robust feature matching for transferring color. Our method renders better colorization results than the current state-of-the-art automatic colorization methods in our experiments.

Posted Content
TL;DR: A Convolutional Neural Network decomposition approach is introduced, leveraging a large training corpus of synthetic data to obtain effective features for classification and achieves an accuracy of higher than 80% on a new large labeled real-world dataset the authors collected.
Abstract: We present a domain adaption framework to address a domain mismatch between synthetic training and real-world testing data. We demonstrate our method on a challenging fine-grain classification problem: recognizing a font style from an image of text. In this task, it is very easy to generate lots of rendered font examples but very hard to obtain real-world labeled images. This real-to-synthetic domain gap caused poor generalization to new real data in previous font recognition methods (Chen et al. (2014)). In this paper, we introduce a Convolutional Neural Network decomposition approach, leveraging a large training corpus of synthetic data to obtain effective features for classification. This is done using an adaptation technique based on a Stacked Convolutional Auto-Encoder that exploits a large collection of unlabeled real-world text images combined with synthetic data preprocessed in a specific way. The proposed DeepFont method achieves an accuracy of higher than 80% (top-5) on a new large labeled real-world dataset we collected.

Posted Content
TL;DR: This paper presents an image similarity learning method that can scale well in both the number of images and the dimensionality of image descriptors and exploits the ensemble of projections so that high-dimensional features can be processed in a set of lower-dimensional subspaces in parallel.
Abstract: Classifying large-scale image data into object categories is an important problem that has received increasing research attention. Given the huge amount of data, non-parametric approaches such as nearest neighbor classifiers have shown promising results, especially when they are underpinned by a learned distance or similarity measurement. Although metric learning has been well studied in the past decades, most existing algorithms are impractical to handle large-scale data sets. In this paper, we present an image similarity learning method that can scale well in both the number of images and the dimensionality of image descriptors. To this end, similarity comparison is restricted to each sample's local neighbors and a discriminative similarity measure is induced from large margin neighborhood embedding. We also exploit the ensemble of projections so that high-dimensional features can be processed in a set of lower-dimensional subspaces in parallel without much performance compromise. The similarity function is learned online using a stochastic gradient descent algorithm in which the triplet sampling strategy is customized for quick convergence of classification performance. The effectiveness of our proposed model is validated on several data sets with scales varying from tens of thousands to one million images. Recognition accuracies competitive with the state-of-the-art performance are achieved with much higher efficiency and scalability.