Author
Jian Zhang
Other affiliations: Nanjing University of Science and Technology
Bio: Jian Zhang is an academic researcher from Huaihai Institute of Technology. The author has contributed to research in topics: k-nearest neighbors algorithm & Facial recognition system. The author has an hindex of 6, co-authored 6 publications receiving 187 citations. Previous affiliations of Jian Zhang include Nanjing University of Science and Technology.
Papers
More filters
TL;DR: A convolutional sparse auto-encoder (CSAE), which leverages the structure of the Convolutional AE and incorporates the max-pooling to heuristically sparsify the feature maps for feature learning and makes the stochastic gradient descent algorithm work efficiently for the CSAE training.
Abstract: Convolutional sparse coding (CSC) can model local connections between image content and reduce the code redundancy when compared with patch-based sparse coding. However, CSC needs a complicated optimization procedure to infer the codes (i.e., feature maps). In this brief, we proposed a convolutional sparse auto-encoder (CSAE), which leverages the structure of the convolutional AE and incorporates the max-pooling to heuristically sparsify the feature maps for feature learning. Together with competition over feature channels, this simple sparsifying strategy makes the stochastic gradient descent algorithm work efficiently for the CSAE training; thus, no complicated optimization procedure is involved. We employed the features learned in the CSAE to initialize convolutional neural networks for classification and achieved competitive results on benchmark data sets. In addition, by building connections between the CSAE and CSC, we proposed a strategy to construct local descriptors from the CSAE for classification. Experiments on Caltech-101 and Caltech-256 clearly demonstrated the effectiveness of the proposed method and verified the CSAE as a CSC model has the ability to explore connections between neighboring image content for classification tasks.
114 citations
TL;DR: The experimental results demonstrate that when the sufficient conditions are satisfied, the pretraining models lead to sparseness, and the experimental results show that the performance of RePLU is better than ReLU, and is comparable with those with some pretraining techniques, such as RBMs and DAEs.
Abstract: A major progress in deep multilayer neural networks (DNNs) is the invention of various unsupervised pretraining methods to initialize network parameters which lead to good prediction accuracy. This paper presents the sparseness analysis on the hidden unit in the pretraining process. In particular, we use the $L_{1}$ -norm to measure sparseness and provide some sufficient conditions for that pretraining leads to sparseness with respect to the popular pretraining models—such as denoising autoencoders (DAEs) and restricted Boltzmann machines (RBMs). Our experimental results demonstrate that when the sufficient conditions are satisfied, the pretraining models lead to sparseness. Our experiments also reveal that when using the sigmoid activation functions, pretraining plays an important sparseness role in DNNs with sigmoid (Dsigm), and when using the rectifier linear unit (ReLU) activation functions, pretraining becomes less effective for DNNs with ReLU (Drelu). Luckily, Drelu can reach a higher recognition accuracy than DNNs with pretraining (DAEs and RBMs), as it can capture the main benefit (such as sparseness-encouraging) of pretraining in Dsigm. However, ReLU is not adapted to the different firing rates in biological neurons, because the firing rate actually changes along with the varying membrane resistances. To address this problem, we further propose a family of rectifier piecewise linear units (RePLUs) to fit the different firing rates. The experimental results show that the performance of RePLU is better than ReLU, and is comparable with those with some pretraining techniques, such as RBMs and DAEs.
65 citations
TL;DR: The intuitive interpretation and mathematical proofs are presented to reveal the efficient working mechanism of LRM and it is found that LRM provides more useful information and advantages than the conventional similarity measure model which calculates the distance between two entities.
Abstract: The linear reconstruction measure (LRM), which determines the nearest neighbors of the query sample in all known training samples by sorting the minimum L"2-norm error linear reconstruction coefficients, is introduced in this paper. The intuitive interpretation and mathematical proofs are presented to reveal the efficient working mechanism of LRM. Through analyzing the physical meaning of coefficients and regularization items, we find that LRM provides more useful information and advantages than the conventional similarity measure model which calculates the distance between two entities (i.e. conventional point-to-point, C-PtP). Inspired by the advantages of LRM, the linear reconstruction measure steered nearest neighbor classification framework (LRM-NNCF) is designed with eight classifiers according to different decision rules and models of LRM. Evaluation on several face databases and the experimental results demonstrate that these proposed classifiers can achieve greater performance than the C-PtP based 1-NNs and competitive recognition accuracy and robustness compared with the state-of-the-art classifiers.
19 citations
TL;DR: This paper presents a simple but effective method for face recognition, named nearest orthogonal matrix representation (NOMR), which is more robust for alleviating the effect of illumination and heterogeneous, and more intuitive and powerful for handling the small sample size problem.
Abstract: This paper presents a simple but effective method for face recognition, named nearest orthogonal matrix representation (NOMR). Specifically, the specific individual subspace of each image is estimated and represented uniquely by the sum of a set of basis matrices generated via singular value decomposition (SVD), i.e. the nearest orthogonal matrix (NOM) of original image. Then, the nearest neighbor criterion is introduced for recognition. Compared with the current specific individual subspace based methods (e.g. the sparse representation based classifier, the linear regression based classifier and so on), the proposed NOMR is more robust for alleviating the effect of illumination and heterogeneous (e.g. sketch face recognition), and more intuitive and powerful for handling the small sample size problem. To evaluate the performance of the proposed method, a series of experiments were performed on several face databases: Extended Yale B, CMU-PIE, FRGCv2, AR and CUHK Face Sketch database (CUFS). Experimental results demonstrate that the proposed method achieves encouraging performance compared with the state-of-the-art methods.
17 citations
TL;DR: A soft salient coding method is proposed, which overcomes the information suppression problem in the original salient coding (SaC) method and is proposed using multiple kernel learning (MKL) to combine these features for classification tasks.
Abstract: Feature combination is an effective way for image classification. Most of the work in this line mainly considers feature combination based on different low-level image descriptors, while ignoring the complementary property of different higher-level image features derived from the same type of low-level descriptor. In this paper, we explore the complementary property of different image features generated from one single type of low-level descriptor for image classification. Specifically, we propose a soft salient coding (SSaC) method, which overcomes the information suppression problem in the original salient coding (SaC) method. We analyse the physical meaning of the SSaC feature and the other two types of image features in the framework of Spatial Pyramid Matching (SPM), and propose using multiple kernel learning (MKL) to combine these features for classification tasks. Experiments on three image databases (Caltech-101, UIUC 8-Sports and 15-Scenes) not only verify the effectiveness of the proposed MKL combination method, but also reveal that collaboration is more important than selection for classification when limited types of image features are employed.
12 citations
Cited by
More filters
TL;DR: The proposed model defeats the state-of-the-art deep learning approaches applied to place recognition and is easily trained via the standard backpropagation method.
Abstract: We propose an end-to-end place recognition model based on a novel deep neural network. First, we propose to exploit the spatial pyramid structure of the images to enhance the vector of locally aggregated descriptors (VLAD) such that the enhanced VLAD features can reflect the structural information of the images. To encode this feature extraction into the deep learning method, we build a spatial pyramid-enhanced VLAD (SPE-VLAD) layer. Next, we impose weight constraints on the terms of the traditional triplet loss (T-loss) function such that the weighted T-loss (WT-loss) function avoids the suboptimal convergence of the learning process. The loss function can work well under weakly supervised scenarios in that it determines the semantically positive and negative samples of each query through not only the GPS tags but also the Euclidean distance between the image representations. The SPE-VLAD layer and the WT-loss layer are integrated with the VGG-16 network or ResNet-18 network to form a novel end-to-end deep neural network that can be easily trained via the standard backpropagation method. We conduct experiments on three benchmark data sets, and the results demonstrate that the proposed model defeats the state-of-the-art deep learning approaches applied to place recognition.
281 citations
01 Jan 2013
TL;DR: In this article, a fast convolutional sparse coding algorithm with globally optimal sub problems and super-linear convergence is proposed for sparse coding with signal processing and augmented Lagrange methods.
Abstract: Sparse coding has become an increasingly popular method in learning and vision for a variety of classification, reconstruction and coding tasks. The canonical approach intrinsically assumes independence between observations during learning. For many natural signals however, sparse coding is applied to sub-elements ( i.e. patches) of the signal, where such an assumption is invalid. Convolutional sparse coding explicitly models local interactions through the convolution operator, however the resulting optimization problem is considerably more complex than traditional sparse coding. In this paper, we draw upon ideas from signal processing and Augmented Lagrange Methods (ALMs) to produce a fast algorithm with globally optimal sub problems and super-linear convergence.
271 citations
TL;DR: A snapshot of the fast-growing deep learning field for microscopy image analysis, which explains the architectures and the principles of convolutional neural networks, fully Convolutional networks, recurrent neural Networks, stacked autoencoders, and deep belief networks and their formulations or modelings for specific tasks on various microscopy images.
Abstract: Computerized microscopy image analysis plays an important role in computer aided diagnosis and prognosis. Machine learning techniques have powered many aspects of medical investigation and clinical practice. Recently, deep learning is emerging as a leading machine learning tool in computer vision and has attracted considerable attention in biomedical image analysis. In this paper, we provide a snapshot of this fast-growing field, specifically for microscopy image analysis. We briefly introduce the popular deep neural networks and summarize current deep learning achievements in various tasks, such as detection, segmentation, and classification in microscopy image analysis. In particular, we explain the architectures and the principles of convolutional neural networks, fully convolutional networks, recurrent neural networks, stacked autoencoders, and deep belief networks, and interpret their formulations or modelings for specific tasks on various microscopy images. In addition, we discuss the open challenges and the potential trends of future research in microscopy image analysis using deep learning.
235 citations
TL;DR: The experimental results demonstrate that the proposed GMDKNN performs better and has the less sensitiveness to k, which could be a promising method for pattern recognition in some expert and intelligence systems.
Abstract: K-nearest neighbor (KNN) rule is a well-known non-parametric classifier that is widely used in pattern recognition. However, the sensitivity of the neighborhood size k always seriously degrades the KNN-based classification performance, especially in the case of the small sample size with the existing outliers. To overcome this issue, in this article we propose a generalized mean distance-based k-nearest neighbor classifier (GMDKNN) by introducing multi-generalized mean distances and the nested generalized mean distance that are based on the characteristic of the generalized mean. In the proposed method, multi-local mean vectors of the given query sample in each class are calculated by adopting its class-specific k nearest neighbors. Using the achieved k local mean vectors per class, the corresponding k generalized mean distances are calculated and then used to design the categorical nested generalized mean distance. In the classification phase, the categorical nested generalized mean distance is used as the classification decision rule and the query sample is classified into the class with the minimum nested generalized mean distance among all the classes. Extensive experiments on the UCI and KEEL data sets, synthetic data sets, the KEEL noise data sets and the UCR time series data sets are conducted by comparing the proposed method to the state-of-art KNN-based methods. The experimental results demonstrate that the proposed GMDKNN performs better and has the less sensitiveness to k. Thus, our proposed GMDKNN with the robust and effective classification performance could be a promising method for pattern recognition in some expert and intelligence systems.
231 citations
TL;DR: An overview of the field of automatic ear recognition (from 2D images) and focuses specifically on the most recent, descriptor-based methods proposed in this area, as well as introducing a new, fully unconstrained dataset of ear images gathered from the web and a toolbox implementing several state-of-the-art techniques.
Abstract: Automatic identity recognition from ear images represents an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes the technology an appealing choice for surveillance and security applications as well as other application domains. Significant contributions have been made in the field over recent years, but open research problems still remain and hinder a wider (commercial) deployment of the technology. This paper presents an overview of the field of automatic ear recognition (from 2D images) and focuses specifically on the most recent, descriptor-based methods proposed in this area. Open challenges are discussed and potential research directions are outlined with the goal of providing the reader with a point of reference for issues worth examining in the future. In addition to a comprehensive review on ear recognition technology, the paper also introduces a new, fully unconstrained dataset of ear images gathered from the web and a toolbox implementing several state-of-the-art techniques for ear recognition. The dataset and toolbox are meant to address some of the open issues in the field and are made publicly available to the research community.
190 citations