scispace - formally typeset
Search or ask a question
Author

Wei Xu

Bio: Wei Xu is an academic researcher from Nanjing University of Science and Technology. The author has contributed to research in topics: Contextual image classification & Feature (computer vision). The author has an hindex of 4, co-authored 5 publications receiving 143 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A convolutional sparse auto-encoder (CSAE), which leverages the structure of the Convolutional AE and incorporates the max-pooling to heuristically sparsify the feature maps for feature learning and makes the stochastic gradient descent algorithm work efficiently for the CSAE training.
Abstract: Convolutional sparse coding (CSC) can model local connections between image content and reduce the code redundancy when compared with patch-based sparse coding. However, CSC needs a complicated optimization procedure to infer the codes (i.e., feature maps). In this brief, we proposed a convolutional sparse auto-encoder (CSAE), which leverages the structure of the convolutional AE and incorporates the max-pooling to heuristically sparsify the feature maps for feature learning. Together with competition over feature channels, this simple sparsifying strategy makes the stochastic gradient descent algorithm work efficiently for the CSAE training; thus, no complicated optimization procedure is involved. We employed the features learned in the CSAE to initialize convolutional neural networks for classification and achieved competitive results on benchmark data sets. In addition, by building connections between the CSAE and CSC, we proposed a strategy to construct local descriptors from the CSAE for classification. Experiments on Caltech-101 and Caltech-256 clearly demonstrated the effectiveness of the proposed method and verified the CSAE as a CSC model has the ability to explore connections between neighboring image content for classification tasks.

114 citations

Journal ArticleDOI
TL;DR: The proposed alternating direction method of multipliers (ADMMs) is derived to solve the matrix-based optimization problem under sparse regularization and is more effective than the state-of-the-art methods when dealing with the structural errors.
Abstract: Sparse representation learning has been successfully applied into image classification, which represents a given image as a linear combination of an over-complete dictionary. The classification result depends on the reconstruction residuals. Normally, the images are stretched into vectors for convenience, and the representation residuals are characterized by $l_{2}$ -norm or $l_{1}$ -norm, which actually assumes that the elements in the residuals are independent and identically distributed variables. However, it is hard to satisfy the hypothesis when it comes to some structural errors, such as illuminations, occlusions, and so on. In this paper, we represent the image data in their intrinsic matrix form rather than concatenated vectors. The representation residual is considered as a matrix variate following the matrix elliptically contoured distribution, which is robust to dependent errors and has long tail regions to fit outliers. Then, we seek the maximum a posteriori probability estimation solution of the matrix-based optimization problem under sparse regularization. An alternating direction method of multipliers (ADMMs) is derived to solve the resulted optimization problem. The convergence of the ADMM is proven theoretically. Experimental results demonstrate that the proposed method is more effective than the state-of-the-art methods when dealing with the structural errors.

35 citations

Journal ArticleDOI
TL;DR: The concept of locality is introduced into the auto-encoder, which enables theAutoencoder to encode similar inputs using similar features to classify images.
Abstract: We propose a locality-constrained sparse auto-encoder (LSAE) for image classification in this letter. Previous work has shown that the locality is more essential than sparsity for classification task. We here introduce the concept of locality into the auto-encoder, which enables the auto-encoder to encode similar inputs using similar features. The proposed LSAE can be trained by the existing backprop algorithm; no complicated optimization is involved. Experiments on the CIFAR-10, STL-10 and Caltech-101 datasets validate the effectiveness of LSAE for classification task.

20 citations

Journal ArticleDOI
Wei Luo1, Jian Yang1, Wei Xu1, Jun Li1, Jian Zhang1 
TL;DR: A soft salient coding method is proposed, which overcomes the information suppression problem in the original salient coding (SaC) method and is proposed using multiple kernel learning (MKL) to combine these features for classification tasks.
Abstract: Feature combination is an effective way for image classification. Most of the work in this line mainly considers feature combination based on different low-level image descriptors, while ignoring the complementary property of different higher-level image features derived from the same type of low-level descriptor. In this paper, we explore the complementary property of different image features generated from one single type of low-level descriptor for image classification. Specifically, we propose a soft salient coding (SSaC) method, which overcomes the information suppression problem in the original salient coding (SaC) method. We analyse the physical meaning of the SSaC feature and the other two types of image features in the framework of Spatial Pyramid Matching (SPM), and propose using multiple kernel learning (MKL) to combine these features for classification tasks. Experiments on three image databases (Caltech-101, UIUC 8-Sports and 15-Scenes) not only verify the effectiveness of the proposed MKL combination method, but also reveal that collaboration is more important than selection for classification when limited types of image features are employed.

12 citations

Book ChapterDOI
14 Jun 2015
TL;DR: This paper explores the ability of denoising auto-encoder with ReLU for pre-training CNN layer-by-layer, and investigates the performance of CNN with weight initialized by the pre-trained features for image classification tasks, where the number of training samples is limited and the size of samples is large.
Abstract: The Neural Network (NN) with Rectified Linear Units (ReLU), has achieved a big success for image classification with large number of labelled training samples. The performance however is unclear when the number of labelled training samples is limited and the size of samples is large. Usually, the Convolutional Neural Network (CNN) is used to process the large-size images, but the unsupervised pre-training method for deep CNN is still progressing slowly. Therefore, in this paper, we first explore the ability of denoising auto-encoder with ReLU for pre-training CNN layer-by-layer, and then investigate the performance of CNN with weight initialized by the pre-trained features for image classification tasks, where the number of training samples is limited and the size of samples is large. Experiments on Caltech-101 benchmark demonstrate the effectiveness of our method.

4 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The proposed model defeats the state-of-the-art deep learning approaches applied to place recognition and is easily trained via the standard backpropagation method.
Abstract: We propose an end-to-end place recognition model based on a novel deep neural network. First, we propose to exploit the spatial pyramid structure of the images to enhance the vector of locally aggregated descriptors (VLAD) such that the enhanced VLAD features can reflect the structural information of the images. To encode this feature extraction into the deep learning method, we build a spatial pyramid-enhanced VLAD (SPE-VLAD) layer. Next, we impose weight constraints on the terms of the traditional triplet loss (T-loss) function such that the weighted T-loss (WT-loss) function avoids the suboptimal convergence of the learning process. The loss function can work well under weakly supervised scenarios in that it determines the semantically positive and negative samples of each query through not only the GPS tags but also the Euclidean distance between the image representations. The SPE-VLAD layer and the WT-loss layer are integrated with the VGG-16 network or ResNet-18 network to form a novel end-to-end deep neural network that can be easily trained via the standard backpropagation method. We conduct experiments on three benchmark data sets, and the results demonstrate that the proposed model defeats the state-of-the-art deep learning approaches applied to place recognition.

281 citations

01 Jan 2013
TL;DR: In this article, a fast convolutional sparse coding algorithm with globally optimal sub problems and super-linear convergence is proposed for sparse coding with signal processing and augmented Lagrange methods.
Abstract: Sparse coding has become an increasingly popular method in learning and vision for a variety of classification, reconstruction and coding tasks. The canonical approach intrinsically assumes independence between observations during learning. For many natural signals however, sparse coding is applied to sub-elements ( i.e. patches) of the signal, where such an assumption is invalid. Convolutional sparse coding explicitly models local interactions through the convolution operator, however the resulting optimization problem is considerably more complex than traditional sparse coding. In this paper, we draw upon ideas from signal processing and Augmented Lagrange Methods (ALMs) to produce a fast algorithm with globally optimal sub problems and super-linear convergence.

271 citations

Journal ArticleDOI
TL;DR: A new DNN, one-dimensional residual convolutional autoencoder (1-DRCAE), is proposed for learning features from vibration signals directly in an unsupervised-learning way and performs better on feature extraction than the typical DNNs, e.g., deep belief network, stacked autoencoders, and 1-D CNN.
Abstract: Vibration signals are generally utilized for machinery fault diagnosis to perform timely maintenance and then reduce losses. Thus, the feature extraction on one-dimensional vibration signals often determines accuracy of those fault diagnosis models. These typical deep neural networks (DNNs), e.g., convolutional neural networks (CNNs), perform well in feature learning and have been applied in machine fault diagnosis. However, the supervised learning of CNN often requires a large amount of labeled images and thus limits its wide applications. In this article, a new DNN, one-dimensional residual convolutional autoencoder (1-DRCAE), is proposed for learning features from vibration signals directly in an unsupervised-learning way. First, 1-D convolutional autoencoder is proposed in 1-DRCAE for feature extraction. Second, a deconvolution operation is developed as decoder of 1-DRCAE to reconstruct the filtered signals. Third, residual learning is employed in 1-DRCAE to perform feature learning on 1-D vibration signals. The results show that 1-DRCAE has good signal denoising and feature extraction performance on vibration signals. It performs better on feature extraction than the typical DNNs, e.g., deep belief network, stacked autoencoders, and 1-D CNN.

116 citations

Journal ArticleDOI
Haiyong Zheng1, Ruchen Wang1, Zhibin Yu1, Nan Wang1, Zhaorui Gu1, Bing Zheng1 
TL;DR: This study demonstrated automatic plankton image classification system combining multiple view features using multiple kernel learning using three kernel functions (linear, polynomial and Gaussian kernel functions) can describe and use information of features better so that achieve a higher classification accuracy.
Abstract: Plankton, including phytoplankton and zooplankton, are the main source of food for organisms in the ocean and form the base of marine food chain. As the fundamental components of marine ecosystems, plankton is very sensitive to environment changes, and the study of plankton abundance and distribution is crucial, in order to understand environment changes and protect marine ecosystems. This study was carried out to develop an extensive applicable plankton classification system with high accuracy for the increasing number of various imaging devices. Literature shows that most plankton image classification systems were limited to only one specific imaging device and a relatively narrow taxonomic scope. The real practical system for automatic plankton classification is even non-existent and this study is partly to fill this gap. Inspired by the analysis of literature and development of technology, we focused on the requirements of practical application and proposed an automatic system for plankton image classification combining multiple view features via multiple kernel learning (MKL). For one thing, in order to describe the biomorphic characteristics of plankton more completely and comprehensively, we combined general features with robust features, especially by adding features like Inner-Distance Shape Context for morphological representation. For another, we divided all the features into different types from multiple views and feed them to multiple classifiers instead of only one by combining different kernel matrices computed from different types of features optimally via multiple kernel learning. Moreover, we also applied feature selection method to choose the optimal feature subsets from redundant features for satisfying different datasets from different imaging devices. We implemented our proposed classification system on three different datasets across more than 20 categories from phytoplankton to zooplankton. The experimental results validated that our system outperforms state-of-the-art plankton image classification systems in terms of accuracy and robustness. This study demonstrated automatic plankton image classification system combining multiple view features using multiple kernel learning. The results indicated that multiple view features combined by NLMKL using three kernel functions (linear, polynomial and Gaussian kernel functions) can describe and use information of features better so that achieve a higher classification accuracy.

88 citations

Journal ArticleDOI
TL;DR: A unique survey of the state-of-the-art image matching methods based on feature descriptor is presented, from which future research may benefit.
Abstract: Image registration is an important technique in many computer vision applications such as image fusion, image retrieval, object tracking, face recognition, change detection and so on. Local feature descriptors, i.e., how to detect features and how to describe them, play a fundamental and important role in image registration process, which directly influence the accuracy and robustness of image registration. This paper mainly focuses on the variety of local feature descriptors including some theoretical research, mathematical models, and methods or algorithms along with their applications in the context of image registration. The existing local feature descriptors are roughly classified into six categories to demonstrate and analyze comprehensively their own advantages. The current and future challenges of local feature descriptors are discussed. The major goal of the paper is to present a unique survey of the state-of-the-art image matching methods based on feature descriptor, from which future research may benefit.

82 citations