scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 2016"


Journal ArticleDOI
TL;DR: The basic ideas of PCA are introduced, discussing what it can and cannot do, and some variants of the technique have been developed that are tailored to various different data types and structures.
Abstract: Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori , hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.

4,289 citations


Journal ArticleDOI
TL;DR: This paper presents a comprehensive survey of the state-of-the-art work on EC for feature selection, which identifies the contributions of these different algorithms.
Abstract: Feature selection is an important task in data mining and machine learning to reduce the dimensionality of the data and increase the performance of an algorithm, such as a classification algorithm. However, feature selection is a challenging task due mainly to the large search space. A variety of methods have been applied to solve feature selection problems, where evolutionary computation (EC) techniques have recently gained much attention and shown some success. However, there are no comprehensive guidelines on the strengths and weaknesses of alternative approaches. This leads to a disjointed and fragmented field with ultimately lost opportunities for improving performance and successful applications. This paper presents a comprehensive survey of the state-of-the-art work on EC for feature selection, which identifies the contributions of these different algorithms. In addition, current issues and challenges are also discussed to identify promising areas for future research.

1,237 citations


Journal ArticleDOI
TL;DR: Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods, and multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme.
Abstract: Extreme learning machine (ELM) is an emerging learning algorithm for the generalized single hidden layer feedforward neural networks, of which the hidden node parameters are randomly generated and the output weights are analytically computed. However, due to its shallow architecture, feature learning using ELM may not be effective for natural signals (e.g., images/videos), even with a large number of hidden nodes. To address this issue, in this paper, a new ELM-based hierarchical learning framework is proposed for multilayer perceptron. The proposed architecture is divided into two main components: 1) self-taught feature extraction followed by supervised feature classification and 2) they are bridged by random initialized hidden weights. The novelties of this paper are as follows: 1) unsupervised multilayer encoding is conducted for feature extraction, and an ELM-based sparse autoencoder is developed via $\ell _{1}$ constraint. By doing so, it achieves more compact and meaningful feature representations than the original ELM; 2) by exploiting the advantages of ELM random feature mapping, the hierarchically encoded outputs are randomly projected before final decision making, which leads to a better generalization with faster learning speed; and 3) unlike the greedy layerwise training of deep learning (DL), the hidden layers of the proposed framework are trained in a forward manner. Once the previous layer is established, the weights of the current layer are fixed without fine-tuning. Therefore, it has much better learning efficiency than the DL. Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods. Furthermore, multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme.

1,166 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide an overview of FDA, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is functional principal component analysis (FPCA).
Abstract: With the advance of modern technology, more and more data are being recorded continuously during a time interval or intermittently at several discrete time points. These are both examples of functional data, which has become a commonly encountered type of data. Functional data analysis (FDA) encompasses the statistical methodology for such data. Broadly interpreted, FDA deals with the analysis and theory of data that are in the form of functions. This paper provides an overview of FDA, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is functional principal component analysis (FPCA). FPCA is an important dimension reduction tool, and in sparse data situations it can be used to impute functional data that are sparsely observed. Other dimension reduction approaches are also discussed. In addition, we review another core technique, functional linear regression, as well as clustering and classification of functional d...

963 citations


Journal ArticleDOI
TL;DR: A spectral-spatial feature based classification (SSFC) framework that jointly uses dimension reduction and deep learning techniques for spectral and spatial feature extraction, respectively is proposed.
Abstract: In this paper, we propose a spectral–spatial feature based classification (SSFC) framework that jointly uses dimension reduction and deep learning techniques for spectral and spatial feature extraction, respectively. In this framework, a balanced local discriminant embedding algorithm is proposed for spectral feature extraction from high-dimensional hyperspectral data sets. In the meantime, convolutional neural network is utilized to automatically find spatial-related features at high levels. Then, the fusion feature is extracted by stacking spectral and spatial features together. Finally, the multiple-feature-based classifier is trained for image classification. Experimental results on well-known hyperspectral data sets show that the proposed SSFC method outperforms other commonly used methods for hyperspectral image classification.

872 citations


Journal ArticleDOI
TL;DR: The results show that auto-encoder can indeed learn something different from other methods, and its possible relation with the intrinsic dimensionality of input data.

583 citations


Posted Content
TL;DR: This work proposes to overcome the SSS problem in re-id distance metric learning by matching people in a discriminative null space of the training data, which has a fixed dimension, a closed-form solution and is very efficient to compute.
Abstract: Most existing person re-identification (re-id) methods focus on learning the optimal distance metrics across camera views. Typically a person's appearance is represented using features of thousands of dimensions, whilst only hundreds of training samples are available due to the difficulties in collecting matched training images. With the number of training samples much smaller than the feature dimension, the existing methods thus face the classic small sample size (SSS) problem and have to resort to dimensionality reduction techniques and/or matrix regularisation, which lead to loss of discriminative power. In this work, we propose to overcome the SSS problem in re-id distance metric learning by matching people in a discriminative null space of the training data. In this null space, images of the same person are collapsed into a single point thus minimising the within-class scatter to the extreme and maximising the relative between-class separation simultaneously. Importantly, it has a fixed dimension, a closed-form solution and is very efficient to compute. Extensive experiments carried out on five person re-identification benchmarks including VIPeR, PRID2011, CUHK01, CUHK03 and Market1501 show that such a simple approach beats the state-of-the-art alternatives, often by a big margin.

556 citations


Proceedings ArticleDOI
07 Mar 2016
TL;DR: In this paper, images of the same person are collapsed into a single point, thus minimising the within-class scatter to the extreme and maximising the relative between-class separation simultaneously.
Abstract: Most existing person re-identification (re-id) methods focus on learning the optimal distance metrics across camera views. Typically a person's appearance is represented using features of thousands of dimensions, whilst only hundreds of training samples are available due to the difficulties in collecting matched training images. With the number of training samples much smaller than the feature dimension, the existing methods thus face the classic small sample size (SSS) problem and have to resort to dimensionality reduction techniques and/or matrix regularisation, which lead to loss of discriminative power. In this work, we propose to overcome the SSS problem in re-id distance metric learning by matching people in a discriminative null space of the training data. In this null space, images of the same person are collapsed into a single point thus minimising the within-class scatter to the extreme and maximising the relative between-class separation simultaneously. Importantly, it has a fixed dimension, a closed-form solution and is very efficient to compute. Extensive experiments carried out on five person re-identification benchmarks including VIPeR, PRID2011, CUHK01, CUHK03 and Market1501 show that such a simple approach beats the state-of-the-art alternatives, often by a big margin.

516 citations


Journal ArticleDOI
12 Apr 2016-eLife
TL;DR: A new dimensionality reduction technique, demixed principal component analysis (dPCA), that decomposes population activity into a few components and exposes the dependence of the neural representation on task parameters such as stimuli, decisions, or rewards is demonstrated.
Abstract: Neurons in higher cortical areas, such as the prefrontal cortex, are often tuned to a variety of sensory and motor variables, and are therefore said to display mixed selectivity. This complexity of single neuron responses can obscure what information these areas represent and how it is represented. Here we demonstrate the advantages of a new dimensionality reduction technique, demixed principal component analysis (dPCA), that decomposes population activity into a few components. In addition to systematically capturing the majority of the variance of the data, dPCA also exposes the dependence of the neural representation on task parameters such as stimuli, decisions, or rewards. To illustrate our method we reanalyze population data from four datasets comprising different species, different cortical areas and different experimental tasks. In each case, dPCA provides a concise way of visualizing the data that summarizes the task-dependent features of the population response in a single figure.

443 citations


Journal ArticleDOI
TL;DR: The basic taxonomy of feature selection is presented, and the state-of-the-art gene selection methods are reviewed by grouping the literatures into three categories: supervised, unsupervised, and semi-supervised.
Abstract: Recently, feature selection and dimensionality reduction have become fundamental tools for many data mining tasks, especially for processing high-dimensional data such as gene expression microarray data. Gene expression microarray data comprises up to hundreds of thousands of features with relatively small sample size. Because learning algorithms usually do not work well with this kind of data, a challenge to reduce the data dimensionality arises. A huge number of gene selection are applied to select a subset of relevant features for model construction and to seek for better cancer classification performance. This paper presents the basic taxonomy of feature selection, and also reviews the state-of-the-art gene selection methods by grouping the literatures into three categories: supervised, unsupervised, and semi-supervised. The comparison of experimental results on top 5 representative gene expression datasets indicates that the classification accuracy of unsupervised and semi-supervised feature selection is competitive with supervised feature selection.

402 citations


Journal ArticleDOI
TL;DR: In this paper, the authors provide mathematical and graphical representations and interpretation of tensor networks, with the main focus on the Tucker and Tensor Train (TT) decompositions and their extensions or generalizations.
Abstract: Machine learning and data mining algorithms are becoming increasingly important in analyzing large volume, multi-relational and multi--modal datasets, which are often conveniently represented as multiway arrays or tensors. It is therefore timely and valuable for the multidisciplinary research community to review tensor decompositions and tensor networks as emerging tools for large-scale data analysis and data mining. We provide the mathematical and graphical representations and interpretation of tensor networks, with the main focus on the Tucker and Tensor Train (TT) decompositions and their extensions or generalizations. Keywords: Tensor networks, Function-related tensors, CP decomposition, Tucker models, tensor train (TT) decompositions, matrix product states (MPS), matrix product operators (MPO), basic tensor operations, multiway component analysis, multilinear blind source separation, tensor completion, linear/multilinear dimensionality reduction, large-scale optimization problems, symmetric eigenvalue decomposition (EVD), PCA/SVD, huge systems of linear equations, pseudo-inverse of very large matrices, Lasso and Canonical Correlation Analysis (CCA) (This is Part 1)

Journal ArticleDOI
TL;DR: The proposed general Laplacian regularized low-rank representation framework for data representation takes advantage of the graph regularizer and can represent the global low-dimensional structures, but also capture the intrinsic non-linear geometric information in data.
Abstract: Low-rank representation (LRR) has recently attracted a great deal of attention due to its pleasing efficacy in exploring low-dimensional subspace structures embedded in data. For a given set of observed data corrupted with sparse errors, LRR aims at learning a lowest-rank representation of all data jointly. LRR has broad applications in pattern recognition, computer vision and signal processing. In the real world, data often reside on low-dimensional manifolds embedded in a high-dimensional ambient space. However, the LRR method does not take into account the non-linear geometric structures within data, thus the locality and similarity information among data may be missing in the learning process. To improve LRR in this regard, we propose a general Laplacian regularized low-rank representation framework for data representation where a hypergraph Laplacian regularizer can be readily introduced into, i.e., a Non-negative Sparse Hyper-Laplacian regularized LRR model (NSHLRR). By taking advantage of the graph regularizer, our proposed method not only can represent the global low-dimensional structures, but also capture the intrinsic non-linear geometric information in data. The extensive experimental results on image clustering, semi-supervised image classification and dimensionality reduction tasks demonstrate the effectiveness of the proposed method.

Journal ArticleDOI
TL;DR: A Max-Relevance-Max-Distance (MRMD) feature ranking method, which balances accuracy and stability of feature ranking and prediction task, and runs faster than other filtering and wrapping methods, such as mRMR and Information Gain.

Journal ArticleDOI
TL;DR: It is shown that Bayesian models are able to use prior information and model measurements with various distributions, and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.
Abstract: Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce and investigate several factors affecting the transferability of such representations, such as parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc., and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc.
Abstract: Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.

Journal ArticleDOI
TL;DR: In this paper, a discriminant correlation analysis (DCA) is proposed for feature fusion by maximizing the pairwise correlations across the two feature sets and eliminating the between-class correlations and restricting the correlations to be within the classes.
Abstract: Information fusion is a key step in multimodal biometric systems. The fusion of information can occur at different levels of a recognition system, i.e., at the feature level, matching-score level, or decision level. However, feature level fusion is believed to be more effective owing to the fact that a feature set contains richer information about the input biometric data than the matching score or the output decision of a classifier. The goal of feature fusion for recognition is to combine relevant information from two or more feature vectors into a single one with more discriminative power than any of the input feature vectors. In pattern recognition problems, we are also interested in separating the classes. In this paper, we present discriminant correlation analysis (DCA), a feature level fusion technique that incorporates the class associations into the correlation analysis of the feature sets. DCA performs an effective feature fusion by maximizing the pairwise correlations across the two feature sets and, at the same time, eliminating the between-class correlations and restricting the correlations to be within the classes. Our proposed method can be used in pattern recognition applications for fusing the features extracted from multiple modalities or combining different feature vectors extracted from a single modality. It is noteworthy that DCA is the first technique that considers class structure in feature fusion. Moreover, it has a very low computational complexity and it can be employed in real-time applications. Multiple sets of experiments performed on various biometric databases and using different feature extraction techniques, show the effectiveness of our proposed method, which outperforms other state-of-the-art approaches.

Journal ArticleDOI
TL;DR: Segmented SAE (S-SAE) is proposed by confronting the original features into smaller data segments, which are separately processed by different smaller SAEs, which has resulted in reduced complexity but improved efficacy of data abstraction and accuracy of data classification.

Posted Content
TL;DR: In this article, a new CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes is proposed, which makes use of LSTM units on the CNN output, which play the role of a structured dimensionality reduction on the feature vector.
Abstract: In this work we propose a new CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes. CNNs allow us to learn suitable feature representations for localization that are robust against motion blur and illumination changes. We make use of LSTM units on the CNN output, which play the role of a structured dimensionality reduction on the feature vector, leading to drastic improvements in localization performance. We provide extensive quantitative comparison of CNN-based and SIFT-based localization methods, showing the weaknesses and strengths of each. Furthermore, we present a new large-scale indoor dataset with accurate ground truth from a laser scanner. Experimental results on both indoor and outdoor public datasets show our method outperforms existing deep architectures, and can localize images in hard conditions, e.g., in the presence of mostly textureless surfaces, where classic SIFT-based methods fail.

Journal ArticleDOI
TL;DR: This study proposes a fault-relevant variable selection and Bayesian inference-based distributed method for efficient fault detection and isolation, which reduces redundancy and complexity, explores numerous local behaviors, and provides accurate description of faults, thus improving monitoring performance significantly.
Abstract: Multivariate statistical process monitoring involves dimension reduction and latent feature extraction in large-scale processes and typically incorporates all measured variables. However, involving variables without beneficial information may degrade monitoring performance. This study analyzes the effect of variable selection on principal component analysis (PCA) monitoring performance. Then, it proposes a fault-relevant variable selection and Bayesian inference-based distributed method for efficient fault detection and isolation. First, the optimal subset of variables is identified for each fault using an optimization algorithm. Second, a sub-PCA model is established in each subset. Finally, the monitoring results of all of the subsets are combined through Bayesian inference. The proposed method reduces redundancy and complexity, explores numerous local behaviors, and provides accurate description of faults, thus improving monitoring performance significantly. Case studies on a numerical example, the Tennessee Eastman benchmark process, and an industrial-scale plant demonstrate the efficiency.

Posted Content
TL;DR: In this paper, a joint dimensionality reduction and k-means clustering approach is proposed, in which the deep neural network (DNN) is employed to jointly optimize the two tasks, while exploiting the DNN's ability to approximate any nonlinear function.
Abstract: Most learning approaches treat dimensionality reduction (DR) and clustering separately (i.e., sequentially), but recent research has shown that optimizing the two tasks jointly can substantially improve the performance of both. The premise behind the latter genre is that the data samples are obtained via linear transformation of latent representations that are easy to cluster; but in practice, the transformation from the latent space to the data can be more complicated. In this work, we assume that this transformation is an unknown and possibly nonlinear function. To recover the `clustering-friendly' latent representations and to better cluster the data, we propose a joint DR and K-means clustering approach in which DR is accomplished via learning a deep neural network (DNN). The motivation is to keep the advantages of jointly optimizing the two tasks, while exploiting the deep neural network's ability to approximate any nonlinear function. This way, the proposed approach can work well for a broad class of generative models. Towards this end, we carefully design the DNN structure and the associated joint optimization criterion, and propose an effective and scalable algorithm to handle the formulated optimization problem. Experiments using different real datasets are employed to showcase the effectiveness of the proposed approach.

Journal ArticleDOI
TL;DR: The experimental results show that unsupervised feature selection algorithms benefits machine learning tasks improving the performance of clustering.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the bands selected by the enhanced FDPC approach could achieve higher classification accuracy than the FDPC and other state-of-the-art band selection techniques, whereas the isolated-point-stopping criterion is a reasonable way to determine the preferable number of bands to be selected.
Abstract: Through imaging the same spatial area by hyperspectral sensors at different spectral wavelengths simultaneously, the acquired hyperspectral imagery often contains hundreds of band images, which provide the possibility to accurately analyze and identify a ground object. However, due to the difficulty of obtaining sufficient labeled training samples in practice, the high number of spectral bands unavoidably leads to the problem of a “dimensionality disaster” (also called the Hughes phenomenon), and dimensionality reduction should be applied. Concerning band (or feature) selection, conventional methods choose the representative bands by ranking the bands with defined metrics (such as non-Gaussianity) or by formulating the band selection problem as a clustering procedure. Because of the different but complementary advantages of the two kinds of methods, it can be beneficial to use both methods together to accomplish the band selection task. Recently, a fast density-peak-based clustering (FDPC) algorithm has been proposed. Based on the computation of the local density and the intracluster distance of each point, the product of the two factors is sorted in decreasing order, and cluster centers are recognized as points with anomalously large values; hence, the FDPC algorithm can be considered a ranking-based clustering method. In this paper, the FDPC algorithm has been enhanced to make it suitable for hyperspectral band selection. First, the ranking score of each band is computed by weighting the normalized local density and the intracluster distance rather than equally taking them into account. Second, an exponential-based learning rule is employed to adjust the cutoff threshold for a different number of selected bands, where it is fixed in the FDPC. The proposed approach is thus named the enhanced FDPC (E-FDPC). Furthermore, an effective strategy, which is called the isolated-point-stopping criterion, is developed to automatically determine the appropriate number of bands to be selected. That is, the clustering process will be stopped by the emergence of an isolated point (the only point in one cluster). Experimental results on three real hyperspectral data demonstrate that the bands selected by our E-FDPC approach could achieve higher classification accuracy than the FDPC and other state-of-the-art band selection techniques, whereas the isolated-point-stopping criterion is a reasonable way to determine the preferable number of bands to be selected.

Proceedings ArticleDOI
07 Mar 2016
TL;DR: This paper claims that hand-crafted histogram features can be complementary to Convolutional Neural Network features and proposes a novel feature extraction model called Feature Fusion Net (FFN) for pedestrian image representation.
Abstract: Feature representation and metric learning are two critical components in person re-identification models. In this paper, we focus on the feature representation and claim that hand-crafted histogram features can be complementary to Convolutional Neural Network (CNN) features. We propose a novel feature extraction model called Feature Fusion Net (FFN) for pedestrian image representation. In FFN, back propagation makes CNN features constrained by the handcrafted features. Utilizing color histogram features (RGB, HSV, YCbCr, Lab and YIQ) and texture features (multi-scale and multi-orientation Gabor features), we get a new deep feature representation that is more discriminative and compact. Experiments on three challenging datasets (VIPeR, CUHK01, PRID450s) validates the effectiveness of our proposal.

Journal ArticleDOI
TL;DR: Results demonstrate that pre-trained neural networks represent microstructure image data well, and when used for feature extraction yield the highest classification accuracies for the majority of classifier and feature selection methods tested, suggesting that deep learning algorithms can successfully be applied to micrograph recognition tasks.

Book
19 Dec 2016
TL;DR: In this paper, the authors provide innovativesolutions to low-rank tensor network decompositions and easy to interpretgraphical representations of the mathematical operations ontensor networks, and demonstrate the ability of tensor networks to provide linearly or even super-linearly e.g., logarithmically scalablesolutions, as illustrated in detail in Part 2.
Abstract: Modern applications in engineering and data science are increasinglybased on multidimensional data of exceedingly high volume, variety,and structural richness. However, standard machine learning algorithmstypically scale exponentially with data volume and complexityof cross-modal couplings - the so called curse of dimensionality -which is prohibitive to the analysis of large-scale, multi-modal andmulti-relational datasets. Given that such data are often efficientlyrepresented as multiway arrays or tensors, it is therefore timely andvaluable for the multidisciplinary machine learning and data analyticcommunities to review low-rank tensor decompositions and tensor networksas emerging tools for dimensionality reduction and large scaleoptimization problems. Our particular emphasis is on elucidating that,by virtue of the underlying low-rank approximations, tensor networkshave the ability to alleviate the curse of dimensionality in a numberof applied areas. In Part 1 of this monograph we provide innovativesolutions to low-rank tensor network decompositions and easy to interpretgraphical representations of the mathematical operations ontensor networks. Such a conceptual insight allows for seamless migrationof ideas from the flat-view matrices to tensor network operationsand vice versa, and provides a platform for further developments, practicalapplications, and non-Euclidean extensions. It also permits theintroduction of various tensor network operations without an explicitnotion of mathematical expressions, which may be beneficial for manyresearch communities that do not directly rely on multilinear algebra.Our focus is on the Tucker and tensor train TT decompositions andtheir extensions, and on demonstrating the ability of tensor networksto provide linearly or even super-linearly e.g., logarithmically scalablesolutions, as illustrated in detail in Part 2 of this monograph.

Journal ArticleDOI
TL;DR: This paper introduces a dimension reduction framework which to some extend represents data as parts, has fast learning speed, and learns the between-class scatter subspace, and experimental results show the efficacy of linear and non-linear ELM-AE and SELM- AE in terms of discriminative capability, sparsity, training time, and normalized mean square error.
Abstract: Data may often contain noise or irrelevant information, which negatively affect the generalization capability of machine learning algorithms The objective of dimension reduction algorithms, such as principal component analysis (PCA), non-negative matrix factorization (NMF), random projection (RP), and auto-encoder (AE), is to reduce the noise or irrelevant information of the data The features of PCA (eigenvectors) and linear AE are not able to represent data as parts (eg nose in a face image) On the other hand, NMF and non-linear AE are maimed by slow learning speed and RP only represents a subspace of original data This paper introduces a dimension reduction framework which to some extend represents data as parts, has fast learning speed, and learns the between-class scatter subspace To this end, this paper investigates a linear and non-linear dimension reduction framework referred to as extreme learning machine AE (ELM-AE) and sparse ELM-AE (SELM-AE) In contrast to tied weight AE, the hidden neurons in ELM-AE and SELM-AE need not be tuned, and their parameters (eg, input weights in additive neurons) are initialized using orthogonal and sparse random weights, respectively Experimental results on USPS handwritten digit recognition data set, CIFAR-10 object recognition, and NORB object recognition data set show the efficacy of linear and non-linear ELM-AE and SELM-AE in terms of discriminative capability, sparsity, training time, and normalized mean square error

Journal ArticleDOI
TL;DR: Experimental results on well-known benchmark datasets with various classifiers indicate that IGFSS improves the performance of classification in terms of two widely-known metrics namely Micro-F1 and Macro-F 1.
Abstract: An improved global feature selection scheme is proposed for text classificationIt is an ensemble method combining the power of two filter-based methodsThe new method combines a global and a one-sided local feature selection methodBy incorporating these methods, the feature set represents classes almost equallyThis method outperforms the individual performances of feature selection methods Feature selection is known as a good solution to the high dimensionality of the feature space and mostly preferred feature selection methods for text classification are filter-based ones In a common filter-based feature selection scheme, unique scores are assigned to features depending on their discriminative power and these features are sorted in descending order according to the scores Then, the last step is to add top-N features to the feature set where N is generally an empirically determined number In this paper, an improved global feature selection scheme (IGFSS) where the last step in a common feature selection scheme is modified in order to obtain a more representative feature set is proposed Although feature set constructed by a common feature selection scheme successfully represents some of the classes, a number of classes may not be even represented Consequently, IGFSS aims to improve the classification performance of global feature selection methods by creating a feature set representing all classes almost equally For this purpose, a local feature selection method is used in IGFSS to label features according to their discriminative power on classes and these labels are used while producing the feature sets Experimental results on well-known benchmark datasets with various classifiers indicate that IGFSS improves the performance of classification in terms of two widely-known metrics namely Micro-F1 and Macro-F1

Journal ArticleDOI
TL;DR: This paper proposes hybrid feature selection approaches based on the Genetic Algorithm that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously.
Abstract: An enhanced genetic algorithm (EGA) is proposed to reduce text dimensionality.The proposed EGA outperformed the traditional genetic algorithm.The EGA is incorporated with six filter feature selection methods to create hybrid feature selection approaches.The proposed hybrid approaches outperformed the single filtering methods. This paper proposes hybrid feature selection approaches based on the Genetic Algorithm (GA). This approach uses a hybrid search technique that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously. First, we propose EGA by improving the crossover and mutation operators. The crossover operation is performed based on chromosome (feature subset) partitioning with term and document frequencies of chromosome entries (features), while the mutation is performed based on the classifier performance of the original parents and feature importance. Thus, the crossover and mutation operations are performed based on useful information instead of using probability and random selection. Second, we incorporate six well-known filter feature selection methods with the EGA to create hybrid feature selection approaches. In the hybrid approach, the EGA is applied to several feature subsets of different sizes, which are ranked in decreasing order based on their importance, and dimension reduction is carried out. The EGA operations are applied to the most important features that had the higher ranks. The effectiveness of the proposed approach is evaluated by using naive Bayes and associative classification on three different collections of Arabic text datasets. The experimental results show the superiority of EGA over GA, comparisons of GA with EGA showed that the latter achieved better results in terms of dimensionality reduction, time and categorization performance. Furthermore, six proposed hybrid FS approaches consisting of a filter method and the EGA are applied to various feature subsets. The results showed that these hybrid approaches are more effective than single filter methods for dimensionality reduction because they were able to produce a higher reduction rate without loss of categorization precision in most situations.

Journal ArticleDOI
TL;DR: Comprehensive experimental results on both the synthetic and real-world data demonstrate significant advantages of the proposed CIFE method in comparison with the state-of-the-art.
Abstract: Real-world data are often acquired as a collection of matrices rather than as a single matrix. Such multiblock data are naturally linked and typically share some common features while at the same time exhibiting their own individual features, reflecting the underlying data generation mechanisms. To exploit the linked nature of data, we propose a new framework for common and individual feature extraction (CIFE) which identifies and separates the common and individual features from the multiblock data. Two efficient algorithms termed common orthogonal basis extraction (COBE) are proposed to extract common basis is shared by all data, independent on whether the number of common components is known beforehand. Feature extraction is then performed on the common and individual subspaces separately, by incorporating dimensionality reduction and blind source separation techniques. Comprehensive experimental results on both the synthetic and real-world data demonstrate significant advantages of the proposed CIFE method in comparison with the state-of-the-art.

Journal ArticleDOI
TL;DR: A novel descriptor that reveals the context of HSI efficiently; a dual clustering method that includes the contextual information in the clustering process; and a new strategy that selects the cluster representatives jointly considering the mutual effects of each cluster are proposed.
Abstract: Hyperspectral image (HSI) involves vast quantities of information that can help with the image analysis. However, this information has sometimes been proved to be redundant, considering specific applications such as HSI classification and anomaly detection. To address this problem, hyperspectral band selection is viewed as an effective dimensionality reduction method that can remove the redundant components of HSI. Various HSI band selection methods have been proposed recently, and the clustering-based method is a traditional one. This agglomerative method has been considered simple and straightforward, while the performance is generally inferior to the state of the art. To tackle the inherent drawbacks of the clustering-based band selection method, a new framework concerning on dual clustering is proposed in this paper. The main contribution can be concluded as follows: 1) a novel descriptor that reveals the context of HSI efficiently; 2) a dual clustering method that includes the contextual information in the clustering process; 3) a new strategy that selects the cluster representatives jointly considering the mutual effects of each cluster. Experimental results on three real-world HSIs verify the noticeable accuracy of the proposed method, with regard to the HSI classification application. The main comparison has been conducted among several recent clustering-based band selection methods and constraint-based band selection methods, demonstrating the superiority of the technique that we present.