scispace - formally typeset
Search or ask a question

Showing papers on "Dimensionality reduction published in 2018"


Posted Content
TL;DR: The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance.
Abstract: UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology The result is a practical scalable algorithm that applies to real world data The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning

5,390 citations


Book
27 Sep 2018
TL;DR: A broad range of illustrations is embedded throughout, including classical and modern results for covariance estimation, clustering, networks, semidefinite programming, coding, dimension reduction, matrix completion, machine learning, compressed sensing, and sparse regression.
Abstract: High-dimensional probability offers insight into the behavior of random vectors, random matrices, random subspaces, and objects used to quantify uncertainty in high dimensions Drawing on ideas from probability, analysis, and geometry, it lends itself to applications in mathematics, statistics, theoretical computer science, signal processing, optimization, and more It is the first to integrate theory, key tools, and modern applications of high-dimensional probability Concentration inequalities form the core, and it covers both classical results such as Hoeffding's and Chernoff's inequalities and modern developments such as the matrix Bernstein's inequality It then introduces the powerful methods based on stochastic processes, including such tools as Slepian's, Sudakov's, and Dudley's inequalities, as well as generic chaining and bounds based on VC dimension A broad range of illustrations is embedded throughout, including classical and modern results for covariance estimation, clustering, networks, semidefinite programming, coding, dimension reduction, matrix completion, machine learning, compressed sensing, and sparse regression

1,190 citations


Proceedings Article
15 Feb 2018
TL;DR: A Deep Autoencoding Gaussian Mixture Model (DAGMM) for unsupervised anomaly detection, which significantly outperforms state-of-the-art anomaly detection techniques, and achieves up to 14% improvement based on the standard F1 score.
Abstract: Unsupervised anomaly detection on multi- or high-dimensional data is of great importance in both fundamental machine learning research and industrial applications, for which density estimation lies at the core Although previous approaches based on dimensionality reduction followed by density estimation have made fruitful progress, they mainly suffer from decoupled model learning with inconsistent optimization goals and incapability of preserving essential information in the low-dimensional space In this paper, we present a Deep Autoencoding Gaussian Mixture Model (DAGMM) for unsupervised anomaly detection Our model utilizes a deep autoencoder to generate a low-dimensional representation and reconstruction error for each input data point, which is further fed into a Gaussian Mixture Model (GMM) Instead of using decoupled two-stage training and the standard Expectation-Maximization (EM) algorithm, DAGMM jointly optimizes the parameters of the deep autoencoder and the mixture model simultaneously in an end-to-end fashion, leveraging a separate estimation network to facilitate the parameter learning of the mixture model The joint optimization, which well balances autoencoding reconstruction, density estimation of latent representation, and regularization, helps the autoencoder escape from less attractive local optima and further reduce reconstruction errors, avoiding the need of pre-training Experimental results on several public benchmark datasets show that, DAGMM significantly outperforms state-of-the-art anomaly detection techniques, and achieves up to 14% improvement based on the standard F1 score

981 citations


Journal ArticleDOI
TL;DR: iFeature is a versatile Python‐based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences, capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors.
Abstract: Summary Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection and dimensionality reduction algorithms, greatly facilitating training, analysis and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. Availability and implementation http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. Supplementary information Supplementary data are available at Bioinformatics online.

411 citations


Journal ArticleDOI
TL;DR: Topological data analysis (TDA) can broadly be described as a collection of data analysis methods that find structure in data as mentioned in this paper, such as clustering, manifold estimation, nonlinear dimension reduction, mode estimation, ridge estimation and persistent homology.
Abstract: Topological data analysis (TDA) can broadly be described as a collection of data analysis methods that find structure in data. These methods include clustering, manifold estimation, nonlinear dimension reduction, mode estimation, ridge estimation and persistent homology. This paper reviews some of these methods.

353 citations


Journal ArticleDOI
TL;DR: Deep neural networks (DNN) are used to construct surrogate models for numerical simulators in a manner that lends the DNN surrogate the interpretation of recovering a low-dimensional nonlinear manifold.

340 citations


Book ChapterDOI
03 Sep 2018
TL;DR: In this paper, feature selection is broadly categorized into four models: filter model, wrapper model, embedded model, and hybrid model, which is one of the most used techniques to reduce dimensionality among practitioners.
Abstract: Dimensionality reduction techniques can be categorized mainly into feature extraction and feature selection. In the feature extraction approach, features are projected into a new space with lower dimensionality. Feature selection is broadly categorized into four models: filter model, wrapper model, embedded model, and hybrid model. With the existence of a large number of features, learning models tend to overfit and their learning performance degenerates. Feature selection is one of the most used techniques to reduce dimensionality among practitioners. The existence of irrelevant features in the data set may degrade learning quality and consume more memory and computational time that could be saved if these features were removed. However, finding clusters in high-dimensional space is computationally expensive and may degrade the learning performance. Clustering is useful in several machine learning and data mining tasks including image segmentation, information retrieval, pattern recognition, pattern classification, and network analysis.

326 citations


Journal ArticleDOI
TL;DR: It is shown that the time-lagged autoencoder reliably finds low-dimensional embeddings for high-dimensional feature spaces which capture the slow dynamics of the underlying stochastic processes-beyond the capabilities of linear dimension reduction techniques.
Abstract: Inspired by the success of deep learning techniques in the physical and chemical sciences, we apply a modification of an autoencoder type deep neural network to the task of dimension reduction of molecular dynamics data. We can show that our time-lagged autoencoder reliably finds low-dimensional embeddings for high-dimensional feature spaces which capture the slow dynamics of the underlying stochastic processes—beyond the capabilities of linear dimension reduction techniques.

295 citations


Journal ArticleDOI
TL;DR: The proposed STL-IDS approach improves network intrusion detection and provides a new research method for intrusion detection, and has accelerated SVM training and testing times and performed better than most of the previous approaches in terms of performance metrics in binary and multiclass classification.
Abstract: Network intrusion detection systems (NIDSs) provide a better solution to network security than other traditional network defense technologies, such as firewall systems The success of NIDS is highly dependent on the performance of the algorithms and improvement methods used to increase the classification accuracy and decrease the training and testing times of the algorithms We propose an effective deep learning approach, self-taught learning (STL)-IDS, based on the STL framework The proposed approach is used for feature learning and dimensionality reduction It reduces training and testing time considerably and effectively improves the prediction accuracy of support vector machines (SVM) with regard to attacks The proposed model is built using the sparse autoencoder mechanism, which is an effective learning algorithm for reconstructing a new feature representation in an unsupervised manner After the pre-training stage, the new features are fed into the SVM algorithm to improve its detection capability for intrusion and classification accuracy Moreover, the efficiency of the approach in binary and multiclass classification is studied and compared with that of shallow classification methods, such as J48, naive Bayesian, random forest, and SVM Results show that our approach has accelerated SVM training and testing times and performed better than most of the previous approaches in terms of performance metrics in binary and multiclass classification The proposed STL-IDS approach improves network intrusion detection and provides a new research method for intrusion detection

291 citations


Journal ArticleDOI
01 Feb 2018
TL;DR: This paper proposes to use a very recent PSO variant, known as competitive swarm optimizer (CSO) that was dedicated to large-scale optimization, for solving high-dimensional feature selection problems, and demonstrates that compared to the canonical PSO-based and a state-of-the-art PSO variants for feature selection, the proposed CSO- based feature selection algorithm not only selects a much smaller number of features, but result in better classification performance as well.
Abstract: When solving many machine learning problems such as classification, there exists a large number of input features. However, not all features are relevant for solving the problem, and sometimes, including irrelevant features may deteriorate the learning performance.Please check the edit made in the article title Therefore, it is essential to select the most relevant features, which is known as feature selection. Many feature selection algorithms have been developed, including evolutionary algorithms or particle swarm optimization (PSO) algorithms, to find a subset of the most important features for accomplishing a particular machine learning task. However, the traditional PSO does not perform well for large-scale optimization problems, which degrades the effectiveness of PSO for feature selection when the number of features dramatically increases. In this paper, we propose to use a very recent PSO variant, known as competitive swarm optimizer (CSO) that was dedicated to large-scale optimization, for solving high-dimensional feature selection problems. In addition, the CSO, which was originally developed for continuous optimization, is adapted to perform feature selection that can be considered as a combinatorial optimization problem. An archive technique is also introduced to reduce computational cost. Experiments on six benchmark datasets demonstrate that compared to the canonical PSO-based and a state-of-the-art PSO variant for feature selection, the proposed CSO-based feature selection algorithm not only selects a much smaller number of features, but result in better classification performance as well.

273 citations


Journal ArticleDOI
TL;DR: A robust statistical model, scvis, is presented to capture and visualize the low-dimensional structures in single-cell gene expression data and preserves both the local and global neighbourhood structures in the data thus enhancing its interpretability.
Abstract: Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.

Journal ArticleDOI
TL;DR: In this article, the authors provide a magazine-style overview of the entire field of robust subspace learning (RSL) and tracking (RST) for long data sequences, where the authors assume that the data lies in a low-dimensional subspace that can change over time, albeit gradually.
Abstract: Principal component analysis (PCA) is one of the most widely used dimension reduction techniques. A related easier problem is termed subspace learning or subspace estimation. Given relatively clean data, both are easily solved via singular value decomposition (SVD). The problem of subspace learning or PCA in the presence of outliers is called robust subspace learning (RSL) or robust PCA (RPCA). For long data sequences, if one tries to use a single lower-dimensional subspace to represent the data, the required subspace dimension may end up being quite large. For such data, a better model is to assume that it lies in a low-dimensional subspace that can change over time, albeit gradually. The problem of tracking such data (and the subspaces) while being robust to outliers is called robust subspace tracking (RST). This article provides a magazine-style overview of the entire field of RSL and tracking.

Journal ArticleDOI
TL;DR: The results show that the proposed criterion outperforms MIFS in both single objective and multi-objective DE frameworks, and indicates that considering feature selection as a multi- objective problem can generally provide better performance in terms of the feature subset size and the classification accuracy.
Abstract: Feature selection is an essential step in various tasks, where filter feature selection algorithms are increasingly attractive due to their simplicity and fast speed. A common filter is to use mutual information to estimate the relationships between each feature and the class labels (mutual relevancy), and between each pair of features (mutual redundancy). This strategy has gained popularity resulting a variety of criteria based on mutual information. Other well-known strategies are to order each feature based on the nearest neighbor distance as in ReliefF, and based on the between-class variance and the within-class variance as in Fisher Score. However, each strategy comes with its own advantages and disadvantages. This paper proposes a new filter criterion inspired by the concepts of mutual information, ReliefF and Fisher Score. Instead of using mutual redundancy, the proposed criterion tries to choose the highest ranked features determined by ReliefF and Fisher Score while providing the mutual relevance between features and the class labels. Based on the proposed criterion, two new differential evolution (DE) based filter approaches are developed. While the former uses the proposed criterion as a single objective problem in a weighted manner, the latter considers the proposed criterion in a multi-objective design. Moreover, a well known mutual information feature selection approach (MIFS) based on maximum-relevance and minimum-redundancy is also adopted in single-objective and multi-objective DE algorithms for feature selection. The results show that the proposed criterion outperforms MIFS in both single objective and multi-objective DE frameworks. The results also indicate that considering feature selection as a multi-objective problem can generally provide better performance in terms of the feature subset size and the classification accuracy.

Journal ArticleDOI
TL;DR: This work presents a comprehensive overview of various feature selection methods and their inherent pros and cons, and analyzes adaptive classification systems and parallel classification systems for chronic disease prediction.

Proceedings Article
27 Apr 2018
TL;DR: A multi-step framework of linear transformations that generalizes a substantial body of previous work is proposed that allows new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction.
Abstract: Using a dictionary to map independently trained word embeddings to a shared space has shown to be an effective approach to learn bilingual word embeddings. In this work, we propose a multi-step framework of linear transformations that generalizes a substantial body of previous work. The core step of the framework is an orthogonal transformation, and existing methods can be explained in terms of the additional normalization, whitening, re-weighting, de-whitening and dimensionality reduction steps. This allows us to gain new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction. The corresponding software is released as an open source project.

Journal ArticleDOI
TL;DR: A feature selection approach is proposed based on a new multi-objective artificial bee colony algorithm integrated with non-dominated sorting procedure and genetic operators that outperformed the other methods in terms of both the dimensionality reduction and the classification accuracy.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a feature learning framework for hyperspectral images spectral-spatial feature representation and classification, which learns a latent low dimensional subspace by projecting the spectral and spatial feature into a common feature space, where the complementary information has been effectively exploited.
Abstract: In hyperspectral remote sensing data mining, it is important to take into account of both spectral and spatial information, such as the spectral signature, texture feature, and morphological property, to improve the performances, e.g., the image classification accuracy. In a feature representation point of view, a nature approach to handle this situation is to concatenate the spectral and spatial features into a single but high dimensional vector and then apply a certain dimension reduction technique directly on that concatenated vector before feed it into the subsequent classifier. However, multiple features from various domains definitely have different physical meanings and statistical properties, and thus such concatenation has not efficiently explore the complementary properties among different features, which should benefit for boost the feature discriminability. Furthermore, it is also difficult to interpret the transformed results of the concatenated vector. Consequently, finding a physically meaningful consensus low dimensional feature representation of original multiple features is still a challenging task. In order to address these issues, we propose a novel feature learning framework, i.e., the simultaneous spectral-spatial feature selection and extraction algorithm, for hyperspectral images spectral-spatial feature representation and classification. Specifically, the proposed method learns a latent low dimensional subspace by projecting the spectral-spatial feature into a common feature space, where the complementary information has been effectively exploited, and simultaneously, only the most significant original features have been transformed. Encouraging experimental results on three public available hyperspectral remote sensing datasets confirm that our proposed method is effective and efficient.

Journal ArticleDOI
TL;DR: Given that the model can learn features from data without having to use specialized feature extraction methods, DL should be considered as an alternative to established EEG classification methods, if enough data is available.
Abstract: Goal: To develop and implement a Deep Learning (DL) approach for an electroencephalogram (EEG) based Motor Imagery (MI) Brain-Computer Interface (BCI) system that could potentially be used to improve the current stroke rehabilitation strategies. Method: The DL model is using Convolutional Neural Network (CNN) layers for learning generalized features and dimension reduction, while a conventional Fully Connected (FC) layer is used for classification. Together they build a unified end-to-end model that can be applied to raw EEG signals. This previously proposed model was applied to a new set of data to validate its robustness against data variations. Furthermore, it was extended by subject-specific adaptation. Lastly, an analysis of the learned filters provides insights into how such a model derives a classification decision. Results: The selected global classifier reached 80.38%, 69.82%, and 58.58% mean accuracies for datasets with two, three, and four classes, respectively, validated using 5-fold crossvalidation. As a novel approach in this context, transfer learning was used to adapt the global classifier to single individuals improving the overall mean accuracy to 86.49%, 79.25%, and 68.51%, respectively. The global models were trained on 3s segments of EEG data from different subjects than they were tested on, which proved the generalization performance of the model. Conclusion: The results are comparable with the reported accuracy values in related studies and the presented model outperforms the results in the literature on the same underlying data. Given that the model can learn features from data without having to use specialized feature extraction methods, DL should be considered as an alternative to established EEG classification methods, if enough data is available.

Journal ArticleDOI
TL;DR: An autoencoder based framework for structural damage identification, which can support deep neural networks and be utilized to obtain optimal solutions for pattern recognition problems of highly non-linear nature, such as learning a mapping between the vibration characteristics and structural damage.

Journal ArticleDOI
TL;DR: This paper proposes a novel semisupervised NMF learning framework, called robust structured NMF, that learns a robust discriminative representation by leveraging the block-diagonal structure and the inline-formula-norm loss function, which addresses the problems of noise and outliers.
Abstract: Dimensionality reduction has attracted increasing attention, because high-dimensional data have arisen naturally in numerous domains in recent years. As one popular dimensionality reduction method, nonnegative matrix factorization (NMF), whose goal is to learn parts-based representations, has been widely studied and applied to various applications. In contrast to the previous approaches, this paper proposes a novel semisupervised NMF learning framework, called robust structured NMF, that learns a robust discriminative representation by leveraging the block-diagonal structure and the $\ell _{2,p}$ -norm (especially when $0 ) loss function. Specifically, the problems of noise and outliers are well addressed by the $\ell _{2,p}$ -norm ( $0 ) loss function, while the discriminative representations of both the labeled and unlabeled data are simultaneously learned by explicitly exploring the block-diagonal structure. The proposed problem is formulated as an optimization problem with a well-defined objective function solved by the proposed iterative algorithm. The convergence of the proposed optimization algorithm is analyzed both theoretically and empirically. In addition, we also discuss the relationships between the proposed method and some previous methods. Extensive experiments on both the synthetic and real-world data sets are conducted, and the experimental results demonstrate the effectiveness of the proposed method in comparison to the state-of-the-art methods.

Journal ArticleDOI
TL;DR: Experimental results on data visualization, clustering, and classification show that the LDFA method is competitive with several well-known dimension reduction techniques, and exploiting locality in deep learning is a research topic worth further exploring.
Abstract: This paper presents an unsupervised deep-learning framework named local deep-feature alignment (LDFA) for dimension reduction. We construct neighbourhood for each data sample and learn a local stacked contractive auto-encoder (SCAE) from the neighbourhood to extract the local deep features. Next, we exploit an affine transformation to align the local deep features of each neighbourhood with the global features. Moreover, we derive an approach from LDFA to map explicitly a new data sample into the learned low-dimensional subspace. The advantage of the LDFA method is that it learns both local and global characteristics of the data sample set: the local SCAEs capture local characteristics contained in the data set, while the global alignment procedures encode the interdependencies between neighbourhoods into the final low-dimensional feature representations. Experimental results on data visualization, clustering, and classification show that the LDFA method is competitive with several well-known dimension reduction techniques, and exploiting locality in deep learning is a research topic worth further exploring.

Journal ArticleDOI
25 Apr 2018
TL;DR: The main goal of this paper is to outline overarching advances, and develop a principled framework to capture nonlinearities through kernels, which are judiciously chosen from a preselected dictionary to optimally fit the data.
Abstract: Identifying graph topologies as well as processes evolving over graphs emerge in various applications involving gene-regulatory, brain, power, and social networks, to name a few. Key graph-aware learning tasks include regression, classification, subspace clustering, anomaly identification, interpolation, extrapolation, and dimensionality reduction. Scalable approaches to deal with such high-dimensional tasks experience a paradigm shift to address the unique modeling and computational challenges associated with data-driven sciences. Albeit simple and tractable, linear time-invariant models are limited since they are incapable of handling generally evolving topologies, as well as nonlinear and dynamic dependencies between nodal processes. To this end, the main goal of this paper is to outline overarching advances, and develop a principled framework to capture nonlinearities through kernels, which are judiciously chosen from a preselected dictionary to optimally fit the data. The framework encompasses and leverages (non) linear counterparts of partial correlation and partial Granger causality, as well as (non)linear structural equations and vector autoregressions, along with attributes such as low rank, sparsity, and smoothness to capture even directional dependencies with abrupt change points, as well as time-evolving processes over possibly time-evolving topologies. The overarching approach inherits the versatility and generality of kernel-based methods, and lends itself to batch and computationally affordable online learning algorithms, which include novel Kalman filters over graphs. Real data experiments highlight the impact of the nonlinear and dynamic models on consumer and financial networks, as well as gene-regulatory and functional connectivity brain networks, where connectivity patterns revealed exhibit discernible differences relative to existing approaches.

Journal ArticleDOI
Zhiqiang Ge1
TL;DR: A tutorial review of probabilistic latent variable models on process data analytics and detailed illustrations of different kinds of basic PLVMs are provided, as well as their research statuses.
Abstract: Dimensionality reduction is important for the high-dimensional nature of data in the process industry, which has made latent variable modeling methods popular in recent years. By projecting high-di...

Journal ArticleDOI
TL;DR: This paper characterize the intrinsic local structure by an adaptive reconstruction graph and simultaneously consider its multiconnected-components (multicluster) structure by imposing a rank constraint on the corresponding Laplacian matrix to achieve a desirable feature subset.
Abstract: Feature selection is one of the most important dimension reduction techniques for its efficiency and interpretation. Since practical data in large scale are usually collected without labels, and labeling these data are dramatically expensive and time-consuming, unsupervised feature selection has become a ubiquitous and challenging problem. Without label information, the fundamental problem of unsupervised feature selection lies in how to characterize the geometry structure of original feature space and produce a faithful feature subset, which preserves the intrinsic structure accurately. In this paper, we characterize the intrinsic local structure by an adaptive reconstruction graph and simultaneously consider its multiconnected-components (multicluster) structure by imposing a rank constraint on the corresponding Laplacian matrix. To achieve a desirable feature subset, we learn the optimal reconstruction graph and selective matrix simultaneously, instead of using a predetermined graph. We exploit an efficient alternative optimization algorithm to solve the proposed challenging problem, together with the theoretical analyses on its convergence and computational complexity. Finally, extensive experiments on clustering task are conducted over several benchmark data sets to verify the effectiveness and superiority of the proposed unsupervised feature selection algorithm.

Journal ArticleDOI
TL;DR: This paper proposes to model the mapping from the high-dimensional SPD manifold to the low-dimensional one with an orthonormal projection and shows that learning can be expressed as an optimization problem on a Grassmann manifold and discusses fast solutions for special cases.
Abstract: Representing images and videos with Symmetric Positive Definite (SPD) matrices, and considering the Riemannian geometry of the resulting space, has been shown to yield high discriminative power in many visual recognition tasks. Unfortunately, computation on the Riemannian manifold of SPD matrices –especially of high-dimensional ones– comes at a high cost that limits the applicability of existing techniques. In this paper, we introduce algorithms able to handle high-dimensional SPD matrices by constructing a lower-dimensional SPD manifold. To this end, we propose to model the mapping from the high-dimensional SPD manifold to the low-dimensional one with an orthonormal projection. This lets us formulate dimensionality reduction as the problem of finding a projection that yields a low-dimensional manifold either with maximum discriminative power in the supervised scenario, or with maximum variance of the data in the unsupervised one. We show that learning can be expressed as an optimization problem on a Grassmann manifold and discuss fast solutions for special cases. Our evaluation on several classification tasks evidences that our approach leads to a significant accuracy gain over state-of-the-art methods.

Journal ArticleDOI
TL;DR: This work proposes a rank-constrained SC with flexible embedding framework, which is superior to previous SC methods in that the block-diagonal affinity matrix learned simultaneously with the adaptive graph construction process, more explicitly induces the cluster membership without further discretization.
Abstract: Spectral clustering (SC) has been proven to be effective in various applications. However, the learning scheme of SC is suboptimal in that it learns the cluster indicator from a fixed graph structure, which usually requires a rounding procedure to further partition the data. Also, the obtained cluster number cannot reflect the ground truth number of connected components in the graph. To alleviate these drawbacks, we propose a rank-constrained SC with flexible embedding framework. Specifically, an adaptive probabilistic neighborhood learning process is employed to recover the block-diagonal affinity matrix of an ideal graph. Meanwhile, a flexible embedding scheme is learned to unravel the intrinsic cluster structure in low-dimensional subspace, where the irrelevant information and noise in high-dimensional data have been effectively suppressed. The proposed method is superior to previous SC methods in that: 1) the block-diagonal affinity matrix learned simultaneously with the adaptive graph construction process, more explicitly induces the cluster membership without further discretization; 2) the number of clusters is guaranteed to converge to the ground truth via a rank constraint on the Laplacian matrix; and 3) the mismatch between the embedded feature and the projected feature allows more freedom for finding the proper cluster structure in the low-dimensional subspace as well as learning the corresponding projection matrix. Experimental results on both synthetic and real-world data sets demonstrate the promising performance of the proposed algorithm.

Journal ArticleDOI
TL;DR: A versatile framework for synthesis of simple, yet well-performing control strategies that mimic the behavior of optimization-based controllers, also for large scale multiple-input-multiple-output (MIMO) control problems which are common in the building sector.

Journal ArticleDOI
TL;DR: In this paper, a superpixelwise PCA (SuperPCA) approach is proposed to learn the intrinsic low-dimensional features of hyperspectral image (HSI) processing and analysis tasks.
Abstract: As an unsupervised dimensionality reduction method, the principal component analysis (PCA) has been widely considered as an efficient and effective preprocessing step for hyperspectral image (HSI) processing and analysis tasks. It takes each band as a whole and globally extracts the most representative bands. However, different homogeneous regions correspond to different objects, whose spectral features are diverse. Therefore, it is inappropriate to carry out dimensionality reduction through a unified projection for an entire HSI. In this paper, a simple but very effective superpixelwise PCA (SuperPCA) approach is proposed to learn the intrinsic low-dimensional features of HSIs. In contrast to classical PCA models, the SuperPCA has four main properties: 1) unlike the traditional PCA method based on a whole image, the SuperPCA takes into account the diversity in different homogeneous regions, that is, different regions should have different projections; 2) most of the conventional feature extraction models cannot directly use the spatial information of HSIs, while the SuperPCA is able to incorporate the spatial context information into the unsupervised dimensionality reduction by superpixel segmentation; 3) since the regions obtained by superpixel segmentation have homogeneity, the SuperPCA can extract potential low-dimensional features even under noise; and 4) although the SuperPCA is an unsupervised method, it can achieve a competitive performance when compared with supervised approaches. The resulting features are discriminative, compact, and noise-resistant, leading to an improved HSI classification performance. Experiments on three public data sets demonstrate that the SuperPCA model significantly outperforms the conventional PCA-based dimensionality reduction baselines for HSI classification, and some state-of-the-art feature extraction approaches. The MATLAB source code is available at https://github.com/junjun-jiang/SuperPCA .

Journal ArticleDOI
TL;DR: In this article, the authors propose a two-stage procedure called inspect for estimation of change points: first, a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series, and then apply an existing univariate change point estimation algorithm to the projected series.
Abstract: Summary Change points are a very common feature of ‘big data’ that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co-ordinates. The challenge is to borrow strength across the co-ordinates to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called inspect for estimation of the change points: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series. We then apply an existing univariate change point estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated change points and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data-generating mechanisms. Software implementing the methodology is available in the R package InspectChangepoint.

Journal ArticleDOI
TL;DR: A novel filter method for feature selection, called Multivariate Relative Discrimination Criterion (MRDC), is proposed for text classification, which focuses on the reduction of redundant features using minimal-redundancy and maximal-relevancy concepts.