Showing papers on "Dimensionality reduction published in 2018"

PDF

Open Access

Posted Content•

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

[...]

09 Feb 2018-arXiv: Machine Learning

TL;DR: The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance.

...read moreread less

Abstract: UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology The result is a practical scalable algorithm that applies to real world data The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning

...read moreread less

5,390 citations

Book•

High-Dimensional Probability: An Introduction with Applications in Data Science

[...]

Roman Vershynin¹•Institutions (1)

University of California, Irvine¹

27 Sep 2018

TL;DR: A broad range of illustrations is embedded throughout, including classical and modern results for covariance estimation, clustering, networks, semidefinite programming, coding, dimension reduction, matrix completion, machine learning, compressed sensing, and sparse regression.

...read moreread less

Abstract: High-dimensional probability offers insight into the behavior of random vectors, random matrices, random subspaces, and objects used to quantify uncertainty in high dimensions Drawing on ideas from probability, analysis, and geometry, it lends itself to applications in mathematics, statistics, theoretical computer science, signal processing, optimization, and more It is the first to integrate theory, key tools, and modern applications of high-dimensional probability Concentration inequalities form the core, and it covers both classical results such as Hoeffding's and Chernoff's inequalities and modern developments such as the matrix Bernstein's inequality It then introduces the powerful methods based on stochastic processes, including such tools as Slepian's, Sudakov's, and Dudley's inequalities, as well as generic chaining and bounds based on VC dimension A broad range of illustrations is embedded throughout, including classical and modern results for covariance estimation, clustering, networks, semidefinite programming, coding, dimension reduction, matrix completion, machine learning, compressed sensing, and sparse regression

...read moreread less

1,190 citations

Proceedings Article•

Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection

[...]

Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Dae-Ki Cho, Haifeng Chen - Show less +3 more

15 Feb 2018

TL;DR: A Deep Autoencoding Gaussian Mixture Model (DAGMM) for unsupervised anomaly detection, which significantly outperforms state-of-the-art anomaly detection techniques, and achieves up to 14% improvement based on the standard F1 score.

...read moreread less

Abstract: Unsupervised anomaly detection on multi- or high-dimensional data is of great importance in both fundamental machine learning research and industrial applications, for which density estimation lies at the core Although previous approaches based on dimensionality reduction followed by density estimation have made fruitful progress, they mainly suffer from decoupled model learning with inconsistent optimization goals and incapability of preserving essential information in the low-dimensional space In this paper, we present a Deep Autoencoding Gaussian Mixture Model (DAGMM) for unsupervised anomaly detection Our model utilizes a deep autoencoder to generate a low-dimensional representation and reconstruction error for each input data point, which is further fed into a Gaussian Mixture Model (GMM) Instead of using decoupled two-stage training and the standard Expectation-Maximization (EM) algorithm, DAGMM jointly optimizes the parameters of the deep autoencoder and the mixture model simultaneously in an end-to-end fashion, leveraging a separate estimation network to facilitate the parameter learning of the mixture model The joint optimization, which well balances autoencoding reconstruction, density estimation of latent representation, and regularization, helps the autoencoder escape from less attractive local optima and further reduce reconstruction errors, avoiding the need of pre-training Experimental results on several public benchmark datasets show that, DAGMM significantly outperforms state-of-the-art anomaly detection techniques, and achieves up to 14% improvement based on the standard F1 score

...read moreread less

981 citations

Journal Article•DOI•

iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences.

[...]

Zhen Chen¹, Pei Zhao², Fuyi Li³, André Leier⁴, Tatiana T. Marquez-Lago⁴, Yanan Wang⁵, Geoffrey I. Webb³, A. Ian Smith³, Roger J. Daly³, Kuo-Chen Chou⁶, Jiangning Song³ - Show less +7 more•Institutions (6)

Qingdao University¹, Civil Aviation Authority of Singapore², Monash University³, University of Alabama at Birmingham⁴, Shanghai Jiao Tong University⁵, University of Electronic Science and Technology of China⁶

15 Jul 2018-Bioinformatics

TL;DR: iFeature is a versatile Python‐based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences, capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors.

...read moreread less

Abstract: Summary Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection and dimensionality reduction algorithms, greatly facilitating training, analysis and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. Availability and implementation http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

411 citations

Journal Article•DOI•

Topological Data Analysis

[...]

Larry Wasserman¹•Institutions (1)

Carnegie Mellon University¹

07 Mar 2018-Social Science Research Network

TL;DR: Topological data analysis (TDA) can broadly be described as a collection of data analysis methods that find structure in data as mentioned in this paper, such as clustering, manifold estimation, nonlinear dimension reduction, mode estimation, ridge estimation and persistent homology.

...read moreread less

Abstract: Topological data analysis (TDA) can broadly be described as a collection of data analysis methods that find structure in data. These methods include clustering, manifold estimation, nonlinear dimension reduction, mode estimation, ridge estimation and persistent homology. This paper reviews some of these methods.

...read moreread less

353 citations

Journal Article•DOI•

Deep UQ: Learning deep neural network surrogate models for high dimensional uncertainty quantification

[...]

Rohit Tripathy¹, Ilias Bilionis¹•Institutions (1)

Purdue University¹

15 Dec 2018-Journal of Computational Physics

TL;DR: Deep neural networks (DNN) are used to construct surrogate models for numerical simulators in a manner that lends the DNN surrogate the interpretation of recovering a low-dimensional nonlinear manifold.

...read moreread less

340 citations

Book Chapter•DOI•

Feature Selection for Clustering: A Review

[...]

Salem Alelyani, Jiliang Tang, Huan Liu

03 Sep 2018

TL;DR: In this paper, feature selection is broadly categorized into four models: filter model, wrapper model, embedded model, and hybrid model, which is one of the most used techniques to reduce dimensionality among practitioners.

...read moreread less

Abstract: Dimensionality reduction techniques can be categorized mainly into feature extraction and feature selection. In the feature extraction approach, features are projected into a new space with lower dimensionality. Feature selection is broadly categorized into four models: filter model, wrapper model, embedded model, and hybrid model. With the existence of a large number of features, learning models tend to overfit and their learning performance degenerates. Feature selection is one of the most used techniques to reduce dimensionality among practitioners. The existence of irrelevant features in the data set may degrade learning quality and consume more memory and computational time that could be saved if these features were removed. However, finding clusters in high-dimensional space is computationally expensive and may degrade the learning performance. Clustering is useful in several machine learning and data mining tasks including image segmentation, information retrieval, pattern recognition, pattern classification, and network analysis.

...read moreread less

326 citations

Journal Article•DOI•

Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics.

[...]

Christoph Wehmeyer¹, Frank Noé¹•Institutions (1)

Free University of Berlin¹

15 Mar 2018-Journal of Chemical Physics

TL;DR: It is shown that the time-lagged autoencoder reliably finds low-dimensional embeddings for high-dimensional feature spaces which capture the slow dynamics of the underlying stochastic processes-beyond the capabilities of linear dimension reduction techniques.

...read moreread less

Abstract: Inspired by the success of deep learning techniques in the physical and chemical sciences, we apply a modification of an autoencoder type deep neural network to the task of dimension reduction of molecular dynamics data. We can show that our time-lagged autoencoder reliably finds low-dimensional embeddings for high-dimensional feature spaces which capture the slow dynamics of the underlying stochastic processes—beyond the capabilities of linear dimension reduction techniques.

...read moreread less

295 citations

Journal Article•DOI•

Deep Learning Approach Combining Sparse Autoencoder With SVM for Network Intrusion Detection

[...]

Majjed Al-Qatf¹, Yu Lasheng¹, Mohammed Al-Habib¹, Kamal Al-Sabahi¹•Institutions (1)

Central South University¹

24 Sep 2018-IEEE Access

TL;DR: The proposed STL-IDS approach improves network intrusion detection and provides a new research method for intrusion detection, and has accelerated SVM training and testing times and performed better than most of the previous approaches in terms of performance metrics in binary and multiclass classification.

...read moreread less

Abstract: Network intrusion detection systems (NIDSs) provide a better solution to network security than other traditional network defense technologies, such as firewall systems The success of NIDS is highly dependent on the performance of the algorithms and improvement methods used to increase the classification accuracy and decrease the training and testing times of the algorithms We propose an effective deep learning approach, self-taught learning (STL)-IDS, based on the STL framework The proposed approach is used for feature learning and dimensionality reduction It reduces training and testing time considerably and effectively improves the prediction accuracy of support vector machines (SVM) with regard to attacks The proposed model is built using the sparse autoencoder mechanism, which is an effective learning algorithm for reconstructing a new feature representation in an unsupervised manner After the pre-training stage, the new features are fed into the SVM algorithm to improve its detection capability for intrusion and classification accuracy Moreover, the efficiency of the approach in binary and multiclass classification is studied and compared with that of shallow classification methods, such as J48, naive Bayesian, random forest, and SVM Results show that our approach has accelerated SVM training and testing times and performed better than most of the previous approaches in terms of performance metrics in binary and multiclass classification The proposed STL-IDS approach improves network intrusion detection and provides a new research method for intrusion detection

...read moreread less

291 citations

Journal Article•DOI•

Feature Selection for High Dimensional Classification using A Competitive Swarm Optimizer

[...]

Shenkai Gu¹, Ran Cheng¹, Yaochu Jin², Yaochu Jin¹•Institutions (2)

University of Surrey¹, Dalian University of Technology²

01 Feb 2018

TL;DR: This paper proposes to use a very recent PSO variant, known as competitive swarm optimizer (CSO) that was dedicated to large-scale optimization, for solving high-dimensional feature selection problems, and demonstrates that compared to the canonical PSO-based and a state-of-the-art PSO variants for feature selection, the proposed CSO- based feature selection algorithm not only selects a much smaller number of features, but result in better classification performance as well.

...read moreread less

Abstract: When solving many machine learning problems such as classification, there exists a large number of input features. However, not all features are relevant for solving the problem, and sometimes, including irrelevant features may deteriorate the learning performance.Please check the edit made in the article title Therefore, it is essential to select the most relevant features, which is known as feature selection. Many feature selection algorithms have been developed, including evolutionary algorithms or particle swarm optimization (PSO) algorithms, to find a subset of the most important features for accomplishing a particular machine learning task. However, the traditional PSO does not perform well for large-scale optimization problems, which degrades the effectiveness of PSO for feature selection when the number of features dramatically increases. In this paper, we propose to use a very recent PSO variant, known as competitive swarm optimizer (CSO) that was dedicated to large-scale optimization, for solving high-dimensional feature selection problems. In addition, the CSO, which was originally developed for continuous optimization, is adapted to perform feature selection that can be considered as a combinatorial optimization problem. An archive technique is also introduced to reduce computational cost. Experiments on six benchmark datasets demonstrate that compared to the canonical PSO-based and a state-of-the-art PSO variant for feature selection, the proposed CSO-based feature selection algorithm not only selects a much smaller number of features, but result in better classification performance as well.

...read moreread less

273 citations

Journal Article•DOI•

Interpretable dimensionality reduction of single cell transcriptome data with deep generative models.

[...]

Jiarui Ding¹, Anne Condon¹, Sohrab P. Shah•Institutions (1)

University of British Columbia¹

21 May 2018-Nature Communications

TL;DR: A robust statistical model, scvis, is presented to capture and visualize the low-dimensional structures in single-cell gene expression data and preserves both the local and global neighbourhood structures in the data thus enhancing its interpretability.

...read moreread less

Abstract: Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.

...read moreread less

Journal Article•DOI•

Robust Subspace Learning: Robust PCA, Robust Subspace Tracking, and Robust Subspace Recovery

[...]

Namrata Vaswani¹, Thierry Bouwmans², Sajid Javed³, Praneeth Narayanamurthy¹•Institutions (3)

Iowa State University¹, University of La Rochelle², University of Warwick³

27 Jun 2018-IEEE Signal Processing Magazine

TL;DR: In this article, the authors provide a magazine-style overview of the entire field of robust subspace learning (RSL) and tracking (RST) for long data sequences, where the authors assume that the data lies in a low-dimensional subspace that can change over time, albeit gradually.

...read moreread less

Abstract: Principal component analysis (PCA) is one of the most widely used dimension reduction techniques. A related easier problem is termed subspace learning or subspace estimation. Given relatively clean data, both are easily solved via singular value decomposition (SVD). The problem of subspace learning or PCA in the presence of outliers is called robust subspace learning (RSL) or robust PCA (RPCA). For long data sequences, if one tries to use a single lower-dimensional subspace to represent the data, the required subspace dimension may end up being quite large. For such data, a better model is to assume that it lies in a low-dimensional subspace that can change over time, albeit gradually. The problem of tracking such data (and the subspaces) while being robust to outliers is called robust subspace tracking (RST). This article provides a magazine-style overview of the entire field of RSL and tracking.

...read moreread less

Journal Article•DOI•

Differential evolution for filter feature selection based on information theory and feature ranking

[...]

Emrah Hancer¹, Emrah Hancer², Bing Xue¹, Mengjie Zhang¹•Institutions (2)

Victoria University of Wellington¹, Mehmet Akif Ersoy University²

15 Jan 2018-Knowledge Based Systems

TL;DR: The results show that the proposed criterion outperforms MIFS in both single objective and multi-objective DE frameworks, and indicates that considering feature selection as a multi- objective problem can generally provide better performance in terms of the feature subset size and the classification accuracy.

...read moreread less

Abstract: Feature selection is an essential step in various tasks, where filter feature selection algorithms are increasingly attractive due to their simplicity and fast speed. A common filter is to use mutual information to estimate the relationships between each feature and the class labels (mutual relevancy), and between each pair of features (mutual redundancy). This strategy has gained popularity resulting a variety of criteria based on mutual information. Other well-known strategies are to order each feature based on the nearest neighbor distance as in ReliefF, and based on the between-class variance and the within-class variance as in Fisher Score. However, each strategy comes with its own advantages and disadvantages. This paper proposes a new filter criterion inspired by the concepts of mutual information, ReliefF and Fisher Score. Instead of using mutual redundancy, the proposed criterion tries to choose the highest ranked features determined by ReliefF and Fisher Score while providing the mutual relevance between features and the class labels. Based on the proposed criterion, two new differential evolution (DE) based filter approaches are developed. While the former uses the proposed criterion as a single objective problem in a weighted manner, the latter considers the proposed criterion in a multi-objective design. Moreover, a well known mutual information feature selection approach (MIFS) based on maximum-relevance and minimum-redundancy is also adopted in single-objective and multi-objective DE algorithms for feature selection. The results show that the proposed criterion outperforms MIFS in both single objective and multi-objective DE frameworks. The results also indicate that considering feature selection as a multi-objective problem can generally provide better performance in terms of the feature subset size and the classification accuracy.

...read moreread less

Journal Article•DOI•

Feature selection and classification systems for chronic disease prediction: A review

[...]

Divya Jain¹, Vijendra Singh¹•Institutions (1)

ITM University, Gurgaon, Haryana¹

01 Nov 2018-Egyptian Informatics Journal

TL;DR: This work presents a comprehensive overview of various feature selection methods and their inherent pros and cons, and analyzes adaptive classification systems and parallel classification systems for chronic disease prediction.

...read moreread less

Proceedings Article•

Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations.

[...]

Mikel Artetxe¹, Gorka Labaka¹, Eneko Agirre¹•Institutions (1)

University of the Basque Country¹

27 Apr 2018

TL;DR: A multi-step framework of linear transformations that generalizes a substantial body of previous work is proposed that allows new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction.

...read moreread less

Abstract: Using a dictionary to map independently trained word embeddings to a shared space has shown to be an effective approach to learn bilingual word embeddings. In this work, we propose a multi-step framework of linear transformations that generalizes a substantial body of previous work. The core step of the framework is an orthogonal transformation, and existing methods can be explained in terms of the additional normalization, whitening, re-weighting, de-whitening and dimensionality reduction steps. This allows us to gain new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction. The corresponding software is released as an open source project.

...read moreread less

Journal Article•DOI•

Pareto front feature selection based on artificial bee colony optimization

[...]

Emrah Hancer¹, Emrah Hancer², Bing Xue³, Mengjie Zhang³, Dervis Karaboga¹, Bahriye Akay¹ - Show less +2 more•Institutions (3)

Mehmet Akif Ersoy University¹, Erciyes University², Victoria University of Wellington³

01 Jan 2018-Information Sciences

TL;DR: A feature selection approach is proposed based on a new multi-objective artificial bee colony algorithm integrated with non-dominated sorting procedure and genetic operators that outperformed the other methods in terms of both the dimensionality reduction and the classification accuracy.

...read moreread less

Journal Article•DOI•

Simultaneous Spectral-Spatial Feature Selection and Extraction for Hyperspectral Images.

[...]

Lefei Zhang¹, Qian Zhang², Bo Du¹, Xin Huang¹, Yuan Yan Tang³, Dacheng Tao⁴ - Show less +2 more•Institutions (4)

Wuhan University¹, Samsung², University of Macau³, University of Technology, Sydney⁴

01 Jan 2018-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Wang et al. as mentioned in this paper proposed a feature learning framework for hyperspectral images spectral-spatial feature representation and classification, which learns a latent low dimensional subspace by projecting the spectral and spatial feature into a common feature space, where the complementary information has been effectively exploited.

...read moreread less

Abstract: In hyperspectral remote sensing data mining, it is important to take into account of both spectral and spatial information, such as the spectral signature, texture feature, and morphological property, to improve the performances, e.g., the image classification accuracy. In a feature representation point of view, a nature approach to handle this situation is to concatenate the spectral and spatial features into a single but high dimensional vector and then apply a certain dimension reduction technique directly on that concatenated vector before feed it into the subsequent classifier. However, multiple features from various domains definitely have different physical meanings and statistical properties, and thus such concatenation has not efficiently explore the complementary properties among different features, which should benefit for boost the feature discriminability. Furthermore, it is also difficult to interpret the transformed results of the concatenated vector. Consequently, finding a physically meaningful consensus low dimensional feature representation of original multiple features is still a challenging task. In order to address these issues, we propose a novel feature learning framework, i.e., the simultaneous spectral-spatial feature selection and extraction algorithm, for hyperspectral images spectral-spatial feature representation and classification. Specifically, the proposed method learns a latent low dimensional subspace by projecting the spectral-spatial feature into a common feature space, where the complementary information has been effectively exploited, and simultaneously, only the most significant original features have been transformed. Encouraging experimental results on three public available hyperspectral remote sensing datasets confirm that our proposed method is effective and efficient.

...read moreread less

Journal Article•DOI•

An End-to-end Deep Learning Approach to MI-EEG Signal Classification for BCIs

[...]

Hauke Dose¹, Jakob S. Moller, Helle K. Iversen², Sadasivan Puthusserypady•Institutions (2)

Technical University of Denmark¹, Copenhagen University Hospital²

30 Dec 2018-Expert Systems With Applications

TL;DR: Given that the model can learn features from data without having to use specialized feature extraction methods, DL should be considered as an alternative to established EEG classification methods, if enough data is available.

...read moreread less

Abstract: Goal: To develop and implement a Deep Learning (DL) approach for an electroencephalogram (EEG) based Motor Imagery (MI) Brain-Computer Interface (BCI) system that could potentially be used to improve the current stroke rehabilitation strategies. Method: The DL model is using Convolutional Neural Network (CNN) layers for learning generalized features and dimension reduction, while a conventional Fully Connected (FC) layer is used for classification. Together they build a unified end-to-end model that can be applied to raw EEG signals. This previously proposed model was applied to a new set of data to validate its robustness against data variations. Furthermore, it was extended by subject-specific adaptation. Lastly, an analysis of the learned filters provides insights into how such a model derives a classification decision. Results: The selected global classifier reached 80.38%, 69.82%, and 58.58% mean accuracies for datasets with two, three, and four classes, respectively, validated using 5-fold crossvalidation. As a novel approach in this context, transfer learning was used to adapt the global classifier to single individuals improving the overall mean accuracy to 86.49%, 79.25%, and 68.51%, respectively. The global models were trained on 3s segments of EEG data from different subjects than they were tested on, which proved the generalization performance of the model. Conclusion: The results are comparable with the reported accuracy values in related studies and the presented model outperforms the results in the literature on the same underlying data. Given that the model can learn features from data without having to use specialized feature extraction methods, DL should be considered as an alternative to established EEG classification methods, if enough data is available.

...read moreread less

Journal Article•DOI•

Structural damage identification based on autoencoder neural networks and deep learning

[...]

Chathurdara Sri Nadith Pathirage¹, Jun Li¹, Ling Li¹, Hong Hao², Hong Hao¹, Wanquan Liu¹, Pinghe Ni³, Pinghe Ni¹ - Show less +4 more•Institutions (3)

Curtin University¹, Guangzhou University², Hong Kong Polytechnic University³

01 Oct 2018-Engineering Structures

TL;DR: An autoencoder based framework for structural damage identification, which can support deep neural networks and be utilized to obtain optimal solutions for pattern recognition problems of highly non-linear nature, such as learning a mapping between the vibration characteristics and structural damage.

...read moreread less

Journal Article•DOI•

Robust Structured Nonnegative Matrix Factorization for Image Representation

[...]

Zechao Li¹, Jinhui Tang¹, Xiaofei He²•Institutions (2)

Nanjing University of Science and Technology¹, Zhejiang University²

01 May 2018-IEEE Transactions on Neural Networks

TL;DR: This paper proposes a novel semisupervised NMF learning framework, called robust structured NMF, that learns a robust discriminative representation by leveraging the block-diagonal structure and the inline-formula-norm loss function, which addresses the problems of noise and outliers.

...read moreread less

Abstract: Dimensionality reduction has attracted increasing attention, because high-dimensional data have arisen naturally in numerous domains in recent years. As one popular dimensionality reduction method, nonnegative matrix factorization (NMF), whose goal is to learn parts-based representations, has been widely studied and applied to various applications. In contrast to the previous approaches, this paper proposes a novel semisupervised NMF learning framework, called robust structured NMF, that learns a robust discriminative representation by leveraging the block-diagonal structure and the $\ell _{2,p}$ -norm (especially when $0 ) loss function. Specifically, the problems of noise and outliers are well addressed by the $\ell _{2,p}$ -norm ( $0 ) loss function, while the discriminative representations of both the labeled and unlabeled data are simultaneously learned by explicitly exploring the block-diagonal structure. The proposed problem is formulated as an optimization problem with a well-defined objective function solved by the proposed iterative algorithm. The convergence of the proposed optimization algorithm is analyzed both theoretically and empirically. In addition, we also discuss the relationships between the proposed method and some previous methods. Extensive experiments on both the synthetic and real-world data sets are conducted, and the experimental results demonstrate the effectiveness of the proposed method in comparison to the state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Local Deep-Feature Alignment for Unsupervised Dimension Reduction

[...]

Jian Zhang¹, Jun Yu², Dacheng Tao³•Institutions (3)

Zhejiang International Studies University¹, Hangzhou Dianzi University², University of Sydney³

22 Feb 2018-IEEE Transactions on Image Processing

TL;DR: Experimental results on data visualization, clustering, and classification show that the LDFA method is competitive with several well-known dimension reduction techniques, and exploiting locality in deep learning is a research topic worth further exploring.

...read moreread less

Abstract: This paper presents an unsupervised deep-learning framework named local deep-feature alignment (LDFA) for dimension reduction. We construct neighbourhood for each data sample and learn a local stacked contractive auto-encoder (SCAE) from the neighbourhood to extract the local deep features. Next, we exploit an affine transformation to align the local deep features of each neighbourhood with the global features. Moreover, we derive an approach from LDFA to map explicitly a new data sample into the learned low-dimensional subspace. The advantage of the LDFA method is that it learns both local and global characteristics of the data sample set: the local SCAEs capture local characteristics contained in the data set, while the global alignment procedures encode the interdependencies between neighbourhoods into the final low-dimensional feature representations. Experimental results on data visualization, clustering, and classification show that the LDFA method is competitive with several well-known dimension reduction techniques, and exploiting locality in deep learning is a research topic worth further exploring.

...read moreread less

Journal Article•DOI•

Topology Identification and Learning over Graphs: Accounting for Nonlinearities and Dynamics

[...]

Georgios B. Giannakis¹, Yanning Shen¹, Georgios Vasileios Karanikolas¹•Institutions (1)

University of Minnesota¹

25 Apr 2018

TL;DR: The main goal of this paper is to outline overarching advances, and develop a principled framework to capture nonlinearities through kernels, which are judiciously chosen from a preselected dictionary to optimally fit the data.

...read moreread less

Abstract: Identifying graph topologies as well as processes evolving over graphs emerge in various applications involving gene-regulatory, brain, power, and social networks, to name a few. Key graph-aware learning tasks include regression, classification, subspace clustering, anomaly identification, interpolation, extrapolation, and dimensionality reduction. Scalable approaches to deal with such high-dimensional tasks experience a paradigm shift to address the unique modeling and computational challenges associated with data-driven sciences. Albeit simple and tractable, linear time-invariant models are limited since they are incapable of handling generally evolving topologies, as well as nonlinear and dynamic dependencies between nodal processes. To this end, the main goal of this paper is to outline overarching advances, and develop a principled framework to capture nonlinearities through kernels, which are judiciously chosen from a preselected dictionary to optimally fit the data. The framework encompasses and leverages (non) linear counterparts of partial correlation and partial Granger causality, as well as (non)linear structural equations and vector autoregressions, along with attributes such as low rank, sparsity, and smoothness to capture even directional dependencies with abrupt change points, as well as time-evolving processes over possibly time-evolving topologies. The overarching approach inherits the versatility and generality of kernel-based methods, and lends itself to batch and computationally affordable online learning algorithms, which include novel Kalman filters over graphs. Real data experiments highlight the impact of the nonlinear and dynamic models on consumer and financial networks, as well as gene-regulatory and functional connectivity brain networks, where connectivity patterns revealed exhibit discernible differences relative to existing approaches.

...read moreread less

Journal Article•DOI•

Process Data Analytics via Probabilistic Latent Variable Models: A Tutorial Review

[...]

Zhiqiang Ge¹•Institutions (1)

Zhejiang University¹

31 Aug 2018-Industrial & Engineering Chemistry Research

TL;DR: A tutorial review of probabilistic latent variable models on process data analytics and detailed illustrations of different kinds of basic PLVMs are provided, as well as their research statuses.

...read moreread less

Abstract: Dimensionality reduction is important for the high-dimensional nature of data in the process industry, which has made latent variable modeling methods popular in recent years. By projecting high-di...

...read moreread less

Journal Article•DOI•

Adaptive Unsupervised Feature Selection With Structure Regularization

[...]

Minnan Luo¹, Feiping Nie², Xiaojun Chang³, Yi Yang⁴, Alexander G. Hauptmann³, Qinghua Zheng¹ - Show less +2 more•Institutions (4)

Xi'an Jiaotong University¹, Northwestern Polytechnical University², Carnegie Mellon University³, University of Technology, Sydney⁴

01 Apr 2018-IEEE Transactions on Neural Networks

TL;DR: This paper characterize the intrinsic local structure by an adaptive reconstruction graph and simultaneously consider its multiconnected-components (multicluster) structure by imposing a rank constraint on the corresponding Laplacian matrix to achieve a desirable feature subset.

...read moreread less

Abstract: Feature selection is one of the most important dimension reduction techniques for its efficiency and interpretation. Since practical data in large scale are usually collected without labels, and labeling these data are dramatically expensive and time-consuming, unsupervised feature selection has become a ubiquitous and challenging problem. Without label information, the fundamental problem of unsupervised feature selection lies in how to characterize the geometry structure of original feature space and produce a faithful feature subset, which preserves the intrinsic structure accurately. In this paper, we characterize the intrinsic local structure by an adaptive reconstruction graph and simultaneously consider its multiconnected-components (multicluster) structure by imposing a rank constraint on the corresponding Laplacian matrix. To achieve a desirable feature subset, we learn the optimal reconstruction graph and selective matrix simultaneously, instead of using a predetermined graph. We exploit an efficient alternative optimization algorithm to solve the proposed challenging problem, together with the theoretical analyses on its convergence and computational complexity. Finally, extensive experiments on clustering task are conducted over several benchmark data sets to verify the effectiveness and superiority of the proposed unsupervised feature selection algorithm.

...read moreread less

Journal Article•DOI•

Dimensionality Reduction on SPD Manifolds: The Emergence of Geometry-Aware Methods

[...]

Mehrtash Harandi¹, Mathieu Salzmann², Richard Hartley¹•Institutions (2)

Australian National University¹, École Polytechnique Fédérale de Lausanne²

01 Jan 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes to model the mapping from the high-dimensional SPD manifold to the low-dimensional one with an orthonormal projection and shows that learning can be expressed as an optimization problem on a Grassmann manifold and discusses fast solutions for special cases.

...read moreread less

Abstract: Representing images and videos with Symmetric Positive Definite (SPD) matrices, and considering the Riemannian geometry of the resulting space, has been shown to yield high discriminative power in many visual recognition tasks. Unfortunately, computation on the Riemannian manifold of SPD matrices –especially of high-dimensional ones– comes at a high cost that limits the applicability of existing techniques. In this paper, we introduce algorithms able to handle high-dimensional SPD matrices by constructing a lower-dimensional SPD manifold. To this end, we propose to model the mapping from the high-dimensional SPD manifold to the low-dimensional one with an orthonormal projection. This lets us formulate dimensionality reduction as the problem of finding a projection that yields a low-dimensional manifold either with maximum discriminative power in the supervised scenario, or with maximum variance of the data in the unsupervised one. We show that learning can be expressed as an optimization problem on a Grassmann manifold and discuss fast solutions for special cases. Our evaluation on several classification tasks evidences that our approach leads to a significant accuracy gain over state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Rank-Constrained Spectral Clustering With Flexible Embedding

[...]

Zhihui Li¹, Feiping Nie², Xiaojun Chang³, Liqiang Nie¹, Huaxiang Zhang⁴, Yi Yang⁵ - Show less +2 more•Institutions (5)

Shandong University¹, Northwestern Polytechnical University², Carnegie Mellon University³, Shandong Normal University⁴, University of Technology, Sydney⁵

19 Apr 2018-IEEE Transactions on Neural Networks

TL;DR: This work proposes a rank-constrained SC with flexible embedding framework, which is superior to previous SC methods in that the block-diagonal affinity matrix learned simultaneously with the adaptive graph construction process, more explicitly induces the cluster membership without further discretization.

...read moreread less

Abstract: Spectral clustering (SC) has been proven to be effective in various applications. However, the learning scheme of SC is suboptimal in that it learns the cluster indicator from a fixed graph structure, which usually requires a rounding procedure to further partition the data. Also, the obtained cluster number cannot reflect the ground truth number of connected components in the graph. To alleviate these drawbacks, we propose a rank-constrained SC with flexible embedding framework. Specifically, an adaptive probabilistic neighborhood learning process is employed to recover the block-diagonal affinity matrix of an ideal graph. Meanwhile, a flexible embedding scheme is learned to unravel the intrinsic cluster structure in low-dimensional subspace, where the irrelevant information and noise in high-dimensional data have been effectively suppressed. The proposed method is superior to previous SC methods in that: 1) the block-diagonal affinity matrix learned simultaneously with the adaptive graph construction process, more explicitly induces the cluster membership without further discretization; 2) the number of clusters is guaranteed to converge to the ground truth via a rank constraint on the Laplacian matrix; and 3) the mismatch between the embedded feature and the projected feature allows more freedom for finding the proper cluster structure in the low-dimensional subspace as well as learning the corresponding projection matrix. Experimental results on both synthetic and real-world data sets demonstrate the promising performance of the proposed algorithm.

...read moreread less

Journal Article•DOI•

Approximate model predictive building control via machine learning

[...]

Ján Drgoňa¹, Damien Picard¹, Michal Kvasnica², Lieve Helsen¹•Institutions (2)

Katholieke Universiteit Leuven¹, Slovak University of Technology in Bratislava²

15 May 2018-Applied Energy

TL;DR: A versatile framework for synthesis of simple, yet well-performing control strategies that mimic the behavior of optimization-based controllers, also for large scale multiple-input-multiple-output (MIMO) control problems which are common in the building sector.

...read moreread less

Journal Article•DOI•

SuperPCA: A Superpixelwise PCA Approach for Unsupervised Feature Extraction of Hyperspectral Imagery

[...]

Junjun Jiang¹, Jiayi Ma², Chen Chen³, Zhongyuan Wang², Zhihua Cai⁴, Lizhe Wang⁴ - Show less +2 more•Institutions (4)

Harbin Institute of Technology¹, Wuhan University², University of North Carolina at Charlotte³, China University of Geosciences (Wuhan)⁴

20 Jun 2018-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: In this paper, a superpixelwise PCA (SuperPCA) approach is proposed to learn the intrinsic low-dimensional features of hyperspectral image (HSI) processing and analysis tasks.

...read moreread less

Abstract: As an unsupervised dimensionality reduction method, the principal component analysis (PCA) has been widely considered as an efficient and effective preprocessing step for hyperspectral image (HSI) processing and analysis tasks. It takes each band as a whole and globally extracts the most representative bands. However, different homogeneous regions correspond to different objects, whose spectral features are diverse. Therefore, it is inappropriate to carry out dimensionality reduction through a unified projection for an entire HSI. In this paper, a simple but very effective superpixelwise PCA (SuperPCA) approach is proposed to learn the intrinsic low-dimensional features of HSIs. In contrast to classical PCA models, the SuperPCA has four main properties: 1) unlike the traditional PCA method based on a whole image, the SuperPCA takes into account the diversity in different homogeneous regions, that is, different regions should have different projections; 2) most of the conventional feature extraction models cannot directly use the spatial information of HSIs, while the SuperPCA is able to incorporate the spatial context information into the unsupervised dimensionality reduction by superpixel segmentation; 3) since the regions obtained by superpixel segmentation have homogeneity, the SuperPCA can extract potential low-dimensional features even under noise; and 4) although the SuperPCA is an unsupervised method, it can achieve a competitive performance when compared with supervised approaches. The resulting features are discriminative, compact, and noise-resistant, leading to an improved HSI classification performance. Experiments on three public data sets demonstrate that the SuperPCA model significantly outperforms the conventional PCA-based dimensionality reduction baselines for HSI classification, and some state-of-the-art feature extraction approaches. The MATLAB source code is available at https://github.com/junjun-jiang/SuperPCA .

...read moreread less

Journal Article•DOI•

High dimensional change point estimation via sparse projection

[...]

Tengyao Wang¹, Richard J. Samworth¹•Institutions (1)

University of Cambridge¹

01 Jan 2018-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: In this article, the authors propose a two-stage procedure called inspect for estimation of change points: first, a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series, and then apply an existing univariate change point estimation algorithm to the projected series.

...read moreread less

Abstract: Summary Change points are a very common feature of ‘big data’ that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co-ordinates. The challenge is to borrow strength across the co-ordinates to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called inspect for estimation of the change points: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series. We then apply an existing univariate change point estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated change points and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data-generating mechanisms. Software implementing the methodology is available in the R package InspectChangepoint.

...read moreread less

Journal Article•DOI•

A novel multivariate filter method for feature selection in text classification problems

[...]

Mahdieh Labani¹, Parham Moradi¹, Fariba Ahmadizar¹, Mahdi Jalili²•Institutions (2)

University of Kurdistan¹, RMIT University²

01 Apr 2018-Engineering Applications of Artificial Intelligence

TL;DR: A novel filter method for feature selection, called Multivariate Relative Discrimination Criterion (MRDC), is proposed for text classification, which focuses on the reduction of redundant features using minimal-redundancy and maximal-relevancy concepts.

...read moreread less

Collapse