scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 2021"


Book
30 Sep 2021
TL;DR: This book discusses Exploratory Data Analysis, Hierarchical Methods Optimization Methods-k-Means, and more.
Abstract: INTRODUCTION TO EXPLORATORY DATA ANALYSIS Introduction to Exploratory Data Analysis What Is Exploratory Data Analysis Overview of the Text A Few Words about Notation Data Sets Used in the Book Transforming Data EDA AS PATTERN DISCOVERY Dimensionality Reduction - Linear Methods Introduction Principal Component Analysis (PCA) Singular Value Decomposition (SVD) Nonnegative Matrix Factorization Factor Analysis Fisher's Linear Discriminant Intrinsic Dimensionality Dimensionality Reduction - Nonlinear Methods Multidimensional Scaling (MDS) Manifold Learning Artificial Neural Network Approaches Data Tours Grand Tour Interpolation Tours Projection Pursuit Projection Pursuit Indexes Independent Component Analysis Finding Clusters Introduction Hierarchical Methods Optimization Methods-k-Means Spectral Clustering Document Clustering Evaluating the Clusters Model-Based Clustering Overview of Model-Based Clustering Finite Mixtures Expectation-Maximization Algorithm Hierarchical Agglomerative Model-Based Clustering Model-Based Clustering MBC for Density Estimation and Discriminant Analysis Generating Random Variables from a Mixture Model Smoothing Scatterplots Introduction Loess Robust Loess Residuals and Diagnostics with Loess Smoothing Splines Choosing the Smoothing Parameter Bivariate Distribution Smooths Curve Fitting Toolbox GRAPHICAL METHODS FOR EDA Visualizing Clusters Dendrogram Treemaps Rectangle Plots ReClus Plots Data Image Distribution Shapes Histograms Boxplots Quantile Plots Bagplots Rangefinder Boxplot Multivariate Visualization Glyph Plots Scatterplots Dynamic Graphics Coplots Dot Charts Plotting Points as Curves Data Tours Revisited Biplots Appendix A: Proximity Measures Appendix B: Software Resources for EDA Appendix C: Description of Data Sets Appendix D: Introduction to MATLAB Appendix E: MATLAB Functions References Index Summary, Further Reading, and Exercises appear at the end of each chapter.

320 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the hybrid filter model reduces data dimensions, selects appropriate feature sets, and reduces training time, hence providing better classification performance as measured by accuracy, precision and recall.
Abstract: Feature Selection and classification have previously been widely applied in various areas like business, medical and media fields. High dimensionality in datasets is one of the main challenges that has been experienced in classifying data, data mining and sentiment analysis. Irrelevant and redundant attributes have also had a negative impact on the complexity and operation of algorithms for classifying data. Consequently, the algorithms record poor results or performance. Some existing work use all attributes for classification, some of which are insignificant for the task, thereby leading to poor performance. This paper therefore develops a hybrid filter model for feature selection based on principal component analysis and information gain. The hybrid model is then applied to support classification using machine learning techniques e.g. the Naive Bayes technique. Experimental results demonstrate that the hybrid filter model reduces data dimensions, selects appropriate feature sets, and reduces training time, hence providing better classification performance as measured by accuracy, precision and recall..

100 citations


Journal ArticleDOI
01 Feb 2021-Energy
TL;DR: The decomposition-ensemble learning model is an efficient and accurate model for wind energy forecasting, and outperform the CEEMD, STACK, and single models in all forecasting horizons, with a performance improvement that ranges 0.06%–97.53%.

98 citations


Journal ArticleDOI
TL;DR: A data-driven methodology for fault detection and diagnosis by integrating the principal component analysis (PCA) with the Bayesian network (BN) and a combination of vine copula and Bayes’ theorem suggests that the proposed framework provides superior performance.

86 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel supervised learning technique for forecasting: scaled principal component analysis (sPCA), and shows that, under some appropriate conditions on data, the sPCA forecast beats the PCA forecast, and when these conditions break down, extensive simulations indicate that the s PCA still has a large chance to outperform thePCA.
Abstract: This paper proposes a novel supervised learning technique for forecasting: scaled principal component analysis (sPCA). The sPCA improves the traditional principal component analysis (PCA) by scalin...

74 citations


Journal ArticleDOI
TL;DR: Empirical wavelet transform (EWT) helped to explore the hidden patterns of MI tasks by decomposing EEG data into different modes and regularization parameter tuning of NCA guaranteed to improve classification performance with significant features for each subject.
Abstract: Background: Analysis and classification of extensive medical data (e.g. electroencephalography (EEG) signals) is a significant challenge to develop effective brain–computer interface (BCI) system. Therefore, it is necessary to build automated classification framework to decode different brain signals. Methods: In the present study, two-step filtering approach is utilize to achieve resilience towards cognitive and external noises. Then, empirical wavelet transform (EWT) and four data reduction techniques; principal component analysis (PCA), independent component analysis (ICA), linear discriminant analysis (LDA) and neighborhood component analysis (NCA) are first time integrated together to explore dynamic nature and pattern mining of motor imagery (MI) EEG signals. Specifically, EWT helped to explore the hidden patterns of MI tasks by decomposing EEG data into different modes where every mode was consider as a feature vector in this study and each data reduction technique have been applied to all these modes to reduce the dimension of huge feature matrix. Moreover, an automated correlation-based components/coefficients selection criteria and parameter tuning were implemented for PCA, ICA, LDA, and NCA respectively. For the comparison purposes, all the experiments were performed on two publicly available datasets (BCI competition III dataset IVa and IVb). The performance of the experiments was verified by decoding three different channel combination strategies along with several neural networks. The regularization parameter tuning of NCA guaranteed to improve classification performance with significant features for each subject. Results: The experimental results revealed that NCA provides an average sensitivity, specificity, accuracy, precision, F1 score and kappa-coefficient of 100% for subject dependent case whereas 93%, 93%, 92.9%, 93%, 96.4% and 90% for subject independent case respectively. All the results were obtained with artificial neural networks, cascade-forward neural networks and multilayer perceptron neural networks (MLP) for subject dependent case while with MLP for subject independent case by utilizing 7 channels out of total 118. Such an increase in results can alleviate users to explain more clearly their MI activities. For instance, physically impaired person will be able to manage their wheelchair quite effectively, and rehabilitated persons may be able to improve their activities.

71 citations


Journal ArticleDOI
15 Apr 2021
TL;DR: This review will start by introducing the basic ideas of Principal Component Analysis, describe some concepts related to (PCA), and describe some of the main components of the method, and discuss what it can do.
Abstract: Big databases are increasingly widespread and are therefore hard to understand, in exploratory biomedicine science, big data in health research is highly exciting because data-based analyses can travel quicker than hypothesis-based research. Principal Component Analysis (PCA) is a method to reduce the dimensionality of certain datasets. Improves interpretability but without losing much information. It achieves this by creating new covariates that are not related to each other. Finding those new variables, or what we call the main components, will reduce the eigenvalue /eigenvectors solution problem. (PCA) can be said to be an adaptive data analysis technology because technology variables are developed to adapt to different data types and structures. This review will start by introducing the basic ideas of (PCA), describe some concepts related to (PCA), and discussing. What it can do, and reviewed fifteen articles of (PCA) that have been introduced and published in the last three years.

68 citations


Journal ArticleDOI
TL;DR: An information theoretic normalized Mutual Information (nMI)-based minimum Redundancy Maximum Relevance (mRMR) non-linear measure to select the intrinsic features from the transformed space of the previously proposed Segmented-Folded-PCA (Seg-Fol- PCA) and Spectrally Segmenting-Folding-PC a (SSeg-FOL-PCa) FE methods.
Abstract: Hyperspectral image (HSI) usually holds information of land cover classes as a set of many contiguous narrow spectral wavelength bands. For its efficient thematic mapping or classification, band (f...

68 citations


Journal ArticleDOI
TL;DR: In this article, the authors trace the journey of the spectra themselves through the operations behind principal component analysis, with each step illustrated by simulated spectra, and each PC is shown to be a successive refinement of the estimated principal component, improving the fit between PC reconstructed data and the original data.
Abstract: Spectroscopy rapidly captures a large amount of data that is not directly interpretable. Principal component analysis is widely used to simplify complex spectral datasets into comprehensible information by identifying recurring patterns in the data with minimal loss of information. The linear algebra underpinning principal component analysis is not well understood by many applied analytical scientists and spectroscopists who use principal component analysis. The meaning of features identified through principal component analysis is often unclear. This manuscript traces the journey of the spectra themselves through the operations behind principal component analysis, with each step illustrated by simulated spectra. Principal component analysis relies solely on the information within the spectra, consequently the mathematical model is dependent on the nature of the data itself. The direct links between model and spectra allow concrete spectroscopic explanation of principal component analysis , such as the scores representing "concentration" or "weights". The principal components (loadings) are by definition hidden, repeated and uncorrelated spectral shapes that linearly combine to generate the observed spectra. They can be visualized as subtraction spectra between extreme differences within the dataset. Each PC is shown to be a successive refinement of the estimated spectra, improving the fit between PC reconstructed data and the original data. Understanding the data-led development of a principal component analysis model shows how to interpret application specific chemical meaning of the principal component analysis loadings and how to analyze scores. A critical benefit of principal component analysis is its simplicity and the succinctness of its description of a dataset, making it powerful and flexible.

63 citations


Journal ArticleDOI
TL;DR: The qualitative and quantitative recognition of various drug- producing chemicals had been realized, which is a new way of on-line rapid sensor detection for drug-producing chemicals.
Abstract: With the rampant drug crime, the detection of drug-producing chemicals has put forward the great demand of multi-type and multi-concentration on-line rapid detection. The rapid development of dynamic measurement for semiconductor gas sensors provides a solution to this problem. However, the mutual blend of type and concentration information negatively affects sensor performance. In this paper, principal component analysis (PCA) was used for weak separation of type and concentration; k-Nearest Neighbor (KNN) was used for qualitative recognition; polynomial regression was used for quantitative recognition. The physical meaning of the dynamic response signal after PCA transformation was first proposed: PC1 has a weak concentration meaning; the combination of PC2, PC3, and PC4 has a weak type meaning. Based on the weak separation, the stepwise recognition method of qualitative classification and quantitative regression was first used to improve the recognition rate, the resolution and the generalization performance of the sensor. Using the inverse transformation of PCA, the principle of PCA and the method of ideal data verified the feasibility of this method. The qualitative and quantitative recognition of various drug-producing chemicals had been realized, which is a new way of on-line rapid sensor detection for drug-producing chemicals.

59 citations


Journal ArticleDOI
TL;DR: In this article, bagging ensemble learning method with decision tree has achieved the best performance in predicting heart disease, which is the deadliest disease and one of leading causes of death worldwide.
Abstract: Heart disease is the deadliest disease and one of leading causes of death worldwide. Machine learning is playing an essential role in the medical side. In this paper, ensemble learning methods are used to enhance the performance of predicting heart disease. Two features of extraction methods: linear discriminant analysis (LDA) and principal component analysis (PCA), are used to select essential features from the dataset. The comparison between machine learning algorithms and ensemble learning methods is applied to selected features. The different methods are used to evaluate models: accuracy, recall, precision, F-measure, and ROC.The results show the bagging ensemble learning method with decision tree has achieved the best performance.

Journal ArticleDOI
TL;DR: In this article, a novel intelligent diagnosis method based on multisensor fusion (MSF) and convolutional neural network (CNN) is explored and shows that the proposed method outperforms other DL-based methods in terms of accuracy.
Abstract: Diagnosis of mechanical faults in manufacturing systems is critical for ensuring safety and saving costs. With the development of data transmission and sensor technologies, measuring systems can acquire massive amounts of multi-sensor data. Although Deep-Learning (DL) provides an end-to-end way to address the drawbacks of traditional methods, it is necessary to do deep research on an intelligent fault diagnosis method based on Multi-Sensor Data. In this project, a novel intelligent diagnosis method based on Multi-Sensor Fusion (MSF) and Convolutional Neural Network (CNN) is explored. Firstly, a Multi-Signals-to-RGB-Image conversion method based on Principal Component Analysis (PCA) is applied to fuse multi-signal data into three-channel RGB images. Then, an improved CNN with residual networks is proposed, which can balance the relationship between computational cost and accuracy. Two datasets are used to verify the effectiveness of the proposed method. The results show the proposed method outperforms other DL-based methods in terms of accuracy.

Journal ArticleDOI
TL;DR: In this paper, the potential of using PCA for dimensionality reduction is illustrated on several real-world datasets, and several theoretical and practical aspects of PCA are reported in an accessible and integrated manner.
Abstract: Principal component analysis (PCA) is often applied for analyzing data in the most diverse areas. This work reports, in an accessible and integrated manner, several theoretical and practical aspects of PCA. The basic principles underlying PCA, data standardization, possible visualizations of the PCA results, and outlier detection are subsequently addressed. Next, the potential of using PCA for dimensionality reduction is illustrated on several real-world datasets. Finally, we summarize PCA-related approaches and other dimensionality reduction techniques. All in all, the objective of this work is to assist researchers from the most diverse areas in using and interpreting PCA.

Journal ArticleDOI
TL;DR: A recursive algorithm is proposed in this article, which has been shown to have less calculation complexity and a randomized algorithm to determine the width of the sliding window.
Abstract: This paper proposes a new combination of correlative statistical analysis and the sliding window technique to detect incipient faults. Compared with the existing monitoring methods based on principal component analysis and transformed component analysis, the combination fully uses the information from the process and quality variables. The sliding window, however, inevitably increases the computation burden due to the repeated window calculations. Therefore, a recursive algorithm is proposed in this paper, which has been shown to have less calculation complexity. Furthermore, a randomized algorithm is proposed to determine the width of sliding window. A numerical example and the thermal power plant process are presented to show the effectiveness and advantages of the proposed method.

Journal ArticleDOI
TL;DR: This paper proposes four improved RF methods that aim to reduce at first the amount of the training data and select the first kernel principal components using different kernel principal component analysis (PCA) based dimensionality reduction schemes.
Abstract: Random Forest (RF) is one of the mostly used machine learning techniques in fault detection and diagnosis of industrial systems. However, its implementation suffers from certain drawbacks when considering the correlations between variables. In addition, to perform a fault detection and diagnosis, the classical RF only uses the raw data by the direct use of measured variables. The direct raw data could yield to poor performance due to the data redundancies and noises. Thus, this paper proposes four improved RF methods to overcome the above-mentioned limitations. The developed methods aim to reduce at first the amount of the training data and select the first kernel principal components (KPCs) using different kernel principal component analysis (PCA) based dimensionality reduction schemes. Then, the retained KPCs are fed to the RF classifier for fault diagnosis purposes. Finally, the proposed techniques are applied to a wind energy conversion (WEC) system. Different case studies were investigated in order to illustrate the effectiveness and robustness of the developed techniques compared to the state-of-the-art methods. The obtained results show the low computation time and high diagnosis accuracy of the proposed approaches (an average accuracy of 91%).

Journal ArticleDOI
TL;DR: The convergence of the outlier eigenvalues and eigenvectors of the spiked separable covariance model $\widetilde{\mathcal{Q}}_1$ and the generalized components (i.e. the principal components) are proved with optimal convergence rates.
Abstract: We study a class of separable sample covariance matrices of the form Q˜1:=A˜1/2XB˜X∗A˜1/2. Here, A˜ and B˜ are positive definite matrices whose spectrums consist of bulk spectrums plus several spikes, that is, larger eigenvalues that are separated from the bulks. Conceptually, we call Q˜1 a spiked separable covariance matrix model. On the one hand, this model includes the spiked covariance matrix as a special case with B˜=I. On the other hand, it allows for more general correlations of datasets. In particular, for spatio-temporal dataset, A˜ and B˜ represent the spatial and temporal correlations, respectively. In this paper, we study the outlier eigenvalues and eigenvectors, that is, the principal components, of the spiked separable covariance model Q˜1. We prove the convergence of the outlier eigenvalues λ˜i and the generalized components (i.e., ⟨v,ξ˜i⟩ for any deterministic vector v) of the outlier eigenvectors ξ˜i with optimal convergence rates. Moreover, we also prove the delocalization of the nonoutlier eigenvectors. We state our results in full generality, in the sense that they also hold near the so-called BBP transition and for degenerate outliers. Our results highlight both the similarity and difference between the spiked separable covariance matrix model and the spiked covariance matrix model in (Probab. Theory Related Fields 164 (2016) 459–552). In particular, we show that the spikes of both A˜ and B˜ will cause outliers of the eigenvalue spectrum, and the eigenvectors can help to select the outliers that correspond to the spikes of A˜ (or B˜).

Journal ArticleDOI
Baigang Du1, Qiliang Zhou1, Jun Guo1, Shunsheng Guo1, Lei Wang1 
TL;DR: Wang et al. as discussed by the authors proposed a hybrid long short-term memory model combining with discrete wavelet transform (DWT) and principal component analysis (PCA) pre-processing techniques for water demand forecasting, i.e., DWT-PCA-LSTM.
Abstract: A reliable and accurate urban water demand forecasting plays a significant role in building intelligent water supplying system and smart city. Due to the high frequency noise and complicated relationships in water demand series, forecasting the urban water demand is not an easy task. In order to improve the model’s abilities in handling the complex patterns and catching the peaks in time series, we propose a hybrid long short-term memory model combining with discrete wavelet transform (DWT) and principal component analysis (PCA) pre-processing techniques for water demand forecasting, i.e., DWT-PCA-LSTM. First, the outliers of water demand series are identified and smoothed by 3σ criterion and weighted average method, respectively. Then, the noise component of water demand series is eliminated by DWT method and the principal components (PCs) among influencing factors of water demand are selected by PCA method. In addition, two LSTM networks are built to yield the daily urban water demand predictions using the results of DWT and PCA techniques. At last, the superiorities of the proposed model are demonstrated by comparing with the other benchmark predictive models. The water demand from 2016 to 2020 of a waterworks located in Suzhou, China is used for the experiment. The predictive performance of the experiments are evaluated by the mean absolute percentage error (MAPE), mean absolute percentage errors of peaks (pMAPE), explain variance score (EVS) and correlation coefficient (R). The results show that the DWT-PCA-LSTM model outperforms the other models and has satisfactory performance both in catching the peaks and the average prediction accuracy.

Journal ArticleDOI
TL;DR: In this paper, the authors compared the performance of support vector regression, relevance vector regression and Gaussian process regression on whole-brain region-based or voxel-based structural magnetic resonance imaging data with or without dimensionality reduction through principal component analysis.
Abstract: Brain morphology varies across the ageing trajectory and the prediction of a person's age using brain features can aid the detection of abnormalities in the ageing process. Existing studies on such "brain age prediction" vary widely in terms of their methods and type of data, so at present the most accurate and generalisable methodological approach is unclear. Therefore, we used the UK Biobank data set (N = 10,824, age range 47-73) to compare the performance of the machine learning models support vector regression, relevance vector regression and Gaussian process regression on whole-brain region-based or voxel-based structural magnetic resonance imaging data with or without dimensionality reduction through principal component analysis. Performance was assessed in the validation set through cross-validation as well as an independent test set. The models achieved mean absolute errors between 3.7 and 4.7 years, with those trained on voxel-level data with principal component analysis performing best. Overall, we observed little difference in performance between models trained on the same data type, indicating that the type of input data had greater impact on performance than model choice. All code is provided online in the hope that this will aid future research.

Journal ArticleDOI
TL;DR: A novel spectral–spatial and SuperPCA (S3-PCA) is proposed to learn the effective and low-dimensional features of HSIs and demonstrates the superiority of the proposed method over the state-of-the-art methods.
Abstract: As the most classical unsupervised dimension reduction algorithm, principal component analysis (PCA) has been widely used in hyperspectral images (HSIs) preprocessing and analysis tasks. Recently proposed superpixelwise PCA (SuperPCA) has shown promising accuracy where superpixels segmentation technique was first used to segment an HSI to various homogeneous regions and then PCA was adopted in each superpixel block to extract the local features. However, the local features could be ineffective due to the neglect of global information especially in some small homogeneous regions and/or in some large homogeneous regions with mixed ground truth objects. In this article, a novel spectral-spatial and SuperPCA (S³-PCA) is proposed to learn the effective and low-dimensional features of HSIs. Inspired by SuperPCA we further adopt superpixels-based local reconstruction to filter the HSIs and use the PCA-based global features as the supplement of local features. It turns out that the global-local and spectral-spatial features can be well exploited. Specifically, each pixel of an HSI is reconstructed by the nearest neighbors' pixels in the same superpixel block, which could eliminate the noise and enhance the spatial information adaptively. After the local reconstruction-based data preprocessing, PCA is performed on each region and the entire HSI to obtain local and global features, respectively. Then we simply concatenate them to get the global-local and spectral-spatial features for HSIs classification. The experimental results on two HSIs data sets demonstrate the superiority of the proposed method over the state-of-the-art methods. The source code of the proposed model is available at https://github.com/XinweiJiang/S3-PCA.

Journal ArticleDOI
TL;DR: A preprocessing method to remove noisy features is coupled with predication methods to improve the performance of the energy consumption prediction models and shows that the proposed method enables practitioners to efficiently acquire a smart dataset from any big dataset for energy consumption predictions problems.

Journal ArticleDOI
TL;DR: An effective method based on three-order tensor creation and Tucker decomposition (TCTD) is proposed, which detects targets with various brightness, spatial sizes, and intensities and ensures that targets can be preserved on the remaining minor principal components.
Abstract: Existing infrared small-target detection methods tend to perform unsatisfactorily when encountering complex scenes, mainly due to the following: 1) the infrared image itself has a low signal-to-noise ratio (SNR) and insufficient detailed/texture knowledge; 2) spatial and structural information is not fully excavated. To avoid these difficulties, an effective method based on three-order tensor creation and Tucker decomposition (TCTD) is proposed, which detects targets with various brightness, spatial sizes, and intensities. In the proposed TCTD, multiple morphological profiles, i.e., diverse attributes and different shapes of trees, are designed to create three-order tensors, which can exploit more spatial and structural information to make up for lacking detailed/texture knowledge. Then, Tucker decomposition is employed, which is capable of estimating and eliminating the major principal components (i.e., most of the background) from three dimensions. Thus, targets can be preserved on the remaining minor principal components. Image contrast is further enhanced by fusing the detection maps of multiple morphological profiles and several groups with discontinuous pruning values. Extensive experiments validated on two synthetic data and six real data sets demonstrate the effectiveness and robustness of the proposed TCTD.

Journal ArticleDOI
TL;DR: For handling the nonlinearity problem, this paper combines artificial neural networks (ANN) with principal component analysis (PCA) and pro-parameter analysis ( PCA) to solve the non linearity problem.
Abstract: Nonlinearity is extremely common in industrial processes. For handling the nonlinearity problem, this paper combines artificial neural networks (ANN) with principal component analysis (PCA) and pro...

Journal ArticleDOI
TL;DR: In this article, the authors consider 8 different optimization formulations for computing a single sparse loading vector; these are obtained by combining the following factors: they employ two norms for measuring variance (L2, L1) and two sparsityinducing norms (L0, L 1), which are used in two different ways (constraint, penalty) and give a unifying reformulation which is solved via a natural alternating maximization (AM) method.
Abstract: Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse loading vector; these are obtained by combining the following factors: we employ two norms for measuring variance (L2, L1) and two sparsity-inducing norms (L0, L1), which are used in two different ways (constraint, penalty). Three of our formulations, notably the one with L0 constraint and L1 variance, have not been considered in the literature. We give a unifying reformulation which we propose to solve via a natural alternating maximization (AM) method. We show the the AM

Journal ArticleDOI
TL;DR: Well log data from the Volve Field are used for validation of the prediction obtained by random forest, in which a high correlation coefficient between prediction and reference is achieved.

Journal ArticleDOI
TL;DR: The work presented in this paper aims to reduce the amount of data which needs to be transmitted by the extraction of descriptive features of the voltage, and then reducing the number of features.

Journal ArticleDOI
TL;DR: A water quality prediction model utilizing the principal component regression technique and the Gradient Boosting Classifier method, which show credible performance compared with the state-of-art models.

Journal ArticleDOI
TL;DR: An efficient method of kernel sample equivalence replacement is established to replace the partial differential operations of the kernel gradient algorithm, which can convert nonlinear fault detection indicators into the standard quadratic forms of the original variable sample, thereby making it possible to solve the non linear fault diagnosis problem by linear manners.
Abstract: This article is devoted to solving the problem of quality-related root cause diagnosis for nonlinear process. First, an orthogonal kernel principal component regression model is constructed to achieve orthogonal decomposition of feature space, such that quality-related and quality-unrelated faults can be separately detected in the subspaces of opposite correlations to the output, without any effect on each other. Then, in view of the high complexity of traditional nonlinear fault diagnosis methods, an efficient method of kernel sample equivalence replacement is established to replace the partial differential operations of the kernel gradient algorithm, which can convert nonlinear fault detection indicators into the standard quadratic forms of the original variable sample, thereby making it possible to solve the nonlinear fault diagnosis problem by linear manners. Furthermore, a transfer entropy algorithm is utilized to the new model to analyze the causality between the diagnosed candidate faulty variables to find out the accurate root cause of the fault. Finally, comparative studies between the latest result and the proposed one are carried out in the Tennessee Eastman process to verify the effectiveness and superiority of the new method.

Journal ArticleDOI
TL;DR: In this paper, a superpixel-wise adaptive singular spectrum analysis (SpaSSA) was proposed for hyperspectral image (HSI) feature extraction, where the SSA and 2D-SSA are combined and adaptively applied to each superpixel derived from an oversegmented HSI.
Abstract: Singular spectral analysis (SSA) has recently been successfully applied to feature extraction in hyperspectral image (HSI), including conventional (1-D) SSA in spectral domain and 2-D SSA in spatial domain. However, there are some drawbacks, such as sensitivity to the window size, high computational complexity under a large window, and failing to extract joint spectral-spatial features. To tackle these issues, in this article, we propose superpixelwise adaptive SSA (SpaSSA), that is superpixelwise adaptive SSA for exploiting local spatial information of HSI. The extraction of local (instead of global) features, particularly in HSI, can be more effective for characterizing the objects within an image. In SpaSSA, conventional SSA and 2-D SSA are combined and adaptively applied to each superpixel derived from an oversegmented HSI. According to the size of the derived superpixels, either SSA or 2-D singular spectrum analysis (2D-SSA) is adaptively applied for feature extraction, where the embedding window in 2D-SSA is also adaptive to the size of the superpixel. Experimental results on the three datasets have shown that the proposed SpaSSA outperforms both SSA and 2D-SSA in terms of classification accuracy and computational complexity. By combining SpaSSA with the principal component analysis (SpaSSA-PCA), the accuracy of land-cover analysis can be further improved, outperforming several state-of-the-art approaches.

Journal ArticleDOI
TL;DR: Light is shed on the versatility and capability of PCA in processing high-dimensional data in general and specifically on its potential in forensic studies to help other research communities interested in unravelling latent structure from high- dimensional data.

Journal ArticleDOI
Yao Duan1, Chuanchuan Yang1, Chen Hao, Weizhen Yan, Hongbin Li1 
TL;DR: A new noise reduction method to filter LiDAR point clouds, i.e. an adaptive clustering method based on principal component analysis (PCA), which derives low computational complexity, effectively removing noises while retaining details of environmental features.