scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 2020"


Journal ArticleDOI
TL;DR: Two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are investigated on four popular Machine Learning (ML) algorithms using publicly available Cardiotocography dataset from University of California and Irvine Machine Learning Repository to prove that PCA outperforms LDA in all the measures.
Abstract: Due to digitization, a huge volume of data is being generated across several sectors such as healthcare, production, sales, IoT devices, Web, organizations. Machine learning algorithms are used to uncover patterns among the attributes of this data. Hence, they can be used to make predictions that can be used by medical practitioners and people at managerial level to make executive decisions. Not all the attributes in the datasets generated are important for training the machine learning algorithms. Some attributes might be irrelevant and some might not affect the outcome of the prediction. Ignoring or removing these irrelevant or less important attributes reduces the burden on machine learning algorithms. In this work two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are investigated on four popular Machine Learning (ML) algorithms, Decision Tree Induction, Support Vector Machine (SVM), Naive Bayes Classifier and Random Forest Classifier using publicly available Cardiotocography (CTG) dataset from University of California and Irvine Machine Learning Repository. The experimentation results prove that PCA outperforms LDA in all the measures. Also, the performance of the classifiers, Decision Tree, Random Forest examined is not affected much by using PCA and LDA.To further analyze the performance of PCA and LDA the eperimentation is carried out on Diabetic Retinopathy (DR) and Intrusion Detection System (IDS) datasets. Experimentation results prove that ML algorithms with PCA produce better results when dimensionality of the datasets is high. When dimensionality of datasets is low it is observed that the ML algorithms without dimensionality reduction yields better results.

414 citations


Journal ArticleDOI
TL;DR: In this article, a combination of principal component analysis (PCA) and convolutional neural networks (CNN) is used to predict the entire stress-strain behavior of binary composites evaluated over the entire failure path.

186 citations


Journal ArticleDOI
TL;DR: The experimental results proved that the combination of chi-square with PCA obtains greater performance in most classifiers and the usage of PCA directly from the raw data computed lower results and would require greater dimensionality to improve the results.

180 citations


Journal ArticleDOI
TL;DR: A deep feedforward method to classify the given microarray cancer data into a set of classes for subsequent diagnosis purposes using a 7-layer deep neural network architecture having various parameters for each dataset is developed.

138 citations


Journal ArticleDOI
TL;DR: This work aims at finding links between cognitive symptoms and the underlying neurodegeneration process by fusing the information of neuropsychological test outcomes, diagnoses, and other clinical data with the imaging features extracted solely via a data-driven decomposition of MRI.
Abstract: Many classical machine learning techniques have been used to explore Alzheimer's disease (AD), evolving from image decomposition techniques such as principal component analysis toward higher complexity, non-linear decomposition algorithms. With the arrival of the deep learning paradigm, it has become possible to extract high-level abstract features directly from MRI images that internally describe the distribution of data in low-dimensional manifolds. In this work, we try a new exploratory data analysis of AD based on deep convolutional autoencoders. We aim at finding links between cognitive symptoms and the underlying neurodegeneration process by fusing the information of neuropsychological test outcomes, diagnoses, and other clinical data with the imaging features extracted solely via a data-driven decomposition of MRI. The distribution of the extracted features in different combinations is then analyzed and visualized using regression and classification analysis, and the influence of each coordinate of the autoencoder manifold over the brain is estimated. The imaging-derived markers could then predict clinical variables with correlations above 0.6 in the case of neuropsychological evaluation variables such as the MMSE or the ADAS11 scores, achieving a classification accuracy over 80% for the diagnosis of AD.

124 citations


Journal ArticleDOI
TL;DR: A novel cross-attention mechanism and graph convolution integration algorithm that achieves better performances than do other well-known algorithms using different methods of training set division to obtain better hyperspectral data classification results.
Abstract: An attention mechanism assigns different weights to different features to help a model select the features most valuable for accurate classification. However, the traditional attention mechanism algorithm often allocates weights in a one-way fashion, which can result in a loss of feature information. To obtain better hyperspectral data classification results, a novel cross-attention mechanism and graph convolution integration algorithm are proposed in this letter. First, principal component analysis is used to reduce the dimensionality of hyperspectral images to obtain low-dimensional features that are more expressive. Second, the model uses a cross (horizontal and vertical directions) attention algorithm to allocate weights jointly based on its two strategies; then, it adopts a graph convolution algorithm to generate the directional relationships between the features. Finally, the generated deep features and the relationship between the deep features are used to complete the prediction of hyperspectral data. Experiments on three well-known hyperspectral data sets--Indian Pines, the University of Pavia, and Salinas--show that the proposed algorithm achieves better performances than do other well-known algorithms using different methods of training set division.

117 citations


Journal ArticleDOI
TL;DR: In this paper, an augmented geographically weighted regression (GWR) model was developed to analyze the spatial distribution of PM2.5 concentrations through the incorporation of Geodetector analysis and principal component analysis (PCA).

105 citations


Journal ArticleDOI
TL;DR: This work proposes a new formulation of logistic PCA which extends Pearson’s formulation of a low dimensional data representation with minimum error to binary data and derives explicit solutions for data matrices of special structure and provides a computationally efficient algorithm for solving for the principal component loadings.

99 citations


Journal ArticleDOI
TL;DR: Experimental results on several real hyperspectral data sets demonstrate that the proposed method outperforms other state-of-the-art methods.
Abstract: In this article, a novel hyperspectral anomaly detection method with kernel Isolation Forest (iForest) is proposed. The method is based on an assumption that anomalies rather than background can be more susceptible to isolation in the kernel space. Based on this idea, the proposed method detects anomalies as follows. First, the hyperspectral data are mapped into the kernel space, and the first $K$ principal components are used. Then, the isolation samples in the image are detected with the iForest constructed using randomly selected samples in the principal components. Finally, the initial anomaly detection map is iteratively refined with locally constructed iForest in connected regions with large areas. Experimental results on several real hyperspectral data sets demonstrate that the proposed method outperforms other state-of-the-art methods.

95 citations


Journal ArticleDOI
TL;DR: This work contributes to successfully implement spatial PCA to reduce signal dimensionality and to select the suitable features based on the t-statistical inferences among the classes to achieve a highly efficient brain-computer interface (BCI) system regarding emotion recognition from electroencephalogram signal.

86 citations


Journal ArticleDOI
TL;DR: This study provides a highly robust and accurate method for predicting and mapping regional SOC contents and indicates that at a low decomposition scale, DWT can effectively eliminate the noise in satellite hyperspectral data, and the FDR combined withDWT can improve the SOC prediction accuracy significantly.

Journal ArticleDOI
TL;DR: An advanced approach known as t-Distributed Stochastic Neighbor embedding (t-SNE) algorithm is introduced into the ink analysis problem, which extracts the non-linear similarity features between spectra to scale them into a lower dimension.

Journal ArticleDOI
TL;DR: It is found gap‐filling uncertainty is much larger than measurement uncertainty in accumulated CH4 budget, and therefore, the approach used for FCH4 gap filling can have important implications for characterizing annual ecosystem‐scale methane budgets.
Abstract: Methane flux (FCH4 ) measurements using the eddy covariance technique have increased over the past decade. FCH4 measurements commonly include data gaps, as is the case with CO2 and energy fluxes. However, gap-filling FCH4 data are more challenging than other fluxes due to its unique characteristics including multidriver dependency, variabilities across multiple timescales, nonstationarity, spatial heterogeneity of flux footprints, and lagged influence of biophysical drivers. Some researchers have applied a marginal distribution sampling (MDS) algorithm, a standard gap-filling method for other fluxes, to FCH4 datasets, and others have applied artificial neural networks (ANN) to resolve the challenging characteristics of FCH4 . However, there is still no consensus regarding FCH4 gap-filling methods due to limited comparative research. We are not aware of the applications of machine learning (ML) algorithms beyond ANN to FCH4 datasets. Here, we compare the performance of MDS and three ML algorithms (ANN, random forest [RF], and support vector machine [SVM]) using multiple combinations of ancillary variables. In addition, we applied principal component analysis (PCA) as an input to the algorithms to address multidriver dependency of FCH4 and reduce the internal complexity of the algorithmic structures. We applied this approach to five benchmark FCH4 datasets from both natural and managed systems located in temperate and tropical wetlands and rice paddies. Results indicate that PCA improved the performance of MDS compared to traditional inputs. ML algorithms performed better when using all available biophysical variables compared to using PCA-derived inputs. Overall, RF was found to outperform other techniques for all sites. We found gap-filling uncertainty is much larger than measurement uncertainty in accumulated CH4 budget. Therefore, the approach used for FCH4 gap filling can have important implications for characterizing annual ecosystem-scale methane budgets, the accuracy of which is important for evaluating natural and managed systems and their interactions with global change processes.

Journal ArticleDOI
TL;DR: A robust and scalable SPCA algorithm is demonstrated by formulating it as a value-function optimization problem, which can further leverage randomized methods from linear algebra to extend the approach to the large-scale (big data) setting.
Abstract: Sparse principal component analysis (SPCA) has emerged as a powerful technique for modern data analysis, providing improved interpretation of low-rank structures by identifying localized spatial st...

Journal ArticleDOI
30 Mar 2020-Sensors
TL;DR: It is observed that the proposed combined path loss and shadowing model is more accurate and flexible compared to the conventional linear path loss plus log-normalshadowing model.
Abstract: Although various linear log-distance path loss models have been developed for wireless sensor networks, advanced models are required to more accurately and flexibly represent the path loss for complex environments. This paper proposes a machine learning framework for modeling path loss using a combination of three key techniques: artificial neural network (ANN)-based multi-dimensional regression, Gaussian process-based variance analysis, and principle component analysis (PCA)-aided feature selection. In general, the measured path loss dataset comprises multiple features such as distance, antenna height, etc. First, PCA is adopted to reduce the number of features of the dataset and simplify the learning model accordingly. ANN then learns the path loss structure from the dataset with reduced dimension, and Gaussian process learns the shadowing effect. Path loss data measured in a suburban area in Korea are employed. We observe that the proposed combined path loss and shadowing model is more accurate and flexible compared to the conventional linear path loss plus log-normal shadowing model.

Journal ArticleDOI
TL;DR: A guideline is developed to select an appropriate PCA implementation based on the differences in the computational environment of users and developers to show that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms.
Abstract: Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory. In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms. We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.

Journal ArticleDOI
TL;DR: In this article, principal component analysis (PCA) was applied to angle-resolved XPS spectra for thermally oxidized nickel titanium alloy to determine the ratio of various oxides in the mixture.

Journal ArticleDOI
TL;DR: A machine learning framework to explore the predictability limits of catalytic activity from experimental descriptor data (which characterizes catalyst formulations and reaction conditions) is presented.
Abstract: We present a machine learning framework to explore the predictability limits of catalytic activity from experimental descriptor data (which characterizes catalyst formulations and reaction conditions). Artificial neural networks are used to fuse descriptor data to predict activity and we use principal component analysis (PCA) and sparse PCA to project the experimental data into an information space and with this identify regions that exhibit low- and high-predictability. Our framework also incorporates a constrained-PCA optimization formulation that identifies new experimental points while filtering out regions in the experimental space due to constraints on technology, economics, and expert knowledge. This allows us to navigate the experimental space in a more targeted manner. Our framework is applied to a comprehensive water–gas shift reaction data set, which contains 2228 experimental data points collected from the literature. Neural network analysis reveals strong predictability of activity across reaction conditions (e.g., varying temperature) but also reveals important gaps in predictability across catalyst formulations (e.g., varying metal, support, and promoter). PCA analysis reveals that these gaps are due to the fact that most experiments reported in the literature lie within narrow regions in the information space. We demonstrate that our framework can systematically guide experiments and the selection of descriptors in order to improve predictability and identify new promising formulations.

Journal ArticleDOI
TL;DR: The objectives of this study were to compare between the performances of a nonlinear dimensionality technique to a standard linear dimensionality method when applied for single subject EMG based hand movement classification, and to examined their performances in case of limited amount of training data samples.
Abstract: Surface electromyography (EMG) is non-invasive signal acquisition technique that plays a central role in many application, including clinical diagnostics, control for prosthetic devices and for human-machine interactions. The processing typically begins with a feature extraction step, which may be followed by the application of a dimensionality reduction technique. The obtained reduced features are input for a machine learning classifier. The constructed machine learning model may then classify new recorded movements. The features extracted for EMG signals usually capture information both from the time and from the frequency domain. Short time Fourier transform (STFT) is commonly used for signal processing and in particular for EMG processing since it captures the temporal and the frequency characteristics of the data. Since the number of calculated STFT features is large, a common approach in signal processing and machine learning applications is to apply a linear or a nonlinear dimensionality reduction technique for simplifying the feature space. Another aspect that arises in medical applications in general and in EMG based hand classification in particular, is the large variability between subjects. Due to this variability, many studies focus on single subject classification. This requires acquiring a large training set for each tested participant which is not practical in real life application. The objectives of this study were first to compare between the performances of a nonlinear dimensionality technique to a standard linear dimensionality method when applied for single subject EMG based hand movement classification, and to examined their performances in case of limited amount of training data samples. The second objective was to propose an algorithm for multi-subjects classification that utilized a data alignment step for overcoming the large variability between subjects. The data set included EMG signals from 5 subjects who perform 6 different hand movements. STFT was calculated for feature extraction, principal component analysis (PCA) and diffusion maps (DM) were compared for dimension reductions. An affine transformation for aligning between the reduced feature spaces of two subjects, was investigated. K-nearest neighbors (KNN) was used for single and multi-subject classification. The results of this study clearly show that the DM outperformed the PCA in case of limited training data. In addition, the multi-subject classification approach, which utilizes dimension reduction methods along with an alignment algorithm enable robust classification of a new subject based on another subjects’ data sets. The proposed framework is general and can be adopted for many EMG classification task.

Journal ArticleDOI
TL;DR: In this paper, the authors developed an estimator for latent factors in a large-dimensional panel of financial data that can explain expected excess returns. But their estimator cannot find asset-pricing factors, which cannot be detected with PCA, even if a large amount of data is available.

Journal ArticleDOI
TL;DR: A novel dynamic weight principal component analysis (DWPCA) algorithm and a hierarchical monitoring strategy are proposed to further increase the fault detection rate while preserving the universality of the algorithm.
Abstract: Traditional monitoring algorithms use the normal data for modeling, which are universal for different types of faults. However, these algorithms may perform poorly sometimes because of the lack of fault information. In order to further increase the fault detection rate while preserving the universality of the algorithm, a novel dynamic weight principal component analysis (DWPCA) algorithm and a hierarchical monitoring strategy are proposed. In the first layer, the dynamic PCA is used for fault detection and diagnosis, if no fault is detected, the following DWPCA-based second layer monitoring will be triggered. In the second layer, the principal components (PCs) are weighted according to its ability in distinguishing between the normal and fault conditions, then the PCs which own larger weight are selected to construct the monitoring model. Compared to the DPCA method, the proposed DWPCA algorithm establishes the monitoring model by combining the information of fault. Afterward, the DWPCA-based variable relative contribution and a novel control limit for the variable relative contribution are presented for the fault diagnosis. Finally, the superiority of the proposed method is demonstrated by a numerical case and the Tennessee Eastman process.

Journal ArticleDOI
TL;DR: This paper aims to explore dimensionality reduction on a real telecom dataset and evaluate customers’ clustering in reduced and latent space, compared to original space in order to achieve better quality clustering results.
Abstract: Telecom Companies logs customer’s actions which generate a huge amount of data that can bring important findings related to customer’s behavior and needs. The main characteristics of such data are the large number of features and the high sparsity that impose challenges to the analytics steps. This paper aims to explore dimensionality reduction on a real telecom dataset and evaluate customers’ clustering in reduced and latent space, compared to original space in order to achieve better quality clustering results. The original dataset contains 220 features that belonging to 100,000 customers. However, dimensionality reduction is an important data preprocessing step in the data mining process specially with the presence of curse of dimensionality. In particular, the aim of data reduction techniques is to filter out irrelevant features and noisy data samples. To reduce the high dimensional data, we projected it down to a subspace using well known Principal Component Analysis (PCA) decomposition and a novel approach based on Autoencoder Neural Network, performing in this way dimensionality reduction of original data. Then K-Means Clustering is applied on both-original and reduced data set. Different internal measures were performed to evaluate clustering for different numbers of dimensions and then we evaluated how the reduction method impacts the clustering task.

Journal ArticleDOI
TL;DR: This paper combines multi-strategy feature selection and grouped feature extraction and a novel fast hybrid dimension reduction method, incorporating their advantages of removing irrelevant and redundant information to reduce the dimensionality of the raw data fast.
Abstract: Dimensionality reduction is one basic and critical technology for data mining, especially in current “big data” era. As two different types of methods, feature selection and feature extraction each have their pros and cons. In this paper, we combine multi-strategy feature selection and grouped feature extraction and propose a novel fast hybrid dimension reduction method, incorporating their advantages of removing irrelevant and redundant information. Firstly, the intrinsic dimensionality of the data set is estimated by the maximum likelihood estimation method. Fisher Score and Information Gain based feature selection are used as multi-strategy methods to remove irrelevant features. With the redundancy among the selected features as clustering criterion, they are grouped into a certain amount of clusters. In every cluster, Principal Component Analysis (PCA) based feature extraction is carried out to remove redundant information. Four classical classifiers and representation entropy are used to evaluate the classification performance and information loss of the reduced set. The runtime results of different methods show that the proposed hybrid method is consistently much faster than the other three in almost all of the sets used. Meanwhile, the proposed method shows competitive classification performance, which has no significant difference basically compared with the other methods. The proposed method reduces the dimensionality of the raw data fast and it has excellent efficiency and competitive classification performance compared with the contrastive methods.

Posted ContentDOI
25 Aug 2020-bioRxiv
TL;DR: A generative model is developed to simulate synthetic datasets with multivariate associations, and characterized how obtained feature profiles can be unstable, which hinders interpretability and generalizability, unless a sufficient number of samples are available to estimate them.
Abstract: Associations between high-dimensional datasets, each comprising many features, can be discovered through multivariate statistical methods, like Canonical Correlation Analysis (CCA) or Partial Least Squares (PLS). CCA and PLS are widely used methods which reveal which features carry the association. Despite the longevity and popularity of CCA/PLS approaches, their application to high-dimensional datasets raises critical questions about the reliability of CCA/PLS solutions. In particular, overfitting can produce solutions that are not stable across datasets, which severely hinders their interpretability and generalizability. To study these issues, we developed a generative model to simulate synthetic datasets with multivariate associations, parameterized by feature dimensionality, data variance structure, and assumed latent association strength. We found that resulting CCA/PLS associations could be highly inaccurate when the number of samples per feature is relatively small. For PLS, the profiles of feature weights exhibit detrimental bias toward leading principal component axes. We confirmed these model trends in state-ofthe-art datasets containing neuroimaging and behavioral measurements in large numbers of subjects, namely the Human Connectome Project (n ≈ 1000) and UK Biobank (n = 20000), where we found that only the latter comprised enough samples to obtain stable estimates. Analysis of the neuroimaging literature using CCA to map brain-behavior relationships revealed that the commonly employed sample sizes yield unstable CCA solutions. Our generative modeling framework provides a calculator of dataset properties required for stable estimates. Collectively, our study characterizes dataset properties needed to limit the potentially detrimental effects of overfitting on stability of CCA/PLS solutions, and provides practical recommendations for future studies. Significance Statement Scientific studies often begin with an observed association between different types of measures. When datasets comprise large numbers of features, multivariate approaches such as canonical correlation analysis (CCA) and partial least squares (PLS) are often used. These methods can reveal the profiles of features that carry the optimal association. We developed a generative model to simulate data, and characterized how obtained feature profiles can be unstable, which hinders interpretability and generalizability, unless a sufficient number of samples is available to estimate them. We determine sufficient sample sizes, depending on properties of datasets. We also show that these issues arise in neuroimaging studies of brain-behavior relationships. We provide practical guidelines and computational tools for future CCA and PLS studies.

Journal ArticleDOI
TL;DR: A new method about the multi-fault condition monitoring of slurry pump based on principal component analysis (PCA) and sequential probability ratio test (SPRT) is proposed.
Abstract: A new method about the multi-fault condition monitoring of slurry pump based on principal component analysis (PCA) and sequential probability ratio test (SPRT) is proposed. The method identifies th...

Journal ArticleDOI
TL;DR: The objective of this review is to demonstrate the analytical performance of High Resolution Mass Spectrometry (HRMS) in the field of food authenticity assessment, allowing the determination of a wide range of food constituents with exceptional identification capabilities.

Journal ArticleDOI
TL;DR: This study provides a novel methodology to predict monthly water demand based on several weather variables scenarios by using combined techniques including discrete wavelet transform, principal component analysis, and particle swarm optimisation.
Abstract: This study provides a novel methodology to predict monthly water demand based on several weather variables scenarios by using combined techniques including discrete wavelet transform, principal component analysis, and particle swarm optimisation. To our knowledge, the adopted approach is the first technique to be proposed and applied in the water demand prediction. Compared to traditional methods, the developed methodology is superior in terms of predictive accuracy and runtime. Water consumption coupled with weather variables of the Melbourne City, from 2006 to 2015, were obtained from the South East Water retail company. The results showed that using data pre-processing techniques can significantly improve the quality of data and to select the best model input scenario. Additionally, it was noticed that the particle swarm optimisation algorithm accurately predicts the constants of the suggested model. Furthermore, the results confirmed that the proposed methodology accurately estimated the monthly data of municipal water demand based on a range of statistical criteria.

Journal ArticleDOI
TL;DR: A novel reformulation of L1-norm kernel PCA is provided through which an equivalent, geometrically interpretable problem is obtained and a “fixed-point” type algorithm that iteratively computes a binary weight for each observation is presented.
Abstract: We present an algorithm for L1-norm kernel PCA and provide a convergence analysis for it. While an optimal solution of L2-norm kernel PCA can be obtained through matrix decomposition, finding that of L1-norm kernel PCA is not trivial due to its non-convexity and non-smoothness. We provide a novel reformulation through which an equivalent, geometrically interpretable problem is obtained. Based on the geometric interpretation of the reformulated problem, we present a “fixed-point” type algorithm that iteratively computes a binary weight for each observation. As the algorithm requires only inner products of data vectors, it is computationally efficient and the kernel trick is applicable. In the convergence analysis, we show that the algorithm converges to a local optimal solution in a finite number of steps. Moreover, we provide a rate of convergence analysis, which has been never done for any L1-norm PCA algorithm, proving that the sequence of objective values converges at a linear rate. In numerical experiments, we show that the algorithm is robust in the presence of entry-wise perturbations and computationally scalable, especially in a large-scale setting. Lastly, we introduce an application to outlier detection where the model based on the proposed algorithm outperforms the benchmark algorithms.

Journal ArticleDOI
TL;DR: The HPCA model’s conclusion can obviously reduce the interference of redundant information and effectively separate the saliency object from the background, and it had more improved detection accuracy than others.
Abstract: Aiming at the problems of intensive background noise, low accuracy, and high computational complexity of the current significant object detection methods, the visual saliency detection algorithm based on Hierarchical Principal Component Analysis (HPCA) has been proposed in the paper. Firstly, the original RGB image has been converted to a grayscale image, and the original grayscale image has been divided into eight layers by the bit surface stratification technique. Each image layer contains significant object information matching the layer image features. Secondly, taking the color structure of the original image as the reference image, the grayscale image is reassigned by the grayscale color conversion method, so that the layered image not only reflects the original structural features but also effectively preserves the color feature of the original image. Thirdly, the Principal Component Analysis (PCA) has been performed on the layered image to obtain the structural difference characteristics and color difference characteristics of each layer of the image in the principal component direction. Fourthly, two features are integrated to get the saliency map with high robustness and to further refine our results; the known priors have been incorporated on image organization, which can place the subject of the photograph near the center of the image. Finally, the entropy calculation has been used to determine the optimal image from the layered saliency map; the optimal map has the least background information and most prominently saliency objects than others. The object detection results of the proposed model are closer to the ground truth and take advantages of performance parameters including precision rate (PRE), recall rate (REC), and - measure (FME). The HPCA model’s conclusion can obviously reduce the interference of redundant information and effectively separate the saliency object from the background. At the same time, it had more improved detection accuracy than others.

Journal ArticleDOI
TL;DR: The experiments on two hyperspectral data sets show that the LSSTRPCA can successfully remove outliers or gross errors and achieve higher accuracies than both the original robust principal component analysis (RPCA) and tensor robust principal components analysis (TRPCA).
Abstract: This letter proposes a lateral-slice sparse tensor robust principal component analysis (LSSTRPCA) method to remove gross errors or outliers from hyperspectral images so as to promote the performance of subsequent classification. The LSSTRPCA assumes that a three-order hyperspectral tensor has a low-rank structure, and gross errors or outliers are sparsely scattered in a 2-D space (i.e., lateral-slice) of the tensor. It formulates a low-rank and sparse tensor decomposition problem into a convex problem and then implements the inexact augmented Lagrange multiplier method to solve it. The experiments on two hyperspectral data sets show that the LSSTRPCA can successfully remove outliers or gross errors and achieve higher accuracies than both the original robust principal component analysis (RPCA) and tensor robust principal component analysis (TRPCA).