Showing papers on "Principal component analysis published in 2021"

PDF

Open Access

Book•

Exploratory Data Analysis with MATLAB

[...]

Wendy L. Martinez, Angel R. Martinez

30 Sep 2021

TL;DR: This book discusses Exploratory Data Analysis, Hierarchical Methods Optimization Methods-k-Means, and more.

...read moreread less

Abstract: INTRODUCTION TO EXPLORATORY DATA ANALYSIS Introduction to Exploratory Data Analysis What Is Exploratory Data Analysis Overview of the Text A Few Words about Notation Data Sets Used in the Book Transforming Data EDA AS PATTERN DISCOVERY Dimensionality Reduction - Linear Methods Introduction Principal Component Analysis (PCA) Singular Value Decomposition (SVD) Nonnegative Matrix Factorization Factor Analysis Fisher's Linear Discriminant Intrinsic Dimensionality Dimensionality Reduction - Nonlinear Methods Multidimensional Scaling (MDS) Manifold Learning Artificial Neural Network Approaches Data Tours Grand Tour Interpolation Tours Projection Pursuit Projection Pursuit Indexes Independent Component Analysis Finding Clusters Introduction Hierarchical Methods Optimization Methods-k-Means Spectral Clustering Document Clustering Evaluating the Clusters Model-Based Clustering Overview of Model-Based Clustering Finite Mixtures Expectation-Maximization Algorithm Hierarchical Agglomerative Model-Based Clustering Model-Based Clustering MBC for Density Estimation and Discriminant Analysis Generating Random Variables from a Mixture Model Smoothing Scatterplots Introduction Loess Robust Loess Residuals and Diagnostics with Loess Smoothing Splines Choosing the Smoothing Parameter Bivariate Distribution Smooths Curve Fitting Toolbox GRAPHICAL METHODS FOR EDA Visualizing Clusters Dendrogram Treemaps Rectangle Plots ReClus Plots Data Image Distribution Shapes Histograms Boxplots Quantile Plots Bagplots Rangefinder Boxplot Multivariate Visualization Glyph Plots Scatterplots Dynamic Graphics Coplots Dot Charts Plotting Points as Curves Data Tours Revisited Biplots Appendix A: Proximity Measures Appendix B: Software Resources for EDA Appendix C: Description of Data Sets Appendix D: Introduction to MATLAB Appendix E: MATLAB Functions References Index Summary, Further Reading, and Exercises appear at the end of each chapter.

...read moreread less

320 citations

Journal Article•DOI•

Feature Selection for Classification using Principal Component Analysis and Information Gain

[...]

Erick Odhiambo Omuya¹, George Okeyo², Michael Kimwele³•Institutions (3)

Machakos University¹, Carnegie Mellon University², Jomo Kenyatta University of Agriculture and Technology³

15 Jul 2021-Expert Systems With Applications

TL;DR: Experimental results demonstrate that the hybrid filter model reduces data dimensions, selects appropriate feature sets, and reduces training time, hence providing better classification performance as measured by accuracy, precision and recall.

...read moreread less

Abstract: Feature Selection and classification have previously been widely applied in various areas like business, medical and media fields. High dimensionality in datasets is one of the main challenges that has been experienced in classifying data, data mining and sentiment analysis. Irrelevant and redundant attributes have also had a negative impact on the complexity and operation of algorithms for classifying data. Consequently, the algorithms record poor results or performance. Some existing work use all attributes for classification, some of which are insignificant for the task, thereby leading to poor performance. This paper therefore develops a hybrid filter model for feature selection based on principal component analysis and information gain. The hybrid model is then applied to support classification using machine learning techniques e.g. the Naive Bayes technique. Experimental results demonstrate that the hybrid filter model reduces data dimensions, selects appropriate feature sets, and reduces training time, hence providing better classification performance as measured by accuracy, precision and recall..

...read moreread less

100 citations

Journal Article•DOI•

A novel decomposition-ensemble learning framework for multi-step ahead wind energy forecasting

[...]

Ramon Gomes da Silva¹, Matheus Henrique Dal Molin Ribeiro¹, Matheus Henrique Dal Molin Ribeiro², Sinvaldo Rodrigues Moreno¹, Viviana Cocco Mariani³, Viviana Cocco Mariani¹, Leandro dos Santos Coelho³, Leandro dos Santos Coelho¹ - Show less +4 more•Institutions (3)

Pontifícia Universidade Católica do Paraná¹, Federal University of Technology - Paraná², Federal University of Paraná³

01 Feb 2021-Energy

TL;DR: The decomposition-ensemble learning model is an efficient and accurate model for wind energy forecasting, and outperform the CEEMD, STACK, and single models in all forecasting horizons, with a performance improvement that ranges 0.06%–97.53%.

...read moreread less

98 citations

Journal Article•DOI•

A data-driven Bayesian network learning method for process fault diagnosis

[...]

Md. Tanjin Amin¹, Faisal Khan¹, Salim Ahmed¹, Syed Imtiaz¹•Institutions (1)

St. John's University¹

01 Jun 2021-Process Safety and Environmental Protection

TL;DR: A data-driven methodology for fault detection and diagnosis by integrating the principal component analysis (PCA) with the Bayesian network (BN) and a combination of vine copula and Bayes’ theorem suggests that the proposed framework provides superior performance.

...read moreread less

86 citations

Journal Article•DOI•

Scaled PCA: A New Approach to Dimension Reduction

[...]

Dashan Huang¹, Fuwei Jiang², Kunpeng Li³, Guoshi Tong⁴, Guofu Zhou⁵ - Show less +1 more•Institutions (5)

Singapore Management University¹, Central University of Finance and Economics², Capital University of Economics and Business³, Fudan University⁴, Washington University in St. Louis⁵

01 Jun 2021-Management Science

TL;DR: This paper proposes a novel supervised learning technique for forecasting: scaled principal component analysis (sPCA), and shows that, under some appropriate conditions on data, the sPCA forecast beats the PCA forecast, and when these conditions break down, extensive simulations indicate that the s PCA still has a large chance to outperform thePCA.

...read moreread less

Abstract: This paper proposes a novel supervised learning technique for forecasting: scaled principal component analysis (sPCA). The sPCA improves the traditional principal component analysis (PCA) by scalin...

...read moreread less

74 citations

Journal Article•DOI•

Exploiting dimensionality reduction and neural network techniques for the development of expert brain–computer interfaces

[...]

Muhammad Tariq Sadiq¹, Xiaojun Yu¹, Zhaohui Yuan¹•Institutions (1)

Northwestern Polytechnical University¹

01 Feb 2021-Expert Systems With Applications

TL;DR: Empirical wavelet transform (EWT) helped to explore the hidden patterns of MI tasks by decomposing EEG data into different modes and regularization parameter tuning of NCA guaranteed to improve classification performance with significant features for each subject.

...read moreread less

Abstract: Background: Analysis and classification of extensive medical data (e.g. electroencephalography (EEG) signals) is a significant challenge to develop effective brain–computer interface (BCI) system. Therefore, it is necessary to build automated classification framework to decode different brain signals. Methods: In the present study, two-step filtering approach is utilize to achieve resilience towards cognitive and external noises. Then, empirical wavelet transform (EWT) and four data reduction techniques; principal component analysis (PCA), independent component analysis (ICA), linear discriminant analysis (LDA) and neighborhood component analysis (NCA) are first time integrated together to explore dynamic nature and pattern mining of motor imagery (MI) EEG signals. Specifically, EWT helped to explore the hidden patterns of MI tasks by decomposing EEG data into different modes where every mode was consider as a feature vector in this study and each data reduction technique have been applied to all these modes to reduce the dimension of huge feature matrix. Moreover, an automated correlation-based components/coefficients selection criteria and parameter tuning were implemented for PCA, ICA, LDA, and NCA respectively. For the comparison purposes, all the experiments were performed on two publicly available datasets (BCI competition III dataset IVa and IVb). The performance of the experiments was verified by decoding three different channel combination strategies along with several neural networks. The regularization parameter tuning of NCA guaranteed to improve classification performance with significant features for each subject. Results: The experimental results revealed that NCA provides an average sensitivity, specificity, accuracy, precision, F1 score and kappa-coefficient of 100% for subject dependent case whereas 93%, 93%, 92.9%, 93%, 96.4% and 90% for subject independent case respectively. All the results were obtained with artificial neural networks, cascade-forward neural networks and multilayer perceptron neural networks (MLP) for subject dependent case while with MLP for subject independent case by utilizing 7 channels out of total 118. Such an increase in results can alleviate users to explain more clearly their MI activities. For instance, physically impaired person will be able to manage their wheelchair quite effectively, and rehabilitated persons may be able to improve their activities.

...read moreread less

71 citations

Journal Article•DOI•

A Review of Principal Component Analysis Algorithm for Dimensionality Reduction

[...]

Basna Mohammed Salih Hasan, Adnan Mohsin Abdulazeez

15 Apr 2021

TL;DR: This review will start by introducing the basic ideas of Principal Component Analysis, describe some concepts related to (PCA), and describe some of the main components of the method, and discuss what it can do.

...read moreread less

Abstract: Big databases are increasingly widespread and are therefore hard to understand, in exploratory biomedicine science, big data in health research is highly exciting because data-based analyses can travel quicker than hypothesis-based research. Principal Component Analysis (PCA) is a method to reduce the dimensionality of certain datasets. Improves interpretability but without losing much information. It achieves this by creating new covariates that are not related to each other. Finding those new variables, or what we call the main components, will reduce the eigenvalue /eigenvectors solution problem. (PCA) can be said to be an adaptive data analysis technology because technology variables are developed to adapt to different data types and structures. This review will start by introducing the basic ideas of (PCA), describe some concepts related to (PCA), and discussing. What it can do, and reviewed fifteen articles of (PCA) that have been introduced and published in the last three years.

...read moreread less

68 citations

Journal Article•DOI•

Information-theoretic feature selection with segmentation-based folded principal component analysis (PCA) for hyperspectral image classification

[...]

Md. Palash Uddin¹, Md. Al Mamun², Masud Ibn Afjal¹, Md. Ali Hossain²•Institutions (2)

Hajee Mohammad Danesh Science & Technology University¹, Rajshahi University of Engineering & Technology²

01 Jan 2021-International Journal of Remote Sensing

TL;DR: An information theoretic normalized Mutual Information (nMI)-based minimum Redundancy Maximum Relevance (mRMR) non-linear measure to select the intrinsic features from the transformed space of the previously proposed Segmented-Folded-PCA (Seg-Fol- PCA) and Spectrally Segmenting-Folding-PC a (SSeg-FOL-PCa) FE methods.

...read moreread less

Abstract: Hyperspectral image (HSI) usually holds information of land cover classes as a set of many contiguous narrow spectral wavelength bands. For its efficient thematic mapping or classification, band (f...

...read moreread less

68 citations

Journal Article•DOI•

Exploration of Principal Component Analysis: Deriving Principal Component Analysis Visually Using Spectra.

[...]

J. Renwick Beattie, Francis W. L. Esmonde-White

22 Jan 2021-Applied Spectroscopy

TL;DR: In this article, the authors trace the journey of the spectra themselves through the operations behind principal component analysis, with each step illustrated by simulated spectra, and each PC is shown to be a successive refinement of the estimated principal component, improving the fit between PC reconstructed data and the original data.

...read moreread less

Abstract: Spectroscopy rapidly captures a large amount of data that is not directly interpretable. Principal component analysis is widely used to simplify complex spectral datasets into comprehensible information by identifying recurring patterns in the data with minimal loss of information. The linear algebra underpinning principal component analysis is not well understood by many applied analytical scientists and spectroscopists who use principal component analysis. The meaning of features identified through principal component analysis is often unclear. This manuscript traces the journey of the spectra themselves through the operations behind principal component analysis, with each step illustrated by simulated spectra. Principal component analysis relies solely on the information within the spectra, consequently the mathematical model is dependent on the nature of the data itself. The direct links between model and spectra allow concrete spectroscopic explanation of principal component analysis , such as the scores representing "concentration" or "weights". The principal components (loadings) are by definition hidden, repeated and uncorrelated spectral shapes that linearly combine to generate the observed spectra. They can be visualized as subtraction spectra between extreme differences within the dataset. Each PC is shown to be a successive refinement of the estimated spectra, improving the fit between PC reconstructed data and the original data. Understanding the data-led development of a principal component analysis model shows how to interpret application specific chemical meaning of the principal component analysis loadings and how to analyze scores. A critical benefit of principal component analysis is its simplicity and the succinctness of its description of a dataset, making it powerful and flexible.

...read moreread less

63 citations

Journal Article•DOI•

Qualitative and quantitative recognition method of drug-producing chemicals based on SnO2 gas sensor with dynamic measurement and PCA weak separation

[...]

Hanyang Ji¹, Wenbo Qin¹, Zhenyu Yuan¹, Fanli Meng¹•Institutions (1)

Northeastern University (China)¹

01 Dec 2021-Sensors and Actuators B-chemical

TL;DR: The qualitative and quantitative recognition of various drug- producing chemicals had been realized, which is a new way of on-line rapid sensor detection for drug-producing chemicals.

...read moreread less

Abstract: With the rampant drug crime, the detection of drug-producing chemicals has put forward the great demand of multi-type and multi-concentration on-line rapid detection. The rapid development of dynamic measurement for semiconductor gas sensors provides a solution to this problem. However, the mutual blend of type and concentration information negatively affects sensor performance. In this paper, principal component analysis (PCA) was used for weak separation of type and concentration; k-Nearest Neighbor (KNN) was used for qualitative recognition; polynomial regression was used for quantitative recognition. The physical meaning of the dynamic response signal after PCA transformation was first proposed: PC1 has a weak concentration meaning; the combination of PC2, PC3, and PC4 has a weak type meaning. Based on the weak separation, the stepwise recognition method of qualitative classification and quantitative regression was first used to improve the recognition rate, the resolution and the generalization performance of the sensor. Using the inverse transformation of PCA, the principle of PCA and the method of ideal data verified the feasibility of this method. The qualitative and quantitative recognition of various drug-producing chemicals had been realized, which is a new way of on-line rapid sensor detection for drug-producing chemicals.

...read moreread less

59 citations

Journal Article•DOI•

Improving the Accuracy for Analyzing Heart Diseases Prediction Based on the Ensemble Method

[...]

Xiao-Yan Gao, Abdelmegeid Amin Ali¹, Hassan Shaban Hassan¹, Eman M. Anwar¹•Institutions (1)

Minia University¹

10 Feb 2021-Complexity

TL;DR: In this article, bagging ensemble learning method with decision tree has achieved the best performance in predicting heart disease, which is the deadliest disease and one of leading causes of death worldwide.

...read moreread less

Abstract: Heart disease is the deadliest disease and one of leading causes of death worldwide. Machine learning is playing an essential role in the medical side. In this paper, ensemble learning methods are used to enhance the performance of predicting heart disease. Two features of extraction methods: linear discriminant analysis (LDA) and principal component analysis (PCA), are used to select essential features from the dataset. The comparison between machine learning algorithms and ensemble learning methods is applied to selected features. The different methods are used to evaluate models: accuracy, recall, precision, F-measure, and ROC.The results show the bagging ensemble learning method with decision tree has achieved the best performance.

...read moreread less

Journal Article•DOI•

Intelligent Mechanical Fault Diagnosis Using Multi-Sensor Fusion and Convolution Neural Network

[...]

Tingli Xie¹, Xufeng Huang², Seung-Kyum Choi¹•Institutions (2)

Georgia Institute of Technology¹, University of Michigan²

04 Aug 2021-IEEE Transactions on Industrial Informatics

TL;DR: In this article, a novel intelligent diagnosis method based on multisensor fusion (MSF) and convolutional neural network (CNN) is explored and shows that the proposed method outperforms other DL-based methods in terms of accuracy.

...read moreread less

Abstract: Diagnosis of mechanical faults in manufacturing systems is critical for ensuring safety and saving costs. With the development of data transmission and sensor technologies, measuring systems can acquire massive amounts of multi-sensor data. Although Deep-Learning (DL) provides an end-to-end way to address the drawbacks of traditional methods, it is necessary to do deep research on an intelligent fault diagnosis method based on Multi-Sensor Data. In this project, a novel intelligent diagnosis method based on Multi-Sensor Fusion (MSF) and Convolutional Neural Network (CNN) is explored. Firstly, a Multi-Signals-to-RGB-Image conversion method based on Principal Component Analysis (PCA) is applied to fuse multi-signal data into three-channel RGB images. Then, an improved CNN with residual networks is proposed, which can balance the relationship between computational cost and accuracy. Two datasets are used to verify the effectiveness of the proposed method. The results show the proposed method outperforms other DL-based methods in terms of accuracy.

...read moreread less

Journal Article•DOI•

Principal Component Analysis: A Natural Approach to Data Exploration

[...]

Felipe L. Gewers¹, Gustavo Rodrigues Ferreira¹, Henrique Ferraz de Arruda¹, Filipi Nascimento Silva¹, Cesar H. Comin², Diego R. Amancio¹, Luciano da Fontoura Costa¹ - Show less +3 more•Institutions (2)

University of São Paulo¹, Federal University of São Carlos²

22 May 2021-ACM Computing Surveys

TL;DR: In this paper, the potential of using PCA for dimensionality reduction is illustrated on several real-world datasets, and several theoretical and practical aspects of PCA are reported in an accessible and integrated manner.

...read moreread less

Abstract: Principal component analysis (PCA) is often applied for analyzing data in the most diverse areas. This work reports, in an accessible and integrated manner, several theoretical and practical aspects of PCA. The basic principles underlying PCA, data standardization, possible visualizations of the PCA results, and outlier detection are subsequently addressed. Next, the potential of using PCA for dimensionality reduction is illustrated on several real-world datasets. Finally, we summarize PCA-related approaches and other dimensionality reduction techniques. All in all, the objective of this work is to assist researchers from the most diverse areas in using and interpreting PCA.

...read moreread less

Journal Article•DOI•

Recursive Correlative Statistical Analysis Method with Sliding Windows for Incipient Fault Detection

[...]

Yihao Qin¹, Yayun Yan², Hongquan Ji², Youqing Wang¹•Institutions (2)

Beijing University of Chemical Technology¹, Shandong University of Science and Technology²

07 Apr 2021-IEEE Transactions on Industrial Electronics

TL;DR: A recursive algorithm is proposed in this article, which has been shown to have less calculation complexity and a randomized algorithm to determine the width of the sliding window.

...read moreread less

Abstract: This paper proposes a new combination of correlative statistical analysis and the sliding window technique to detect incipient faults. Compared with the existing monitoring methods based on principal component analysis and transformed component analysis, the combination fully uses the information from the process and quality variables. The sliding window, however, inevitably increases the computation burden due to the repeated window calculations. Therefore, a recursive algorithm is proposed in this paper, which has been shown to have less calculation complexity. Furthermore, a randomized algorithm is proposed to determine the width of sliding window. A numerical example and the thermal power plant process are presented to show the effectiveness and advantages of the proposed method.

...read moreread less

Journal Article•DOI•

Effective Random Forest-Based Fault Detection and Diagnosis for Wind Energy Conversion Systems

[...]

Radhia Fezai, Khaled Dhibi, Majdi Mansouri¹, Mohamed Trabelsi, Mansour Hajji², Kais Bouzrara, Hazem Nounou¹, Mohamed Nounou¹ - Show less +4 more•Institutions (2)

Texas A&M University at Qatar¹, University of Kairouan²

01 Mar 2021-IEEE Sensors Journal

TL;DR: This paper proposes four improved RF methods that aim to reduce at first the amount of the training data and select the first kernel principal components using different kernel principal component analysis (PCA) based dimensionality reduction schemes.

...read moreread less

Abstract: Random Forest (RF) is one of the mostly used machine learning techniques in fault detection and diagnosis of industrial systems. However, its implementation suffers from certain drawbacks when considering the correlations between variables. In addition, to perform a fault detection and diagnosis, the classical RF only uses the raw data by the direct use of measured variables. The direct raw data could yield to poor performance due to the data redundancies and noises. Thus, this paper proposes four improved RF methods to overcome the above-mentioned limitations. The developed methods aim to reduce at first the amount of the training data and select the first kernel principal components (KPCs) using different kernel principal component analysis (PCA) based dimensionality reduction schemes. Then, the retained KPCs are fed to the RF classifier for fault diagnosis purposes. Finally, the proposed techniques are applied to a wind energy conversion (WEC) system. Different case studies were investigated in order to illustrate the effectiveness and robustness of the developed techniques compared to the state-of-the-art methods. The obtained results show the low computation time and high diagnosis accuracy of the proposed approaches (an average accuracy of 91%).

...read moreread less

Journal Article•DOI•

Spiked separable covariance matrices and principal components

[...]

Xiucai Ding¹, Fan Yang²•Institutions (2)

Duke University¹, University of Pennsylvania²

01 Apr 2021-Annals of Statistics

TL;DR: The convergence of the outlier eigenvalues and eigenvectors of the spiked separable covariance model $\widetilde{\mathcal{Q}}_1$ and the generalized components (i.e. the principal components) are proved with optimal convergence rates.

...read moreread less

Abstract: We study a class of separable sample covariance matrices of the form Q˜1:=A˜1/2XB˜X∗A˜1/2. Here, A˜ and B˜ are positive definite matrices whose spectrums consist of bulk spectrums plus several spikes, that is, larger eigenvalues that are separated from the bulks. Conceptually, we call Q˜1 a spiked separable covariance matrix model. On the one hand, this model includes the spiked covariance matrix as a special case with B˜=I. On the other hand, it allows for more general correlations of datasets. In particular, for spatio-temporal dataset, A˜ and B˜ represent the spatial and temporal correlations, respectively. In this paper, we study the outlier eigenvalues and eigenvectors, that is, the principal components, of the spiked separable covariance model Q˜1. We prove the convergence of the outlier eigenvalues λ˜i and the generalized components (i.e., ⟨v,ξ˜i⟩ for any deterministic vector v) of the outlier eigenvectors ξ˜i with optimal convergence rates. Moreover, we also prove the delocalization of the nonoutlier eigenvectors. We state our results in full generality, in the sense that they also hold near the so-called BBP transition and for degenerate outliers. Our results highlight both the similarity and difference between the spiked separable covariance matrix model and the spiked covariance matrix model in (Probab. Theory Related Fields 164 (2016) 459–552). In particular, we show that the spikes of both A˜ and B˜ will cause outliers of the eigenvalue spectrum, and the eigenvectors can help to select the outliers that correspond to the spikes of A˜ (or B˜).

...read moreread less

Journal Article•DOI•

Deep learning with long short-term memory neural networks combining wavelet transform and principal component analysis for daily urban water demand forecasting

[...]

Baigang Du¹, Qiliang Zhou¹, Jun Guo¹, Shunsheng Guo¹, Lei Wang¹ - Show less +1 more•Institutions (1)

Wuhan University of Technology¹

01 Jun 2021-Expert Systems With Applications

TL;DR: Wang et al. as discussed by the authors proposed a hybrid long short-term memory model combining with discrete wavelet transform (DWT) and principal component analysis (PCA) pre-processing techniques for water demand forecasting, i.e., DWT-PCA-LSTM.

...read moreread less

Abstract: A reliable and accurate urban water demand forecasting plays a significant role in building intelligent water supplying system and smart city. Due to the high frequency noise and complicated relationships in water demand series, forecasting the urban water demand is not an easy task. In order to improve the model’s abilities in handling the complex patterns and catching the peaks in time series, we propose a hybrid long short-term memory model combining with discrete wavelet transform (DWT) and principal component analysis (PCA) pre-processing techniques for water demand forecasting, i.e., DWT-PCA-LSTM. First, the outliers of water demand series are identified and smoothed by 3σ criterion and weighted average method, respectively. Then, the noise component of water demand series is eliminated by DWT method and the principal components (PCs) among influencing factors of water demand are selected by PCA method. In addition, two LSTM networks are built to yield the daily urban water demand predictions using the results of DWT and PCA techniques. At last, the superiorities of the proposed model are demonstrated by comparing with the other benchmark predictive models. The water demand from 2016 to 2020 of a waterworks located in Suzhou, China is used for the experiment. The predictive performance of the experiments are evaluated by the mean absolute percentage error (MAPE), mean absolute percentage errors of peaks (pMAPE), explain variance score (EVS) and correlation coefficient (R). The results show that the DWT-PCA-LSTM model outperforms the other models and has satisfactory performance both in catching the peaks and the average prediction accuracy.

...read moreread less

Journal Article•DOI•

Brain age prediction: A comparison between machine learning models using region- and voxel-based morphometric data.

[...]

Lea Baecker¹, Jessica Dafflon¹, Pedro F. da Costa¹, Rafael Garcia-Dias¹, Sandra Vieira¹, Cristina Scarpazza¹, Cristina Scarpazza², Vince D. Calhoun³, Vince D. Calhoun⁴, João Ricardo Sato⁵, Andrea Mechelli¹, Walter H. L. Pinaya⁵, Walter H. L. Pinaya¹ - Show less +9 more•Institutions (5)

King's College London¹, University of Padua², Georgia State University³, Georgia Institute of Technology⁴, Universidade Federal do ABC⁵

19 Mar 2021-Human Brain Mapping

TL;DR: In this paper, the authors compared the performance of support vector regression, relevance vector regression and Gaussian process regression on whole-brain region-based or voxel-based structural magnetic resonance imaging data with or without dimensionality reduction through principal component analysis.

...read moreread less

Abstract: Brain morphology varies across the ageing trajectory and the prediction of a person's age using brain features can aid the detection of abnormalities in the ageing process. Existing studies on such "brain age prediction" vary widely in terms of their methods and type of data, so at present the most accurate and generalisable methodological approach is unclear. Therefore, we used the UK Biobank data set (N = 10,824, age range 47-73) to compare the performance of the machine learning models support vector regression, relevance vector regression and Gaussian process regression on whole-brain region-based or voxel-based structural magnetic resonance imaging data with or without dimensionality reduction through principal component analysis. Performance was assessed in the validation set through cross-validation as well as an independent test set. The models achieved mean absolute errors between 3.7 and 4.7 years, with those trained on voxel-level data with principal component analysis performing best. Overall, we observed little difference in performance between models trained on the same data type, indicating that the type of input data had greater impact on performance than model choice. All code is provided online in the hope that this will aid future research.

...read moreread less

Journal Article•DOI•

Spectral-Spatial and Superpixelwise PCA for Unsupervised Feature Extraction of Hyperspectral Imagery

[...]

Xin Zhang¹, Xinwei Jiang¹, Junjun Jiang², Yongshan Zhang¹, Xiaobo Liu¹, Zhihua Cai¹ - Show less +2 more•Institutions (2)

China University of Geosciences (Wuhan)¹, Harbin Institute of Technology²

26 Feb 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A novel spectral–spatial and SuperPCA (S3-PCA) is proposed to learn the effective and low-dimensional features of HSIs and demonstrates the superiority of the proposed method over the state-of-the-art methods.

...read moreread less

Abstract: As the most classical unsupervised dimension reduction algorithm, principal component analysis (PCA) has been widely used in hyperspectral images (HSIs) preprocessing and analysis tasks. Recently proposed superpixelwise PCA (SuperPCA) has shown promising accuracy where superpixels segmentation technique was first used to segment an HSI to various homogeneous regions and then PCA was adopted in each superpixel block to extract the local features. However, the local features could be ineffective due to the neglect of global information especially in some small homogeneous regions and/or in some large homogeneous regions with mixed ground truth objects. In this article, a novel spectral-spatial and SuperPCA (S³-PCA) is proposed to learn the effective and low-dimensional features of HSIs. Inspired by SuperPCA we further adopt superpixels-based local reconstruction to filter the HSIs and use the PCA-based global features as the supplement of local features. It turns out that the global-local and spectral-spatial features can be well exploited. Specifically, each pixel of an HSI is reconstructed by the nearest neighbors' pixels in the same superpixel block, which could eliminate the noise and enhance the spatial information adaptively. After the local reconstruction-based data preprocessing, PCA is performed on each region and the entire HSI to obtain local and global features, respectively. Then we simply concatenate them to get the global-local and spectral-spatial features for HSIs classification. The experimental results on two HSIs data sets demonstrate the superiority of the proposed method over the state-of-the-art methods. The source code of the proposed model is available at https://github.com/XinweiJiang/S3-PCA.

...read moreread less

Journal Article•DOI•

Evaluation and improvement of energy consumption prediction models using principal component analysis based feature reduction

[...]

Tarannom Parhizkar¹, Elham Rafieipour², Aram Parhizkar³•Institutions (3)

Sharif University of Technology¹, Shahid Chamran University of Ahvaz², University of Tabriz³

10 Jan 2021-Journal of Cleaner Production

TL;DR: A preprocessing method to remove noisy features is coupled with predication methods to improve the performance of the energy consumption prediction models and shows that the proposed method enables practitioners to efficiently acquire a smart dataset from any big dataset for energy consumption predictions problems.

...read moreread less

Journal Article•DOI•

Three-Order Tensor Creation and Tucker Decomposition for Infrared Small-Target Detection

[...]

Mingjing Zhao¹, Wei Li¹, Lu Li², Pengge Ma³, Zhaoquan Cai⁴, Ran Tao¹ - Show less +2 more•Institutions (4)

Beijing Institute of Technology¹, Beijing Information Science & Technology University², Zhengzhou University³, Huizhou University⁴

18 Feb 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: An effective method based on three-order tensor creation and Tucker decomposition (TCTD) is proposed, which detects targets with various brightness, spatial sizes, and intensities and ensures that targets can be preserved on the remaining minor principal components.

...read moreread less

Abstract: Existing infrared small-target detection methods tend to perform unsatisfactorily when encountering complex scenes, mainly due to the following: 1) the infrared image itself has a low signal-to-noise ratio (SNR) and insufficient detailed/texture knowledge; 2) spatial and structural information is not fully excavated. To avoid these difficulties, an effective method based on three-order tensor creation and Tucker decomposition (TCTD) is proposed, which detects targets with various brightness, spatial sizes, and intensities. In the proposed TCTD, multiple morphological profiles, i.e., diverse attributes and different shapes of trees, are designed to create three-order tensors, which can exploit more spatial and structural information to make up for lacking detailed/texture knowledge. Then, Tucker decomposition is employed, which is capable of estimating and eliminating the major principal components (i.e., most of the background) from three dimensions. Thus, targets can be preserved on the remaining minor principal components. Image contrast is further enhanced by fusing the detection maps of multiple morphological profiles and several groups with discontinuous pruning values. Extensive experiments validated on two synthetic data and six real data sets demonstrate the effectiveness and robustness of the proposed TCTD.

...read moreread less

Journal Article•DOI•

New Nonlinear Approach for Process Monitoring: Neural Component Analysis

[...]

Zhijiang Lou¹, Youqing Wang²•Institutions (2)

Shenzhen Polytechnic¹, Shandong University of Science and Technology²

13 Jan 2021-Industrial & Engineering Chemistry Research

TL;DR: For handling the nonlinearity problem, this paper combines artificial neural networks (ANN) with principal component analysis (PCA) and pro-parameter analysis ( PCA) to solve the non linearity problem.

...read moreread less

Abstract: Nonlinearity is extremely common in industrial processes. For handling the nonlinearity problem, this paper combines artificial neural networks (ANN) with principal component analysis (PCA) and pro...

...read moreread less

Journal Article•DOI•

Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes

[...]

Peter Richtárik, Martin Takáč¹, Selin Damla Ahipasaoglu²•Institutions (2)

Lehigh University¹, University of Southampton²

01 Sep 2021-Optimization and Engineering

TL;DR: In this article, the authors consider 8 different optimization formulations for computing a single sparse loading vector; these are obtained by combining the following factors: they employ two norms for measuring variance (L2, L1) and two sparsityinducing norms (L0, L 1), which are used in two different ways (constraint, penalty) and give a unifying reformulation which is solved via a natural alternating maximization (AM) method.

...read moreread less

Abstract: Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse loading vector; these are obtained by combining the following factors: we employ two norms for measuring variance (L2, L1) and two sparsity-inducing norms (L0, L1), which are used in two different ways (constraint, penalty). Three of our formulations, notably the one with L0 constraint and L1 variance, have not been considered in the literature. We give a unifying reformulation which we propose to solve via a natural alternating maximization (AM) method. We show the the AM

...read moreread less

Journal Article•DOI•

Imputation of missing well log data by random forest and its uncertainty analysis

[...]

Runhai Feng¹, Dario Grana², Niels Balling¹•Institutions (2)

Aarhus University¹, University of Wyoming²

01 Jul 2021-Computers & Geosciences

TL;DR: Well log data from the Volve Field are used for validation of the prediction obtained by random forest, in which a high correlation coefficient between prediction and reference is achieved.

...read moreread less

Journal Article•DOI•

Battery state-of-health modelling by multiple linear regression

[...]

Søren B. Vilsen, Daniel-Ioan Stroe

25 Mar 2021-Journal of Cleaner Production

TL;DR: The work presented in this paper aims to reduce the amount of data which needs to be transmitted by the extraction of descriptive features of the voltage, and then reducing the number of features.

...read moreread less

Journal Article•DOI•

Water Quality Prediction and Classification Based on Principal Component Regression and Gradient Boosting Classifier Approach

[...]

Md. Saikat Islam Khan¹, David Schmidtz², Nazrul Islam¹, Jia Uddin³, Sifatul Islam¹, Mostofa Kamal Nasir¹ - Show less +2 more•Institutions (3)

Mawlana Bhashani Science and Technology University¹, Queensland University of Technology², Woosong University³

14 Jun 2021-Journal of King Saud University - Computer and Information Sciences

TL;DR: A water quality prediction model utilizing the principal component regression technique and the Gradient Boosting Classifier method, which show credible performance compared with the state-of-art models.

...read moreread less

Journal Article•DOI•

[...]

Jianfang Jiao¹, Weiting Zhen¹, Wenxiang Zhu¹, Guang Wang¹•Institutions (1)

North China Electric Power University¹

01 Sep 2021-IEEE Transactions on Industrial Informatics

TL;DR: An efficient method of kernel sample equivalence replacement is established to replace the partial differential operations of the kernel gradient algorithm, which can convert nonlinear fault detection indicators into the standard quadratic forms of the original variable sample, thereby making it possible to solve the non linear fault diagnosis problem by linear manners.

...read moreread less

Abstract: This article is devoted to solving the problem of quality-related root cause diagnosis for nonlinear process. First, an orthogonal kernel principal component regression model is constructed to achieve orthogonal decomposition of feature space, such that quality-related and quality-unrelated faults can be separately detected in the subspaces of opposite correlations to the output, without any effect on each other. Then, in view of the high complexity of traditional nonlinear fault diagnosis methods, an efficient method of kernel sample equivalence replacement is established to replace the partial differential operations of the kernel gradient algorithm, which can convert nonlinear fault detection indicators into the standard quadratic forms of the original variable sample, thereby making it possible to solve the nonlinear fault diagnosis problem by linear manners. Furthermore, a transfer entropy algorithm is utilized to the new model to analyze the causality between the diagnosed candidate faulty variables to find out the accurate root cause of the fault. Finally, comparative studies between the latest result and the proposed one are carried out in the Tennessee Eastman process to verify the effectiveness and superiority of the new method.

...read moreread less

Journal Article•DOI•

SpaSSA: Superpixelwise Adaptive SSA for Unsupervised Spatial-Spectral Feature Extraction in Hyperspectral Image.

[...]

Genyun Sun¹, Hang Fu¹, Jinchang Ren², Aizhu Zhang¹, Jaime Zabalza³, Xiuping Jia⁴, Huimin Zhao - Show less +3 more•Institutions (4)

China University of Petroleum¹, Robert Gordon University², University of Strathclyde³, University of New South Wales⁴

09 Sep 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: In this paper, a superpixel-wise adaptive singular spectrum analysis (SpaSSA) was proposed for hyperspectral image (HSI) feature extraction, where the SSA and 2D-SSA are combined and adaptively applied to each superpixel derived from an oversegmented HSI.

...read moreread less

Abstract: Singular spectral analysis (SSA) has recently been successfully applied to feature extraction in hyperspectral image (HSI), including conventional (1-D) SSA in spectral domain and 2-D SSA in spatial domain. However, there are some drawbacks, such as sensitivity to the window size, high computational complexity under a large window, and failing to extract joint spectral-spatial features. To tackle these issues, in this article, we propose superpixelwise adaptive SSA (SpaSSA), that is superpixelwise adaptive SSA for exploiting local spatial information of HSI. The extraction of local (instead of global) features, particularly in HSI, can be more effective for characterizing the objects within an image. In SpaSSA, conventional SSA and 2-D SSA are combined and adaptively applied to each superpixel derived from an oversegmented HSI. According to the size of the derived superpixels, either SSA or 2-D singular spectrum analysis (2D-SSA) is adaptively applied for feature extraction, where the embedding window in 2D-SSA is also adaptive to the size of the superpixel. Experimental results on the three datasets have shown that the proposed SpaSSA outperforms both SSA and 2D-SSA in terms of classification accuracy and computational complexity. By combining SpaSSA with the principal component analysis (SpaSSA-PCA), the accuracy of land-cover analysis can be further improved, outperforming several state-of-the-art approaches.

...read moreread less

Journal Article•DOI•

On overview of PCA application strategy in processing high dimensionality forensic data

[...]

Loong Chuen Lee¹, Abdul Aziz Jemain¹•Institutions (1)

National University of Malaysia¹

01 Oct 2021-Microchemical Journal

TL;DR: Light is shed on the versatility and capability of PCA in processing high-dimensional data in general and specifically on its potential in forensic studies to help other research communities interested in unravelling latent structure from high- dimensional data.

...read moreread less

Journal Article•DOI•

Low-complexity point cloud denoising for LiDAR by PCA-based dimension reduction

[...]

Yao Duan¹, Chuanchuan Yang¹, Chen Hao, Weizhen Yan, Hongbin Li¹ - Show less +1 more•Institutions (1)

Peking University¹

01 Mar 2021-Optics Communications

TL;DR: A new noise reduction method to filter LiDAR point clouds, i.e. an adaptive clustering method based on principal component analysis (PCA), which derives low computational complexity, effectively removing noises while retaining details of environmental features.

...read moreread less

Collapse