scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 2023"


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an urban travel demand prediction model considering influencing factors (UTDP-IF), which used a principal component analysis algorithm to extract the principal components of different influencing factors for avoiding multicollinearity.
Abstract: Predicting urban travel demand is important in perceiving the future state of a city, deploying public transportation resources, and building intelligent cities. Influenced by multifarious factors, urban travel demand data have high-frequency noise and complex fluctuation patterns. Current studies have focused on predicting urban travel demand via various models. However, there is little work that comprehensively considers natural environmental factors and socioeconomic factors affecting urban travel demand. Some improvements are made in this work. First, multifarious influencing factors are taken into consideration. Second, a novel random forest-based method for influencing factor data preprocessing is introduced. Finally, this work proposes an urban travel demand prediction model considering influencing factors (UTDP-IF). A principal component analysis algorithm is used to extract the principal components of different influencing factors for avoiding multicollinearity. Based on four data sets, this work evaluates a UTDP-IF and compares it with some typical models. Compared with baselines, the root-mean-square error of the UTDF-IF is reduced by approximately 29.44% on average, which can perfectly predict urban travel demand.

23 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a tensor low-rank and sparse representation (TLRSR) method for hyperspectral anomaly detection, where a 3D TLR model is expanded to separate the LR background part represented by a tensorial background dictionary and corresponding coefficients.
Abstract: Recently, low-rank representation (LRR) methods have been widely applied for hyperspectral anomaly detection, due to their potentials in separating the backgrounds and anomalies. However, existing LRR models generally convert 3-D hyperspectral images (HSIs) into 2-D matrices, inevitably leading to the destruction of intrinsic 3-D structure properties in HSIs. To this end, we propose a novel tensor low-rank and sparse representation (TLRSR) method for hyperspectral anomaly detection. A 3-D TLR model is expanded to separate the LR background part represented by a tensorial background dictionary and corresponding coefficients. This representation characterizes the multiple subspace property of the complex LR background. Based on the weighted tensor nuclear norm and the $L_{F,1}$ sparse norm, a dictionary is designed to make its atoms more relevant to the background. Moreover, a principal component analysis (PCA) method can be assigned as one preprocessing step to exact a subset of HSI bands, retaining enough the HSI object information and reducing computational time of the postprocessing tensorial operations. The proposed model is efficiently solved by the well-designed alternating direction method of multipliers (ADMMs). A comparison with the existing algorithms via experiments establishes the competitiveness of the proposed method with the state-of-the-art competitors in the hyperspectral anomaly detection task.

23 citations


Journal ArticleDOI
TL;DR: In this paper , a novel image processing algorithm and multi-class support vector machine (SVM) were used to diagnose and classify grape leaf diseases, i.e., black measles, black rot, and leaf blight.
Abstract: Plant diseases often reduce crop yield and product quality; therefore, plant disease diagnosis plays a vital role in farmers’ management decisions. Visual crop inspections by humans are time-consuming and challenging tasks and, practically, can only be performed in small areas at a given time, especially since many diseases have similar symptoms. An intelligent machine vision monitoring system for automatic inspection can be a great help for farmers in this regard. Although many algorithms have been introduced for plant disease diagnosis in recent years, a simple method relying on minimal information from the images is of interest for field conditions. In this study, a novel image processing algorithm and multi-class support vector machine (SVM) were used to diagnose and classify grape leaf diseases, i.e., black measles, black rot, and leaf blight. The area of disease symptoms was separated from the healthy parts of the leaf utilizing K-means clustering automatically, and then the features were extracted in three color models, namely RGB, HSV, and l*a*b. As an efficient classification method, SVM was used in this study, where principal component analysis (PCA) was performed for feature dimension reduction. Finally, the most important features were selected by the relief feature selection. Gray-level co-occurrence matrix (GLCM) features resulted in an accuracy of 98.71%, while feature dimension reduction using PCA resulted in an accuracy of 98.97%. The proposed method was compared with two deep learning methods, i.e., CNN and GoogleNet, which achieved classification accuracies of 86.82% and 94.05%, respectively, while the processing time for the proposed method was significantly shorter than those of these models.

16 citations


Journal ArticleDOI
TL;DR: In this paper , a novel hybrid framework of optimized deep learning models combined with multi-sensor fusion is developed for condition diagnosis of concrete arch beam, where the vibration responses of structure are first processed by principal component analysis for dimensionality reduction and noise elimination.
Abstract: A novel hybrid framework of optimized deep learning models combined with multi-sensor fusion is developed for condition diagnosis of concrete arch beam. The vibration responses of structure are first processed by principal component analysis for dimensionality reduction and noise elimination. Then, the deep network based on stacked autoencoders (SAE) is established at each sensor for initial condition diagnosis, where extracted principal components and corresponding condition categories are inputs and output, respectively. To enhance diagnostic accuracy of proposed deep SAE, an enhanced whale optimization algorithm is proposed to optimize network meta-parameters. Eventually, Dempster-Shafer fusion algorithm is employed to combine initial diagnosis results from each sensor to make a final diagnosis. A miniature structural component of Sydney Harbour Bridge with artificial multiple progressive damages is tested in laboratory. The results demonstrate that the proposed method can detect structural damage accurately, even under the condition of limited sensors and high levels of uncertainties.

13 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a diagnosis method of non-severe depression based on cognitive behavior of emotional conflict, and four classifiers (nearest neighbor (KNN), support vector machine (SVM), kernel extreme learning machine (KELM), and random forest (RF)) were used to classify patients and normal subjects.
Abstract: To improve the diagnosis accuracy of non-severe depression (NSD), this article proposes a diagnosis method of NSD based on cognitive behavior of emotional conflict. First, the original classification features are constructed based on the cognitive behavior of emotional conflict and statistical distribution, and a classification normalization method is proposed to preprocess the feature data. Then, the relief algorithm and principal component analysis (PCA) are recruited for feature processing. Finally, four classifiers [ $k$ -nearest neighbor (KNN), support vector machine (SVM), kernel extreme learning machine (KELM), and random forest (RF)] are used to classify NSD patients and normal subjects. The test results show that among all the classifiers, RF achieves the highest classification sensitivity and specificity of 92% and 88%, respectively. Compared with the results of other NSD diagnosis methods in recent years, it has a better performance. The diagnostic method for NSD proposed in this article has obvious performance advantages and provides technical support for improving the accuracy of clinical depression diagnosis. Furthermore, it also provides a new idea and method for the diagnosis and screening of depression.

11 citations


Journal ArticleDOI
TL;DR: In this article , a folded-PCA and 2D singular spectral analysis (2DSSA) approach is proposed for spectral-spatial feature mining in hyperspectral images.
Abstract: The principal component analysis (PCA) and 2-D singular spectral analysis (2DSSA) are widely used for spectral domain and spatial domain feature extraction in hyperspectral images (HSI). However, PCA itself suffers from low efficacy if no spatial information is combined, whilst 2DSSA can extract the spatial information yet has a high computing complexity. As a result, we propose in this paper a PCA domain 2DSSA approach for spectral-spatial feature mining in HSI. Specifically, PCA and its variation, folded-PCA are utilized to fuse with the 2DSSA, as folded-PCA can extract both global and local spectral features. By applying 2DSSA only on a small number of PCA components, the overall computational complexity has been significantly reduced whilst preserving the discrimination ability of the features. In addition, with the effective fusion of spectral and spatial features, the proposed approach can work well on the uncorrected dataset without removing the noisy and water absorption bands, even under a small number of training samples. Experiments on two publicly available datasets have fully demonstrated the superiority of the proposed approach, in comparison to several state-of-the-art HSI classification methods and deep-learning models.

10 citations


Journal ArticleDOI
TL;DR: In this paper , the parallel nonlinear transformation (PNT) is proposed for decorrelating neutral vector variables, and the results demonstrate the superiority of PNT over PCA and ICA.
Abstract: As a typical non-Gaussian vector variable, a neutral vector variable contains nonnegative elements only, and its $l_{1}$ -norm equals one. In addition, its neutral properties make it significantly different from the commonly studied vector variables (e.g., the Gaussian vector variables). Due to the aforementioned properties, the conventionally applied linear transformation approaches [e.g., principal component analysis (PCA) and independent component analysis (ICA)] are not suitable for neutral vector variables, as PCA cannot transform a neutral vector variable, which is highly negatively correlated, into a set of mutually independent scalar variables and ICA cannot preserve the bounded property after transformation. In recent work, we proposed an efficient nonlinear transformation approach, i.e., the parallel nonlinear transformation (PNT), for decorrelating neutral vector variables. In this article, we extensively compare PNT with PCA and ICA through both theoretical analysis and experimental evaluations. The results of our investigations demonstrate the superiority of PNT for decorrelating the neutral vector variables.

9 citations


Journal ArticleDOI
TL;DR: In this paper , the authors proposed and related Alzheimer's disease early diagnostic method using Mild Cognitive Impairment (MCI), Structural Magnetic Resonance (sMR) imaging for AD-discrimination and healthy control participants (HC) with Import Vector Machine (IVM), Regularized Extreme Learning Machine (RELM) and a Support vector machine (SVM).

9 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an efficient hashing method for image copy detection using 2D-2D (two-directional two-dimensional) PCA (Principal Component Analysis).
Abstract: Image copy detection is an important technology of copyright protection. This paper proposes an efficient hashing method for image copy detection using 2D-2D (two-directional two-dimensional) PCA (Principal Component Analysis). The key is the discovery of the translation invariance of 2D-2D PCA. With the property of translation invariance, a novel model of extracting rotation-invariant low-dimensional features is designed by combining PCT (Polar Coordinate Transformation) and 2D-2D PCA. The PCT can convert an input rotated image to a translation matrix. Since the 2D-2D PCA is invariant to translation, the low-dimensional features learned from the translation matrix are rotation-invariant. Moreover, vector distances of low-dimensional features are stable to common digital operations and thus hash construction with the vector distances is of robustness and compactness. Three open image datasets are exploited to conduct various experiments for validating efficiencies of the proposed method. The results demonstrate that the proposed method is much better than some representative hashing methods in the performances of classification and copy detection.

7 citations


Journal ArticleDOI
07 Feb 2023-eLight
TL;DR: PCA-SIM as mentioned in this paper is based on the observation that the ideal phasor matrix of a SIM pattern is of rank one, leading to the low complexity, precise identification of noninteger pixel wave vector and pattern phase while rejecting components that are unrelated to the parameter estimation.
Abstract: Abstract Structured illumination microscopy (SIM) is one of the powerful super-resolution modalities in bioscience with the advantages of full-field imaging and high photon efficiency. However, artifact-free super-resolution image reconstruction requires precise knowledge about the illumination parameters. The sample- and environment-dependent on-the-fly experimental parameters need to be retrieved a posteriori from the acquired data, posing a major challenge for real-time, long-term live-cell imaging, where low photobleaching, phototoxicity, and light dose are a must. In this work, we present an efficient and robust SIM algorithm based on principal component analysis (PCA-SIM). PCA-SIM is based on the observation that the ideal phasor matrix of a SIM pattern is of rank one, leading to the low complexity, precise identification of noninteger pixel wave vector and pattern phase while rejecting components that are unrelated to the parameter estimation. We demonstrate that PCA-SIM achieves non-iteratively fast, accurate (below 0.01-pixel wave vector and 0.1 $$\%$$ % of 2 $$\pi$$ π relative phase under typical noise level), and robust parameter estimation at low SNRs, which allows real-time super-resolution imaging of live cells in complicated experimental scenarios where other state-of-the-art methods inevitably fail. In particular, we provide the open-source MATLAB toolbox of our PCA-SIM algorithm and associated datasets. The combination of iteration-free reconstruction, robustness to noise, and limited computational complexity makes PCA-SIM a promising method for high-speed, long-term, artifact-free super-resolution imaging of live cells.

7 citations


Journal ArticleDOI
TL;DR: In this article , the authors combined several data-driven methods, and the thermal runaway prognosis is realized by two steps, i.e., temperature prediction by the modified extreme gradient boosting (XGBoost) and then abnormality detection by the principal component analysis (PCA) and density-based spatial clustering of applications with noise (DBSCAN).
Abstract: Hundreds of thermal-runaway-induced battery fire accidents have been occurring to real-world electric vehicles (EVs) in recent years, exposing life to danger and causing property losses. Timely and fast battery thermal runaway prognosis is essential but restricted by limited parameters and complex influencing factors during real-world operation of EVs, i.e., environment, driving behavior, and weather. To cope with the issue, several data-driven methods are combined, and the thermal runaway prognosis is realized by two steps, i.e., temperature prediction by the modified extreme gradient boosting (XGBoost) and then abnormality detection by the principal component analysis (PCA) and density-based spatial clustering of applications with noise (DBSCAN). The XGBoost is modified and trained by data of real-world EVs to couple the influencing factors during the real-world operation of EVs. For parameter optimization, the “pretraining and adjacent grid optimizing method” (P-AGOM) and the “adjacent grid optimizing method” (AGOM) are proposed to achieve locally optimal hyperparameters for XGBoost and DBSCAN. Verified results showcase that the XGBoost-PCA-DBSCAN achieves accurate 5-min-forward temperature prediction, and the mean square errors (mses) of four seasons are only 0.0729, 0.0594, 0.0747, and 0.0523, respectively. By modification of XGBoost, the mse of temperature prediction is reduced by 31.2%. In addition, the 35-min-forward thermal runaway prognosis by the XGBoost-PCA-DBSCAN will provide the driver sufficient response time to minimize the loss of life and property.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a recursive innovational component statistical analysis (RICSA), which estimates the dynamic structure of the data, accurately divides the data space into dynamic components and innovational components.
Abstract: Fault detection has long been a hot research issue for industry. Many common algorithms such as principal component analysis, recursive transformed component statistical analysis and moments-based robust principal component analysis can deal with static processes only, whereas most industrial processes are dynamic. Therefore, dynamic principal component analysis and recursive dynamic transformed component statistical analysis have been proposed to deal with dynamic processes by expanding the dimensions. The computational complexity of these algorithms are greatly increased, and these algorithms cannot divide the data space accurately. In this paper, we propose a novel algorithm called recursive innovational component statistical analysis (RICSA), which estimates the dynamic structure of the data, accurately divides the data space into dynamic components and innovational components. In unsteady state process, the statistical characteristics of data will change, and RICSA can classify these characteristics into dynamic components by dividing the data space, instead of treating them as faults, thereby reducing the false alarm rate. Through a series of comparative experiments, especially on practical coal pulverizing system in the 1000-MW ultra-supercritical thermal power plant, Zhoushan Power Plant, we found the recursive innovational component statistical analysis to realize a higher accuracy rate and a lower false alarm rate and detection delay, which verifies its superiority. We also discuss the reduced computational complexity associated with the recursive innovational component statistical analysis. Note to Practitioners—Aiming at the dynamic processes, the recursive innovational component statistical analysis algorithm proposed in this paper can divide the data space into dynamic components and innovational components by estimating the dynamic structure of the data. In addition, in the monitoring process, computational complexity is also a key point. Compared with recursive dynamic transformed component statistical analysis, recursive innovational component statistical analysis has lower computational complexity and higher accuracy. After multiple sets of experiments, it can be verified that recursive innovational component statistical analysis has a great monitoring effect in the actual industrial process, and can provide early warning of faults.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a hybrid system based on the advantages of fused CNN models, which achieved promising results for diagnosing dermatoscopic images of the ISIC 2019 data set and distinguishing skin cancer from other skin lesions.
Abstract: Melanoma is one of the deadliest types of skin cancer that leads to death if not diagnosed early. Many skin lesions are similar in the early stages, which causes an inaccurate diagnosis. Accurate diagnosis of the types of skin lesions helps dermatologists save patients’ lives. In this paper, we propose hybrid systems based on the advantages of fused CNN models. CNN models receive dermoscopy images of the ISIC 2019 dataset after segmenting the area of lesions and isolating them from healthy skin through the Geometric Active Contour (GAC) algorithm. Artificial neural network (ANN) and Random Forest (Rf) receive fused CNN features and classify them with high accuracy. The first methodology involved analyzing the area of skin lesions and diagnosing their type early using the hybrid models CNN-ANN and CNN-RF. CNN models (AlexNet, GoogLeNet and VGG16) receive lesions area only and produce high depth feature maps. Thus, the deep feature maps were reduced by the PCA and then classified by ANN and RF networks. The second methodology involved analyzing the area of skin lesions and diagnosing their type early using the hybrid CNN-ANN and CNN-RF models based on the features of the fused CNN models. It is worth noting that the features of the CNN models were serially integrated after reducing their high dimensions by Principal Component Analysis (PCA). Hybrid models based on fused CNN features achieved promising results for diagnosing dermatoscopic images of the ISIC 2019 data set and distinguishing skin cancer from other skin lesions. The AlexNet-GoogLeNet-VGG16-ANN hybrid model achieved an AUC of 94.41%, sensitivity of 88.90%, accuracy of 96.10%, precision of 88.69%, and specificity of 99.44%.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors applied multivariate statistical methods and absolute principal component score-multiple linear regression (APCS-MLR) model to assess water quality of the lake and parse main pollution sources.

Posted ContentDOI
11 Jan 2023-bioRxiv
TL;DR: Principal component analysis and tensor decomposition-based unsupervised feature extraction with optimized standard deviation whose effectiveness toward differentially expressed gene (DEG) identification was recently recognized suggest that the proposed method is a promising candidate for standard methods for identifying DMCs.
Abstract: In contrast to RNA-seq analysis, which has various standard methods, no standard methods for identifying differentially methylated cytosines (DMCs) exist. To identify DMCs, we tested principal component analysis and tensor decomposition-based unsupervised feature extraction with optimized standard deviation, which has been shown to be effective for differentially expressed gene (DEG) identification. The proposed method outperformed certain conventional methods, including those that assume beta-binomial distribution for methylation as the proposed method does not require this, especially when applied to methylation profiles measured using high throughput sequencing. DMCs identified by the proposed method also significantly overlapped with various functional sites, including known differentially methylated regions, enhancers, and DNase I hypersensitive sites. The proposed method was applied to data sets retrieved from The Cancer Genome Atlas to identify DMCs using American Joint Committee on Cancer staging system edition labels. This suggests that the proposed method is a promising standard method for identifying DMCs.

Journal ArticleDOI
TL;DR: In this paper , the authors explored the reliability of the most commonly used countermovement jump (CMJ) metrics, and reduced a large pool of metrics with acceptable levels of reliability via principal component analysis to the significant factors capable of providing distinctive aspects of CMJ performance.
Abstract: The purpose of the present study was (i) to explore the reliability of the most commonly used countermovement jump (CMJ) metrics, and (ii) to reduce a large pool of metrics with acceptable levels of reliability via principal component analysis to the significant factors capable of providing distinctive aspects of CMJ performance. Seventy-nine physically active participants (thirty-seven females and forty-two males) performed three maximal CMJs while standing on a force platform. Each participant visited the laboratory on two occasions, separated by 24–48 h. The most reliable variables were performance variables (CV = 4.2–11.1%), followed by kinetic variables (CV = 1.6–93.4%), and finally kinematic variables (CV = 1.9–37.4%). From the 45 CMJ computed metrics, only 24 demonstrated acceptable levels of reliability (CV ≤ 10%). These variables were included in the principal component analysis and loaded a total of four factors, explaining 91% of the CMJ variance: performance component (variables responsible for overall jump performance), eccentric component (variables related to the breaking phase), concentric component (variables related to the upward phase), and jump strategy component (variables influencing the jumping style). Overall, the findings revealed important implications for sports scientists and practitioners regarding the CMJ-derived metrics that should be considered to gain a comprehensive insight into the biomechanical parameters related to CMJ performance.

Journal ArticleDOI
01 Jan 2023-Sensors
TL;DR: In this article , a normalized mutual information (NMI)-based band grouping strategy was applied to each band subgroup for intrinsic feature extraction, and the subspace of the most effective features was generated by the NMI-based minimum redundancy and maximum relevance (mRMR) FS criteria.
Abstract: A hyperspectral image (HSI), which contains a number of contiguous and narrow spectral wavelength bands, is a valuable source of data for ground cover examinations. Classification using the entire original HSI suffers from the “curse of dimensionality” problem because (i) the image bands are highly correlated both spectrally and spatially, (ii) not every band can carry equal information, (iii) there is a lack of enough training samples for some classes, and (iv) the overall computational cost is high. Therefore, effective feature (band) reduction is necessary through feature extraction (FE) and/or feature selection (FS) for improving the classification in a cost-effective manner. Principal component analysis (PCA) is a frequently adopted unsupervised FE method in HSI classification. Nevertheless, its performance worsens when the dataset is noisy, and the computational cost becomes high. Consequently, this study first proposed an efficient FE approach using a normalized mutual information (NMI)-based band grouping strategy, where the classical PCA was applied to each band subgroup for intrinsic FE. Finally, the subspace of the most effective features was generated by the NMI-based minimum redundancy and maximum relevance (mRMR) FS criteria. The subspace of features was then classified using the kernel support vector machine. Two real HSIs collected by the AVIRIS and HYDICE sensors were used in an experiment. The experimental results demonstrated that the proposed feature reduction approach significantly improved the classification performance. It achieved the highest overall classification accuracy of 94.93% for the AVIRIS dataset and 99.026% for the HYDICE dataset. Moreover, the proposed approach reduced the computational cost compared with the studied methods.


Journal ArticleDOI
TL;DR: In this paper , the authors used principal component analysis, k-means clustering, and convolutional neural networks to reconstruct the phase diagram of an interacting superconductor.
Abstract: The recent advances in machine learning algorithms have boosted the application of these techniques to the field of condensed matter physics, in order e.g. to classify the phases of matter at equilibrium or to predict the real-time dynamics of a large class of physical models. Typically in these works, a machine learning algorithm is trained and tested on data coming from the same physical model. Here we demonstrate that unsupervised and supervised machine learning techniques are able to predict phases of a non-exactly solvable model when trained on data of a solvable model. In particular, we employ a training set made by single-particle correlation functions of a non-interacting quantum wire and by using principal component analysis, k-means clustering, and convolutional neural networks we reconstruct the phase diagram of an interacting superconductor. We show that both the principal component analysis and the convolutional neural networks trained on the data of the non-interacting model can identify the topological phases of the interacting model with a high degree of accuracy. Our findings indicate that non-trivial phases of matter emerging from the presence of interactions can be identified by means of unsupervised and supervised techniques applied to data of non-interacting systems.

Journal ArticleDOI
TL;DR: In this paper , a Deep Learning Multi-Label Feature Extraction and Classification (ML-FEC) model based on pre-trained Convolutional Neural Network (CNN) architecture was proposed.
Abstract: Diabetic Retinopathy (DR) is the most common cause of eyesight loss that affects millions of people worldwide. Although there are recognized screening procedures for detecting the condition, such as fluorescein angiography and optical coherence tomography, the majority of patients are unaware and fail to have such tests at the proper time. Prompt identification of the condition is critical in avoiding vision loss, which occurs when Diabetes Mellitus (DM) is left untreated for an extended length of time. Several Machine Learning (ML) and Deep Learning (DL) algorithms have been used on DR datasets for disease prediction and classification, however, the majority of them have ignored the element of data pre-processing and dimensionality reduction, which are known as a major gap resulting in biased findings. In the first line of this research, data preprocessing was performed on the color Fundus Photographs (CFPs). Subsequently, we performed feature extraction with Principal Component Analysis (PCA). A Deep Learning Multi-Label Feature Extraction and Classification (ML-FEC) model based on pre-trained Convolutional Neural Network (CNN) architecture was proposed. Then, transfer learning was applied to train a subset of the images using three state-of-the-art CNN architectures, namely, ResNet50, ResNet152, and SqueezeNet1 with parameter-tuning to identify and classify the lesions. The experimental findings revealed an accuracy of 93.67% with a hamming loss of 0.0603 for ResNet 50, an accuracy of 91.94%, and Hamming Loss of 0.0805 for Squeezenet1 and an accuracy of 94.40% with Hamming loss of 0.0560 was achieved by ResNet 152 which demonstrates the suitability of the model for implementation in daily clinical practice and to support large scale DR screening programs.

Journal ArticleDOI
TL;DR: In this paper , the authors explored, compared and compared various face identification algorithms such as Linear Discriminant Analysis (LDA), Local Binary Pattern Histogram (LBPH), Principal Component Analysis (PCA), Elastic Bunch Graph Matching (EBGM), and neural networks.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a coefficient of variation (CV)-based feature selection method for stock prediction, which is widely used to obtain variability among data distributions and integrate an existing method, k-means algorithm, as well as proposed methods, median range and top-M, to select a set of features with specific characteristics such as features belonging to the largest cluster, the defined range, and with the highest CV values, respectively.
Abstract: Stock market forecasting has been a subject of interest for many researchers; the essential market analyses can be integrated with historical stock market data to derive a set of features. It is crucial to select features with useful information about the specific aspect. In this article, we propose coefficient of variation (CV)-based feature selection for stock prediction. The unitless statistical method, CV, is widely used to obtain variability among data distributions. We calculate CV for each feature and integrate an existing method, k-means algorithm, as well as proposed methods, median range and top-M, to select a set of features with specific characteristics such as features belonging to the largest cluster, the defined range, and with the highest CV values, respectively. We apply the set of selected features to models such as backpropagation neural network (BPNN), long short-term memory (LSTM), gated recurrent unit (GRU), and convolutional neural network (CNN) for stock price and trend prediction. We demonstrate the applicability of our proposed approach using five of the existing feature selection methods, namely, correlation coefficient, Chi2, mutual information, principal component analysis, and variance threshold; comparison indicates remarkable performance enhancement using several accuracy-based, as well as error-based, metrics and the same is statistically supported using Wilcoxon signed-rank test.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a structural damage detection method by combining the advantages of variational mode decomposition algorithm and kernel principal component analysis in the presence of environmental effects, where the spectral centroid feature corresponding to the statistical characteristics of the spectrum shape in the short-time Fourier transform is extracted from selected IMF components to construct the damage feature matrix.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors developed an efficient fault diagnostic scheme for battery packs using a novel sensor topology and signal processing procedure, where cross-cell voltages were measured to capture electrical abnormalities, and recursive correlation coefficients between adjacent voltages are calculated to embody system state.
Abstract: This article develops an efficient fault diagnostic scheme for battery packs using a novel sensor topology and signal processing procedure. Cross-cell voltages are measured to capture electrical abnormalities, and recursive correlation coefficients between adjacent voltages are calculated to embody system state. Then discrete wavelet packet transform is applied on the correlation sequences to extract diverse characteristic indexes, wherein the most representative components are refined as fault features by principal component analysis. Afterward, resorting to multiclass relevance vector machine, sparse classification models are constructed to cognize fault patterns, and accordingly, fault types and grades are evaluated. Common faults, including external and internal short-circuit, thermal abuse, and loose connection, are physically triggered on a series pack to acquire realistic data set. Experimental verifications under different conditions and algorithmic configurations suggest that the proposed diagnosis scheme can give accurate and reliable assessments on different fault specifics, with a fault isolation success rate of 84% and a fault severity grading success rate of 90%.

Journal ArticleDOI
13 Feb 2023-Aestimum
TL;DR: In this paper, a data-driven quantitative methodology to compute cultural performance indices of cities (C4 Index) and thus compare results derived by subjective and objective assessment methods within the case study of the Metropolitan City of Naples.
Abstract: Culture, creativity and circularity are driving forces for the transition of cities towards sustainable development models. This contribution proposes a data-driven quantitative methodology to compute cultural performance indices of cities (C4 Index) and thus compare results derived by subjective and objective assessment methods within the case study of the Metropolitan City of Naples. After data processing with Machine-Learning (ML) algorithms, two methods for weighting the indicators were compared: principal component analysis (PCA) and geographically weighted linear combination (WLC) with budget allocation. The results highlight similar trends among higher performance in seaside cities and lower levels in the inner areas, although some divergences between rankings. The proposed methodology was addressed to fill the research gap in comparing results obtained with different aggregation methods, allowing a choice consistent with the decision-making environment.

Journal ArticleDOI
TL;DR: In this article , different modeling methods were applied to establish the relationship between the near-infrared (NIR) spectra of the frozen samples and quality indicators of drip loss, texture parameters including hardness, chewiness, gumminess and gel strength, respectively.

Journal ArticleDOI
TL;DR: In this paper , an evolutionary machine learning (ML) approach was introduced to quantify the byproducts of the biomass-polymer co-pyrolysis process, where the input features were constructed using an innovative approach to reflect the physics of the process.

Journal ArticleDOI
TL;DR: In this article , a Shallow-to-Deep Feature Enhancement (SDFE) model with three modules based on Convolutional Neural Networks (CNNs) and VisionTransformer (ViT) is proposed.
Abstract: Since Hyperspectral Images (HSIs) contain plenty of ground object information, they are widely used in fine-grain classification of ground objects. However, some ground objects are similar and the number of spectral bands is far higher than the number of the ground object categories. Therefore, it is hard to deeply explore the spatial–spectral joint features with greater discrimination. To mine the spatial–spectral features of HSIs, a Shallow-to-Deep Feature Enhancement (SDFE) model with three modules based on Convolutional Neural Networks (CNNs) and Vision-Transformer (ViT) is proposed. Firstly, the bands containing important spectral information are selected using Principal Component Analysis (PCA). Secondly, a two-layer 3D-CNN-based Shallow Spatial–Spectral Feature Extraction (SSSFE) module is constructed to preserve the spatial and spectral correlations across spaces and bands at the same time. Thirdly, to enhance the nonlinear representation ability of the network and avoid the loss of spectral information, a channel attention residual module based on 2D-CNN is designed to capture the deeper spatial–spectral complementary information. Finally, a ViT-based module is used to extract the joint spatial–spectral features (SSFs) with greater robustness. Experiments are carried out on Indian Pines (IP), Pavia University (PU) and Salinas (SA) datasets. The experimental results show that better classification results can be achieved by using the proposed feature enhancement method as compared to other methods.

Journal ArticleDOI
13 Apr 2023-Machines
TL;DR: In this paper , a wavelet transform-based FNR (Fault to Noise Ratio) enhancement is realized to highlight incipient fault information and a Deep PCA (Principal Component Analysis)-based diagnosability analysis framework is proposed.
Abstract: In recent years, the data-driven based FDD (Fault Detection and Diagnosis) of high-speed train electric traction systems has made rapid progress, as the safe operation of traction system is closely related to the reliability and stability of high-speed trains. The internal complexity and external complexity of the environment mean that fault diagnosis of high-speed train traction system faces great challenges. In this paper, a wavelet transform-based FNR (Fault to Noise Ratio) enhancement is realised to highlight incipient fault information and a Deep PCA (Principal Component Analysis)-based diagnosability analysis framework is proposed. First, a scheme for FNR enhancement-based fault data preprocessing with selection of the intelligent decomposition levels and optimal noise threshold is proposed. Second, fault information enhancement technology based on continuous wavelet transform is proposed from the perspective of energy. Further, a Deep-PCA based incipient fault detectability and isolatability analysis are provided via geometric descriptions. Finally, experiments on the TDCS-FIB (Traction Drive Control System–Fault Injection Benchmark) platform fully demonstrate the effectiveness of the method proposed in this paper.

Journal ArticleDOI
TL;DR: In this paper , the authors used the MATLAB R2018a application with a total data set of 137 raw data, 137 ripe data and 136 rotten data, totaling 410 tempe image data.
Abstract: Tempe is one of the traditional foods in Indonesia which has nutritional content and benefits that are very much favored by all Indonesian people. To determine the maturity of tempe, it is generally done by fermenting it into tempeh using a certain temperature and usually tempe entrepreneurs are done traditionally. But in this way, tempe producers do not know what temperature and humidity are right for tempeh maturity. In this study, researchers used the MATLAB R2018a application with a total data set of 137 raw data, 137 ripe data and 136 rotten data, totaling 410 tempe image data. The purpose of this research is to produce a system that can detect the ripeness of tempe using the KNN (K-Nearest Neighbor) method which is equipped with GLCM texture feature extraction, with extraction of 8 color features, using the PCA (Principal Component Analysis) selection feature. And compare the results with the same method, namely KNN (K-Nearest Neighbor) without using the PCA (Principal Component Analysis) selection feature with the required running time between the two. KNN with PCA selection feature gets an average accuracy value of 80.63% and takes 1.06 seconds. Compared with the same method, namely KNN without using the selection feature, it gets an average accuracy value of 81.67% with a time of 1.18 seconds.