scispace - formally typeset
Search or ask a question
Journal ArticleDOI

PCA-based unsupervised feature extraction for gene expression analysis of COVID-19 patients.

30 Aug 2021-Scientific Reports (Springer Science and Business Media LLC)-Vol. 11, Iss: 1, pp 17351-17351
TL;DR: In this paper, the authors applied principal component-analysis-based unsupervised feature extraction (PCAUFE) to the RNA expression profiles of 16 COVID-19 patients and 18 healthy control subjects.
Abstract: Coronavirus disease 2019 (COVID-19) is raging worldwide. This potentially fatal infectious disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, the complete mechanism of COVID-19 is not well understood. Therefore, we analyzed gene expression profiles of COVID-19 patients to identify disease-related genes through an innovative machine learning method that enables a data-driven strategy for gene selection from a data set with a small number of samples and many candidates. Principal-component-analysis-based unsupervised feature extraction (PCAUFE) was applied to the RNA expression profiles of 16 COVID-19 patients and 18 healthy control subjects. The results identified 123 genes as critical for COVID-19 progression from 60,683 candidate probes, including immune-related genes. The 123 genes were enriched in binding sites for transcription factors NFKB1 and RELA, which are involved in various biological phenomena such as immune response and cell survival: the primary mediator of canonical nuclear factor-kappa B (NF-κB) activity is the heterodimer RelA-p50. The genes were also enriched in histone modification H3K36me3, and they largely overlapped the target genes of NFKB1 and RELA. We found that the overlapping genes were downregulated in COVID-19 patients. These results suggest that canonical NF-κB activity was suppressed by H3K36me3 in COVID-19 patient blood.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
01 Jan 2023-Sensors
TL;DR: In this paper , a comprehensive review of artificial intelligence with specific attention to machine learning, deep learning, image processing, object detection, image segmentation, and few-shot learning studies were utilized in several tasks related to the COVID-19 pandemic.
Abstract: Artificial intelligence has significantly enhanced the research paradigm and spectrum with a substantiated promise of continuous applicability in the real world domain. Artificial intelligence, the driving force of the current technological revolution, has been used in many frontiers, including education, security, gaming, finance, robotics, autonomous systems, entertainment, and most importantly the healthcare sector. With the rise of the COVID-19 pandemic, several prediction and detection methods using artificial intelligence have been employed to understand, forecast, handle, and curtail the ensuing threats. In this study, the most recent related publications, methodologies and medical reports were investigated with the purpose of studying artificial intelligence’s role in the pandemic. This study presents a comprehensive review of artificial intelligence with specific attention to machine learning, deep learning, image processing, object detection, image segmentation, and few-shot learning studies that were utilized in several tasks related to COVID-19. In particular, genetic analysis, medical image analysis, clinical data analysis, sound analysis, biomedical data classification, socio-demographic data analysis, anomaly detection, health monitoring, personal protective equipment (PPE) observation, social control, and COVID-19 patients’ mortality risk approaches were used in this study to forecast the threatening factors of COVID-19. This study demonstrates that artificial-intelligence-based algorithms integrated into Internet of Things wearable devices were quite effective and efficient in COVID-19 detection and forecasting insights which were actionable through wide usage. The results produced by the study prove that artificial intelligence is a promising arena of research that can be applied for disease prognosis, disease forecasting, drug discovery, and to the development of the healthcare sector on a global scale. We prove that artificial intelligence indeed played a significantly important role in helping to fight against COVID-19, and the insightful knowledge provided here could be extremely beneficial for practitioners and research experts in the healthcare domain to implement the artificial-intelligence-based systems in curbing the next pandemic or healthcare disaster.

9 citations

Journal ArticleDOI
TL;DR: Wearable sensors hold great potential in empowering personalized health monitoring, predictive analytics, and timely intervention toward personalized healthcare as discussed by the authors , and a comprehensive review of wearable sweat sensors and outlines state-of-theart technologies and research that strive to bridge these gaps.
Abstract: Wearable sensors hold great potential in empowering personalized health monitoring, predictive analytics, and timely intervention toward personalized healthcare. Advances in flexible electronics, materials science, and electrochemistry have spurred the development of wearable sweat sensors that enable the continuous and noninvasive screening of analytes indicative of health status. Existing major challenges in wearable sensors include: improving the sweat extraction and sweat sensing capabilities, improving the form factor of the wearable device for minimal discomfort and reliable measurements when worn, and understanding the clinical value of sweat analytes toward biomarker discovery. This review provides a comprehensive review of wearable sweat sensors and outlines state-of-the-art technologies and research that strive to bridge these gaps. The physiology of sweat, materials, biosensing mechanisms and advances, and approaches for sweat induction and sampling are introduced. Additionally, design considerations for the system-level development of wearable sweat sensing devices, spanning from strategies for prolonged sweat extraction to efficient powering of wearables, are discussed. Furthermore, the applications, data analytics, commercialization efforts, challenges, and prospects of wearable sweat sensors for precision medicine are discussed.

7 citations

Journal ArticleDOI
01 Jan 2021-PLOS ONE
TL;DR: The authors applied principal component analysis (PCA) to the daily time series of the COVID-19 death cases and confirmed cases for the top 25 countries from April of 2020 to February of 2021.
Abstract: The COVID-19 is one of the worst pandemics in modern history. We applied principal component analysis (PCA) to the daily time series of the COVID-19 death cases and confirmed cases for the top 25 countries from April of 2020 to February of 2021. We calculated the eigenvalues and eigenvectors of the cross-correlation matrix of the changes in daily accumulated data over monthly time windows. The largest eigenvalue describes the overall evolution dynamics of the COVID-19 and indicates that evolution was faster in April of 2020 than in any other period. By using the first two PC coefficients, we can identify the group dynamics of the COVID-19 evolution. We observed groups under critical states in the loading plot and found that American and European countries are represented by strong clusters in the loading plot. The first PC plays an important role and the correlations (C1) between the normalized logarithmic changes in deaths or confirmed cases and the first PCs may be used as indicators of different phases of the COVID-19. By varying C1 over time, we identified different phases of the COVID-19 in the analyzed countries over the target time period.

3 citations

Journal ArticleDOI
01 May 2022-Cancers
TL;DR: This report focused on the up-regulated genes of crucial cell signaling pathways, which are key hallmarks of unregulated cell division and apoptosis and analyzed the genes of the WNT pathway and seven cross-linked pathways that may explain the differences in aggressiveness among cancer types.
Abstract: Simple Summary Traditionally, chemotherapy has been approached through one-size-fits-all strategies. However, personalized oncology would allow a rational approach to chemotherapies. Classically, cancer diagnosis and prognosis are performed through mutation mapping, but this genomic approach has an indirect relationship with the disease since it is based on the results of statistics accumulated over time. By contrast, a strategy based on gene expression would enable figuring out the actual disease phenotype and focusing on its specific molecular targets. In previous reports, we paved the way in that direction by successively showing that targeting up-regulated hubs are a suitable strategy to forward a tumor toward cell death and that the number of proteins to be targeted is typically between 3 and 10 according to tumor aggressiveness. In this report, we focused on the up-regulated genes of crucial cell signaling pathways, which are key hallmarks of unregulated cell division and apoptosis. By principal component analysis, we identified the genes that most explain the aggressiveness among cancer types. We also identified the genes that maximized the classification between aggressive and mild cancers using the random forest algorithm. Finally, by mapping these genes on the human interactome, we showed that they were close neighbors. Abstract The main hallmarks of cancer include sustaining proliferative signaling and resisting cell death. We analyzed the genes of the WNT pathway and seven cross-linked pathways that may explain the differences in aggressiveness among cancer types. We divided six cancer types (liver, lung, stomach, kidney, prostate, and thyroid) into classes of high (H) and low (L) aggressiveness considering the TCGA data, and their correlations between Shannon entropy and 5-year overall survival (OS). Then, we used principal component analysis (PCA), a random forest classifier (RFC), and protein–protein interactions (PPI) to find the genes that correlated with aggressiveness. Using PCA, we found GRB2, CTNNB1, SKP1, CSNK2A1, PRKDC, HDAC1, YWHAZ, YWHAB, and PSMD2. Except for PSMD2, the RFC analysis showed a different list, which was CAD, PSMD14, APH1A, PSMD2, SHC1, TMEFF2, PSMD11, H2AFZ, PSMB5, and NOTCH1. Both methods use different algorithmic approaches and have different purposes, which explains the discrepancy between the two gene lists. The key genes of aggressiveness found by PCA were those that maximized the separation of H and L classes according to its third component, which represented 19% of the total variance. By contrast, RFC classified whether the RNA-seq of a tumor sample was of the H or L type. Interestingly, PPIs showed that the genes of PCA and RFC lists were connected neighbors in the PPI signaling network of WNT and cross-linked pathways.

2 citations

Journal ArticleDOI
TL;DR: In this paper , a principal component analysis (PCA)-based database is constructed to correlate surface images with empirically determined sub-surface structures, and from this database, the morphology of buried sub-sub-surface structure is determined only using surface topography.
Abstract: Abstract Empty space in germanium (ESG) or germanium-on-nothing (GON) are unique self-assembled germanium structures with multiscale cavities of various morphologies. Due to their simple fabrication process and high-quality crystallinity after self-assembly, they can be applied in various fields including micro-/nanoelectronics, optoelectronics, and precision sensors, to name a few. In contrast to their simple fabrication, inspection is intrinsically difficult due to buried structures. Today, ultrasonic atomic force microscopy and interferometry are some prevalent non-destructive 3-D imaging methods that are used to inspect the underlying ESG structures. However, these non-destructive characterization methods suffer from low throughput due to slow measurement speed and limited measurable thickness. To overcome these limitations, this work proposes a new methodology to construct a principal-component-analysis based database that correlates surface images with empirically determined sub-surface structures. Then, from this database, the morphology of buried sub-surface structure is determined only using surface topography. Since the acquisition rate of a single nanoscale surface micrograph is up to a few orders faster than a thorough 3-D sub-surface analysis, the proposed methodology benefits from improved throughput compared to current inspection methods. Also, an empirical destructive test essentially resolves the measurable thickness limitation. We also demonstrate the practicality of the proposed methodology by applying it to GON devices to selectively detect and quantitatively analyze surface defects. Compared to state-of-the-art deep learning-based defect detection schemes, our method is much effortlessly finetunable for specific applications. In terms of sub-surface analysis, this work proposes a fast, robust, and high-resolution methodology which could potentially replace the conventional exhaustive sub-surface inspection schemes.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Abstract: SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

83,420 citations

Journal ArticleDOI
01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

79,257 citations

Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

37,861 citations

Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations