scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Validation of miRNAs as Breast Cancer Biomarkers with a Machine Learning Approach.

26 Mar 2019-Cancers (Multidisciplinary Digital Publishing Institute)-Vol. 11, Iss: 3, pp 431
TL;DR: The validated importance of certain small noncoding microRNAs using a machine learning approach on miRNA expression data suggests that machine learning is a useful tool for functional studies of miRNAs for cancer detection and diagnosis.
Abstract: Certain small noncoding microRNAs (miRNAs) are differentially expressed in normal tissues and cancers, which makes them great candidates for biomarkers for cancer. Previously, a selected subset of miRNAs has been experimentally verified to be linked to breast cancer. In this paper, we validated the importance of these miRNAs using a machine learning approach on miRNA expression data. We performed feature selection, using Information Gain (IG), Chi-Squared (CHI2) and Least Absolute Shrinkage and Selection Operation (LASSO), on the set of these relevant miRNAs to rank them by importance. We then performed cancer classification using these miRNAs as features using Random Forest (RF) and Support Vector Machine (SVM) classifiers. Our results demonstrated that the miRNAs ranked higher by our analysis had higher classifier performance. Performance becomes lower as the rank of the miRNA decreases, confirming that these miRNAs had different degrees of importance as biomarkers. Furthermore, we discovered that using a minimum of three miRNAs as biomarkers for breast cancers can be as effective as using the entire set of 1800 miRNAs. This work suggests that machine learning is a useful tool for functional studies of miRNAs for cancer detection and diagnosis.
Citations
More filters
Journal ArticleDOI
15 May 2020-Sensors
TL;DR: The present work proposes a cervical cancer prediction model (CCPM) that offers early prediction of cervical cancer using risk factors as inputs and employs random forest (RF) as a classifier.
Abstract: Globally, cervical cancer remains as the foremost prevailing cancer in females. Hence, it is necessary to distinguish the importance of risk factors of cervical cancer to classify potential patients. The present work proposes a cervical cancer prediction model (CCPM) that offers early prediction of cervical cancer using risk factors as inputs. The CCPM first removes outliers by using outlier detection methods such as density-based spatial clustering of applications with noise (DBSCAN) and isolation forest (iForest) and by increasing the number of cases in the dataset in a balanced way, for example, through synthetic minority over-sampling technique (SMOTE) and SMOTE with Tomek link (SMOTETomek). Finally, it employs random forest (RF) as a classifier. Thus, CCPM lies on four scenarios: (1) DBSCAN + SMOTETomek + RF, (2) DBSCAN + SMOTE+ RF, (3) iForest + SMOTETomek + RF, and (4) iForest + SMOTE + RF. A dataset of 858 potential patients was used to validate the performance of the proposed method. We found that combinations of iForest with SMOTE and iForest with SMOTETomek provided better performances than those of DBSCAN with SMOTE and DBSCAN with SMOTETomek. We also observed that RF performed the best among several popular machine learning classifiers. Furthermore, the proposed CCPM showed better accuracy than previously proposed methods for forecasting cervical cancer. In addition, a mobile application that can collect cervical cancer risk factors data and provides results from CCPM is developed for instant and proper action at the initial stage of cervical cancer.

155 citations


Cites methods from "Validation of miRNAs as Breast Canc..."

  • ...Chi-squared feature selection is used to infer a feature’s reliance on the class label [37]....

    [...]

  • ...Several studies used chi-square as a feature extraction technique, such as in breast cancer [37], Parkinson’s disease using voice signal [39], cancer classification [38], computer-aided diagnosis of Parkinson’s disease [40], and healthcare tweet classification [41]....

    [...]

Journal ArticleDOI
TL;DR: Pd–Au synthetic alloys are reported for mass‐spectrometry‐based metabolic fingerprinting and analysis, toward medulloblastoma diagnosis and radiotherapy evaluation and will lead to the application‐driven development of novel materials with tailored structural design and establishment of new protocols for precision medicine in near future.
Abstract: Diagnostics is the key in screening and treatment of cancer. As an emerging tool in precision medicine, metabolic analysis detects end products of pathways, and thus is more distal than proteomic/genetic analysis. However, metabolic analysis is far from ideal in clinical diagnosis due to the sample complexity and metabolite abundance in patient specimens. A further challenge is real-time and accurate tracking of treatment effect, e.g., radiotherapy. Here, Pd-Au synthetic alloys are reported for mass-spectrometry-based metabolic fingerprinting and analysis, toward medulloblastoma diagnosis and radiotherapy evaluation. A core-shell structure is designed using magnetic core particles to support Pd-Au alloys on the surface. Optimized synthetic alloys enhance the laser desorption/ionization efficacy and achieve direct detection of 100 nL of biofluids in seconds. Medulloblastoma patients are differentiated from healthy controls with average diagnostic sensitivity of 94.0%, specificity of 85.7%, and accuracy of 89.9%, by machine learning of metabolic fingerprinting. Furthermore, the radiotherapy process of patients is monitored and a preliminary panel of serum metabolite biomarkers is identified with gradual changes. This work will lead to the application-driven development of novel materials with tailored structural design and establishment of new protocols for precision medicine in near future.

91 citations

Journal ArticleDOI
12 Dec 2019-Cancers
TL;DR: The Random Forest algorithm outperformed all other algorithms, with an accuracy of approximately 84% and an area under the curve (AUC) of 0.82 ± 0.10 for predicting the five years DFS.
Abstract: The prediction of tumor in the TNM staging (tumor, node, and metastasis) stage of colon cancer using the most influential histopathology parameters and to predict the five years disease-free survival (DFS) period using machine learning (ML) in clinical research have been studied here. From the colorectal cancer (CRC) registry of Chang Gung Memorial Hospital, Linkou, Taiwan, 4021 patients were selected for the analysis. Various ML algorithms were applied for the tumor stage prediction of the colon cancer by considering the Tumor Aggression Score (TAS) as a prognostic factor. Performances of different ML algorithms were evaluated using five-fold cross-validation, which is an effective way of the model validation. The accuracy achieved by the algorithms taking both cases of standard TNM staging and TNM staging with the Tumor Aggression Score was determined. It was observed that the Random Forest model achieved an F-measure of 0.89, when the Tumor Aggression Score was considered as an attribute along with the standard attributes normally used for the TNM stage prediction. We also found that the Random Forest algorithm outperformed all other algorithms, with an accuracy of approximately 84% and an area under the curve (AUC) of 0.82 ± 0.10 for predicting the five years DFS.

40 citations

Journal ArticleDOI
TL;DR: It is demonstrated that the Fisher score followed by the application of the MCC algorithm can accurately identify hub genes in HCC and has similar performance to the WGCNA and random forest algorithms.
Abstract: This study aimed to select the feature genes of hepatocellular carcinoma (HCC) with the Fisher score algorithm and to identify hub genes with the Maximal Clique Centrality (MCC) algorithm. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed to examine the enrichment of terms. Gene set enrichment analysis (GSEA) was used to identify the classes of genes that are overrepresented. Following the construction of a protein-protein interaction network with the feature genes, hub genes were identified with the MCC algorithm. The Kaplan–Meier plotter was utilized to assess the prognosis of patients based on expression of the hub genes. The feature genes were closely associated with cancer and the cell cycle, as revealed by GO, KEGG and GSEA enrichment analyses. Survival analysis showed that the overexpression of the Fisher score–selected hub genes was associated with decreased survival time (P < 0.05). Weighted gene co-expression network analysis (WGCNA), Lasso, ReliefF and random forest were used for comparison with the Fisher score algorithm. The comparison among these approaches showed that the Fisher score algorithm is superior to the Lasso and ReliefF algorithms in terms of hub gene identification and has similar performance to the WGCNA and random forest algorithms. Our results demonstrated that the Fisher score followed by the application of the MCC algorithm can accurately identify hub genes in HCC.

34 citations

Journal ArticleDOI
26 Apr 2021-Analyst
TL;DR: In this article, an electrochemical sensor is fabricated using a nanocomposite, consisting of graphene (GP), polypyrrole (PPY), and gold nanoparticles (AuNPs), modified onto a screen-printed carbon electrode (SPCE) to improve electron transfer properties and increase the degree of methylene blue (MB) intercalation for signal amplification.
Abstract: Numerous clinical studies suggest that microRNAs (miRNAs) are indicative biomolecules for the early diagnosis of cancer. This work aims to develop a cost-effective and label-free electrochemical biosensor to detect miRNA-21, a biomarker of breast cancer. An electrochemical sensor is fabricated using a nanocomposite, consisting of graphene (GP), polypyrrole (PPY) and gold nanoparticles (AuNPs), modified onto a screen-printed carbon electrode (SPCE) to improve electron transfer properties and increase the degree of methylene blue (MB) intercalation for signal amplification. The GP/PPY-modified electrode offers good electrochemical reactivity and high dispersibility of AuNPs, resulting in excellent sensor performance. Peak current of the MB redox process, which is proportional to miRNA-21 concentration on the electrode surface, is monitored by differential pulse voltammetry (DPV). Under optimal conditions, this sensor is operated by monitoring the MB signal response due to the amount of hybridization products between miRNA-21 target molecules and DNA-21 probes immobilized on the electrode. The proposed biosensor reveals a linear range from 1.0 fM to 1.0 nM with a low detection limit of 0.020 fM. In addition, the miRNA-21 biosensor provides good selectivity, high stability, and satisfactory reproducibility, which shows promising potential in clinical research and diagnostic applications.

32 citations

References
More filters
Journal ArticleDOI
09 Jun 2005-Nature
TL;DR: A new, bead-based flow cytometric miRNA expression profiling method is used to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers, and finds the miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours.
Abstract: Recent work has revealed the existence of a class of small non-coding RNA species, known as microRNAs (miRNAs), which have critical functions across various biological processes. Here we use a new, bead-based flow cytometric miRNA expression profiling method to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers. The miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours. We observe a general downregulation of miRNAs in tumours compared with normal tissues. Furthermore, we were able to successfully classify poorly differentiated tumours using miRNA expression profiles, whereas messenger RNA profiles were highly inaccurate when applied to the same samples. These findings highlight the potential of miRNA profiling in cancer diagnosis.

9,470 citations


"Validation of miRNAs as Breast Canc..." refers methods in this paper

  • ...[5] used hierarchical clustering on 73 bone marrow samples and determined that miRNA expression distinguishes tumors of different subtypes within acute lymphoblastic leukemia....

    [...]

Journal ArticleDOI
TL;DR: Two founding members of the microRNA family were originally identified in Caenorhabditis elegans as genes that were required for the timed regulation of developmental events and indicate the existence of multiple RISCs that carry out related but specific biological functions.
Abstract: MicroRNAs are a family of small, non-coding RNAs that regulate gene expression in a sequence-specific manner. The two founding members of the microRNA family were originally identified in Caenorhabditis elegans as genes that were required for the timed regulation of developmental events. Since then, hundreds of microRNAs have been identified in almost all metazoan genomes, including worms, flies, plants and mammals. MicroRNAs have diverse expression patterns and might regulate various developmental and physiological processes. Their discovery adds a new dimension to our understanding of complex gene regulatory networks.

6,282 citations

Journal ArticleDOI
TL;DR: The results indicate that miRNAs are extensively involved in cancer pathogenesis of solid tumors and support their function as either dominant or recessive cancer genes.
Abstract: Small noncoding microRNAs (miRNAs) can contribute to cancer development and progression and are differentially expressed in normal tissues and cancers From a large-scale miRnome analysis on 540 samples including lung, breast, stomach, prostate, colon, and pancreatic tumors, we identified a solid cancer miRNA signature composed by a large portion of overexpressed miRNAs Among these miRNAs are some with well characterized cancer association, such as miR-17-5p, miR-20a, miR-21, miR-92, miR-106a, and miR-155 The predicted targets for the differentially expressed miRNAs are significantly enriched for protein-coding tumor suppressors and oncogenes (P < 00001) A number of the predicted targets, including the tumor suppressors RB1 (Retinoblastoma 1) and TGFBR2 (transforming growth factor, beta receptor II) genes were confirmed experimentally Our results indicate that miRNAs are extensively involved in cancer pathogenesis of solid tumors and support their function as either dominant or recessive cancer genes

5,791 citations

Journal ArticleDOI
TL;DR: Detailed deletion and expression analysis shows that miR15 and miR16 are located within a 30-kb region of loss in CLL, and that both genes are deleted or down-regulated in the majority (≈68%) of CLL cases.
Abstract: Micro-RNAs (miR genes) are a large family of highly conserved noncoding genes thought to be involved in temporal and tissue-specific gene regulation MiRs are transcribed as short hairpin precursors (≈70 nt) and are processed into active 21- to 22-nt RNAs by Dicer, a ribonuclease that recognizes target mRNAs via base-pairing interactions Here we show that miR15 and miR16 are located at chromosome 13q14, a region deleted in more than half of B cell chronic lymphocytic leukemias (B-CLL) Detailed deletion and expression analysis shows that miR15 and miR16 are located within a 30-kb region of loss in CLL, and that both genes are deleted or down-regulated in the majority (≈68%) of CLL cases

5,113 citations


"Validation of miRNAs as Breast Canc..." refers background in this paper

  • ...[2] were among the first who established the relationship between miRNAs and cancers after discovering that mir15 and mir16 are deleted or down-regulated in a majority of chronic lymphocytic leukemia cases....

    [...]

Journal ArticleDOI
TL;DR: A basic taxonomy of feature selection techniques is provided, providing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.
Abstract: Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications. Contact: yvan.saeys@psb.ugent.be Supplementary information: http://bioinformatics.psb.ugent.be/supplementary_data/yvsae/fsreview

4,706 citations