scispace - formally typeset
Search or ask a question
Journal ArticleDOI

μHEM for identification of differentially expressed miRNAs using hypercuboid equivalence partition matrix

04 Sep 2013-BMC Bioinformatics (BioMed Central)-Vol. 14, Iss: 1, pp 266-266
TL;DR: The results on several microarray data sets demonstrate that the proposed method can bring a remarkable improvement on miRNA selection problem and is a potentially useful tool for exploration of miRNA expression data and identification of differentially expressed miRNAs worth further investigation.
Abstract: The miRNAs, a class of short approximately 22‐nucleotide non‐coding RNAs, often act post‐transcriptionally to inhibit mRNA expression. In effect, they control gene expression by targeting mRNA. They also help in carrying out normal functioning of a cell as they play an important role in various cellular processes. However, dysregulation of miRNAs is found to be a major cause of a disease. It has been demonstrated that miRNA expression is altered in many human cancers, suggesting that they may play an important role as disease biomarkers. Multiple reports have also noted the utility of miRNAs for the diagnosis of cancer. Among the large number of miRNAs present in a microarray data, a modest number might be sufficient to classify human cancers. Hence, the identification of differentially expressed miRNAs is an important problem particularly for the data sets with large number of miRNAs and small number of samples. In this regard, a new miRNA selection algorithm, called μHEM, is presented based on rough hypercuboid approach. It selects a set of miRNAs from a microarray data by maximizing both relevance and significance of the selected miRNAs. The degree of dependency of sample categories on miRNAs is defined, based on the concept of hypercuboid equivalence partition matrix, to measure both relevance and significance of miRNAs. The effectiveness of the new approach is demonstrated on six publicly available miRNA expression data sets using support vector machine. The.632+ bootstrap error estimate is used to minimize the variability and biasedness of the derived results. An important finding is that the μHEM algorithm achieves lowest B.632+ error rate of support vector machine with a reduced set of differentially expressed miRNAs on four expression data sets compare to some existing machine learning and statistical methods, while for other two data sets, the error rate of the μHEM algorithm is comparable with the existing techniques. The results on several microarray data sets demonstrate that the proposed method can bring a remarkable improvement on miRNA selection problem. The method is a potentially useful tool for exploration of miRNA expression data and identification of differentially expressed miRNAs worth further investigation.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The proposed prediction model provides an effective tool for DLB classification and predicted candidate target genes from the miRNAs, including 6 functional genes included in the DHA signaling pathway associated with DLB pathology.
Abstract: Dementia with Lewy bodies (DLB) is the second most common subtype of neurodegenerative dementia in humans following Alzheimer’s disease (AD). Present clinical diagnosis of DLB has high specificity and low sensitivity and finding potential biomarkers of prodromal DLB is still challenging. MicroRNAs (miRNAs) have recently received a lot of attention as a source of novel biomarkers. In this study, using serum miRNA expression of 478 Japanese individuals, we investigated potential miRNA biomarkers and constructed an optimal risk prediction model based on several machine learning methods: penalized regression, random forest, support vector machine, and gradient boosting decision tree. The final risk prediction model, constructed via a gradient boosting decision tree using 180 miRNAs and two clinical features, achieved an accuracy of 0.829 on an independent test set. We further predicted candidate target genes from the miRNAs. Gene set enrichment analysis of the miRNA target genes revealed 6 functional genes included in the DHA signaling pathway associated with DLB pathology. Two of them were further supported by gene-based association studies using a large number of single nucleotide polymorphism markers (BCL2L1: P = 0.012, PIK3R2: P = 0.021). Our proposed prediction model provides an effective tool for DLB classification. Also, a gene-based association test of rare variants revealed that BCL2L1 and PIK3R2 were statistically significantly associated with DLB.

24 citations


Cites methods from "μHEM for identification of differen..."

  • ...This final risk prediction model using μHEM algorithm achieved an accuracy of 0.803 on an independent test set when pre-selecting the top-ranked 330 miRNAs and three clinical features....

    [...]

  • ...Paul S, Maji P. muHEM for identification of differentially expressed miRNAs using hypercuboid equivalence partition matrix....

    [...]

  • ...We also constructed a GBDT risk prediction model using another feature selection algorithm, μHEM [23], publicly available at http://www....

    [...]

  • ...We also constructed a GBDT risk prediction model using another feature selection algorithm, μHEM [23], publicly available at http://www.isical.ac.in/~bibl/results/ mihem/mihem.html, and investigated whether this feature selection methodology can further improve the predictive ability of our model....

    [...]

  • ...Hyperparameter values in the final GBDT model when using μHEM algorithm....

    [...]

Journal ArticleDOI
TL;DR: The formulation enables the proposed method to extract required number of correlated features sequentially with lesser computational cost as compared to existing methods, and provides an efficient way to find optimum regularization parameters employed in CCA.
Abstract: One of the main problems associated with high dimensional multimodal real life data sets is how to extract relevant and significant features. In this regard, a fast and robust feature extraction algorithm, termed as FaRoC, is proposed, integrating judiciously the merits of canonical correlation analysis (CCA) and rough sets. The proposed method extracts new features sequentially from two multidimensional data sets by maximizing their relevance with respect to class label and significance with respect to already-extracted features. To generate canonical variables sequentially, an analytical formulation is introduced to establish the relation between regularization parameters and CCA. The formulation enables the proposed method to extract required number of correlated features sequentially with lesser computational cost as compared to existing methods. To compute both significance and relevance measures of a feature, the concept of hypercuboid equivalence partition matrix of rough hypercuboid approach is used. It also provides an efficient way to find optimum regularization parameters employed in CCA. The efficacy of the proposed FaRoC algorithm, along with a comparison with other existing methods, is extensively established on several real life data sets.

23 citations


Cites methods from "μHEM for identification of differen..."

  • ...It has been applied successfully for analyzing omics data [34], [45], [46]....

    [...]

Journal ArticleDOI
TL;DR: Results indicate that the integrated method presented is quite promising and may become a useful tool for identifying disease genes.
Abstract: One of the most important and challenging problems in functional genomics is how to select the disease genes. In this regard, the paper presents a new computational method to identify disease genes. It judiciously integrates the information of gene expression profiles and shortest path analysis of protein---protein interaction networks. While the $$f$$f-information based maximum relevance-maximum significance framework is used to select differentially expressed genes as disease genes using gene expression profiles, the functional protein association network is used to study the mechanism of diseases. An important finding is that some $$f$$f-information measures are shown to be effective for selecting relevant and significant genes from microarray data. Extensive experimental study on colorectal cancer establishes the fact that the genes identified by the integrated method have more colorectal cancer genes than the genes identified from the gene expression profiles alone, irrespective of any gene selection algorithm. Also, these genes have greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. The enrichment analysis of the obtained genes reveals to be associated with some of the important KEGG pathways. All these results indicate that the integrated method is quite promising and may become a useful tool for identifying disease genes.

12 citations


Cites methods from "μHEM for identification of differen..."

  • ...The f -MRMS algorithm judiciously integrates the merits of maximum relevancemaximum significance (MRMS) criterion (Maji and Paul 2011; Paul and Maji 2013a, b) and f -information measures....

    [...]

Journal ArticleDOI
TL;DR: This study presents an application of the RH-SAC algorithm on miRNA and mRNA expression data for identification of potential miRNA-mRNA modules and identified novel miRNA/mRNA interactions in colorectal cancer.
Abstract: Differences in the expression profiles of miRNAs and mRNAs have been reported in colorectal cancer. Nevertheless, information on important miRNA-mRNA regulatory modules in colorectal cancer is still lacking. In this regard, this study presents an application of the RH-SAC algorithm on miRNA and mRNA expression data for identification of potential miRNA-mRNA modules. First, a set of miRNA rules was generated using the RH-SAC algorithm. The mRNA targets of the selected miRNAs were identified using the miRTarBase database. Next, the expression values of target mRNAs were used to generate mRNA rules using the RH-SAC. Then all miRNA-mRNA rules have been integrated for generating networks. The RH-SAC algorithm unlike other existing methods selects a group of co-expressed miRNAs and mRNAs that are also differentially expressed. In total 17 miRNAs and 141 mRNAs were selected. The enrichment analysis of selected mRNAs revealed that our method selected mRNAs that are significantly associated with colorectal cancer. We identified novel miRNA/mRNA interactions in colorectal cancer. Through experiment, we could confirm that one of our discovered miRNAs, hsa-miR-93-5p, was significantly up-regulated in 75.8% CRC in comparison to their corresponding non-tumor samples. It could have the potential to examine colorectal cancer subtype specific unique miRNA/mRNA interactions.

9 citations

Journal ArticleDOI
TL;DR: A novel supervised regularized canonical correlation analysis, termed as CuRSaR, to extract relevant and significant features from multimodal high dimensional omics datasets by maximizing the relevance of extracted features with respect to sample categories and significance among them.
Abstract: Objective: This paper presents a novel supervised regularized canonical correlation analysis, termed as CuRSaR, to extract relevant and significant features from multimodal high dimensional omics datasets. Methods: The proposed method extracts a new set of features from two multidimensional datasets by maximizing the relevance of extracted features with respect to sample categories and significance among them. It integrates judiciously the merits of regularized canonical correlation analysis (RCCA) and rough hypercuboid approach. An analytical formulation, based on spectral decomposition, is introduced to establish the relation between canonical correlation analysis (CCA) and RCCA. The concept of hypercuboid equivalence partition matrix of rough hypercuboid is used to compute both relevance and significance of a feature. Significance: The analytical formulation makes the computational complexity of the proposed algorithm significantly lower than existing methods. The equivalence partition matrix offers an efficient way to find optimum regularization parameters employed in CCA. Results: The superiority of the proposed algorithm over other existing methods, in terms of computational complexity and classification accuracy, is established extensively on real life data.

9 citations


Cites methods from "μHEM for identification of differen..."

  • ...It has been applied successfully to feature selection and clustering [27] as well as to omics data analysis [26]–[30]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This study represents the first integrated analysis of miRNA expression, mRNA expression and genomic changes in human breast cancer and may serve as a basis for functional studies of the role of miRNAs in the etiology of breast cancer.
Abstract: MicroRNAs (miRNAs), a class of short non-coding RNAs found in many plants and animals, often act post-transcriptionally to inhibit gene expression. Here we report the analysis of miRNA expression in 93 primary human breast tumors, using a bead-based flow cytometric miRNA expression profiling method. Of 309 human miRNAs assayed, we identify 133 miRNAs expressed in human breast and breast tumors. We used mRNA expression profiling to classify the breast tumors as luminal A, luminal B, basal-like, HER2+ and normal-like. A number of miRNAs are differentially expressed between these molecular tumor subtypes and individual miRNAs are associated with clinicopathological factors. Furthermore, we find that miRNAs could classify basal versus luminal tumor subtypes in an independent data set. In some cases, changes in miRNA expression correlate with genomic loss or gain; in others, changes in miRNA expression are likely due to changes in primary transcription and or miRNA biogenesis. Finally, the expression of DICER1 and AGO2 is correlated with tumor subtype and may explain some of the changes in miRNA expression observed. This study represents the first integrated analysis of miRNA expression, mRNA expression and genomic changes in human breast cancer and may serve as a basis for functional studies of the role of miRNAs in the etiology of breast cancer. Furthermore, we demonstrate that bead-based flow cytometric miRNA expression profiling might be a suitable platform to classify breast cancer into prognostic molecular subtypes.

961 citations


"μHEM for identification of differen..." refers background in this paper

  • ...Recently, few studies are carried out to identify differentially expressed miRNAs [4-9]....

    [...]

  • ...Different statistical tests are also employed to identify differentially expressed miRNAs [1,4-8,17-20]....

    [...]

Journal ArticleDOI
TL;DR: Hierarchical clustering of the tumor samples by their miRNA expression accurately separated the carcinomas from the BPH samples and also further classified the carcinoma tumors according to their androgen dependence, indicating the potential of miRNAs as a novel diagnostic and prognostic tool for prostate cancer.
Abstract: MicroRNAs (miRNA) are small, endogenously expressed noncoding RNAs that negatively regulate expression of protein-coding genes at the translational level. Accumulating evidence, such as aberrant expression of miRNAs, suggests that they are involved in the development of cancer. They have been identified in various tumor types, showing that different sets of miRNAs are usually deregulated in different cancers. To identify the miRNA signature specific for prostate cancer, miRNA expression profiling of 6 prostate cancer cell lines, 9 prostate cancer xenografts samples, 4 benign prostatic hyperplasia (BPH), and 9 prostate carcinoma samples was carried out by using an oligonucleotide array hybridization method. Differential expression of 51 individual miRNAs between benign tumors and carcinoma tumors was detected, 37 of them showing down-regulation and 14 up-regulation in carcinoma samples, thus identifying those miRNAs that could be significant in prostate cancer development and/or growth. There was a significant trend (P=0.029) between the expression of miRNAs and miRNA locus copy number determined by array comparative genomic hybridization, indicating that genetic aberrations may target miRNAs. Hierarchical clustering of the tumor samples by their miRNA expression accurately separated the carcinomas from the BPH samples and also further classified the carcinoma tumors according to their androgen dependence (hormone naive versus hormone refractory), indicating the potential of miRNAs as a novel diagnostic and prognostic tool for prostate cancer.

959 citations


"μHEM for identification of differen..." refers background in this paper

  • ...In [57,58], it is shown that hsa-miR-143 expression is clearly down-regulated during prostate cancer progression....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a computationally simple variant of boosting, L2Boost, which is constructed from a functional gradient descent algorithm with the L2-loss function, is investigated in both regression and classification.
Abstract: This article investigates a computationally simple variant of boosting, L2Boost, which is constructed from a functional gradient descent algorithm with the L2-loss function. Like other boosting algorithms, L2Boost uses many times in an iterative fashion a prechosen fitting method, called the learner. Based on the explicit expression of refitting of residuals of L2Boost, the case with (symmetric) linear learners is studied in detail in both regression and classification. In particular, with the boosting iteration m working as the smoothing or regularization parameter, a new exponential bias-variance trade-off is found with the variance (complexity) term increasing very slowly as m tends to infinity. When the learner is a smoothing spline, an optimal rate of convergence result holds for both regression and classification and the boosted smoothing spline even adapts to higher-order, unknown smoothness. Moreover, a simple expansion of a (smoothed) 0–1 loss function is derived to reveal the importance of the d...

759 citations

Journal ArticleDOI
13 Mar 2008-Oncogene
TL;DR: It is found that changes in miRNA expression may have an important role in the biology of human prostate cancer, and widespread, but not universal, downregulation of miRNAs is found in clinically localized prostate cancer relative to benign peripheral zone tissue.
Abstract: MicroRNAs (miRNAs) are small regulatory RNAs that can regulate gene expression by binding to mRNA sequences and repressing target-gene expression post-transcriptionally, either by inhibiting translation or promoting RNA degradation. We have analysed expression of 328 known and 152 novel human miRNAs in 10 benign peripheral zone tissues and 16 prostate cancer tissues using microarrays and found widespread, but not universal, downregulation of miRNAs in clinically localized prostate cancer relative to benign peripheral zone tissue. These findings have been verified by real-time RT-PCR assays on select miRNAs, including miR-125b, miR-145 and let-7c. The downregulated miRNAs include several with proven target mRNAs whose proteins have been previously shown to be increased in prostate cancer by immunohistochemistry, including RAS, E2F3, BCL-2 and MCL-1. Using a bioinformatics approach, we have identified additional potential mRNA targets of one of the miRNAs, (miR-125b) that are upregulated in prostate cancer and confirmed increased expression of one of these targets, EIF4EBP1, in prostate cancer tissues. Our findings indicate that changes in miRNA expression may have an important role in the biology of human prostate cancer.

678 citations


"μHEM for identification of differen..." refers background in this paper

  • ...The down regulation of hsa-miR-145 is also mentioned in [52,53]....

    [...]

Journal ArticleDOI
TL;DR: The gene shaving method is a potentially useful tool for exploration of gene expression data and identification of interesting clusters of genes worth further investigation.
Abstract: Large gene expression studies, such as those conducted using DNA arrays, often provide millions of different pieces of data. To address the problem of analyzing such data, we describe a statistical method, which we have called 'gene shaving'. The method identifies subsets of genes with coherent expression patterns and large variation across conditions. Gene shaving differs from hierarchical clustering and other widely used methods for analyzing gene expression studies in that genes may belong to more than one cluster, and the clustering may be supervised by an outcome measure. The technique can be 'unsupervised', that is, the genes and samples are treated as unlabeled, or partially or fully supervised by using known properties of the genes or samples to assist in finding meaningful groupings. We illustrate the use of the gene shaving method to analyze gene expression measurements made on samples from patients with diffuse large B-cell lymphoma. The method identifies a small cluster of genes whose expression is highly predictive of survival. The gene shaving method is a potentially useful tool for exploration of gene expression data and identification of interesting clusters of genes worth further investigation.

618 citations


"μHEM for identification of differen..." refers background in this paper

  • ...In this regard, the Gap function [49] is generally used to know whether the obtained B1 error is smaller than that would be expected by chance, if the distribution of the classmembership label of the sample did not depend on its feature vector....

    [...]