Rough sets and support vector machine for selecting differentially expressed miRNAs

doi:10.1109/BIBMW.2012.6470255

Home
/
Papers
/
Rough sets and support vector machine for selecting differentially expressed miRNAs

Proceedings Article•DOI•

Rough sets and support vector machine for selecting differentially expressed miRNAs

Sushmita Paul¹, Pradipta Maji¹•Institutions (1)

Indian Statistical Institute¹

04 Oct 2012-pp 864-871

TL;DR: A rough set based feature selection algorithm to select miRNAs from expression data that can classify tissue samples into their respective category with minimal error rate is presented.

read less

Abstract: The microRNAs, also known as miRNAs are, the class of small non-coding RNAs that repress the expression of a gene post-transcriptionally. In effect, they regulate expression of a gene or protein. It has been observed that they play an important role in various cellular processes and thus help in carrying out normal functioning of a cell. However, dysregulation of miRNAs is found to be a major cause of a disease. Various studies have also shown the role of miRNAs in cancer and utility of miRNAs for the diagnosis of cancer and other diseases. A large number of works have been conducted to identify differentially expressed miRNAs as unlike with mRNA expression, a modest number of miRNAs might be sufficient to classify human cancers. In this regard, this paper presents a rough set based feature selection algorithm to select miRNAs from expression data that can classify tissue samples into their respective category with minimal error rate. It selects a set of miRNAs by maximizing both the relevance and significance of miRNAs. The effectiveness of the rough set based algorithm, along with a comparison with other related algorithms, is demonstrated on three miRNA microarray expression data sets using the B.632+ bootstrap error rate of support vector machine.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A Study on the Relevance of Feature Selection Methods in Microarray Data

[...]

Barnali Sahu, Satchidananda Dehuri, Alok Kumar Jagadev

31 Jul 2018-The Open Bioinformatics Journal

TL;DR: A generic categorizing framework is proposed that systematically groups algorithms into categories based on search strategies and evaluation criteria and provides guidelines for selecting feature selection algorithms in general and in specific to the context of this study.

...read moreread less

Abstract: This paper studies the relevance of feature selection algorithms in microarray data for effective analysis. With no loss of generality, we present a list of feature selection algorithms and propose a generic categorizing framework that systematically groups algorithms into categories. The generic categorizing framework is based on search strategies and evaluation criteria. Further, it provides guidelines for selecting feature selection algorithms in general and in specific to the context of this study. In the context of microarray data analysis, the feature selection algorithms are classified into soft and non-soft computing categories. Their performance analysis with respect to microarray data analysis has been presented.

...read moreread less

23 citations

Cites methods from "Rough sets and support vector machi..."

...Redundant Gene selection using PSO(RGS-PSO) [123] – ▪ Redundant gene selection using PSO (RGS-PSO) is a novel approach....
[...]
...Rough set and SVM based [123] ▪ Rough set and MRMS for gene selection ▪ SVM for classification The MRMS selects a set of miRNAs having a lowest B....
[...]

Journal Article•DOI•

μHEM for identification of differentially expressed miRNAs using hypercuboid equivalence partition matrix

[...]

Sushmita Paul¹, Pradipta Maji¹•Institutions (1)

Indian Statistical Institute¹

04 Sep 2013-BMC Bioinformatics

TL;DR: The results on several microarray data sets demonstrate that the proposed method can bring a remarkable improvement on miRNA selection problem and is a potentially useful tool for exploration of miRNA expression data and identification of differentially expressed miRNAs worth further investigation.

...read moreread less

Abstract: The miRNAs, a class of short approximately 22‐nucleotide non‐coding RNAs, often act post‐transcriptionally to inhibit mRNA expression. In effect, they control gene expression by targeting mRNA. They also help in carrying out normal functioning of a cell as they play an important role in various cellular processes. However, dysregulation of miRNAs is found to be a major cause of a disease. It has been demonstrated that miRNA expression is altered in many human cancers, suggesting that they may play an important role as disease biomarkers. Multiple reports have also noted the utility of miRNAs for the diagnosis of cancer. Among the large number of miRNAs present in a microarray data, a modest number might be sufficient to classify human cancers. Hence, the identification of differentially expressed miRNAs is an important problem particularly for the data sets with large number of miRNAs and small number of samples. In this regard, a new miRNA selection algorithm, called μHEM, is presented based on rough hypercuboid approach. It selects a set of miRNAs from a microarray data by maximizing both relevance and significance of the selected miRNAs. The degree of dependency of sample categories on miRNAs is defined, based on the concept of hypercuboid equivalence partition matrix, to measure both relevance and significance of miRNAs. The effectiveness of the new approach is demonstrated on six publicly available miRNA expression data sets using support vector machine. The.632+ bootstrap error estimate is used to minimize the variability and biasedness of the derived results. An important finding is that the μHEM algorithm achieves lowest B.632+ error rate of support vector machine with a reduced set of differentially expressed miRNAs on four expression data sets compare to some existing machine learning and statistical methods, while for other two data sets, the error rate of the μHEM algorithm is comparable with the existing techniques. The results on several microarray data sets demonstrate that the proposed method can bring a remarkable improvement on miRNA selection problem. The method is a potentially useful tool for exploration of miRNA expression data and identification of differentially expressed miRNAs worth further investigation.

...read moreread less

14 citations

Cites methods from "Rough sets and support vector machi..."

...The theory of rough sets has also been successfully applied to microarray data analysis in [9,24-35]....
[...]

Book Chapter•DOI•

Rough Sets for Insilico Identification of Differentially Expressed miRNAs

[...]

Pradipta Maji¹, Sushmita Paul¹•Institutions (1)

Indian Statistical Institute¹

01 Jan 2014

TL;DR: This chapter presents a new approach for selecting miRNAs from microarray expression data that integrates the merit of rough set-based feature selection algorithm reported in Chap.

...read moreread less

Abstract: The microRNAs or miRNAs regulate expression of a gene or protein. It has been observed that they play an important role in various cellular processes and thus help in carrying out normal functioning of a cell. However, dysregulation of miRNAs is found to be a major cause of a disease. Various studies have also shown the role of miRNAs in cancer and utility of miRNAs for the diagnosis of cancer. In this regard, this chapter presents a new approach for selecting miRNAs from microarray expression data. It integrates the merit of rough set-based feature selection algorithm reported in Chap. 4 and theory of B.632+ bootstrap error rate. The effectiveness of the new approach, along with a comparison with other algorithms, is demonstrated on several miRNA data sets.

...read moreread less

8 citations

Journal Article•DOI•

Rough sets for in silico identification of differentially expressed miRNAs

[...]

Sushmita Paul¹, Pradipta Maji¹•Institutions (1)

Indian Statistical Institute¹

16 Sep 2013-International Journal of Nanomedicine

TL;DR: A novel approach for in silico identification of differentially expressed miRNAs from microarray expression data sets by integrating judiciously the theory of rough sets and merit of the so-called B.632+ bootstrap error estimate.

...read moreread less

Abstract: The microRNAs, also known as miRNAs, are the class of small noncoding RNAs. They repress the expression of a gene posttranscriptionally. In effect, they regulate expression of a gene or protein. It has been observed that they play an important role in various cellular processes and thus help in carrying out normal functioning of a cell. However, dysregulation of miRNAs is found to be a major cause of a disease. Various studies have also shown the role of miRNAs in cancer and the utility of miRNAs for the diagnosis of cancer and other diseases. Unlike with mRNAs, a modest number of miRNAs might be sufficient to classify human cancers. However, the absence of a robust method to identify differentially expressed miRNAs makes this an open problem. In this regard, this paper presents a novel approach for in silico identification of differentially expressed miRNAs from microarray expression data sets. It integrates judiciously the theory of rough sets and merit of the so-called B.632+ bootstrap error estimate. While rough sets select relevant and significant miRNAs from expression data, the B.632+ error rate minimizes the variability and bias of the derived results. The effectiveness of the proposed approach, along with a comparison with other related approaches, is demonstrated on several miRNA microarray expression data sets, using the support vector machine.

...read moreread less

7 citations

Journal Article•DOI•

Rough Hypercuboid Based Supervised Regularized Canonical Correlation for Multimodal Data Analysis

[...]

Pradipta Maji¹, Ankita Mandal¹•Institutions (1)

Indian Statistical Institute¹

01 Jan 2016-Fundamenta Informaticae

TL;DR: This paper introduces a new SRCCA algorithm, integrating judiciously the merits ofSRCCA and rough hypercuboid approach, to extract relevant and nonredundant features in approximation spaces from multimodal omics data sets.

...read moreread less

Abstract: One of the main problems in real life omics data analysis is how to extract relevant and non-redundant features from high dimensional multimodal data sets. In general, supervised regularized canonical correlation analysis (SRCCA) plays an important role in extracting new features from multimodal omics data sets. However, the existing SRCCA optimizes regularization parameters based on the quality of first pair of canonical variables only using standard feature evaluation indices. In this regard, this paper introduces a new SRCCA algorithm, integrating judiciously the merits of SRCCA and rough hypercuboid approach, to extract relevant and nonredundant features in approximation spaces from multimodal omics data sets. The proposed method optimizes regularization parameters of the SRCCA based on the quality of a set of pairs of canonical variables using rough hypercuboid approach. While the rough hypercuboid approach provides an efficient way to calculate the degree of dependency of class labels on feature set in approximation spaces, the merit of SRCCA helps in extracting non-redundant features from multimodal data sets. The effectiveness of the proposed approach, along with a comparison with related existing approaches, is demonstrated on several real life data sets.

...read moreread less

5 citations

Cites background from "Rough sets and support vector machi..."

...Rough set theory and its several variants have been successfully applied to omics data analysis [33, 34, 35, 36, 20, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]....
[...]

References

PDF

Open Access

More filters

Book•

The Nature of Statistical Learning Theory

[...]

Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

01 Jan 1995

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

...read moreread less

40,147 citations

"Rough sets and support vector machi..." refers methods in this paper

...The error rate of support vector machine (SVM) [52] is used to evaluate the performance of different algorithms....
[...]
...632+ error rate of support vector machine (SVM) [52] is used to evaluate the performance of different miRNA selection algorithms....
[...]

Journal Article•DOI•

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

[...]

Todd R. Golub¹, Todd R. Golub², Donna K. Slonim¹, Pablo Tamayo¹, Christine Huard¹, Michelle Gaasenbeek¹, Jill P. Mesirov¹, Hilary A. Coller¹, Mignon L. Loh², James R. Downing³, Michael A. Caligiuri⁴, Clara D. Bloomfield⁴, Eric S. Lander¹ - Show less +9 more•Institutions (4)

Massachusetts Institute of Technology¹, Harvard University², St. Jude Children's Research Hospital³, Ohio State University⁴

15 Oct 1999-Science

TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

...read moreread less

Abstract: Although cancer classification has improved over the past 30 years, there has been no general approach for identifying new cancer classes (class discovery) or for assigning tumors to known classes (class prediction). Here, a generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case. A class discovery procedure automatically discovered the distinction between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) without previous knowledge of these classes. An automatically derived class predictor was able to determine the class of new leukemia cases. The results demonstrate the feasibility of cancer classification based solely on gene expression monitoring and suggest a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

...read moreread less

12,530 citations

"Rough sets and support vector machi..." refers background in this paper

..., n}, where Wij E ~ is the measured expression level of a miRNA ~ in the jth sample, m and n represent the total number of miRNAs and samples, respectively [5]....
[...]

Journal Article•DOI•

MicroRNA expression profiles classify human cancers

[...]

Jun Lu¹, Gad Getz¹, Eric A. Miska², Eric A. Miska³, Ezequiel Alvarez-Saavedra², Justin Lamb¹, David Peck¹, Alejandro Sweet-Cordero², Alejandro Sweet-Cordero⁴, Benjamin L. Ebert¹, Benjamin L. Ebert⁴, Raymond H. Mak¹, Raymond H. Mak⁴, Adolfo A. Ferrando⁴, James R. Downing⁵, Tyler Jacks², H. Robert Horvitz², H. Robert Horvitz⁶, Todd R. Golub⁴, Todd R. Golub¹, Todd R. Golub⁶ - Show less +17 more•Institutions (6)

Broad Institute¹, Massachusetts Institute of Technology², Wellcome Trust/Cancer Research UK Gurdon Institute³, Harvard University⁴, St. Jude Children's Research Hospital⁵, Howard Hughes Medical Institute⁶

09 Jun 2005-Nature

TL;DR: A new, bead-based flow cytometric miRNA expression profiling method is used to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers, and finds the miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours.

...read moreread less

Abstract: Recent work has revealed the existence of a class of small non-coding RNA species, known as microRNAs (miRNAs), which have critical functions across various biological processes. Here we use a new, bead-based flow cytometric miRNA expression profiling method to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers. The miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours. We observe a general downregulation of miRNAs in tumours compared with normal tissues. Furthermore, we were able to successfully classify poorly differentiated tumours using miRNA expression profiles, whereas messenger RNA profiles were highly inaccurate when applied to the same samples. These findings highlight the potential of miRNA profiling in cancer diagnosis.

...read moreread less

9,470 citations

"Rough sets and support vector machi..." refers background or methods in this paper

...Different statistical tests are also employed to identify differentially expressed miRNAs [1], [12], [13], [14], [15], [16], [17], [18], [19], [20]....
[...]
...Unlike with mRNA expression, a modest number of miRNAs (200 in total) might be sufficient to classify human cancers [1]....
[...]

Journal Article•DOI•

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

[...]

Hanchuan Peng¹, Fuhui Long¹, Chris Ding¹•Institutions (1)

Lawrence Berkeley National Laboratory¹

01 Aug 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, the maximal statistical dependency criterion based on mutual information (mRMR) was proposed to select good features according to the maximal dependency condition. But the problem of feature selection is not solved by directly implementing mRMR.

...read moreread less

Abstract: Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.

...read moreread less

8,078 citations

Book•

Rough Sets: Theoretical Aspects of Reasoning about Data

[...]

Zdzisław Pawlak

31 Oct 1991

TL;DR: Theoretical Foundations.

...read moreread less

Abstract: I. Theoretical Foundations.- 1. Knowledge.- 1.1. Introduction.- 1.2. Knowledge and Classification.- 1.3. Knowledge Base.- 1.4. Equivalence, Generalization and Specialization of Knowledge.- Summary.- Exercises.- References.- 2. Imprecise Categories, Approximations and Rough Sets.- 2.1. Introduction.- 2.2. Rough Sets.- 2.3. Approximations of Set.- 2.4. Properties of Approximations.- 2.5. Approximations and Membership Relation.- 2.6. Numerical Characterization of Imprecision.- 2.7. Topological Characterization of Imprecision.- 2.8. Approximation of Classifications.- 2.9. Rough Equality of Sets.- 2.10. Rough Inclusion of Sets.- Summary.- Exercises.- References.- 3. Reduction of Knowledge.- 3.1. Introduction.- 3.2. Reduct and Core of Knowledge.- 3.3. Relative Reduct and Relative Core of Knowledge.- 3.4. Reduction of Categories.- 3.5. Relative Reduct and Core of Categories.- Summary.- Exercises.- References.- 4. Dependencies in Knowledge Base.- 4.1. Introduction.- 4.2. Dependency of Knowledge.- 4.3. Partial Dependency of Knowledge.- Summary.- Exercises.- References.- 5. Knowledge Representation.- 5.1. Introduction.- 5.2. Examples.- 5.3. Formal Definition.- 5.4. Significance of Attributes.- 5.5. Discernibility Matrix.- Summary.- Exercises.- References.- 6. Decision Tables.- 6.1. Introduction.- 6.2. Formal Definition and Some Properties.- 6.3. Simplification of Decision Tables.- Summary.- Exercises.- References.- 7. Reasoning about Knowledge.- 7.1. Introduction.- 7.2. Language of Decision Logic.- 7.3. Semantics of Decision Logic Language.- 7.4. Deduction in Decision Logic.- 7.5. Normal Forms.- 7.6. Decision Rules and Decision Algorithms.- 7.7. Truth and Indiscernibility.- 7.8. Dependency of Attributes.- 7.9. Reduction of Consistent Algorithms.- 7.10. Reduction of Inconsistent Algorithms.- 7.11. Reduction of Decision Rules.- 7.12. Minimization of Decision Algorithms.- Summary.- Exercises.- References.- II. Applications.- 8. Decision Making.- 8.1. Introduction.- 8.2. Optician's Decisions Table.- 8.3. Simplification of Decision Table.- 8.4. Decision Algorithm.- 8.5. The Case of Incomplete Information.- Summary.- Exercises.- References.- 9. Data Analysis.- 9.1. Introduction.- 9.2. Decision Table as Protocol of Observations.- 9.3. Derivation of Control Algorithms from Observation.- 9.4. Another Approach.- 9.5. The Case of Inconsistent Data.- Summary.- Exercises.- References.- 10. Dissimilarity Analysis.- 10.1. Introduction.- 10.2. The Middle East Situation.- 10.3. Beauty Contest.- 10.4. Pattern Recognition.- 10.5. Buying a Car.- Summary.- Exercises.- References.- 11. Switching Circuits.- 11.1. Introduction.- 11.2. Minimization of Partially Defined Switching Functions.- 11.3. Multiple-Output Switching Functions.- Summary.- Exercises.- References.- 12. Machine Learning.- 12.1. Introduction.- 12.2. Learning From Examples.- 12.3. The Case of an Imperfect Teacher.- 12.4. Inductive Learning.- Summary.- Exercises.- References.

...read moreread less

7,826 citations

"Rough sets and support vector machi..." refers background in this paper

...It is proposed for indiscernibility in classification according to some similarity [26], [34]....
[...]
...An approximation space is also called an information system [26]....
[...]
...Definition 3: Given CC, IIJ) and an attribute A E CC, the significance of the attribute A is defined as [26]: while the total significance among the selected miRNAs is ~ignf = L (T{Gi,Gj}(IIJ),Gj)....
[...]
...One may characterize X by a pair of lower and upper approximations defined as follows [26]: ]E(X) = U{ [Xi]IF I [Xi]IF <X} and llD(X) = U{[Xi]IF I [Xi]IF n X -# 0}....
[...]
...The dependency between ce and IIJ) can be defined as [26]...
[...]