scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Rough sets and support vector machine for selecting differentially expressed miRNAs

04 Oct 2012-pp 864-871
TL;DR: A rough set based feature selection algorithm to select miRNAs from expression data that can classify tissue samples into their respective category with minimal error rate is presented.
Abstract: The microRNAs, also known as miRNAs are, the class of small non-coding RNAs that repress the expression of a gene post-transcriptionally. In effect, they regulate expression of a gene or protein. It has been observed that they play an important role in various cellular processes and thus help in carrying out normal functioning of a cell. However, dysregulation of miRNAs is found to be a major cause of a disease. Various studies have also shown the role of miRNAs in cancer and utility of miRNAs for the diagnosis of cancer and other diseases. A large number of works have been conducted to identify differentially expressed miRNAs as unlike with mRNA expression, a modest number of miRNAs might be sufficient to classify human cancers. In this regard, this paper presents a rough set based feature selection algorithm to select miRNAs from expression data that can classify tissue samples into their respective category with minimal error rate. It selects a set of miRNAs by maximizing both the relevance and significance of miRNAs. The effectiveness of the rough set based algorithm, along with a comparison with other related algorithms, is demonstrated on three miRNA microarray expression data sets using the B.632+ bootstrap error rate of support vector machine.
Citations
More filters
Journal ArticleDOI
TL;DR: A generic categorizing framework is proposed that systematically groups algorithms into categories based on search strategies and evaluation criteria and provides guidelines for selecting feature selection algorithms in general and in specific to the context of this study.
Abstract: This paper studies the relevance of feature selection algorithms in microarray data for effective analysis. With no loss of generality, we present a list of feature selection algorithms and propose a generic categorizing framework that systematically groups algorithms into categories. The generic categorizing framework is based on search strategies and evaluation criteria. Further, it provides guidelines for selecting feature selection algorithms in general and in specific to the context of this study. In the context of microarray data analysis, the feature selection algorithms are classified into soft and non-soft computing categories. Their performance analysis with respect to microarray data analysis has been presented.

23 citations


Cites methods from "Rough sets and support vector machi..."

  • ...Redundant Gene selection using PSO(RGS-PSO) [123] – ▪ Redundant gene selection using PSO (RGS-PSO) is a novel approach....

    [...]

  • ...Rough set and SVM based [123] ▪ Rough set and MRMS for gene selection ▪ SVM for classification The MRMS selects a set of miRNAs having a lowest B....

    [...]

Journal ArticleDOI
TL;DR: The results on several microarray data sets demonstrate that the proposed method can bring a remarkable improvement on miRNA selection problem and is a potentially useful tool for exploration of miRNA expression data and identification of differentially expressed miRNAs worth further investigation.
Abstract: The miRNAs, a class of short approximately 22‐nucleotide non‐coding RNAs, often act post‐transcriptionally to inhibit mRNA expression. In effect, they control gene expression by targeting mRNA. They also help in carrying out normal functioning of a cell as they play an important role in various cellular processes. However, dysregulation of miRNAs is found to be a major cause of a disease. It has been demonstrated that miRNA expression is altered in many human cancers, suggesting that they may play an important role as disease biomarkers. Multiple reports have also noted the utility of miRNAs for the diagnosis of cancer. Among the large number of miRNAs present in a microarray data, a modest number might be sufficient to classify human cancers. Hence, the identification of differentially expressed miRNAs is an important problem particularly for the data sets with large number of miRNAs and small number of samples. In this regard, a new miRNA selection algorithm, called μHEM, is presented based on rough hypercuboid approach. It selects a set of miRNAs from a microarray data by maximizing both relevance and significance of the selected miRNAs. The degree of dependency of sample categories on miRNAs is defined, based on the concept of hypercuboid equivalence partition matrix, to measure both relevance and significance of miRNAs. The effectiveness of the new approach is demonstrated on six publicly available miRNA expression data sets using support vector machine. The.632+ bootstrap error estimate is used to minimize the variability and biasedness of the derived results. An important finding is that the μHEM algorithm achieves lowest B.632+ error rate of support vector machine with a reduced set of differentially expressed miRNAs on four expression data sets compare to some existing machine learning and statistical methods, while for other two data sets, the error rate of the μHEM algorithm is comparable with the existing techniques. The results on several microarray data sets demonstrate that the proposed method can bring a remarkable improvement on miRNA selection problem. The method is a potentially useful tool for exploration of miRNA expression data and identification of differentially expressed miRNAs worth further investigation.

14 citations


Cites methods from "Rough sets and support vector machi..."

  • ...The theory of rough sets has also been successfully applied to microarray data analysis in [9,24-35]....

    [...]

Book ChapterDOI
01 Jan 2014
TL;DR: This chapter presents a new approach for selecting miRNAs from microarray expression data that integrates the merit of rough set-based feature selection algorithm reported in Chap.
Abstract: The microRNAs or miRNAs regulate expression of a gene or protein. It has been observed that they play an important role in various cellular processes and thus help in carrying out normal functioning of a cell. However, dysregulation of miRNAs is found to be a major cause of a disease. Various studies have also shown the role of miRNAs in cancer and utility of miRNAs for the diagnosis of cancer. In this regard, this chapter presents a new approach for selecting miRNAs from microarray expression data. It integrates the merit of rough set-based feature selection algorithm reported in Chap. 4 and theory of B.632+ bootstrap error rate. The effectiveness of the new approach, along with a comparison with other algorithms, is demonstrated on several miRNA data sets.

8 citations

Journal ArticleDOI
TL;DR: A novel approach for in silico identification of differentially expressed miRNAs from microarray expression data sets by integrating judiciously the theory of rough sets and merit of the so-called B.632+ bootstrap error estimate.
Abstract: The microRNAs, also known as miRNAs, are the class of small noncoding RNAs. They repress the expression of a gene posttranscriptionally. In effect, they regulate expression of a gene or protein. It has been observed that they play an important role in various cellular processes and thus help in carrying out normal functioning of a cell. However, dysregulation of miRNAs is found to be a major cause of a disease. Various studies have also shown the role of miRNAs in cancer and the utility of miRNAs for the diagnosis of cancer and other diseases. Unlike with mRNAs, a modest number of miRNAs might be sufficient to classify human cancers. However, the absence of a robust method to identify differentially expressed miRNAs makes this an open problem. In this regard, this paper presents a novel approach for in silico identification of differentially expressed miRNAs from microarray expression data sets. It integrates judiciously the theory of rough sets and merit of the so-called B.632+ bootstrap error estimate. While rough sets select relevant and significant miRNAs from expression data, the B.632+ error rate minimizes the variability and bias of the derived results. The effectiveness of the proposed approach, along with a comparison with other related approaches, is demonstrated on several miRNA microarray expression data sets, using the support vector machine.

7 citations

Journal ArticleDOI
TL;DR: This paper introduces a new SRCCA algorithm, integrating judiciously the merits ofSRCCA and rough hypercuboid approach, to extract relevant and nonredundant features in approximation spaces from multimodal omics data sets.
Abstract: One of the main problems in real life omics data analysis is how to extract relevant and non-redundant features from high dimensional multimodal data sets. In general, supervised regularized canonical correlation analysis (SRCCA) plays an important role in extracting new features from multimodal omics data sets. However, the existing SRCCA optimizes regularization parameters based on the quality of first pair of canonical variables only using standard feature evaluation indices. In this regard, this paper introduces a new SRCCA algorithm, integrating judiciously the merits of SRCCA and rough hypercuboid approach, to extract relevant and nonredundant features in approximation spaces from multimodal omics data sets. The proposed method optimizes regularization parameters of the SRCCA based on the quality of a set of pairs of canonical variables using rough hypercuboid approach. While the rough hypercuboid approach provides an efficient way to calculate the degree of dependency of class labels on feature set in approximation spaces, the merit of SRCCA helps in extracting non-redundant features from multimodal data sets. The effectiveness of the proposed approach, along with a comparison with related existing approaches, is demonstrated on several real life data sets.

5 citations


Cites background from "Rough sets and support vector machi..."

  • ...Rough set theory and its several variants have been successfully applied to omics data analysis [33, 34, 35, 36, 20, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]....

    [...]

References
More filters
Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations


"Rough sets and support vector machi..." refers methods in this paper

  • ...The error rate of support vector machine (SVM) [52] is used to evaluate the performance of different algorithms....

    [...]

  • ...632+ error rate of support vector machine (SVM) [52] is used to evaluate the performance of different miRNA selection algorithms....

    [...]

Journal ArticleDOI
15 Oct 1999-Science
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Abstract: Although cancer classification has improved over the past 30 years, there has been no general approach for identifying new cancer classes (class discovery) or for assigning tumors to known classes (class prediction). Here, a generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case. A class discovery procedure automatically discovered the distinction between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) without previous knowledge of these classes. An automatically derived class predictor was able to determine the class of new leukemia cases. The results demonstrate the feasibility of cancer classification based solely on gene expression monitoring and suggest a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

12,530 citations


"Rough sets and support vector machi..." refers background in this paper

  • ..., n}, where Wij E ~ is the measured expression level of a miRNA ~ in the jth sample, m and n represent the total number of miRNAs and samples, respectively [5]....

    [...]

Journal ArticleDOI
09 Jun 2005-Nature
TL;DR: A new, bead-based flow cytometric miRNA expression profiling method is used to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers, and finds the miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours.
Abstract: Recent work has revealed the existence of a class of small non-coding RNA species, known as microRNAs (miRNAs), which have critical functions across various biological processes. Here we use a new, bead-based flow cytometric miRNA expression profiling method to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers. The miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours. We observe a general downregulation of miRNAs in tumours compared with normal tissues. Furthermore, we were able to successfully classify poorly differentiated tumours using miRNA expression profiles, whereas messenger RNA profiles were highly inaccurate when applied to the same samples. These findings highlight the potential of miRNA profiling in cancer diagnosis.

9,470 citations


"Rough sets and support vector machi..." refers background or methods in this paper

  • ...Different statistical tests are also employed to identify differentially expressed miRNAs [1], [12], [13], [14], [15], [16], [17], [18], [19], [20]....

    [...]

  • ...Unlike with mRNA expression, a modest number of miRNAs (200 in total) might be sufficient to classify human cancers [1]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the maximal statistical dependency criterion based on mutual information (mRMR) was proposed to select good features according to the maximal dependency condition. But the problem of feature selection is not solved by directly implementing mRMR.
Abstract: Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.

8,078 citations

Book
31 Oct 1991
TL;DR: Theoretical Foundations.
Abstract: I. Theoretical Foundations.- 1. Knowledge.- 1.1. Introduction.- 1.2. Knowledge and Classification.- 1.3. Knowledge Base.- 1.4. Equivalence, Generalization and Specialization of Knowledge.- Summary.- Exercises.- References.- 2. Imprecise Categories, Approximations and Rough Sets.- 2.1. Introduction.- 2.2. Rough Sets.- 2.3. Approximations of Set.- 2.4. Properties of Approximations.- 2.5. Approximations and Membership Relation.- 2.6. Numerical Characterization of Imprecision.- 2.7. Topological Characterization of Imprecision.- 2.8. Approximation of Classifications.- 2.9. Rough Equality of Sets.- 2.10. Rough Inclusion of Sets.- Summary.- Exercises.- References.- 3. Reduction of Knowledge.- 3.1. Introduction.- 3.2. Reduct and Core of Knowledge.- 3.3. Relative Reduct and Relative Core of Knowledge.- 3.4. Reduction of Categories.- 3.5. Relative Reduct and Core of Categories.- Summary.- Exercises.- References.- 4. Dependencies in Knowledge Base.- 4.1. Introduction.- 4.2. Dependency of Knowledge.- 4.3. Partial Dependency of Knowledge.- Summary.- Exercises.- References.- 5. Knowledge Representation.- 5.1. Introduction.- 5.2. Examples.- 5.3. Formal Definition.- 5.4. Significance of Attributes.- 5.5. Discernibility Matrix.- Summary.- Exercises.- References.- 6. Decision Tables.- 6.1. Introduction.- 6.2. Formal Definition and Some Properties.- 6.3. Simplification of Decision Tables.- Summary.- Exercises.- References.- 7. Reasoning about Knowledge.- 7.1. Introduction.- 7.2. Language of Decision Logic.- 7.3. Semantics of Decision Logic Language.- 7.4. Deduction in Decision Logic.- 7.5. Normal Forms.- 7.6. Decision Rules and Decision Algorithms.- 7.7. Truth and Indiscernibility.- 7.8. Dependency of Attributes.- 7.9. Reduction of Consistent Algorithms.- 7.10. Reduction of Inconsistent Algorithms.- 7.11. Reduction of Decision Rules.- 7.12. Minimization of Decision Algorithms.- Summary.- Exercises.- References.- II. Applications.- 8. Decision Making.- 8.1. Introduction.- 8.2. Optician's Decisions Table.- 8.3. Simplification of Decision Table.- 8.4. Decision Algorithm.- 8.5. The Case of Incomplete Information.- Summary.- Exercises.- References.- 9. Data Analysis.- 9.1. Introduction.- 9.2. Decision Table as Protocol of Observations.- 9.3. Derivation of Control Algorithms from Observation.- 9.4. Another Approach.- 9.5. The Case of Inconsistent Data.- Summary.- Exercises.- References.- 10. Dissimilarity Analysis.- 10.1. Introduction.- 10.2. The Middle East Situation.- 10.3. Beauty Contest.- 10.4. Pattern Recognition.- 10.5. Buying a Car.- Summary.- Exercises.- References.- 11. Switching Circuits.- 11.1. Introduction.- 11.2. Minimization of Partially Defined Switching Functions.- 11.3. Multiple-Output Switching Functions.- Summary.- Exercises.- References.- 12. Machine Learning.- 12.1. Introduction.- 12.2. Learning From Examples.- 12.3. The Case of an Imperfect Teacher.- 12.4. Inductive Learning.- Summary.- Exercises.- References.

7,826 citations


"Rough sets and support vector machi..." refers background in this paper

  • ...It is proposed for indiscernibility in classification according to some similarity [26], [34]....

    [...]

  • ...An approximation space is also called an information system [26]....

    [...]

  • ...Definition 3: Given CC, IIJ) and an attribute A E CC, the significance of the attribute A is defined as [26]: while the total significance among the selected miRNAs is ~ignf = L (T{Gi,Gj}(IIJ),Gj)....

    [...]

  • ...One may characterize X by a pair of lower and upper approximations defined as follows [26]: ]E(X) = U{ [Xi]IF I [Xi]IF <X} and llD(X) = U{[Xi]IF I [Xi]IF n X -# 0}....

    [...]

  • ...The dependency between ce and IIJ) can be defined as [26]...

    [...]