scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Robust RFCM algorithm for identification of co-expressed miRNAs

04 Oct 2012-pp 1-4
TL;DR: The application of robust rough-fuzzy c-means (rRFCM) algorithm to discover co-expressed miRNA clusters is presented and the effectiveness of the rRFCM algorithm and different initialization methods, along with a comparison with other related methods, is demonstrated.
Abstract: MicroRNAs (miRNAs) are short, endogenous RNAs having ability to regulate gene expression at the post-transcriptional level. Various studies have revealed that miRNAs tend to cluster on chromosomes. Members of a cluster that are at close proximity on chromosome are highly likely to be processed as cotranscribed units. Therefore, a large proportion of miRNAs are co-expressed. Expression profiling of miRNAs generates a huge volume of data. Complicated networks of miRNA-mRNA interaction create a big challenge for scientists to decipher this huge expression data. In order to extract meaningful information from expression data, this paper presents the application of robust rough-fuzzy c-means (rRFCM) algorithm to discover co-expressed miRNA clusters. The rRFCM algorithm comprises a judicious integration of rough sets, fuzzy sets, and c-means algorithm. The effectiveness of the rRFCM algorithm and different initialization methods, along with a comparison with other related methods, is demonstrated on three miRNA microarray expression data sets using Silhouette index, Davies-Bouldin index, Dunn index, β index, and gene ontology based analysis.
Citations
More filters
Journal ArticleDOI
TL;DR: The results on several microarray data sets demonstrate that the proposed method can bring a remarkable improvement on miRNA selection problem and is a potentially useful tool for exploration of miRNA expression data and identification of differentially expressed miRNAs worth further investigation.
Abstract: The miRNAs, a class of short approximately 22‐nucleotide non‐coding RNAs, often act post‐transcriptionally to inhibit mRNA expression. In effect, they control gene expression by targeting mRNA. They also help in carrying out normal functioning of a cell as they play an important role in various cellular processes. However, dysregulation of miRNAs is found to be a major cause of a disease. It has been demonstrated that miRNA expression is altered in many human cancers, suggesting that they may play an important role as disease biomarkers. Multiple reports have also noted the utility of miRNAs for the diagnosis of cancer. Among the large number of miRNAs present in a microarray data, a modest number might be sufficient to classify human cancers. Hence, the identification of differentially expressed miRNAs is an important problem particularly for the data sets with large number of miRNAs and small number of samples. In this regard, a new miRNA selection algorithm, called μHEM, is presented based on rough hypercuboid approach. It selects a set of miRNAs from a microarray data by maximizing both relevance and significance of the selected miRNAs. The degree of dependency of sample categories on miRNAs is defined, based on the concept of hypercuboid equivalence partition matrix, to measure both relevance and significance of miRNAs. The effectiveness of the new approach is demonstrated on six publicly available miRNA expression data sets using support vector machine. The.632+ bootstrap error estimate is used to minimize the variability and biasedness of the derived results. An important finding is that the μHEM algorithm achieves lowest B.632+ error rate of support vector machine with a reduced set of differentially expressed miRNAs on four expression data sets compare to some existing machine learning and statistical methods, while for other two data sets, the error rate of the μHEM algorithm is comparable with the existing techniques. The results on several microarray data sets demonstrate that the proposed method can bring a remarkable improvement on miRNA selection problem. The method is a potentially useful tool for exploration of miRNA expression data and identification of differentially expressed miRNAs worth further investigation.

14 citations


Cites methods from "Robust RFCM algorithm for identific..."

  • ...The theory of rough sets has also been successfully applied to microarray data analysis in [9,24-35]....

    [...]

Book ChapterDOI
01 Jan 2014
TL;DR: This chapter presents a new approach for selecting miRNAs from microarray expression data that integrates the merit of rough set-based feature selection algorithm reported in Chap.
Abstract: The microRNAs or miRNAs regulate expression of a gene or protein. It has been observed that they play an important role in various cellular processes and thus help in carrying out normal functioning of a cell. However, dysregulation of miRNAs is found to be a major cause of a disease. Various studies have also shown the role of miRNAs in cancer and utility of miRNAs for the diagnosis of cancer. In this regard, this chapter presents a new approach for selecting miRNAs from microarray expression data. It integrates the merit of rough set-based feature selection algorithm reported in Chap. 4 and theory of B.632+ bootstrap error rate. The effectiveness of the new approach, along with a comparison with other algorithms, is demonstrated on several miRNA data sets.

8 citations

Journal ArticleDOI
TL;DR: An automatic clustering technique using the search capability of multiobjective optimization which can automatically determine the relevant distance measure and the corresponding partitioning from a given data set is developed.
Abstract: Distance plays an important role in the clustering process for allocating data points to different clusters. Several distance or proximity measures have been developed and reported in the literature to determine dissimilarities between two given points. The choice of distance measure depends on a particular domain as well as different data sets of the same domain. It is important to automatically determine the appropriate distance measure which acts best for a particular data set. In this study we have developed an automatic clustering technique using the search capability of multiobjective optimization which can automatically determine the relevant distance measure and the corresponding partitioning from a given data set. Our proposed automated framework is generic in nature i.e., any number of different distance measures can be incorporated into it. In our work we have used four existing widely used distance measures, i.e., Euclidean, line symmetry, point symmetry and city block distance to be explored for each data set. In order to measure the richness of an obtained partitioning using a particular distance, four cluster validity indices, the Silhouette index, the DB index, the adjusted rand index and classification accuracy are used. A new encoding strategy which can encode the set of cluster centers and the particular distance function is used to represent the problem. The appropriate distance function and the corresponding partitioning are determined using the search capability of a multiobjective optimization based technique. The efficiency of the proposed technique is shown on clustering three microRNA and three microarray gene expression data sets having varying complexities. The results show the usefulness of the proposed automated approach.

8 citations

Book ChapterDOI
19 Dec 2013
TL;DR: The proposed method judiciously integrates the merits of robust rough-fuzzy c-means algorithm and normalized range-normalized city block distance to discover co-expressed miRNA clusters and helps to handle minute differences between two miRNA expression profiles.
Abstract: The microRNAs or miRNAs are short, endogenous RNAs having ability to regulate gene expression at the post-transcriptional level. Various studies have revealed that a large proportion of miRNAs are co-expressed. Expression profiling of miRNAs generates a huge volume of data. Complicated networks of miRNA-mRNA interaction increase the challenges of comprehending and interpreting the resulting mass of data. In this regard, this paper presents the application of city block distance in order to extract meaningful information from miRNA expression data. The proposed method judiciously integrates the merits of robust rough-fuzzy c-means algorithm and normalized range-normalized city block distance to discover co-expressed miRNA clusters. The city block distance is used to calculate the membership functions of fuzzy sets, and thereby helps to handle minute differences between two miRNA expression profiles. The effectiveness of the proposed approach, along with a comparison with other related methods, is demonstrated on several miRNA expression data sets using different cluster validity indices and gene ontology.

8 citations

Journal ArticleDOI
TL;DR: A novel approach for in silico identification of differentially expressed miRNAs from microarray expression data sets by integrating judiciously the theory of rough sets and merit of the so-called B.632+ bootstrap error estimate.
Abstract: The microRNAs, also known as miRNAs, are the class of small noncoding RNAs. They repress the expression of a gene posttranscriptionally. In effect, they regulate expression of a gene or protein. It has been observed that they play an important role in various cellular processes and thus help in carrying out normal functioning of a cell. However, dysregulation of miRNAs is found to be a major cause of a disease. Various studies have also shown the role of miRNAs in cancer and the utility of miRNAs for the diagnosis of cancer and other diseases. Unlike with mRNAs, a modest number of miRNAs might be sufficient to classify human cancers. However, the absence of a robust method to identify differentially expressed miRNAs makes this an open problem. In this regard, this paper presents a novel approach for in silico identification of differentially expressed miRNAs from microarray expression data sets. It integrates judiciously the theory of rough sets and merit of the so-called B.632+ bootstrap error estimate. While rough sets select relevant and significant miRNAs from expression data, the B.632+ error rate minimizes the variability and bias of the derived results. The effectiveness of the proposed approach, along with a comparison with other related approaches, is demonstrated on several miRNA microarray expression data sets, using the support vector machine.

7 citations

References
More filters
Journal ArticleDOI
01 Mar 2005-RNA
TL;DR: The results show that proximal pairs of miRNAs are generally coexpressed, and that in situ analyses of host gene expression can be used to probe the spatial and temporal localization of intronic mi RNAs.
Abstract: MicroRNAs (miRNAs) are short endogenous RNAs known to post-transcriptionally repress gene expression in animals and plants. A microarray profiling survey revealed the expression patterns of 175 human miRNAs across 24 different human organs. Our results show that proximal pairs of miRNAs are generally coexpressed. In addition, an abrupt transition in the correlation between pairs of expressed miRNAs occurs at a distance of 50 kb, implying that miRNAs separated by <50 kb typically derive from a common transcript. Some microRNAs are within the introns of host genes. Intronic miRNAs are usually coordinately expressed with their host gene mRNA, implying that they also generally derive from a common transcript, and that in situ analyses of host gene expression can be used to probe the spatial and temporal localization of intronic miRNAs.

1,445 citations


"Robust RFCM algorithm for identific..." refers methods in this paper

  • ...Existence of co-expressed miRNAs is also demonstrated using expression profiling analysis in [2]....

    [...]

Journal ArticleDOI
TL;DR: This study raises the proportion of clustered human miRNAs that are <3000 nt apart to 42%.
Abstract: MicroRNAs (miRNAs) are � 22 nt-long non-coding RNA molecules, believed to play important roles in gene regulation. We present a comprehensive analysis of the conservation and clustering patterns of known miRNAs in human. We show that human miRNA gene clustering is significantly higher than expected at random. A total of 37% of the known human miRNA genes analyzed in this study appear in clusters of two or more with pairwise chromosomal distancesofatmost3000 nt. Comparison ofthe miRNA sequences with their homologs in four other organisms reveals a typical conservation pattern, persistent throughout the clusters. Furthermore, we show enrichment in the typical conservation patterns and other miRNA-like properties in the vicinity of known miRNA genes, compared with random genomic regions. This may imply that additional, yet unknown, miRNAs reside in these regions, consistent with the current recognition that there are overlooked miRNAs. Indeed, by comparing our predictions with cloning results and with identified miRNA genes in other mammals, we corroborate the predictions of 18 additional human miRNA genes in the vicinity of the previously known ones. Our study raises the proportion of clustered human miRNAs that are <3000 nt apart to 42%. This suggests that the clustering of miRNA genes ishigherthancurrentlyacknowledged, alluding to its evolutionary and functional implications.

806 citations


"Robust RFCM algorithm for identific..." refers background in this paper

  • ...It has been reported that at a very conservative maximum inter-miRNA distance of 1kb, over 30% of all miRNAs are organized into clusters [1]....

    [...]

Journal ArticleDOI
TL;DR: DIANA-microT 3.0 was found to achieve the highest precision among the most widely used microRNA target prediction programs reaching approximately 66%.
Abstract: MicroRNAs are small endogenously expressed non-coding RNA molecules that regulate target gene expression through translation repression or messenger RNA degradation. MicroRNA regulation is performed through pairing of the microRNA to sites in the messenger RNA of protein coding genes. Since experimental identification of miRNA target genes poses difficulties, computational microRNA target prediction is one of the key means in deciphering the role of microRNAs in development and disease. DIANA-microT 3.0 is an algorithm for microRNA target prediction which is based on several parameters calculated individually for each microRNA and combines conserved and non-conserved microRNA recognition elements into a final prediction score, which correlates with protein production fold change. Specifically, for each predicted interaction the program reports a signal to noise ratio and a precision score which can be used as an indication of the false positive rate of the prediction. Recently, several computational target prediction programs were benchmarked based on a set of microRNA target genes identified by the pSILAC method. In this assessment DIANA-microT 3.0 was found to achieve the highest precision among the most widely used microRNA target prediction programs reaching approximately 66%. The DIANA-microT 3.0 prediction results are available online in a user friendly web server at http://www.microrna.gr/microT

347 citations


"Robust RFCM algorithm for identific..." refers methods in this paper

  • ...0 [6], a miRNA target prediction algorithm is used to predict miRNA target genes for all miRNA clusters generated by different clustering algorithms....

    [...]

Journal ArticleDOI
01 Dec 2007
TL;DR: The RFPCM comprises a judicious integration of the principles of rough and fuzzy sets that incorporates both probabilistic and possibilistic memberships simultaneously to avoid the problems of noise sensitivity of fuzzy C-means and the coincident clusters of PCM.
Abstract: A generalized hybrid unsupervised learning algorithm, which is termed as rough-fuzzy possibilistic C-means (RFPCM), is proposed in this paper. It comprises a judicious integration of the principles of rough and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in class definition, the membership function of fuzzy sets enables efficient handling of overlapping partitions. It incorporates both probabilistic and possibilistic memberships simultaneously to avoid the problems of noise sensitivity of fuzzy C-means and the coincident clusters of PCM. The concept of crisp lower bound and fuzzy boundary of a class, which is introduced in the RFPCM, enables efficient selection of cluster prototypes. The algorithm is generalized in the sense that all existing variants of C-means algorithms can be derived from the proposed algorithm as a special case. Several quantitative indices are introduced based on rough sets for the evaluation of performance of the proposed C-means algorithm. The effectiveness of the algorithm, along with a comparison with other algorithms, has been demonstrated both qualitatively and quantitatively on a set of real-life data sets.

220 citations


"Robust RFCM algorithm for identific..." refers background or methods in this paper

  • ...In this section, the performance of the rRFCM algorithm is compared with that of hard c-means (HCM), fuzzy cmeans (FCM), rough-fuzzy c-means (RFCM) [3], cluster identification via connectivity kernels (CLICK), and self organizing map (SOM) on three miRNA microarray data sets, which are downloaded from Gene Expression Omnibus (www....

    [...]

  • ...Both fuzzy set and rough set provide a mathematical framework to capture uncertainties associated with human cognition process [3]....

    [...]

  • ...The lower bound of the rRFCM algorithm differentiates it from the RFCM [3] algorithm used in [5]....

    [...]

Journal ArticleDOI
TL;DR: An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters.
Abstract: Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy $(c)$-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy $(c)$-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed $(c)$-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets.

95 citations


"Robust RFCM algorithm for identific..." refers methods in this paper

  • ...To generate initial cluster prototypes of c-means algorithms, the initialization method reported in [4] is used, where a quantitative measure, called degree of similarity, is used to evaluate the similarity between two objects....

    [...]

  • ...In this paper, the application of a newly developed hybrid algorithm, called robust rough-fuzzy c-means (rRFCM) [4], is presented for clustering miRNA expression data....

    [...]

  • ...In effect, it has a direct influence on the performance of the initialization method, used in [4]....

    [...]

  • ...The rRFCM has been used to group functionally similar genes from gene microarray data [4]....

    [...]

  • ...An efficient method developed in [4] is used to select initial prototypes of different miRNA clusters; thereby circumventing the initialization and local minima problems of c-means algorithm....

    [...]