City Block Distance for Identification of Co-expressed MicroRNAs
TL;DR: The proposed method judiciously integrates the merits of robust rough-fuzzy c-means algorithm and normalized range-normalized city block distance to discover co-expressed miRNA clusters and helps to handle minute differences between two miRNA expression profiles.
Abstract: The microRNAs or miRNAs are short, endogenous RNAs having ability to regulate gene expression at the post-transcriptional level. Various studies have revealed that a large proportion of miRNAs are co-expressed. Expression profiling of miRNAs generates a huge volume of data. Complicated networks of miRNA-mRNA interaction increase the challenges of comprehending and interpreting the resulting mass of data. In this regard, this paper presents the application of city block distance in order to extract meaningful information from miRNA expression data. The proposed method judiciously integrates the merits of robust rough-fuzzy c-means algorithm and normalized range-normalized city block distance to discover co-expressed miRNA clusters. The city block distance is used to calculate the membership functions of fuzzy sets, and thereby helps to handle minute differences between two miRNA expression profiles. The effectiveness of the proposed approach, along with a comparison with other related methods, is demonstrated on several miRNA expression data sets using different cluster validity indices and gene ontology.
...read more
Citations
31 citations
Cites methods from "City Block Distance for Identificat..."
...In order to include fuzziness, the normalized city-block distance was employed, adopting the probability function Px(U) as a dimension to compare dissimilarities between the two sample sets (M1,M2) (Webb and Copsey, 2003; Paul and Maji, 2014):...
[...]
...In order to include fuzziness, the normalized city-block distance was employed, adopting the probability function Px(U) as a dimension to compare dissimilarities between the two sample sets (M1,M2) (Webb and Copsey, 2003; Paul and Maji, 2014): dNCB = 1 N × N∑ x=1...
[...]
18 citations
Cites methods from "City Block Distance for Identificat..."
...For example microRNA datasets used in [10] or real life gene expression datasets used in [11] are some unlabeled datasets....
[...]
14 citations
10 citations
Additional excerpts
...But most of them utilize either supervised or semi-supervised classification ( An & Doerge, 2012; Saha et al., 2016; Wang & Pan, 2014 ) techniques. In cancer diagnosis, these classification methodologies help in classifying tumor samples as benign or malignant or any other sub types ( Alizadeh et al., 20 0 0; de Souto et al., 20 08; Yeung & Bumgarner, 20 03 ). But in many cases, it may not be possible to have labeled tissue samples. For example microRNA datasets used in ( Paul & Maji, 2013 ) or real life gene expression datasets used in ( Saha, Ekbal, Gupta, & Bandyopadhyay, 2013 ) are some unlabeled datasets. Because of the unavailability of labeled data, it is difficult to apply any supervised classification technique to solve this problem. Thus unsupervised classification techniques become popular in solving this problem. In recent years the use of multi-objective optimization (MOO) ( Saha et al., 2016 ) becomes popular in solving the cancer tissue sample classification problem. Several objective functions related to partitioning the cancer tissues are simultaneously optimized using some MOO-based techniques. In Horng et al. (2009) , the authors have developed a supervised system that selects a small group of gene markers for classification by using all the necessary information on well-defined pathways available from KEGG. They have used C4.5 decision tree for generating the classification model. In Alonso-González, Moro-Sancho, Simon-Hurtado, and Varelarrabal (2012) , the authors propose, relaxing the maximum accuracy criterion, to select the combination of attribute selection and classification algorithm that using less attributes has an accuracy not statistically significantly worst that the best....
[...]
...But most of them utilize either supervised or semi-supervised classification ( An & Doerge, 2012; Saha et al., 2016; Wang & Pan, 2014 ) techniques. In cancer diagnosis, these classification methodologies help in classifying tumor samples as benign or malignant or any other sub types ( Alizadeh et al., 20 0 0; de Souto et al., 20 08; Yeung & Bumgarner, 20 03 ). But in many cases, it may not be possible to have labeled tissue samples. For example microRNA datasets used in ( Paul & Maji, 2013 ) or real life gene expression datasets used in ( Saha, Ekbal, Gupta, & Bandyopadhyay, 2013 ) are some unlabeled datasets. Because of the unavailability of labeled data, it is difficult to apply any supervised classification technique to solve this problem. Thus unsupervised classification techniques become popular in solving this problem. In recent years the use of multi-objective optimization (MOO) ( Saha et al., 2016 ) becomes popular in solving the cancer tissue sample classification problem. Several objective functions related to partitioning the cancer tissues are simultaneously optimized using some MOO-based techniques. In Horng et al. (2009) , the authors have developed a supervised system that selects a small group of gene markers for classification by using all the necessary information on well-defined pathways available from KEGG....
[...]
4 citations
Cites background or methods or result from "City Block Distance for Identificat..."
...In this section, we have reported the results of [16] for six upper mentioned algorithms with their best DB index values (either for Euclidean distance or NRNCBD) for two miRNA microarray data sets, GSE16473 and GSE29495 and compared their results with our proposed clustering algorithm’s outcome....
[...]
...In [16] authors have incorporated range-normalized city block distance(NRNCBD) instead of Euclidean distance in robust rough Fuzzy c-means(rRFCM) [30] clustering algorithm....
[...]
...In [16] authors have shown the superioty of NRNCBD over Euclidean and Pearson distance version of different clustering algorithms like fuzzy c-means (FCM)[26], hard c-means (HCM)[7], rough-fuzzy c-means (RFCM)[27] and Robust rough-fuzzy c-means(rRFCM) [17]....
[...]
...Recently in [16] a clustering algorithm, combining the concepts of robust rough-fuzzy cmeans algorithm [17] and Normalized range-normalized cityblock distance(NRNCBD) is proposed to discover co-regulated miRNAs from datasets of miRNA expression data....
[...]
...As we can see in the table that for different clustering algorithms its best performance with respect DB index from [16] are reported....
[...]
References
[...]
50,974 citations
15,070 citations
10,821 citations
8,996 citations
7,826 citations