scispace - formally typeset
Search or ask a question
Book ChapterDOI

City Block Distance for Identification of Co-expressed MicroRNAs

19 Dec 2013-pp 387-396
TL;DR: The proposed method judiciously integrates the merits of robust rough-fuzzy c-means algorithm and normalized range-normalized city block distance to discover co-expressed miRNA clusters and helps to handle minute differences between two miRNA expression profiles.
Abstract: The microRNAs or miRNAs are short, endogenous RNAs having ability to regulate gene expression at the post-transcriptional level. Various studies have revealed that a large proportion of miRNAs are co-expressed. Expression profiling of miRNAs generates a huge volume of data. Complicated networks of miRNA-mRNA interaction increase the challenges of comprehending and interpreting the resulting mass of data. In this regard, this paper presents the application of city block distance in order to extract meaningful information from miRNA expression data. The proposed method judiciously integrates the merits of robust rough-fuzzy c-means algorithm and normalized range-normalized city block distance to discover co-expressed miRNA clusters. The city block distance is used to calculate the membership functions of fuzzy sets, and thereby helps to handle minute differences between two miRNA expression profiles. The effectiveness of the proposed approach, along with a comparison with other related methods, is demonstrated on several miRNA expression data sets using different cluster validity indices and gene ontology.
Citations
More filters
Journal ArticleDOI
TL;DR: The study shows that different types of geological data have disparate effects on model uncertainty and model geometry, and the presented approach using both information entropy and distance measures can be a major help in the optimization of 3-D geological models.
Abstract: . The quality of a 3-D geological model strongly depends on the type of integrated geological data, their interpretation and associated uncertainties. In order to improve an existing geological model and effectively plan further site investigation, it is of paramount importance to identify existing uncertainties within the model space. Information entropy, a voxel-based measure, provides a method for assessing structural uncertainties, comparing multiple model interpretations and tracking changes across consecutively built models. The aim of this study is to evaluate the effect of data integration (i.e., update of an existing model through successive addition of different types of geological data) on model uncertainty, model geometry and overall structural understanding. Several geological 3-D models of increasing complexity, incorporating different input data categories, were built for the study site Staufen (Germany). We applied the concept of information entropy in order to visualize and quantify changes in uncertainty between these models. Furthermore, we propose two measures, the Jaccard and the city-block distance, to directly compare dissimilarities between the models. The study shows that different types of geological data have disparate effects on model uncertainty and model geometry. The presented approach using both information entropy and distance measures can be a major help in the optimization of 3-D geological models.

39 citations


Cites methods from "City Block Distance for Identificat..."

  • ...In order to include fuzziness, the normalized city-block distance was employed, adopting the probability function Px(U) as a dimension to compare dissimilarities between the two sample sets (M1,M2) (Webb and Copsey, 2003; Paul and Maji, 2014):...

    [...]

  • ...In order to include fuzziness, the normalized city-block distance was employed, adopting the probability function Px(U) as a dimension to compare dissimilarities between the two sample sets (M1,M2) (Webb and Copsey, 2003; Paul and Maji, 2014): dNCB = 1 N × N∑ x=1...

    [...]

Journal ArticleDOI
TL;DR: A MOO-based clustering technique utilizing archived multiobjective simulated annealing (AMOSA) as the underlying optimization strategy for classification of tissue samples from cancer datasets and significant gene markers have been identified and demonstrated visually from the clustering solutions obtained.
Abstract: In the field of pattern recognition, the study of the gene expression profiles of different tissue samples over different experimental conditions has become feasible with the arrival of microarray-based technology. In cancer research, classification of tissue samples is necessary for cancer diagnosis, which can be done with the help of microarray technology. In this paper, we have presented a multiobjective optimization (MOO)-based clustering technique utilizing archived multiobjective simulated annealing(AMOSA) as the underlying optimization strategy for classification of tissue samples from cancer datasets. The presented clustering technique is evaluated for three open source benchmark cancer datasets [Brain tumor dataset, Adult Malignancy, and Small Round Blood Cell Tumors (SRBCT)]. In order to evaluate the quality or goodness of produced clusters, two cluster quality measures viz, adjusted rand index and classification accuracy ( $\% CoA$ ) are calculated. Comparative results of the presented clustering algorithm with ten state-of-the-art existing clustering techniques are shown for three benchmark datasets. Also, we have conducted a statistical significance test called t -test to prove the superiority of our presented MOO-based clustering technique over other clustering techniques. Moreover, significant gene markers have been identified and demonstrated visually from the clustering solutions obtained. In the field of cancer subtype prediction, this study can have important impact.

23 citations


Cites methods from "City Block Distance for Identificat..."

  • ...For example microRNA datasets used in [10] or real life gene expression datasets used in [11] are some unlabeled datasets....

    [...]

Journal ArticleDOI
23 May 2019-PLOS ONE
TL;DR: A late integration based multiobjective multi-view clustering algorithm which uses a special perturbation operator to generate a single set of non-dominated solutions for patient sub-classification of multi-omics datasets.
Abstract: Recent high throughput omics technology has been used to assemble large biomedical omics datasets. Clustering of single omics data has proven invaluable in biomedical research. For the task of patient sub-classification, all the available omics data should be utilized combinedly rather than treating them individually. Clustering of multi-omics datasets has the potential to reveal deep insights. Here, we propose a late integration based multiobjective multi-view clustering algorithm which uses a special perturbation operator. Initially, a large number of diverse clustering solutions (called base partitionings) are generated for each omic dataset using four clustering algorithms, viz., k means, complete linkage, spectral and fast search clustering. These base partitionings of multi-omic datasets are suitably combined using a special perturbation operator. The perturbation operator uses an ensemble technique to generate new solutions from the base partitionings. The optimal combination of multiple partitioning solutions across different views is determined after optimizing the objective functions, namely conn-XB, for checking the quality of partitionings for different views, and agreement index, for checking agreement between the views. The search capability of a multiobjective simulated annealing approach, namely AMOSA is used for this purpose. Lastly, the non-dominated solutions of the different views are combined based on similarity to generate a single set of non-dominated solutions. The proposed algorithm is evaluated on 13 multi-view cancer datasets. An elaborated comparative study with several baseline methods and five state-of-the-art models is performed to show the effectiveness of the algorithm.

15 citations

Journal ArticleDOI
TL;DR: Results of the newly developed stability based clustering namely Stab-clustering with respect to existing approaches are shown for twelve microarray cancer datasets in terms of different cluster quality measures, confirming the robustness of the proposed technique over state-of-the-art.
Abstract: The concept of stability is one of the commonly used physical phenomena. Current paper builds on the hypothesis that the optimal number of clusters present in the dataset corresponds to that partitioning which is most stable over some small changes in the dataset. In order to quantify the degree of stability, a new measure is also proposed in the paper. Thereafter an expert clustering approach is developed in the current paper which utilizes the properties of stability for automatically detecting the number of clusters from a given dataset. Initially, several different variants of the dataset are generated by introducing small perturbations. A multi-objective based expert clustering framework is developed to automatically partition different variants of the data. A new objective function, capturing stability property of clustering solution namely ‘Agreement-index’, along with two well-known objective functions are optimized simultaneously using a multi-objective simulated annealing based process, namely AMOSA for the purpose of clustering. Finally, the problem of cancer classification is addressed as the application domain of the proposed expert framework. Results of our newly developed stability based clustering namely Stab-clustering with respect to existing approaches are shown for twelve microarray cancer datasets in terms of different cluster quality measures. The obtained results confirm the robustness of our proposed technique over state-of-the-art. A thorough biological and statistical significance tests are also conducted to prove the effectiveness of the proposed approach.

10 citations


Additional excerpts

  • ...But most of them utilize either supervised or semi-supervised classification ( An & Doerge, 2012; Saha et al., 2016; Wang & Pan, 2014 ) techniques. In cancer diagnosis, these classification methodologies help in classifying tumor samples as benign or malignant or any other sub types ( Alizadeh et al., 20 0 0; de Souto et al., 20 08; Yeung & Bumgarner, 20 03 ). But in many cases, it may not be possible to have labeled tissue samples. For example microRNA datasets used in ( Paul & Maji, 2013 ) or real life gene expression datasets used in ( Saha, Ekbal, Gupta, & Bandyopadhyay, 2013 ) are some unlabeled datasets. Because of the unavailability of labeled data, it is difficult to apply any supervised classification technique to solve this problem. Thus unsupervised classification techniques become popular in solving this problem. In recent years the use of multi-objective optimization (MOO) ( Saha et al., 2016 ) becomes popular in solving the cancer tissue sample classification problem. Several objective functions related to partitioning the cancer tissues are simultaneously optimized using some MOO-based techniques. In Horng et al. (2009) , the authors have developed a supervised system that selects a small group of gene markers for classification by using all the necessary information on well-defined pathways available from KEGG. They have used C4.5 decision tree for generating the classification model. In Alonso-González, Moro-Sancho, Simon-Hurtado, and Varelarrabal (2012) , the authors propose, relaxing the maximum accuracy criterion, to select the combination of attribute selection and classification algorithm that using less attributes has an accuracy not statistically significantly worst that the best....

    [...]

  • ...But most of them utilize either supervised or semi-supervised classification ( An & Doerge, 2012; Saha et al., 2016; Wang & Pan, 2014 ) techniques. In cancer diagnosis, these classification methodologies help in classifying tumor samples as benign or malignant or any other sub types ( Alizadeh et al., 20 0 0; de Souto et al., 20 08; Yeung & Bumgarner, 20 03 ). But in many cases, it may not be possible to have labeled tissue samples. For example microRNA datasets used in ( Paul & Maji, 2013 ) or real life gene expression datasets used in ( Saha, Ekbal, Gupta, & Bandyopadhyay, 2013 ) are some unlabeled datasets. Because of the unavailability of labeled data, it is difficult to apply any supervised classification technique to solve this problem. Thus unsupervised classification techniques become popular in solving this problem. In recent years the use of multi-objective optimization (MOO) ( Saha et al., 2016 ) becomes popular in solving the cancer tissue sample classification problem. Several objective functions related to partitioning the cancer tissues are simultaneously optimized using some MOO-based techniques. In Horng et al. (2009) , the authors have developed a supervised system that selects a small group of gene markers for classification by using all the necessary information on well-defined pathways available from KEGG....

    [...]

Posted ContentDOI
TL;DR: In this article, the effect of data assimilation on model uncertainty, model 5 geometry and overall structural understanding was evaluated using the concept of information entropy in order to visualize and quantify changes in uncertainty between these models.
Abstract: The quality of a 3D geological model strongly depends on the type of integrated geological data, their interpretation and associated uncertainties. In order to improve an existing geological model and effectively plan further site investigation, it is of paramount importance to identify existing uncertainties within the model space. Information entropy, a voxel based measure, provides a method for assessing structural uncertainties, comparing multiple model interpretations and tracking changes across consecutively built models. The aim of this study is to evaluate the effect of data assimilation on model uncertainty, model 5 geometry and overall structural understanding. Several geological 3D models of increasing complexity, incorporating different input data categories, were built for the study site Staufen (Germany). We applied the concept of information entropy in order to visualize and quantify changes in uncertainty between these models. Furthermore, we propose two measures, the Jaccard and the City-Block distance, to directly compare dissimilarities between the models. The study shows that different types of geological data have disparate effects on model uncertainty and model geometry. The presented approach using both 10 information entropy and distance measures can be a major help in the optimization of 3D geological models.

4 citations

References
More filters
Book
01 Aug 1996
TL;DR: A separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.
Abstract: A fuzzy set is a class of objects with a continuum of grades of membership. Such a set is characterized by a membership (characteristic) function which assigns to each object a grade of membership ranging between zero and one. The notions of inclusion, union, intersection, complement, relation, convexity, etc., are extended to such sets, and various properties of these notions in the context of fuzzy sets are established. In particular, a separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.

52,705 citations

Book
31 Jul 1981
TL;DR: Books, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with, becomes what you need to get.
Abstract: New updated! The latest book from a very famous author finally comes out. Book of pattern recognition with fuzzy objective function algorithms, as an amazing reference becomes what you need to get. What's for is this book? Are you still thinking for what the book is? Well, this is what you probably will get. You should have made proper choices for your better life. Book, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with.

15,662 citations

Journal ArticleDOI
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

14,144 citations

Journal ArticleDOI
09 Jun 2005-Nature
TL;DR: A new, bead-based flow cytometric miRNA expression profiling method is used to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers, and finds the miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours.
Abstract: Recent work has revealed the existence of a class of small non-coding RNA species, known as microRNAs (miRNAs), which have critical functions across various biological processes. Here we use a new, bead-based flow cytometric miRNA expression profiling method to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers. The miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours. We observe a general downregulation of miRNAs in tumours compared with normal tissues. Furthermore, we were able to successfully classify poorly differentiated tumours using miRNA expression profiles, whereas messenger RNA profiles were highly inaccurate when applied to the same samples. These findings highlight the potential of miRNA profiling in cancer diagnosis.

9,470 citations

Book
31 Oct 1991
TL;DR: Theoretical Foundations.
Abstract: I. Theoretical Foundations.- 1. Knowledge.- 1.1. Introduction.- 1.2. Knowledge and Classification.- 1.3. Knowledge Base.- 1.4. Equivalence, Generalization and Specialization of Knowledge.- Summary.- Exercises.- References.- 2. Imprecise Categories, Approximations and Rough Sets.- 2.1. Introduction.- 2.2. Rough Sets.- 2.3. Approximations of Set.- 2.4. Properties of Approximations.- 2.5. Approximations and Membership Relation.- 2.6. Numerical Characterization of Imprecision.- 2.7. Topological Characterization of Imprecision.- 2.8. Approximation of Classifications.- 2.9. Rough Equality of Sets.- 2.10. Rough Inclusion of Sets.- Summary.- Exercises.- References.- 3. Reduction of Knowledge.- 3.1. Introduction.- 3.2. Reduct and Core of Knowledge.- 3.3. Relative Reduct and Relative Core of Knowledge.- 3.4. Reduction of Categories.- 3.5. Relative Reduct and Core of Categories.- Summary.- Exercises.- References.- 4. Dependencies in Knowledge Base.- 4.1. Introduction.- 4.2. Dependency of Knowledge.- 4.3. Partial Dependency of Knowledge.- Summary.- Exercises.- References.- 5. Knowledge Representation.- 5.1. Introduction.- 5.2. Examples.- 5.3. Formal Definition.- 5.4. Significance of Attributes.- 5.5. Discernibility Matrix.- Summary.- Exercises.- References.- 6. Decision Tables.- 6.1. Introduction.- 6.2. Formal Definition and Some Properties.- 6.3. Simplification of Decision Tables.- Summary.- Exercises.- References.- 7. Reasoning about Knowledge.- 7.1. Introduction.- 7.2. Language of Decision Logic.- 7.3. Semantics of Decision Logic Language.- 7.4. Deduction in Decision Logic.- 7.5. Normal Forms.- 7.6. Decision Rules and Decision Algorithms.- 7.7. Truth and Indiscernibility.- 7.8. Dependency of Attributes.- 7.9. Reduction of Consistent Algorithms.- 7.10. Reduction of Inconsistent Algorithms.- 7.11. Reduction of Decision Rules.- 7.12. Minimization of Decision Algorithms.- Summary.- Exercises.- References.- II. Applications.- 8. Decision Making.- 8.1. Introduction.- 8.2. Optician's Decisions Table.- 8.3. Simplification of Decision Table.- 8.4. Decision Algorithm.- 8.5. The Case of Incomplete Information.- Summary.- Exercises.- References.- 9. Data Analysis.- 9.1. Introduction.- 9.2. Decision Table as Protocol of Observations.- 9.3. Derivation of Control Algorithms from Observation.- 9.4. Another Approach.- 9.5. The Case of Inconsistent Data.- Summary.- Exercises.- References.- 10. Dissimilarity Analysis.- 10.1. Introduction.- 10.2. The Middle East Situation.- 10.3. Beauty Contest.- 10.4. Pattern Recognition.- 10.5. Buying a Car.- Summary.- Exercises.- References.- 11. Switching Circuits.- 11.1. Introduction.- 11.2. Minimization of Partially Defined Switching Functions.- 11.3. Multiple-Output Switching Functions.- Summary.- Exercises.- References.- 12. Machine Learning.- 12.1. Introduction.- 12.2. Learning From Examples.- 12.3. The Case of an Imperfect Teacher.- 12.4. Inductive Learning.- Summary.- Exercises.- References.

7,826 citations