scispace - formally typeset
Search or ask a question
Author

Jasmine Irani

Bio: Jasmine Irani is an academic researcher. The author has contributed to research in topics: Consensus clustering & Cluster analysis. The author has an hindex of 1, co-authored 1 publications receiving 70 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The survey of various clustering techniques, the current similarity measures based on distance based clustering, explains the limitations associated with the existing clustering technique and proposes that the combination of the advantages of the existing systems can help overcome the limitations of theexisting systems.
Abstract: Clustering is an unsupervised learning technique which aims at grouping a set of objects into clusters so that objects in the same clusters should be similar as possible, whereas objects in one cluster should be as dissimilar as possible from objects in other clusters. Cluster analysis aims to group a collection of patterns into clusters based on similarity. A typical clustering technique uses a similarity function for comparing various data items. This paper covers the survey of various clustering techniques, the current similarity measures based on distance based clustering, explains the limitations associated with the existing clustering techniques and propose that the combination of the advantages of the existing systems can help overcome the limitations of the existing systems. General Terms Data Mining, Machine Learning, Clustering, Pattern based Similarity, Negative Data, et. al.

92 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A state-of-the-art kernel-based clustering algorithm (SIMLR) is modified using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering.
Abstract: Advances in high-throughput sequencing on single-cell gene expressions [single-cell RNA sequencing (scRNA-seq)] have enabled transcriptome profiling on individual cells from complex samples. A common goal in scRNA-seq data analysis is to discover and characterise cell types, typically through clustering methods. The quality of the clustering therefore plays a critical role in biological discovery. While numerous clustering algorithms have been proposed for scRNA-seq data, fundamentally they all rely on a similarity metric for categorising individual cells. Although several studies have compared the performance of various clustering algorithms for scRNA-seq data, currently there is no benchmark of different similarity metrics and their influence on scRNA-seq data clustering. Here, we compared a panel of similarity metrics on clustering a collection of annotated scRNA-seq datasets. Within each dataset, a stratified subsampling procedure was applied and an array of evaluation measures was employed to assess the similarity metrics. This produced a highly reliable and reproducible consensus on their performance assessment. Overall, we found that correlation-based metrics (e.g. Pearson's correlation) outperformed distance-based metrics (e.g. Euclidean distance). To test if the use of correlation-based metrics can benefit the recently published clustering techniques for scRNA-seq data, we modified a state-of-the-art kernel-based clustering algorithm (SIMLR) using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering. These findings demonstrate the importance of similarity metrics in clustering scRNA-seq data and highlight Pearson's correlation as a favourable choice. Further comparison on different scRNA-seq library preparation protocols suggests that they may also affect clustering performance. Finally, the benchmarking framework is available at http://www.maths.usyd.edu.au/u/SMS/bioinformatics/software.html.

101 citations

Journal ArticleDOI
TL;DR: An overview of machine learning in hydrologic sciences provides a non‐technical introduction, placed within a historical context, to commonly used machine learning algorithms and deep learning architectures.

52 citations

Journal ArticleDOI
TL;DR: A state-of-the-art review of brain MRI studies that use clustering techniques for different tasks, including segmentation of brain regions and tissues and clustering of the atrophy in different parts of the brain.
Abstract: Clustering is a vital task in magnetic resonance imaging (MRI) brain imaging and plays an important role in the reliability of brain disease detection, diagnosis, and effectiveness of the treatment. Clustering is used in processing and analysis of brain images for different tasks, including segmentation of brain regions and tissues (grey matter, white matter, and cerebrospinal fluid) and clustering of the atrophy in different parts of the brain. This paper presents a state-of-the-art review of brain MRI studies that use clustering techniques for different tasks.

44 citations

Journal ArticleDOI
TL;DR: The present work aims to perform a scientometric analysis on driving simulation reviews and to propose a selective review of reviews focusing on relevant aspects related to validity and fidelity, showing a substantial agreement for supporting validity of driving simulation with respect to neuropsychological and on-road testing.
Abstract: Driving behaviors and fitness to drive have been assessed over time using different tools: standardized neuropsychological, on-road and driving simulation testing. Nowadays, the great variability of topics related to driving simulation has elicited a high number of reviews. The present work aims to perform a scientometric analysis on driving simulation reviews and to propose a selective review of reviews focusing on relevant aspects related to validity and fidelity. A scientometric analysis of driving simulation reviews published from 1988 to 2019 was conducted. Bibliographic data from 298 reviews were extracted from Scopus and WoS. Performance analysis was conducted to investigate most prolific Countries, Journals, Institutes and Authors. A cluster analysis on authors' keywords was performed to identify relevant associations between different research topics. Based on the reviews extracted from cluster analysis, a selective review of reviews was conducted to answer questions regarding validity, fidelity and critical issues. United States and Germany are the first two Countries for number of driving simulation reviews. United States is the leading Country with 5 Institutes in the top-ten. Top Authors wrote from 3 to 7 reviews each and belong to Institutes located in North America and Europe. Cluster analysis identified three clusters and eight keywords. The selective review of reviews showed a substantial agreement for supporting validity of driving simulation with respect to neuropsychological and on-road testing, while for fidelity with respect to real-world driving experience a blurred representation emerged. The most relevant critical issues were the a) lack of a common set of standards, b) phenomenon of simulation sickness, c) need for psychometric properties, lack of studies investigating d) predictive validity with respect to collision rates and e) ecological validity. Driving simulation represents a cross-cutting topic in scientific literature on driving, and there are several evidences for considering it as a valid alternative to neuropsychological and on-road testing. Further research efforts could be aimed at establishing a consensus statement for protocols assessing fitness to drive, in order to (a) use standardized systems, (b) compare systematically driving simulators with regard to their validity and fidelity, and (c) employ shared criteria for conducting studies in a given sub-topic.

29 citations

Proceedings Article
25 Sep 2019
TL;DR: The Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model that can learn multi-modal distributions of high-dimensional data and use these to generate realistic data with high efficacy and efficiency is introduced.
Abstract: Clustering high-dimensional data, such as images or biological measurements, is a long-standing problem and has been studied extensively. Recently, Deep Clustering gained popularity due to its flexibility in fitting the specific peculiarities of complex data. Here we introduce the Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model. The model can learn multi-modal distributions of high-dimensional data and use these to generate realistic data with high efficacy and efficiency. MoE-Sim-VAE is based on a Variational Autoencoder (VAE), where the decoder consists of a Mixture-of-Experts (MoE) architecture. This specific architecture allows for various modes of the data to be automatically learned by means of the experts. Additionally, we encourage the lower dimensional latent representation of our model to follow a Gaussian mixture distribution and to accurately represent the similarities between the data points. We assess the performance of our model on the MNIST benchmark data set and a challenging real-world task of defining cell subpopulations from mass cytometry (CyTOF) measurements on hundreds of different datasets. MoE-Sim-VAE exhibits superior clustering performance on all these tasks in comparison to the baselines as well as competitor methods and we show that the MoE architecture in the decoder reduces the computational cost of sampling specific data modes with high fidelity.

23 citations