Author

Jasmine Irani

Bio: Jasmine Irani is an academic researcher. The author has contributed to research in topics: Consensus clustering & Cluster analysis. The author has an hindex of 1, co-authored 1 publications receiving 70 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Clustering Techniques and the Similarity Measures used in Clustering: A Survey

[...]

Jasmine Irani, Nitin Pise, Madhura V. Phatak

15 Jan 2016-International Journal of Computer Applications

TL;DR: The survey of various clustering techniques, the current similarity measures based on distance based clustering, explains the limitations associated with the existing clustering technique and proposes that the combination of the advantages of the existing systems can help overcome the limitations of theexisting systems.

...read moreread less

Abstract: Clustering is an unsupervised learning technique which aims at grouping a set of objects into clusters so that objects in the same clusters should be similar as possible, whereas objects in one cluster should be as dissimilar as possible from objects in other clusters. Cluster analysis aims to group a collection of patterns into clusters based on similarity. A typical clustering technique uses a similarity function for comparing various data items. This paper covers the survey of various clustering techniques, the current similarity measures based on distance based clustering, explains the limitations associated with the existing clustering techniques and propose that the combination of the advantages of the existing systems can help overcome the limitations of the existing systems. General Terms Data Mining, Machine Learning, Clustering, Pattern based Similarity, Negative Data, et. al.

...read moreread less

92 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

[...]

Taiyun Kim¹, Irene Rui Chen¹, Yingxin Lin¹, Andy Wang¹, Jean Yee Hwa Yang¹, Pengyi Yang¹ - Show less +2 more•Institutions (1)

University of Sydney¹

27 Nov 2019-Briefings in Bioinformatics

TL;DR: A state-of-the-art kernel-based clustering algorithm (SIMLR) is modified using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering.

...read moreread less

Abstract: Advances in high-throughput sequencing on single-cell gene expressions [single-cell RNA sequencing (scRNA-seq)] have enabled transcriptome profiling on individual cells from complex samples. A common goal in scRNA-seq data analysis is to discover and characterise cell types, typically through clustering methods. The quality of the clustering therefore plays a critical role in biological discovery. While numerous clustering algorithms have been proposed for scRNA-seq data, fundamentally they all rely on a similarity metric for categorising individual cells. Although several studies have compared the performance of various clustering algorithms for scRNA-seq data, currently there is no benchmark of different similarity metrics and their influence on scRNA-seq data clustering. Here, we compared a panel of similarity metrics on clustering a collection of annotated scRNA-seq datasets. Within each dataset, a stratified subsampling procedure was applied and an array of evaluation measures was employed to assess the similarity metrics. This produced a highly reliable and reproducible consensus on their performance assessment. Overall, we found that correlation-based metrics (e.g. Pearson's correlation) outperformed distance-based metrics (e.g. Euclidean distance). To test if the use of correlation-based metrics can benefit the recently published clustering techniques for scRNA-seq data, we modified a state-of-the-art kernel-based clustering algorithm (SIMLR) using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering. These findings demonstrate the importance of similarity metrics in clustering scRNA-seq data and highlight Pearson's correlation as a favourable choice. Further comparison on different scRNA-seq library preparation protocols suggests that they may also affect clustering performance. Finally, the benchmarking framework is available at http://www.maths.usyd.edu.au/u/SMS/bioinformatics/software.html.

...read moreread less

101 citations

Journal Article•DOI•

Machine learning for hydrologic sciences: An introductory overview

[...]

Tianfang Xu¹, Feng Liang²•Institutions (2)

Arizona State University¹, University of Illinois at Urbana–Champaign²

01 Sep 2021-Wiley Interdisciplinary Reviews: Water

TL;DR: An overview of machine learning in hydrologic sciences provides a non‐technical introduction, placed within a historical context, to commonly used machine learning algorithms and deep learning architectures.

...read moreread less

52 citations

Journal Article•DOI•

Segmentation and clustering in brain MRI imaging.

[...]

Golrokh Mirzaei¹, Hojjat Adeli¹•Institutions (1)

Ohio State University¹

19 Dec 2018-Reviews in The Neurosciences

TL;DR: A state-of-the-art review of brain MRI studies that use clustering techniques for different tasks, including segmentation of brain regions and tissues and clustering of the atrophy in different parts of the brain.

...read moreread less

Abstract: Clustering is a vital task in magnetic resonance imaging (MRI) brain imaging and plays an important role in the reliability of brain disease detection, diagnosis, and effectiveness of the treatment. Clustering is used in processing and analysis of brain images for different tasks, including segmentation of brain regions and tissues (grey matter, white matter, and cerebrospinal fluid) and clustering of the atrophy in different parts of the brain. This paper presents a state-of-the-art review of brain MRI studies that use clustering techniques for different tasks.

...read moreread less

44 citations

Journal Article•DOI•

The Drives for Driving Simulation: A Scientometric Analysis and a Selective Review of Reviews on Simulated Driving Research

[...]

Alessandro Oronzo Caffò¹, Luigi Tinella¹, Antonella Lopez¹, Giuseppina Spano¹, Ylenia Massaro¹, Andrea Lisi¹, Fabrizio Stasolla, Roberto Catanesi¹, Francesco Nardulli, Ignazio Grattagliano¹, Andrea Bosco¹ - Show less +7 more•Institutions (1)

University of Bari¹

27 May 2020-Frontiers in Psychology

TL;DR: The present work aims to perform a scientometric analysis on driving simulation reviews and to propose a selective review of reviews focusing on relevant aspects related to validity and fidelity, showing a substantial agreement for supporting validity of driving simulation with respect to neuropsychological and on-road testing.

...read moreread less

Abstract: Driving behaviors and fitness to drive have been assessed over time using different tools: standardized neuropsychological, on-road and driving simulation testing. Nowadays, the great variability of topics related to driving simulation has elicited a high number of reviews. The present work aims to perform a scientometric analysis on driving simulation reviews and to propose a selective review of reviews focusing on relevant aspects related to validity and fidelity. A scientometric analysis of driving simulation reviews published from 1988 to 2019 was conducted. Bibliographic data from 298 reviews were extracted from Scopus and WoS. Performance analysis was conducted to investigate most prolific Countries, Journals, Institutes and Authors. A cluster analysis on authors' keywords was performed to identify relevant associations between different research topics. Based on the reviews extracted from cluster analysis, a selective review of reviews was conducted to answer questions regarding validity, fidelity and critical issues. United States and Germany are the first two Countries for number of driving simulation reviews. United States is the leading Country with 5 Institutes in the top-ten. Top Authors wrote from 3 to 7 reviews each and belong to Institutes located in North America and Europe. Cluster analysis identified three clusters and eight keywords. The selective review of reviews showed a substantial agreement for supporting validity of driving simulation with respect to neuropsychological and on-road testing, while for fidelity with respect to real-world driving experience a blurred representation emerged. The most relevant critical issues were the a) lack of a common set of standards, b) phenomenon of simulation sickness, c) need for psychometric properties, lack of studies investigating d) predictive validity with respect to collision rates and e) ecological validity. Driving simulation represents a cross-cutting topic in scientific literature on driving, and there are several evidences for considering it as a valid alternative to neuropsychological and on-road testing. Further research efforts could be aimed at establishing a consensus statement for protocols assessing fitness to drive, in order to (a) use standardized systems, (b) compare systematically driving simulators with regard to their validity and fidelity, and (c) employ shared criteria for conducting studies in a given sub-topic.

...read moreread less

29 citations

Proceedings Article•

Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations

[...]

Andreas Kopf¹, Vincent Fortuin¹, Vignesh Ram Somnath¹, Manfred Claassen²•Institutions (2)

ETH Zurich¹, University of Tübingen²

25 Sep 2019

TL;DR: The Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model that can learn multi-modal distributions of high-dimensional data and use these to generate realistic data with high efficacy and efficiency is introduced.

...read moreread less

Abstract: Clustering high-dimensional data, such as images or biological measurements, is a long-standing problem and has been studied extensively. Recently, Deep Clustering gained popularity due to its flexibility in fitting the specific peculiarities of complex data. Here we introduce the Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model. The model can learn multi-modal distributions of high-dimensional data and use these to generate realistic data with high efficacy and efficiency. MoE-Sim-VAE is based on a Variational Autoencoder (VAE), where the decoder consists of a Mixture-of-Experts (MoE) architecture. This specific architecture allows for various modes of the data to be automatically learned by means of the experts. Additionally, we encourage the lower dimensional latent representation of our model to follow a Gaussian mixture distribution and to accurately represent the similarities between the data points. We assess the performance of our model on the MNIST benchmark data set and a challenging real-world task of defining cell subpopulations from mass cytometry (CyTOF) measurements on hundreds of different datasets. MoE-Sim-VAE exhibits superior clustering performance on all these tasks in comparison to the baselines as well as competitor methods and we show that the MoE architecture in the decoder reduces the computational cost of sampling specific data modes with high fidelity.

...read moreread less

23 citations

Collapse