scispace - formally typeset
Search or ask a question
Author

Kalyani Desikan

Bio: Kalyani Desikan is an academic researcher from VIT University. The author has contributed to research in topics: Cluster analysis & Gibbs sampling. The author has an hindex of 3, co-authored 12 publications receiving 13 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A density with distance based method which ensures identification of seed artefacts from different clusters that leads to more accurate clustering results, and compares its results with random, Wu, Cao and Khan’s methods of initial seed artefact selection.

9 citations

Journal ArticleDOI
Kalyani Desikan1
01 Jan 2021

8 citations

Journal ArticleDOI
TL;DR: A novel technique for identifying initial seeds for heterogeneous data clustering is proposed, through the introduction of a unique distance measure where the distance of the numerical attributes is scaled such that it is comparable to that of categorical attributes.
Abstract: Data sets to which clustering is applied may be homogeneous (numerical or categorical) or heterogeneous (numerical and categorical) in nature. Handling homogeneous data is easier than heterogeneous data. We propose a novel technique for identifying initial seeds for heterogeneous data clustering, through the introduction of a unique distance measure where the distance of the numerical attributes is scaled such that it is comparable to that of categorical attributes. The proposed initial seed selection algorithm ensures selection of initial seed points from different clusters of the clustering solution which are then given as input to the modified K-means clustering algorithm along with the data set. This technique is independent of any user-defined parameter and thus can be easily applied to clusterable data sets with mixed attributes. We have also modified the K-means clustering algorithm to handle mixed attributes by incorporating our novel distance measure to handle numerical data and assigned the value one or zero when categorical data is dissimilar or similar. Finally, a comparison has been made with existing algorithms to bring out the significance of our approach. We also perform a statistical test to evaluate the statistical significance of our proposed technique.

5 citations

Posted Content
TL;DR: A novel method to assign weights to individual feature with the help of out of bag errors obtained from constructing multiple decision tree models is proposed.
Abstract: Nearest Neighbors Algorithm is a Lazy Learning Algorithm, in which the algorithm tries to approximate the predictions with the help of similar existing vectors in the training dataset. The predictions made by the K-Nearest Neighbors algorithm is based on averaging the target values of the spatial neighbors. The selection process for neighbors in the Hermitian space is done with the help of distance metrics such as Euclidean distance, Minkowski distance, Mahalanobis distance etc. A majority of the metrics such as Euclidean distance are scale variant, meaning that the results could vary for different range of values used for the features. Standard techniques used for the normalization of scaling factors are feature scaling method such as Z-score normalization technique, Min-Max scaling etc. Scaling methods uniformly assign equal weights to all the features, which might result in a non-ideal situation. This paper proposes a novel method to assign weights to individual feature with the help of out of bag errors obtained from constructing multiple decision tree models.

4 citations

Journal ArticleDOI
31 Oct 2014
TL;DR: In this paper, the authors show experimentally how to determine the number of clusters based on the quality of the cluster quality, which is a drawback of the majority of text clustering algorithms.
Abstract: Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering algorithms, the number of clusters must be specified apriori, which is a drawback of these algorithms. The aim of this paper is to show experimentally how to determine the number of clusters based on cluster quality. Since partitional clustering algorithms are well-suited for clustering large document datasets, we have confined our analysis to a partitional clustering algorithm.

3 citations


Cited by
More filters
01 Jan 2016
TL;DR: The using multivariate statistics is universally compatible with any devices to read, allowing you to get the most less latency time to download any of the authors' books like this one.
Abstract: Thank you for downloading using multivariate statistics. As you may know, people have look hundreds times for their favorite novels like this using multivariate statistics, but end up in infectious downloads. Rather than reading a good book with a cup of tea in the afternoon, instead they juggled with some harmful bugs inside their laptop. using multivariate statistics is available in our digital library an online access to it is set as public so you can download it instantly. Our books collection saves in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Merely said, the using multivariate statistics is universally compatible with any devices to read.

14,604 citations

Journal ArticleDOI
TL;DR: Investigation of the global patent databases of DT patents and summarizes related technologies, effects, and applications reveals that DT fails to form a comprehensively connected technology, which is a typical phenomenon for a technology that remains in its early stage of development.
Abstract: Digital twin (DT) can facilitate interaction between the physical and the cyber worlds and achieve smart manufacturing. However, the DT’s development in the industry remains vague. This study investigates the global patent databases of DT patents and summarizes related technologies, effects, and applications. Patent map analysis is used to uncover the patent development trajectory of DT in the patent databases of the USA, China, and the World Intellectual Property Organization among European nations. In addition, a nation-based survey is conducted to explore their DT patent trends. Findings reveal that DT fails to form a comprehensively connected technology, which is a typical phenomenon for a technology that remains in its early stage of development. In the present study, the two-dimensional matrix analysis of patent technology and effect exhibits that several patents created a variety of effects and reached saturation. Moreover, several technology–effect domains remain, and DT-related technology gaps exist in a number of potential effects. The DT-related patents are distributed unevenly in various industries. For instance, most of the DT-related patents appear in the manufacturing industry. Furthermore, our K-mode cluster analysis reveals that the DT-related patents are distributed in five subgroups of the three dimensions, namely, technology, effect, and application.

19 citations

Journal ArticleDOI
TL;DR: In this paper, a modified Roger's distance for mixed quantitative-qualitative phenotypes was developed to select 30 accessions (denoted as the core collection) that had a maximum pairwise genetic distance.
Abstract: Vegetable soybeans [Glycine max (L.) Merr.] have characteristics of larger seeds, less beany flavor, tender texture, and green-colored pods and seeds. Rich in nutrients, vegetable soybeans are conducive to preventing neurological disease. Due to the change of dietary habits and increasing health awareness, the demand for vegetable soybeans has increased. To conserve vegetable soybean germplasms in Taiwan, we built a core collection of vegetable soybeans, with minimum accessions, minimum redundancy, and maximum representation. Initially, a total of 213 vegetable soybean germplasms and 29 morphological traits were used to construct the core collection. After redundant accessions were removed, 200 accessions were retained as the entire collection, which was grouped into nine clusters. Here, we developed a modified Roger’s distance for mixed quantitative–qualitative phenotypes to select 30 accessions (denoted as the core collection) that had a maximum pairwise genetic distance. No significant differences were observed in all phenotypic traits (p-values > 0.05) between the entire and the core collections, except plant height. Compared to the entire collection, we found that most traits retained diversities, but seven traits were slightly lost (ranged from 2 to 9%) in the core collection. The core collection demonstrated a small percentage of significant mean difference (3.45%) and a large coincidence rate (97.70%), indicating representativeness of the entire collection. Furthermore, large values in variable rate (149.80%) and coverage (92.5%) were in line with high diversity retained in the core collection. The results suggested that phenotype-based core collection can retain diversity and genetic variability of vegetable soybeans, providing a basis for further research and breeding programs.

9 citations

Journal ArticleDOI
TL;DR: In this article , the authors derived two formulas for the number of spanning trees in a chain of diphenylene planar graphs that have connected intersection of one edge but where the diphenylenes have same sizes.
Abstract: Abstract Cheminformatics is a modern field of chemistry information science and mathematics that is very much helpful in keeping the data and getting information about chemicals. A new two-dimensional carbon known as diphenylene was identified and synthesized. It is considered one of the materials that have many applications in most fields such as catalysis. The number of spanning trees of a graph G, also known as the complexity of a graph G, denoted by τ(G), is an important, well-studied quantity in graph theory, and appears in a number of applications. In this paper, we introduce a new chemical compound that is a chain of diphenylene where any two diphenylene intersect by one edge. We derive two formulas for the number of spanning trees in a chain of diphenylene planar graphs that have connected intersection of one edge but where the diphenylenes have same sizes.

7 citations