scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A simple and fast algorithm for K-medoids clustering

01 Mar 2009-Expert Systems With Applications (Pergamon Press, Inc.)-Vol. 36, Iss: 2, pp 3336-3341
TL;DR: Experimental results show that the proposed algorithm takes a significantly reduced time in computation with comparable performance against the partitioning around medoids.
Abstract: This paper proposes a new algorithm for K-medoids clustering which runs like the K-means algorithm and tests several methods for selecting initial medoids. The proposed algorithm calculates the distance matrix once and uses it for finding new medoids at every iterative step. To evaluate the proposed algorithm, we use some real and artificial data sets and compare with the results of other algorithms in terms of the adjusted Rand index. Experimental results show that the proposed algorithm takes a significantly reduced time in computation with comparable performance against the partitioning around medoids.
Citations
More filters
Journal ArticleDOI
TL;DR: This review paper begins at the definition of clustering, takes the basic elements involved in the clustering process, such as the distance or similarity measurement and evaluation indicators, into consideration, and analyzes the clustered algorithms from two perspectives, the traditional ones and the modern ones.
Abstract: Data analysis is used as a common method in modern science research, which is across communication science, computer science and biology science. Clustering, as the basic composition of data analysis, plays a significant role. On one hand, many tools for cluster analysis have been created, along with the information increase and subject intersection. On the other hand, each clustering algorithm has its own strengths and weaknesses, due to the complexity of information. In this review paper, we begin at the definition of clustering, take the basic elements involved in the clustering process, such as the distance or similarity measurement and evaluation indicators, into consideration, and analyze the clustering algorithms from two perspectives, the traditional ones and the modern ones. All the discussed clustering algorithms will be compared in detail and comprehensively shown in Appendix Table 22.

1,234 citations


Cites background from "A simple and fast algorithm for K-m..."

  • ...K-means [7] and K-medoids [8] are the two most famous ones of this kind of clustering algorithms....

    [...]

Journal ArticleDOI
TL;DR: Concepts and algorithms related to clustering, a concise survey of existing (clustering) algorithms as well as a comparison, both from a theoretical and an empirical perspective are introduced.
Abstract: Clustering algorithms have emerged as an alternative powerful meta-learning tool to accurately analyze the massive volume of data generated by modern applications. In particular, their main goal is to categorize data into clusters such that objects are grouped in the same cluster when they are similar according to specific metrics. There is a vast body of knowledge in the area of clustering and there has been attempts to analyze and categorize them for a larger number of applications. However, one of the major issues in using clustering algorithms for big data that causes confusion amongst practitioners is the lack of consensus in the definition of their properties as well as a lack of formal categorization. With the intention of alleviating these problems, this paper introduces concepts and algorithms related to clustering, a concise survey of existing (clustering) algorithms as well as providing a comparison, both from a theoretical and an empirical perspective. From a theoretical perspective, we developed a categorizing framework based on the main properties pointed out in previous studies. Empirically, we conducted extensive experiments where we compared the most representative algorithm from each of the categories using a large number of real (big) data sets. The effectiveness of the candidate clustering algorithms is measured through a number of internal and external validity metrics, stability, runtime, and scalability tests. In addition, we highlighted the set of clustering algorithms that are the best performing for big data.

833 citations

Journal ArticleDOI
TL;DR: This review describes how molecular docking was firstly applied to assist in drug discovery tasks, and illustrates newer and emergent uses and applications of docking, including prediction of adverse effects, polypharmacology, drug repurposing, and target fishing and profiling.
Abstract: Molecular docking is an established in silico structure-based method widely used in drug discovery. Docking enables the identification of novel compounds of therapeutic interest, predicting ligand-target interactions at a molecular level, or delineating structure-activity relationships (SAR), without knowing a priori the chemical structure of other target modulators. Although it was originally developed to help understanding the mechanisms of molecular recognition between small and large molecules, uses and applications of docking in drug discovery have heavily changed over the last years. In this review, we describe how molecular docking was firstly applied to assist in drug discovery tasks. Then, we illustrate newer and emergent uses and applications of docking, including prediction of adverse effects, polypharmacology, drug repurposing, and target fishing and profiling, discussing also future applications and further potential of this technique when combined with emergent techniques, such as artificial intelligence.

663 citations


Cites methods from "A simple and fast algorithm for K-m..."

  • ...Then, they performed clustering via the K-medoids method on the calculated molecular dynamics trajectories [88] to identify MD-derived representative conformations of the investigated targets....

    [...]

Journal ArticleDOI
22 Mar 2021
TL;DR: In this paper, the authors present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application and highlight the challenges and potential research directions based on their study.
Abstract: In the current age of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning, which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study’s key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.

659 citations

Journal ArticleDOI
TL;DR: This survey aims at providing an overview on the way machine learning has been used so far in the context of malware analysis in Windows environments, i.e. for the analysis of Portable Executables.

316 citations


Cites background from "A simple and fast algorithm for K-m..."

  • ...3, apply to k-medoids as well, but it is less sensitive to outliers [84]....

    [...]

References
More filters
01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

24,320 citations


"A simple and fast algorithm for K-m..." refers methods in this paper

  • ...The proposed algorithm calculates the distance matrix once and uses it for finding new medoids at every iterative step....

    [...]

  • ...K-means clustering (MacQueen, 1967) and partitioning around medoids (Kaufman & Rousseeuw, 1990) are well known techniques for performing non-hierarchical clustering....

    [...]

Journal ArticleDOI
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

14,144 citations


"A simple and fast algorithm for K-m..." refers background in this paper

  • ...Ng and Han (1994) proposed an efficient PAM-based algorithm, which updates new medoids from some neighboring objects. van der Laan, Pollard, and Bryan (2003) tried to maximize the silhouette proposed by Rousseeuw (1987) instead of minimizing the sum of distances to the closest medoid in PAM....

    [...]

Book
01 Jan 1990
TL;DR: An electrical signal transmission system, applicable to the transmission of signals from trackside hot box detector equipment for railroad locomotives and rolling stock, wherein a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count.
Abstract: 1. Introduction. 2. Partitioning Around Medoids (Program PAM). 3. Clustering large Applications (Program CLARA). 4. Fuzzy Analysis. 5. Agglomerative Nesting (Program AGNES). 6. Divisive Analysis (Program DIANA). 7. Monothetic Analysis (Program MONA). Appendix 1. Implementation and Structure of the Programs. Appendix 2. Running the Programs. Appendix 3. Adapting the Programs to Your Needs. Appendix 4. The Program CLUSPLOT. References. Author Index. Subject Index.

10,537 citations


"A simple and fast algorithm for K-m..." refers methods in this paper

  • ...Among many algorithms for K-medoids clustering, partitioning around medoids (PAM) proposed by Kaufman and Rousseeuw (1990) is known to be most powerful....

    [...]

  • ...Kaufman and Rousseeuw (1990) also proposed an algorithm called CLARA, which applies the PAM to sampled objects instead of all objects....

    [...]

  • ...K-means clustering (MacQueen, 1967) and partitioning around medoids (Kaufman & Rousseeuw, 1990) are well known techniques for performing non-hierarchical clustering....

    [...]

BookDOI
01 Jan 1990
TL;DR: In this article, an electrical signal transmission system for railway locomotives and rolling stock is proposed, where a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count, and a spike pulse of greater selected amplitude is transmitted, occurring immediately after the axle count pulse to which it relates, whenever an overheated axle box is detected.
Abstract: An electrical signal transmission system, applicable to the transmission of signals from trackside hot box detector equipment for railroad locomotives and rolling stock, wherein a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count, and a spike pulse of greater selected amplitude is transmitted, occurring immediately after the axle count pulse to which it relates, whenever an overheated axle box is detected. To enable the signal receiving equipment to determine on which side of a train the overheated box is located, the spike pulses are of two different amplitudes corresponding, respectively, to opposite sides of the train.

9,011 citations