scispace - formally typeset
Search or ask a question
Author

Hongchao Gao

Bio: Hongchao Gao is an academic researcher from Shaanxi Normal University. The author has contributed to research in topics: Nearest-neighbor chain algorithm & k-medians clustering. The author has an hindex of 1, co-authored 1 publications receiving 170 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The experimental results demonstrate that the proposed clustering algorithm can find cluster centers, recognize clusters regardless of their shape and dimension of the space in which they are embedded, be unaffected by outliers, and can often outperform DPC, AP, DBSCAN and K-means.

272 citations


Cited by
More filters
Journal Article
TL;DR: In this article, the authors proposed a measure on local outliers based on a symmetric neighborhood relationship, which considers both neighbors and reverse neighbors of an object when estimating its density distribution.
Abstract: Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

321 citations

Journal ArticleDOI
TL;DR: A shared-nearest-neighbor-based clustering by fast search and find of density peaks (SNN-DPC) algorithm that can recognize clusters regardless of their size, shape, and dimensions; is robust to noise; and is remarkably superior to DPC, FKNN-dPC, AP, OPTICS, DBSCAN, and K-means.

245 citations

Journal ArticleDOI
TL;DR: The idea of K-nearest neighbors to compute the global parameter dc and the local density ρi of each point, apply a new approach to select initial cluster centers automatically, and finally aggregate clusters if they are density reachable.
Abstract: Recently a density peaks based clustering algorithm (dubbed as DPC) was proposed to group data by setting up a decision graph and finding out cluster centers from the graph fast. It is simple but efficient since it is noniterative and needs few parameters. However, the improper selection of its parameter cutoff distance dc will lead to the wrong selection of initial cluster centers, but the DPC cannot correct it in the subsequent assignment process. Furthermore, in some cases, even the proper value of dc was set, initial cluster centers are still difficult to be selected from the decision graph. To overcome these defects, an adaptive clustering algorithm (named as ADPC-KNN) is proposed in this paper. We introduce the idea of K-nearest neighbors to compute the global parameter dc and the local density ρi of each point, apply a new approach to select initial cluster centers automatically, and finally aggregate clusters if they are density reachable. The ADPC-KNN requires only one parameter and the clustering is automatic. Experiments on synthetic and real-world data show that the proposed clustering algorithm can often outperform DBSCAN, DPC, K-Means++, Expectation Maximization (EM) and single-link.

164 citations

Journal ArticleDOI
TL;DR: This Review provides a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicates likely directions for further developments in the field.
Abstract: Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.

144 citations

Journal ArticleDOI
TL;DR: To effectively identify disease symptoms more accurately, a Density-Peaked Clustering Analysis (DPCA) algorithm is introduced for disease-symptom clustering and the proposed Disease Diagnosis and Treatment Recommendation System (DDTRS) derives disease treatment recommendations intelligently and accurately.

107 citations