Author

Hongchao Gao

Bio: Hongchao Gao is an academic researcher from Shaanxi Normal University. The author has contributed to research in topics: Nearest-neighbor chain algorithm & k-medians clustering. The author has an hindex of 1, co-authored 1 publications receiving 170 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors

[...]

Juanying Xie¹, Hongchao Gao¹, Weixin Xie², Xiaohui Liu³, Philip W. Grant⁴ - Show less +1 more•Institutions (4)

Shaanxi Normal University¹, Shenzhen University², Brunel University London³, Swansea University⁴

01 Aug 2016-Information Sciences

TL;DR: The experimental results demonstrate that the proposed clustering algorithm can find cluster centers, recognize clusters regardless of their shape and dimension of the space in which they are embedded, be unaffected by outliers, and can often outperform DPC, AP, DBSCAN and K-means.

...read moreread less

272 citations

Cited by

PDF

Open Access

More filters

Journal Article•

Ranking outliers using symmetric neighborhood relationship

[...]

Wen Jin, Anthony K. H. Tung, Jiawei Han, Wei Wang

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: In this article, the authors proposed a measure on local outliers based on a symmetric neighborhood relationship, which considers both neighbors and reverse neighbors of an object when estimating its density distribution.

...read moreread less

Abstract: Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

...read moreread less

321 citations

Journal Article•DOI•

Shared-nearest-neighbor-based clustering by fast search and find of density peaks

[...]

Rui Liu¹, Hong Wang¹, Xiaomei Yu¹•Institutions (1)

Shandong Normal University¹

01 Jun 2018-Information Sciences

TL;DR: A shared-nearest-neighbor-based clustering by fast search and find of density peaks (SNN-DPC) algorithm that can recognize clusters regardless of their size, shape, and dimensions; is robust to noise; and is remarkably superior to DPC, FKNN-dPC, AP, OPTICS, DBSCAN, and K-means.

...read moreread less

245 citations

Journal Article•DOI•

Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy

[...]

Liu Yaohui¹, Liu Yaohui², Ma Zhengming¹, Yu Fang²•Institutions (2)

Sun Yat-sen University¹, Xiangnan University²

01 Oct 2017-Knowledge Based Systems

TL;DR: The idea of K-nearest neighbors to compute the global parameter dc and the local density ρi of each point, apply a new approach to select initial cluster centers automatically, and finally aggregate clusters if they are density reachable.

...read moreread less

Abstract: Recently a density peaks based clustering algorithm (dubbed as DPC) was proposed to group data by setting up a decision graph and finding out cluster centers from the graph fast. It is simple but efficient since it is noniterative and needs few parameters. However, the improper selection of its parameter cutoff distance dc will lead to the wrong selection of initial cluster centers, but the DPC cannot correct it in the subsequent assignment process. Furthermore, in some cases, even the proper value of dc was set, initial cluster centers are still difficult to be selected from the decision graph. To overcome these defects, an adaptive clustering algorithm (named as ADPC-KNN) is proposed in this paper. We introduce the idea of K-nearest neighbors to compute the global parameter dc and the local density ρi of each point, apply a new approach to select initial cluster centers automatically, and finally aggregate clusters if they are density reachable. The ADPC-KNN requires only one parameter and the clustering is automatic. Experiments on synthetic and real-world data show that the proposed clustering algorithm can often outperform DBSCAN, DPC, K-Means++, Expectation Maximization (EM) and single-link.

...read moreread less

164 citations

Journal Article•DOI•

Unsupervised Learning Methods for Molecular Simulation Data.

[...]

Aldo Glielmo¹, Brooke E. Husic², Alex Rodriguez³, Cecilia Clementi², Frank Noé², Frank Noé⁴, Alessandro Laio³, Alessandro Laio¹ - Show less +4 more•Institutions (4)

International School for Advanced Studies¹, Free University of Berlin², International Centre for Theoretical Physics³, Rice University⁴

04 May 2021-Chemical Reviews

TL;DR: This Review provides a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicates likely directions for further developments in the field.

...read moreread less

Abstract: Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.

...read moreread less

144 citations

Journal Article•DOI•

A disease diagnosis and treatment recommendation system based on big data mining and cloud computing

[...]

Jianguo Chen¹, Kenli Li¹, Huigui Rong¹, Kashif Bilal², Nan Yang³, Keqin Li¹, Keqin Li⁴ - Show less +3 more•Institutions (4)

Hunan University¹, COMSATS Institute of Information Technology², Xi'an Jiaotong University³, State University of New York System⁴

01 Apr 2018-Information Sciences

TL;DR: To effectively identify disease symptoms more accurately, a Density-Peaked Clustering Analysis (DPCA) algorithm is introduced for disease-symptom clustering and the proposed Disease Diagnosis and Treatment Recommendation System (DDTRS) derives disease treatment recommendations intelligently and accurately.

...read moreread less

107 citations

Collapse