Graph clustering based on structural/attribute similarities

doi:10.14778/1687627.1687709

Journal ArticleDOI

Graph clustering based on structural/attribute similarities

- Vol. 2, Iss: 1, pp 718-729

TLDR

This paper proposes a novel graph clustering algorithm, SA-Cluster, based on both structural and attribute similarities through a unified distance measure, which partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values.

Abstract:

The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph clustering techniques are very useful for detecting densely connected groups in a large graph. Many existing graph clustering methods mainly focus on the topological structure for clustering, but largely ignore the vertex properties which are often heterogenous. In this paper, we propose a novel graph clustering algorithm, SA-Cluster, based on both structural and attribute similarities through a unified distance measure. Our method partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values. An effective method is proposed to automatically learn the degree of contributions of structural similarity and attribute similarity. Theoretical analysis is provided to show that SA-Cluster is converging. Extensive experimental results demonstrate the effectiveness of SA-Cluster through comparison with the state-of-the-art graph clustering and summarization methods.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Graph embedding techniques, applications, and performance: A survey

Palash Goyal, +1 more

- 01 Jul 2018 -

Knowledge Based Systems

TL;DR: A comprehensive and structured analysis of various graph embedding techniques proposed in the literature, and the open-source Python library, named GEM (Graph Embedding Methods, available at https://github.com/palash1992/GEM ), which provides all presented algorithms within a unified interface to foster and facilitate research on the topic.

...read moreread less

Journal ArticleDOI

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

Han Hu, +3 more

- 24 Jun 2014 -

IEEE Access

TL;DR: This paper presents a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics, and presents the prevalent Hadoop framework for addressing big data challenges.

...read moreread less

BookDOI

Data Clustering: Algorithms and Applications

Charu C. Aggarwal, +1 more

TL;DR: Top researchers from around the world explore the characteristics of clustering problems in a variety of application areas and explain how to glean detailed insight from the clustering process including how to verify the quality of the underlying cluster through supervision, human intervention, or the automated generation of alternative clusters.

...read moreread less

Proceedings ArticleDOI

Community Detection in Networks with Node Attributes

Jaewon Yang, +2 more

TL;DR: CESNA as mentioned in this paper is an accurate and scalable algorithm for detecting overlapping communities in networks with node attributes, which leads to more accurate community detection as well as improved robustness in the presence of noise in the network structure.

...read moreread less

Proceedings ArticleDOI

Community Detection in Networks with Node Attributes

Jaewon Yang, +2 more

- 28 Jan 2014 -

arXiv: Social and Information Networks

TL;DR: This paper develops Communities from Edge Structure and Node Attributes (CESNA), an accurate and scalable algorithm for detecting overlapping communities in networks with node attributes that statistically models the interaction between the network structure and the node attributes, which leads to more accurate community detection as well as improved robustness in the presence of noise in thenetwork structure.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

Martin Ester, +3 more

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.

...read moreread less

Journal ArticleDOI

Normalized cuts and image segmentation

Jianbo Shi, +1 more

- 01 Aug 2000 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.

...read moreread less

Journal ArticleDOI

Finding and evaluating community structure in networks.

Mark Newman, +3 more

- 26 Feb 2004 -

Physical Review E

TL;DR: It is demonstrated that the algorithms proposed are highly effective at discovering community structure in both computer-generated and real-world network data, and can be used to shed light on the sometimes dauntingly complex structure of networked systems.

...read moreread less

Proceedings ArticleDOI

Normalized cuts and image segmentation

Jianbo Shi, +1 more

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.

...read moreread less

Journal ArticleDOI

Probabilistic latent semantic indexing

Thomas Hofmann

TL;DR: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.

...read moreread less