Author

# Sungsu Lim

Other affiliations: KAIST

Bio: Sungsu Lim is an academic researcher from Chungnam National University. The author has contributed to research in topics: Computer science & Graph (abstract data type). The author has an hindex of 8, co-authored 23 publications receiving 198 citations. Previous affiliations of Sungsu Lim include KAIST.

##### Papers

More filters

••

19 May 2014

TL;DR: A novel framework of the link-space transformation that transforms a given original graph into a link- space graph, which facilitates overlapping community detection as well as improves the resulting quality, is proposed and the algorithm LinkSCAN that performs structural clustering on the linked graph is developed.

Abstract: In this paper, for overlapping community detection, we propose a novel framework of the link-space transformation that transforms a given original graph into a link-space graph. Its unique idea is to consider topological structure and link similarity separately using two distinct types of graphs: the line graph and the original graph. For topological structure, each link of the original graph is mapped to a node of the link-space graph, which enables us to discover overlapping communities using non-overlapping community detection algorithms as in the line graph. For link similarity, it is calculated on the original graph and carried over into the link-space graph, which enables us to keep the original structure on the transformed graph. Thus, our transformation, by combining these two advantages, facilitates overlapping community detection as well as improves the resulting quality. Based on this framework, we develop the algorithm LinkSCAN that performs structural clustering on the link-space graph. Moreover, we propose the algorithm LinkSCAN* that enhances the efficiency of LinkSCAN by sampling. Extensive experiments were conducted using the LFR benchmark networks as well as some real-world networks. The results show that our algorithms achieve higher accuracy, quality, and coverage than the state-of-the-art algorithms.

72 citations

••

23 Aug 2020

TL;DR: SSumM is a scalable and effective graph-summarization algorithm that yields a sparse summary graph that not only merges nodes together but also sparsifies the summary graph, and the two strategies are carefully balanced based on the minimum description length principle.

Abstract: Given a graph G and the desired size k in bits, how can we summarize G within k bits, while minimizing the information loss? Large-scale graphs have become omnipresent, posing considerable computational challenges. Analyzing such large graphs can be fast and easy if they are compressed sufficiently to fit in main memory or even cache. Graph summarization, which yields a coarse-grained summary graph with merged nodes, stands out with several advantages among graph compression techniques. Thus, a number of algorithms have been developed for obtaining a concise summary graph with little information loss or equivalently small reconstruction error. However, the existing methods focus solely on reducing the number of nodes, and they often yield dense summary graphs, failing to achieve better compression rates. Moreover, due to their limited scalability, they can be applied only to moderate-size graphs. In this work, we propose SSumM, a scalable and effective graph-summarization algorithm that yields a sparse summary graph. SSumM not only merges nodes together but also sparsifies the summary graph, and the two strategies are carefully balanced based on the minimum description length principle. Compared with state-of-the-art competitors, SSumM is (a) Concise: yields up to 11.2X smaller summary graphs with similar reconstruction error, (b) Accurate: achieves up to 4.2X smaller reconstruction error with similarly concise outputs, and (c) Scalable: summarizes 26X larger graphs while exhibiting linear scalability. We validate these advantages through extensive experiments on 10 real-world graphs.

29 citations

••

02 Dec 2013TL;DR: This paper proposes a generalization of the SIS model by allowing intermediate states between susceptible and infected states, and presents the analytical derivation and shows experimentally how different factors can affect the phase transition process and the final equilibrium.

Abstract: There is a growing interest to understand the fundamental principles of how epidemic, ideas or information spread over large networks (e.g., the Internet or online social networks). Conventional approach is to use SIS models (or its derivatives). However, these models usually are over-simplified and may not be applicable in realistic situations. In this paper, we propose a generalization of the SIS model by allowing intermediate states between susceptible and infected states. To analyze the diffusion process on large graphs, we use the ``mean-field analysis technique'' to determine which initial condition leads to or prevents information or virus outbreak. Numerical results show our methodology can accurately predict the behavior of the phase-transition process for various large graphs (e.g., complete graphs, random graphs or power-law graphs). We also extend our generalized SIS model to consider the interaction of two competing sources (i.e., competing products or virus-antidote modeling). We present the analytical derivation and show experimentally how different factors, i.e., transmission rates, recovery rates, number of states or initial condition, can affect the phase transition process and the final equilibrium. Our models and methodology can serve as an essential tool in understanding information diffusion in large networks.

21 citations

••

16 May 2016

TL;DR: This paper proposes a novel community detection algorithm for undirected graphs, called BlackHole, by importing a geometric embedding technique from graph drawing, to prove that a common idea in graph drawing improves the clusterability of an embedding.

Abstract: With regard to social network analysis, we concentrate on two widely-accepted building blocks: community detection and graph drawing. Although community detection and graph drawing have been studied separately, they have a great commonality, which means that it is possible to advance one field using the techniques of the other. In this paper, we propose a novel community detection algorithm for undirected graphs, called BlackHole, by importing a geometric embedding technique from graph drawing. Our proposed algorithm transforms the vertices of a graph to a set of points on a low-dimensional space whose coordinates are determined by a variant of graph drawing algorithms, following the overall procedure of spectral clustering. The set of points are then clustered using a conventional clustering algorithm to form communities. Our primary contribution is to prove that a common idea in graph drawing, which is characterized by consideration of repulsive forces in addition to attractive forces, improves the clusterability of an embedding. As a result, our algorithm has the advantages of being robust especially when the community structure is not easily detectable. Through extensive experiments, we have shown that BlackHole achieves the accuracy higher than or comparable to the state-of-the-art algorithms.

21 citations

••

KAIST

^{1}TL;DR: The approach of differential flattening leads to discovery of higher-quality communities than baseline approaches and the state-of-the-art algorithms, and is applied to community detection.

Abstract: A multi-layer graph consists of multiple layers of weighted graphs, where the multiple layers represent the different aspects of relationships. Considering multiple aspects (i.e., layers) together is essential to achieve a comprehensive and consolidated view. In this article, we propose a novel framework of differential flattening, which facilitates the analysis of multi-layer graphs, and apply this framework to community detection. Differential flattening merges multiple graphs into a single graph such that the graph structure with the maximum clustering coefficient is obtained from the single graph. It has two distinct features compared with existing approaches. First, dealing with multiple layers is done independently of a specific community detection algorithm, whereas previous approaches rely on a specific algorithm. Thus, any algorithm for a single graph becomes applicable to multi-layer graphs. Second, the contribution of each layer to the single graph is determined automatically for the maximum clustering coefficient. Since differential flattening is formulated by an optimization problem, the optimal solution is easily obtained by well-known algorithms such as interior point methods. Extensive experiments were conducted using the Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks as well as the DBLP, 20 Newsgroups, and MIT Reality Mining networks. The results show that our approach of differential flattening leads to discovery of higher-quality communities than baseline approaches and the state-of-the-art algorithms.

20 citations

##### Cited by

More filters

••

[...]

TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.

Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

01 Jan 2003

TL;DR: In this article, the authors propose a web of trust, in which each user maintains trust in a small number of other users and then composes these trust values into trust values for all other users.

Abstract: Though research on the Semantic Web has progressed at a steady pace, its promise has yet to be realized. One major difficulty is that, by its very nature, the Semantic Web is a large, uncensored system to which anyone may contribute. This raises the question of how much credence to give each source. We cannot expect each user to know the trustworthiness of each source, nor would we want to assign top-down or global credibility values due to the subjective nature of trust. We tackle this problem by employing a web of trust, in which each user maintains trusts in a small number of other users. We then compose these trusts into trust values for all other users. The result of our computation is not an agglomerate "trustworthiness" of each user. Instead, each user receives a personalized set of trusts, which may vary widely from person to person. We define properties for combination functions which merge such trusts, and define a class of functions for which merging may be done locally while maintaining these properties. We give examples of specific functions and apply them to data from Epinions and our BibServ bibliography server. Experiments confirm that the methods are robust to noise, and do not put unreasonable expectations on users. We hope that these methods will help move the Semantic Web closer to fulfilling its promise.

567 citations

••

TL;DR: In this paper, a comprehensive set of user, structural, linguistic, and temporal features was examined and their relative strength was compared from near-complete date of Twitter, and a new rumor classification algorithm that achieves competitive accuracy over both short and long time windows.

Abstract: This study determines the major difference between rumors and non-rumors and explores rumor classification performance levels over varying time windows-from the first three days to nearly two months. A comprehensive set of user, structural, linguistic, and temporal features was examined and their relative strength was compared from near-complete date of Twitter. Our contribution is at providing deep insight into the cumulative spreading patterns of rumors over time as well as at tracking the precise changes in predictive powers across rumor features. Statistical analysis finds that structural and temporal features distinguish rumors from non-rumors over a long-term window, yet they are not available during the initial propagation phase. In contrast, user and linguistic features are readily available and act as a good indicator during the initial propagation phase. Based on these findings, we suggest a new rumor classification algorithm that achieves competitive accuracy over both short and long time windows. These findings provide new insights for explaining rumor mechanism theories and for identifying features of early rumor detection.

314 citations

•

TL;DR: In this article, a bipartite graph based data clustering method is proposed, where terms and documents are simultaneously grouped into semantically meaningful co-categories and subject descriptors.

Abstract: Bipartite Graph Partitioning and Data Clustering* Hongyuan Zha Xiaofeng He Dept. of Comp. Sci. & Eng. Penn State Univ. State College, PA 16802 {zha,xhe}@cse.psu.edu Chris Ding Horst Simon NERSC Division Berkeley National Lab. Berkeley, CA 94720 {chqding,hdsimon} Qlbl. gov Ming Gu Dept. of Math. U.C. Berkeley Berkeley, CA 94720 mgu@math.berkeley.edu ABSTRACT M a n y data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, we propose a new data clustering method based on partitioning the underlying bipartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. We show that an approxi mate solution to the minimization problem can be obtained by computing a partial singular value decomposition ( S V D ) of the associated edge weight matrix of the bipartite graph. We point out the connection of our clustering algorithm to correspondence analysis used in multivariate analysis. We also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, we apply our clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency. 1. INTRODUCTION Cluster analysis is an important tool for exploratory data mining applications arising from many diverse disciplines. Informally, cluster analysis seeks to partition a given data set into compact clusters so that data objects within a clus ter are more similar than those in distinct clusters. The liter ature on cluster analysis is enormous including contributions from many research communities, (see [6, 9] for recent sur veys of some classical approaches.) M a n y traditional clus tering algorithms are based on the assumption that the given dataset consists of covariate information (or attributes) for each individual data object, and cluster analysis can be cast as a problem of grouping a set of n-dimensional vectors each representing a data object in the dataset. A familiar ex ample is document clustering using the vector space model [1]. Here each document is represented by an n-dimensional vector, and each coordinate of the vector corresponds to a term in a vocabulary of size n. This formulation leads to the so-called term-document matrix A = (oy) for the rep resentation of the collection of documents, where o y is the so-called term frequency, i.e., the number of times term i occurs in document j. In this vector space model terms and documents are treated asymmetrically with terms consid ered as the covariates or attributes of documents. It is also possible to treat both terms and documents as first-class citizens in a symmetric fashion, and consider a y as the fre quency of co-occurrence of term i and document j as is done, for example, in probabilistic latent semantic indexing [12]. In this paper, we follow this basic principle and propose a new approach to model terms and documents as vertices in a bipartite graph with edges of the graph indicating the co-occurrence of terms and documents. In addition we can optionally use edge weights to indicate the frequency of this co-occurrence. Cluster analysis for document collections in this context is based on a very intuitive notion: documents are grouped by topics, on one hand documents in a topic tend to more heavily use the same subset of terms which form a term cluster, and on the other hand a topic usually is characterized by a subset of terms and those documents heavily using those terms tend to be about that particular topic. It is this interplay of terms and documents which gives rise to what we call bi-clustering by which terms and documents are simultaneously grouped into semantically co- Categories and Subject Descriptors 11.3.3 [ I n f o r m a t i o n S e a r c h a n d R e t r i e v a l ] : Clustering; G.1.3 [ N u m e r i c a l L i n e a r A l g e b r a ] : Singular value de composition; G.2.2 [ G r a p h T h e o r y ] : G r a p h algorithms General Terms Algorithms, theory Keywords document clustering, bipartite graph, graph partitioning, spectral relaxation, singular value decomposition, correspon dence analysis *Part of this work was done while Xiaofeng He was a grad uate research assistant at N E R S C , Berkeley National Lab. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM '01 November 5-10, 2001, Atlanta, Georgia. U S A Copyright 2001 A C M X - X X X X X - X X - X / X X / X X ...$5.00. O u r clustering algorithm computes an approximate global optimal solution while probabilistic latent semantic indexing relies on the E M algorithm and therefore might be prone to local m i n i m a even with the help of some annealing process. x

295 citations