What is the first algorithm to implement a simulated annealing approach?

The NetCarto algorithm [23] (C code available on demand to its authors) implements a simulated annealing approach, which allows it to get very close to the actual optimum, but makes it in turn very slow.

What does the fact that nodes are very close to indicate?

The fact they are all very close to , including the network value, indicates nodes are very dominantly connected to other nodes from the same communities.

How was the measure extended to weighted links?

The measure was extended to weighted links similarly to what was done for the edge-betweenness, i.e. using a normalization based on the weight of the considered link [48].

How did Girvan and Newman define their edgebetweenness measure?

Newman proposed extensions of both measures for weighted links [27], by normalizing edgebetweenness with the considered link weight, and by using weights to process the random walker transition probabilities.

What is the advantage of the pattern-based approach?

Although this is a drawback in the context of complex networks analysis, due to their size, the pattern-based approach still has an interesting advantage: it allows specifying more precisely the internal structure of the communities.

What are the two properties of link centrality?

There are several definitions for this notion, but link centrality is basically related to two properties: the number of pairs of nodes the link is connecting (directly or not) and how likely these connections are to be used.

(Open Access) Detection and Interpretation of Communities in Complex Networks: Practical Methods and Application (2012) | Vincent Labatut

Q: What are the contributions in "Detection and interpretation of communities in complex networks: methods and practical application" ?

In this paper, the authors focus on detecting and interpreting communities in complex networks, and adopt the user 's point of view.

Q: What have the authors stated for future works in "Detection and interpretation of communities in complex networks: methods and practical application" ?

The same remark holds for the measures designed to study the significance [ 72,73 ] and topological properties [ 15 ] of the community structure.

HAL Id: hal-00633653

https://hal.archives-ouvertes.fr/hal-00633653v3

Submitted on 27 May 2012

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Detection and Interpretation of Communities in

Complex Networks: Methods and Practical Application

Vincent Labatut, Jean-Michel Balasque

To cite this version:

Vincent Labatut, Jean-Michel Balasque. Detection and Interpretation of Communities in Complex

Networks: Methods and Practical Application. Computational Social Networks: Tools, Perspectives

and Applications, Springer, pp.81-113, 2012, �10.1007/978-1-4471-4048-1_4�. �hal-00633653v3�

Detection and Interpretation of Communities in

Complex Networks: Practical Methods and

Application

Vincent Labatut and Jean-Michel Balasque

Summary Community detection, an important part of network analysis, has be-

come a very popular field of research. This activity resulted in a profusion of

community detection algorithms, all different in some not always clearly defined

sense. This makes it very difficult to select an appropriate tool when facing the

concrete task of having to identify and interpret groups of nodes, relatively to a

system of interest. In this chapter, we tackle this problem in a very practical way,

from the user’s point of view. We first review community detection algorithms

and characterize them in terms of the nature of the communities they detect. We

then focus on the methodological tools one can use to analyze the obtained com-

munity structure, both in terms of topological features and nodal attributes. To be

as concrete as possible, we use a real-world social network to illustrate the appli-

cation of the presented tools, and give examples of interpretation of their results

from a Business Science perspective.

1 Introduction

Network modeling has been used for years in many application fields: biological,

social, technological, communication, information (see [1] for a very comprehen-

sive review of applied studies). The necessity to focus on some subparts has ap-

peared quite soon for instance in sociology [2], and was initially performed manu-

ally, with a qualitative approach. However this type of analysis changed radically

during the last decades, with the coming of the information age. Technology pro-

vided scientists with means to store, access and take advantage of very large

amount of data (databases, internet, computing power). The analysis of very large

networks became possible, provided appropriate techniques were used. Network

Vincent Labatut

Galatasaray University, Computer Science Department, Çırağan Cad. No:36, 34357 Orta-

köy/İstanbul, Turkey

e-mail: vlabatut@gsu.edu.tr

Jean-Michel Balasque

Galatasaray University, Business Science & Marketing Department, Çırağan Cad. No:36, 34357

Ortaköy/İstanbul, Turkey

e-mail: jmbalasque@gsu.edu.tr

analysis took a quantitative turn, which initiated a very creative phase, leading to

the development of powerful tools.

Large real-world networks are characterized by a heterogeneous structure,

which leads to particular properties. Various subfields of network analysis focus

on different properties: efficiency of information propagation, robustness, stabil-

ity, synchronization, etc. [1]. In particular, an heterogeneous distribution of links

often leads to a so-called community structure [3]. A community roughly corre-

sponds to a group of nodes more densely interconnected, relatively to the rest of

the network [4]. Note this concept has been translated into different more formal

definitions, which we will review later in this document. The way such a structure

can be interpreted is obviously dependent on the modeled system. However, inde-

pendently from the nature of this system, the study of communities constitutes a

mesoscopic analysis, complementary to the microscopic (node-wise) and macro-

scopic (network-wise) approaches one can also adopt. Because of this intermedi-

ary position, the community structure conveys some very important information,

necessary to the good understanding of the system [5]. Consequently, detecting

communities is an essential part of modern network analysis.

In this chapter, we focus on this task with a very practical and operational ap-

proach, and adopt the user’s point of view. To our opinion, someone willing to

perform community detection on his data needs to answer three important ques-

tions: Which algorithms should I apply? How will I compare their results? How

will I interpret the obtained communities? As stated before, networks are used in

many application fields. However, modern community detection tools have not

significantly penetrated certain research areas yet. We believe one of the reasons

for this is the profusion of tools and the lack of information regarding their simi-

larities and differences, which underlines the importance of our first question.

Most articles present new community detection algorithms and compare them to

existing ones, using real-world and artificially generated data. However, the algo-

rithms are generally compared only in a quantitative way, thanks to some perfor-

mance measures [6]. Yet, algorithms rely on different formal definitions of what a

community is. It therefore seems incomplete, or even unfair, to compare algo-

rithms which do not actually try to detect the same objects. Moreover, once com-

munities have been identified, one wants to give them a meaning relative to the

studied system, and this task is largely dependent on the selected algorithm.

We aim at offering the user the information he needs to determine which algo-

rithms are adapted to his data, apply and compare them, and interpret their result

in meaningful terms, relatively to the applicative context. As an illustration, we

will apply the described methods to some data describing a sample of  univer-

sity students. These data were gathered during a survey performed in the Ga-

latasaray University at Istanbul, Turkey [7]. Its goal was to retrieve the infor-

mation needed to extract a network representing the students’ social interactions,

and perform an analysis of their purchasing behavior. Thus, besides the social

network itself, the data includes a whole set of nodal attributes describing factual

(age, gender, clubs membership, etc.), behavioral (perceived actions in terms of

human interaction and purchasing behavior) and sentimental (personal thoughts

and feelings relative to university, friends, desires, favorite brands, etc.) infor-

mation. In this chapter, however, we do not mean to conduct an exhaustive analy-

sis of these data, but simply to use them as a practical example (cf. [7] for the de-

tails regarding the survey and this analysis).

The rest of this chapter is organized as follows. Section two is dedicated to

community detection algorithms: we describe their properties, how to compare

their results, and how to select the most relevant community structure. In the third

section, we show different types of analysis oriented towards the interpretation of

the community structure. We focus on different methods allowing to characterize

communities, based on both topological information and nodal attributes. Finally,

we conclude by mentioning alternative methods which we could not describe in

details.

2 Community Detection Process

Our goal in this section is first to review the existing community detection meth-

ods from the user’s perspective. Usually, these algorithms are presented from the

author’s perspective, with emphasis on process, performance and computational

cost [6]. However, the community detection problem is known to be ill-defined

[3,8,9,5], which is why so many different algorithms exist: they do not define the

concept of community in the same formal way. They consequently do not neces-

sarily detect the same communities. Under these conditions, comparing raw per-

formances obtained from different algorithms seems very little relevant.

We think the final user is basically interested in three properties. First, the type

of information the algorithm is able to process. Indeed, there are various ways of

describing a network and one can embed different sorts of data: link attributes

(weights, directions), node attributes, different classes of links (multiplex net-

works) or nodes (n-mode or multipartite networks), temporal information, etc. The

user may want to select a method able to take advantage of all the available data.

In this chapter, we decided to focus on plain networks, with simple links.

Second, the kind of community structure the algorithm produces. One generally

distinguishes partitions and covers, i.e. mutually exclusive and overlapping com-

munities. We decided to focus on the former, because only a few algorithms are

able to identify covers already. Most algorithms output a single partition, but some

of them are able to produce a collection of community structures estimated for dif-

ferent granularities. In the case of hierarchical algorithms, communities belonging

to neighboring granularities are hierarchically related. In a given level, communi-

ties may correspond to the merging of several lower level communities, while be-

ing a part themselves of larger communities in the upper level. Multiresolution

methods also estimate the community structure at different granularities, but with-

out looking specifically for hierarchical relationships between them. They either

scan automatically various scales or allow to specify them parametrically [10].

Third, the nature of the communities the algorithm is able to identify. As stated

before, there are many ways to define formally what a community is. Yet this con-

cept is at the center of the analysis, and is therefore of utmost importance. The us-

er should select his tool mainly based on this feature.

In order to give the user all the information he needs, we reviewed community

detection methods according to the three properties we mentioned. Note excellent

reviews exist, which describe in great details the points we chose to ignore here

[3,8,9,11]. The rest of the section is more practical. We present a list of publicly

available tools and summarize their features in the previously mentioned terms.

We then consider the very common case where one could estimate several com-

munity structures for a network of interest. We present various ways to tackle the

problem of selecting the most appropriate community structure depending on the

user’s criteria and objectives.

2.1 Concept of Community

A very widespread informal definition of the community concept considers it as a

group of nodes densely interconnected compared to the rest of the network

[3,8,9,11]. In other terms, a community is a cohesive subset clearly separated from

the rest of the network. Formal interpretations try to formalize and combine both

these aspects of cohesion and separation. Note this definition is not always explic-

it: procedural approaches exist, in which the notion of community is implicitly de-

fined as the result of the processing. Although it is not always straightforward to

categorize the definitions, we regroup them in four classes: density-, pattern-, node

similarity- and link centrality-based approaches. The last subsection is dedicated

to methods which did not fit in the previous definitions.

2.1.1 Density

A whole family of formalizations is based on a direct translation of the informal

community definition given above. The general approach consists first of specify-

ing two distinct measures to assess separately cohesion and separation, and then in

defining a global measure by considering their difference or ratio. For instance,

Mancoridis et. al [12] defined their intra-connectivity and inter-connectivity to

measure the cohesion and separation of a community, respectively. The former is

simply the regular density processed when considering only the links located in-

side a community, i.e. connecting two nodes belonging to the community. The lat-

ter is the density processed when considering only the links between a pair of

communities. Let us note 



the number of nodes in community , and 



the

number of links between communities  and . Then, for an undirected network,

the intra-connectivity of community  is:

Detection and Interpretation of Communities in Complex Networks: Practical Methods and Application

Figures

Citations

Data Mining Practical Machine Learning Tools and Techniques

Surprise maximization reveals the community structure of complex networks

Unsupervised Learning and Multipartite Network Models: A Promising Approach for Understanding Traditional Medicine

An O(n2) algorithm for detecting communities of unbalanced sizes in large scale social networks

Countering the social ignorance of 'social' network analysis and data mining with ethnography : a case study of the Singapore blogosphere

References

Data Mining: Practical Machine Learning Tools and Techniques

Community structure in social and biological networks

Fast unfolding of communities in large networks

Finding and evaluating community structure in networks.

Fast unfolding of communities in large networks

Related Papers (5)

Fast unfolding of communities in large networks

Community detection in graphs

Community structure in social and biological networks

Finding and evaluating community structure in networks.

Finding community structure in very large networks.

Frequently Asked Questions (9)

Q1. What are the contributions in "Detection and interpretation of communities in complex networks: methods and practical application" ?

Q2. What have the authors stated for future works in "Detection and interpretation of communities in complex networks: methods and practical application" ?

Q3. What is the first algorithm to implement a simulated annealing approach?

Q4. What does the fact that nodes are very close to indicate?

Q5. What is the common application of network modeling?

Q6. How was the measure extended to weighted links?

Q7. How did Girvan and Newman define their edgebetweenness measure?

Q8. What is the advantage of the pattern-based approach?

Q9. What are the two properties of link centrality?