scispace - formally typeset
Open AccessBook ChapterDOI

Detection and Interpretation of Communities in Complex Networks: Practical Methods and Application

TLDR
This article first review community detection algorithms and characterize them in terms of the nature of the communities they detect, then focuses on the methodological tools one can use to analyze the obtained community structure, both in termsof topological features and nodal attributes.
Abstract
Community detection, an important part of network analysis, has become a very popular field of research. This activity resulted in a profusion of community detection algorithms, all different in some not always clearly defined sense. This makes it very difficult to select an appropriate tool when facing the concrete task of having to identify and interpret groups of nodes, relatively to a system of interest. In this chapter, we tackle this problem in a very practical way, from the user’s point of view. We first review community detection algorithms and characterize them in terms of the nature of the communities they detect. We then focus on the methodological tools one can use to analyze the obtained community structure, both in terms of topological features and nodal attributes. To be as concrete as possible, we use a real-world social network to illustrate the application of the presented tools and give examples of interpretation of their results from a Business Science perspective.

read more

Content maybe subject to copyright    Report

HAL Id: hal-00633653
https://hal.archives-ouvertes.fr/hal-00633653v3
Submitted on 27 May 2012
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Detection and Interpretation of Communities in
Complex Networks: Methods and Practical Application
Vincent Labatut, Jean-Michel Balasque
To cite this version:
Vincent Labatut, Jean-Michel Balasque. Detection and Interpretation of Communities in Complex
Networks: Methods and Practical Application. Computational Social Networks: Tools, Perspectives
and Applications, Springer, pp.81-113, 2012, �10.1007/978-1-4471-4048-1_4�. �hal-00633653v3�

Detection and Interpretation of Communities in
Complex Networks: Practical Methods and
Application
Vincent Labatut and Jean-Michel Balasque
Summary Community detection, an important part of network analysis, has be-
come a very popular field of research. This activity resulted in a profusion of
community detection algorithms, all different in some not always clearly defined
sense. This makes it very difficult to select an appropriate tool when facing the
concrete task of having to identify and interpret groups of nodes, relatively to a
system of interest. In this chapter, we tackle this problem in a very practical way,
from the user’s point of view. We first review community detection algorithms
and characterize them in terms of the nature of the communities they detect. We
then focus on the methodological tools one can use to analyze the obtained com-
munity structure, both in terms of topological features and nodal attributes. To be
as concrete as possible, we use a real-world social network to illustrate the appli-
cation of the presented tools, and give examples of interpretation of their results
from a Business Science perspective.
1 Introduction
Network modeling has been used for years in many application fields: biological,
social, technological, communication, information (see [1] for a very comprehen-
sive review of applied studies). The necessity to focus on some subparts has ap-
peared quite soon for instance in sociology [2], and was initially performed manu-
ally, with a qualitative approach. However this type of analysis changed radically
during the last decades, with the coming of the information age. Technology pro-
vided scientists with means to store, access and take advantage of very large
amount of data (databases, internet, computing power). The analysis of very large
networks became possible, provided appropriate techniques were used. Network
Vincent Labatut
Galatasaray University, Computer Science Department, Çırağan Cad. No:36, 34357 Orta-
köy/İstanbul, Turkey
e-mail: vlabatut@gsu.edu.tr
Jean-Michel Balasque
Galatasaray University, Business Science & Marketing Department, Çırağan Cad. No:36, 34357
Ortaköy/İstanbul, Turkey
e-mail: jmbalasque@gsu.edu.tr

2
analysis took a quantitative turn, which initiated a very creative phase, leading to
the development of powerful tools.
Large real-world networks are characterized by a heterogeneous structure,
which leads to particular properties. Various subfields of network analysis focus
on different properties: efficiency of information propagation, robustness, stabil-
ity, synchronization, etc. [1]. In particular, an heterogeneous distribution of links
often leads to a so-called community structure [3]. A community roughly corre-
sponds to a group of nodes more densely interconnected, relatively to the rest of
the network [4]. Note this concept has been translated into different more formal
definitions, which we will review later in this document. The way such a structure
can be interpreted is obviously dependent on the modeled system. However, inde-
pendently from the nature of this system, the study of communities constitutes a
mesoscopic analysis, complementary to the microscopic (node-wise) and macro-
scopic (network-wise) approaches one can also adopt. Because of this intermedi-
ary position, the community structure conveys some very important information,
necessary to the good understanding of the system [5]. Consequently, detecting
communities is an essential part of modern network analysis.
In this chapter, we focus on this task with a very practical and operational ap-
proach, and adopt the user’s point of view. To our opinion, someone willing to
perform community detection on his data needs to answer three important ques-
tions: Which algorithms should I apply? How will I compare their results? How
will I interpret the obtained communities? As stated before, networks are used in
many application fields. However, modern community detection tools have not
significantly penetrated certain research areas yet. We believe one of the reasons
for this is the profusion of tools and the lack of information regarding their simi-
larities and differences, which underlines the importance of our first question.
Most articles present new community detection algorithms and compare them to
existing ones, using real-world and artificially generated data. However, the algo-
rithms are generally compared only in a quantitative way, thanks to some perfor-
mance measures [6]. Yet, algorithms rely on different formal definitions of what a
community is. It therefore seems incomplete, or even unfair, to compare algo-
rithms which do not actually try to detect the same objects. Moreover, once com-
munities have been identified, one wants to give them a meaning relative to the
studied system, and this task is largely dependent on the selected algorithm.
We aim at offering the user the information he needs to determine which algo-
rithms are adapted to his data, apply and compare them, and interpret their result
in meaningful terms, relatively to the applicative context. As an illustration, we
will apply the described methods to some data describing a sample of  univer-
sity students. These data were gathered during a survey performed in the Ga-
latasaray University at Istanbul, Turkey [7]. Its goal was to retrieve the infor-
mation needed to extract a network representing the students’ social interactions,
and perform an analysis of their purchasing behavior. Thus, besides the social
network itself, the data includes a whole set of nodal attributes describing factual
(age, gender, clubs membership, etc.), behavioral (perceived actions in terms of
human interaction and purchasing behavior) and sentimental (personal thoughts

3
and feelings relative to university, friends, desires, favorite brands, etc.) infor-
mation. In this chapter, however, we do not mean to conduct an exhaustive analy-
sis of these data, but simply to use them as a practical example (cf. [7] for the de-
tails regarding the survey and this analysis).
The rest of this chapter is organized as follows. Section two is dedicated to
community detection algorithms: we describe their properties, how to compare
their results, and how to select the most relevant community structure. In the third
section, we show different types of analysis oriented towards the interpretation of
the community structure. We focus on different methods allowing to characterize
communities, based on both topological information and nodal attributes. Finally,
we conclude by mentioning alternative methods which we could not describe in
details.
2 Community Detection Process
Our goal in this section is first to review the existing community detection meth-
ods from the user’s perspective. Usually, these algorithms are presented from the
author’s perspective, with emphasis on process, performance and computational
cost [6]. However, the community detection problem is known to be ill-defined
[3,8,9,5], which is why so many different algorithms exist: they do not define the
concept of community in the same formal way. They consequently do not neces-
sarily detect the same communities. Under these conditions, comparing raw per-
formances obtained from different algorithms seems very little relevant.
We think the final user is basically interested in three properties. First, the type
of information the algorithm is able to process. Indeed, there are various ways of
describing a network and one can embed different sorts of data: link attributes
(weights, directions), node attributes, different classes of links (multiplex net-
works) or nodes (n-mode or multipartite networks), temporal information, etc. The
user may want to select a method able to take advantage of all the available data.
In this chapter, we decided to focus on plain networks, with simple links.
Second, the kind of community structure the algorithm produces. One generally
distinguishes partitions and covers, i.e. mutually exclusive and overlapping com-
munities. We decided to focus on the former, because only a few algorithms are
able to identify covers already. Most algorithms output a single partition, but some
of them are able to produce a collection of community structures estimated for dif-
ferent granularities. In the case of hierarchical algorithms, communities belonging
to neighboring granularities are hierarchically related. In a given level, communi-
ties may correspond to the merging of several lower level communities, while be-
ing a part themselves of larger communities in the upper level. Multiresolution
methods also estimate the community structure at different granularities, but with-
out looking specifically for hierarchical relationships between them. They either
scan automatically various scales or allow to specify them parametrically [10].

4
Third, the nature of the communities the algorithm is able to identify. As stated
before, there are many ways to define formally what a community is. Yet this con-
cept is at the center of the analysis, and is therefore of utmost importance. The us-
er should select his tool mainly based on this feature.
In order to give the user all the information he needs, we reviewed community
detection methods according to the three properties we mentioned. Note excellent
reviews exist, which describe in great details the points we chose to ignore here
[3,8,9,11]. The rest of the section is more practical. We present a list of publicly
available tools and summarize their features in the previously mentioned terms.
We then consider the very common case where one could estimate several com-
munity structures for a network of interest. We present various ways to tackle the
problem of selecting the most appropriate community structure depending on the
user’s criteria and objectives.
2.1 Concept of Community
A very widespread informal definition of the community concept considers it as a
group of nodes densely interconnected compared to the rest of the network
[3,8,9,11]. In other terms, a community is a cohesive subset clearly separated from
the rest of the network. Formal interpretations try to formalize and combine both
these aspects of cohesion and separation. Note this definition is not always explic-
it: procedural approaches exist, in which the notion of community is implicitly de-
fined as the result of the processing. Although it is not always straightforward to
categorize the definitions, we regroup them in four classes: density-, pattern-, node
similarity- and link centrality-based approaches. The last subsection is dedicated
to methods which did not fit in the previous definitions.
2.1.1 Density
A whole family of formalizations is based on a direct translation of the informal
community definition given above. The general approach consists first of specify-
ing two distinct measures to assess separately cohesion and separation, and then in
defining a global measure by considering their difference or ratio. For instance,
Mancoridis et. al [12] defined their intra-connectivity and inter-connectivity to
measure the cohesion and separation of a community, respectively. The former is
simply the regular density processed when considering only the links located in-
side a community, i.e. connecting two nodes belonging to the community. The lat-
ter is the density processed when considering only the links between a pair of
communities. Let us note
the number of nodes in community , and

the
number of links between communities and . Then, for an undirected network,
the intra-connectivity of community is:

Figures
Citations
More filters
Journal ArticleDOI

Surprise maximization reveals the community structure of complex networks

TL;DR: It is concluded that Surprise maximization precisely reveals the community structure of complex networks.
Journal ArticleDOI

Unsupervised Learning and Multipartite Network Models: A Promising Approach for Understanding Traditional Medicine

TL;DR: It is concluded that multipartite network modeling may become a suitable data integration tool for understanding the mechanisms of actions of traditional medicine through clustering analysis and network modeling.
Journal ArticleDOI

An O(n2) algorithm for detecting communities of unbalanced sizes in large scale social networks

TL;DR: This work defines a new function that qualifies a partition and presents an algorithm that optimizes this function in order to find, within a reasonable time, the partition with the best measure of quality and which does not ignore small community.
Dissertation

Countering the social ignorance of 'social' network analysis and data mining with ethnography : a case study of the Singapore blogosphere

TL;DR: In this paper, the authors present a case study of the use of social network analysis and an anthropological approach to the analysis of the Singaporean blogosphere from 2009 to 2010, and assesses which of White's three disciplines and relative valuation orders the Singapore blogosphere adheres.
References
More filters
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI

Community structure in social and biological networks

TL;DR: This article proposes a method for detecting communities, built around the idea of using centrality indices to find community boundaries, and tests it on computer-generated and real-world graphs whose community structure is already known and finds that the method detects this known structure with high sensitivity and reliability.
Journal ArticleDOI

Fast unfolding of communities in large networks

TL;DR: This work proposes a heuristic method that is shown to outperform all other known community detection methods in terms of computation time and the quality of the communities detected is very good, as measured by the so-called modularity.
Journal ArticleDOI

Finding and evaluating community structure in networks.

TL;DR: It is demonstrated that the algorithms proposed are highly effective at discovering community structure in both computer-generated and real-world network data, and can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Journal ArticleDOI

Fast unfolding of communities in large networks

TL;DR: In this paper, the authors proposed a simple method to extract the community structure of large networks based on modularity optimization, which is shown to outperform all other known community detection methods in terms of computation time.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What are the contributions in "Detection and interpretation of communities in complex networks: methods and practical application" ?

In this paper, the authors focus on detecting and interpreting communities in complex networks, and adopt the user 's point of view. 

The same remark holds for the measures designed to study the significance [ 72,73 ] and topological properties [ 15 ] of the community structure. 

The NetCarto algorithm [23] (C code available on demand to its authors) implements a simulated annealing approach, which allows it to get very close to the actual optimum, but makes it in turn very slow. 

The fact they are all very close to , including the network value, indicates nodes are very dominantly connected to other nodes from the same communities. 

Network modeling has been used for years in many application fields: biological, social, technological, communication, information (see [1] for a very comprehensive review of applied studies). 

The measure was extended to weighted links similarly to what was done for the edge-betweenness, i.e. using a normalization based on the weight of the considered link [48]. 

Newman proposed extensions of both measures for weighted links [27], by normalizing edgebetweenness with the considered link weight, and by using weights to process the random walker transition probabilities. 

Although this is a drawback in the context of complex networks analysis, due to their size, the pattern-based approach still has an interesting advantage: it allows specifying more precisely the internal structure of the communities. 

There are several definitions for this notion, but link centrality is basically related to two properties: the number of pairs of nodes the link is connecting (directly or not) and how likely these connections are to be used.