Stochastic Block Models for Multiplex networks: an
application to networks of researchers
Pierre Barbillon
∗1
, Sophie Donnet
1
, Emmanuel Lazega
2
, and Avner Bar-Hen
3
1
AgroParisTech / UMR INRA MIA 518, 16 rue Claude Bernard, 75231 Paris
Cedex 05, France
2
Institut d’
´
Etudes Politiques de Paris (Sciences Po), D´epartement de Sociologie,
Centre de Sociologie des Organisations, 19 rue Am´elie, 75007 Paris, France
3
MAP5, UFR de Math´ematiques et Informatique Universit´e Paris Descartes 45
rue des Saints-P`eres 75270 Paris cedex 06
Abstract
Modeling relations between individuals is a classical q u est i on in soci al sciences
and clustering individuals according to the observed patterns of interactions allows
to uncover a latent struct u re in the data. Stochastic bl ock model (SBM) is a popu-
lar approach for grouping t he individuals with respect to their social comportment.
When several relationships of various types can occur jointly between the individu-
als, the data are represented by multiplex networks where more than one edge can
exist between the nodes. In this paper, we extend the SBM to multiplex networks
in order to obtain a clustering based on more than one kind of relationship. We pro-
pose to estimat e the parameters –such as the marginal probabilities of assignment
to groups (blocks) and the mat ri x of probabilities of connections between groups–
∗
pierre.barbillon@agroparistech.fr
1
arXiv:1501.06444v1 [stat.ME] 26 Jan 2015
through a variational Expectation-Maximization procedure. Consistency of the es-
timates as well as statistical properties of the model are obt ai ne d. The numb er of
groups is chosen thanks to the Integrated Completed Likelihood criteria, a penalized
likelihood criterion. Multiplex Stochastic Block Model arises in many situations but
our applied example is moti vated by a network of French cancer rese ar chers. The
two possible links (edges) between researchers are a direct connection or a connec-
tion through their labs. Our results show strong i nteractions between these two
kinds of connections and the groups that are obtained are discussed to emphasize
the common features of researchers grouped together.
Keywords— Bivariate Stochastic Block Model, Multilevel / Multiplex networks,
Social network.
1 Introduction
Network analysis has emerged as a key technique for understanding and for investigating
social interactions through the properties of relations between and within units. From a
statistical point of view, a network is a realization of a random graph formed by a set of
nodes V representing the units (e.g. individuals, actors, companies) and a set of edges
E representing relat i ons hip s between pairs of nodes.
The system in which the same nodes belong to multiple networks is typically referred
to as a multiplex network or multigraph (see
Wasserman (1994) for example). In recent
literature, there has been an upsurge of interest in multiplex networks (see for example
Cozzo et al. (2012); Loe and Jeldtoft Jensen (2014); Rank et al. (2010); Szell et al. (2010);
Mucha et al. (2010); Maggioni et al. (2013); Brummitt et al. (2012); Saumell-Mendiola
et al. (2012
); Bianconi (2013); Ni c osi a et al. (2013)). In these multiplex networks, differ-
ent kinds of links (or c onn ec ti on s) are possible for each p ai r of nodes. This induced link
multiplexity is a fundamental aspect of social relation s (
Snijders and Bae rveldt, 2003)
since t h es e multiple links are frequently interdependent: links in one network may have
2
an influence on the formation or di ss ol ut i on of lin ks in ot h er networks.
The simultaneous analysis of several networks also arises when one is interested in the
social comp ort m ent of individuals b el on gin g to organized entities (such as companies,
laboratories, political groups, etc.), with some individuals possibly belonging to the same
institution. While the actors will exchange resources (such as advice for instance) at the
individual level, their respective organizations of affiliation will also share resources at
the ins ti t u t ion al level (financial resources for instance). Each level (individuals and or-
ganizations) constitutes a system of e xchange of different resources that has its own
logic and could be studied separately. However, studying the two networks jointly (and
hence embedding the individuals in the multilevel relational and organizational st r uc -
tures constituting the inter-organizational context of t he i r actions) would allow us to
identify the indiv i du al s t h at bene fit from relatively easy access to the re sou rc es circu-
lating in each level, whi ch is of much more interest. In other words, studying the two
levels jointly could help us understand how an indivi du al can b en efi t from the position
of its organization in the institu t i onal network.
In this paper, we are interested in s t ud y in g the advice relations between researchers
and the exchanges of resources between laboratories. We adopt the following indivi d ual -
oriented strategy (this point is discussed in the paper): the institutional network is used
to define a new network on the individual level i.e. the set of nodes consists in the set
of individuals and for a pair of ind i v id u als , two kinds of links are possible: a direct con-
nection given by the individual network and a connection through their organizations
given by the organizational network. As a consequence, the individual and institution al
levels are fused into a multiplex.
We then develop a statistical model able to detect in multiplex substantial non-
trivial topological features, wit h patterns of connection between their elements that are
3
not purely regular. Several models such as scale-free networks and small-world networks
have been proposed to describe and understand the heterogeneity observed in networks.
These models allow to derive proper ti e s of the network at the macro-scale and to un-
derstand the outcomes of interactions. To explore heterogeneity at others scales (such
as micro or meso-scale) in social n etworks, specific mod el s such as the stochastic block
models (SBM) (
Snijders and Nowicki (1997)) have been developed for uniplex networks.
In thi s paper, we propose an original extension of the SBMs to the multiplex case. Our
model is efficient to model not only the main effects (that correspond to a classical uni-
plex) but also the pairwise interactions between the nodes. We estimate the parameters
of the multiplex SBM usin g an extension of the variational EM algorithm. Consistency
of the est i mat i on of the parameters is proved. As for unip le x SBM, a key issue is to
choose t he number of blocks. We use a penalized likelihood criterion, namely Integrated
Completed Likelihood (ICL). The inference procedure is performed on the cancer re-
searchers / laboratories dataset.
The paper is organized as follows. The extension of SBM to multiplex network is pre-
sented in Section
2, the proofs of model identifiability and the consistency of variational
EM procedure ar e postponed in Appendices
A and B. In Se ct i on 3, we describe Laz ega
et al. (2008
)’s dataset, apply the new modeling and discuss the results. Eventually, t h e
contribution of multiplex SBM to the analysis of multiplex networks is highlighted in
Section
4.
2 Multiplex stochastic block model
The main object i ve is to clus t er the individuals (or nodes) into blocks sharing connection
properties with the other individuals of the multiplex-network. Stochastic block models
(Nowicki and Snijders, 2001) for random graphs have emerged as a natural tool to
perform such a clustering based on uniplex networks (directed or not, valued or not).
4
In the following, we propose an ex t e ns ion of the Stochastic Block Model (SBM) to
multiplex networks. The SBM for multiplex networks is derived fr om a multiplex Erd¨os-
R´enyi model which is descri bed in subsection
2.1. T he SBM for multiplex networks is
derived in sub se ct i on
2.2.
2.1 Erd¨os-R´enyi model for multiplex networks
Let X
1
, . . . , X
K
be K directed graphs relying on the same set of nodes E = {1, . . . , n}.
We assume that ∀(i, j), i 6= j, ∀k ∈ {1, . . . , K}, X
k
ij
∈ {0, 1} and X
ii
6= 0. We define a
joint distribution on X
1:K
= (X
1
, . . . , X
K
) as: ∀(i, j) ∈ {1, . . . , n}
2
, i 6= j, ∀w ∈ {0, 1}
K
,
P(X
1:K
ij
= w) = π
(w)
where
X
w∈{0,1}
K
π
(w)
= 1 , (1)
and (X
1:K
ij
)
i,j
are mutually in de pendent.
The maximum likelihood estimate of th e parameter of interest π = (π
(w)
)
w∈{0,1}
K
is, for all w ∈ {0, 1}
K
:
bπ
w
=
1
n(n − 1)
X
i,j,i6=j
I
{X
1:K
ij
=w}
.
This model is quite simple since any relation between two individuals (a relation being
a collection of edges) d oes not depend on the relations be tween the other individuals.
However, the different kind of relations between two individuals (edges) are not assumed
to be independent.
Remark 1. This model is clearly an extension of the Erd¨os-R´enyi model since the
marginal distribution of X
k
ij
(for any k = 1 . . . K) is Bernou lli w it h density :
P(X
k
ij
= x
k
ij
) =
X
w∈{0,1}|w
k
=1
π
(w)
x
k
ij
X
w∈{0,1}|w
k
=0
π
(w)
1−x
k
ij
.
Moreover, any conditional distribution of X
k
ij
given (X
l
ij
)
l∈S
\k
(where S
\k
is a subset of
5