scispace - formally typeset
Open AccessBook ChapterDOI

A note on maximizing the spread of influence in social networks

Eyal Even-Dar, +1 more
- pp 281-286
Reads0
Chats0
TLDR
A very simple and efficient algorithms are provided for solving the spread maximization problem in the context of the well studied probabilistic voter model and it is shown that the most natural heuristic solution, which picks the nodes in the network with the highest degree is indeed the optimal solution.
Abstract
We consider the spread maximization problem that was defined by Domingos and Richardson [6,15] In this problem, we are given a social network represented as a graph and are required to find the set of the most "influential" individuals that by introducing them with a new technology, we maximize the expected number of individuals in the network, later in time, that adopt the new technology This problem has applications in viral marketing, where a company may wish to spread the rumor of a new product via the most influential individuals in popular social networks such as Myspace and Blogsphere The spread maximization problem was recently studied in several models of social networks [10,11,13] In this short paper we study this problem in the context of the well studied probabilistic voter model We provide very simple and efficient algorithms for solving this problem An interesting special case of our result is that the most natural heuristic solution, which picks the nodes in the network with the highest degree, is indeed the optimal solution

read more

Content maybe subject to copyright    Report

A Note on Maximizing the Spread of Influence in
Social Networks
Eyal Even-Dar
1
and Asaf Shapira
2
1
Go ogle Research, Email: evendar@google.com
2
Microsoft Research, Email: asafico@microsoft.com
Abstract. We consider the spread maximization problem that was defined by Domingos and Richard-
son [7, 22]. In this problem, we are given a social network represented as a graph and are required
to find the set of the most “influential” individuals that by introducing them with a new technology,
we maximize the expected number of individuals in the network, later in time, that adopt the new
technology. This problem has applications in viral marketing, where a company may wish to spread the
rumor of a new product via the most influential individuals in popular social networks such as Myspace
and Blogsphere.
The spread maximization problem was recently studied in several models of social networks [14, 15, 20].
In this short paper we study this problem in the context of the well studied probabilistic voter model.
We provide very simple and efficient algorithms for solving this problem. An interesting special case of
our result is that the most natural heuristic solution, which picks the nodes in the network with the
highest degree, is indeed the optimal solution.
1 Introduction
1.1 Background
With the emerging Web 2.0, the importance of social networks as a marketing tool is growing
rapidly and the use of social networks as a marketing tool spans diverse areas, and has even
been recently used by the campaigns of presidential candidates in the United States
1
. Social
networks are networks (i.e. graphs) in which the nodes represent individuals and the edges
represent relations between them. To illustrate the viral marketing channel (see [2, 3, 7, 11,
19]), consider a new company that wishes to promote its new specialized search engine.
A promising way these days would be through popular social network such as Myspace,
Blogsphere etc, rather than using classical advertising channels. By convincing several key
persons in each network to adopt (or even to try) the new search engine, the company can
obtain an effective marketing campaign and to enjoy the diffusion effect over the network.
If we assume that “convincing” each key person to “spread” the rumor on the new product
costs money, then a natural problem is the following: given a social network, how can we
detect the players through which we can spread, or “diffuse”, the new technology in the most
effective way.
Diffusion processes in social network have been studied for a long time in social sci-
ences, see e.g. [5, 12, 3, 23, 24]. The algorithmic aspect of marketing in social networks was
1
Hillary Clinton - http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&friendID=64552165,
Barack Obama - http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&friendid=184040237, John
Edwards-http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&friendid=9736082, Rudy Giuliani -
http://www.myspace.com/rudygiulianiisgod

introduced by Domingos and Richardson [7, 22] and can be formulated as follows. Given a
social network structure and a diffusion dynamics (i.e. how the individuals influence each
other), find a set S of nodes of cost at most K that by introducing them with a new tech-
nology/product, the spread of the technology/product will be maximized. We refer to the
problem of finding such a maximizing set S as the Spread maximization set problem. The
work of Domingos and Richardson [7, 22] studied this problem in a probabilistic setting and
mainly provided heuristics to compute a maximizing set. Following [7, 22], Kempe et al. [14,
15] and Mossel and Roch [20] considered a threshold network, in which users adopt a new
technology only if a fixed fraction of their neighbors have already adopted this new technol-
ogy. Their results show that finding the optimal subset of size K is NP-Hard to approximate
within a factor smaller than 1 1/e and also show that a greedy algorithm achieves this
ratio.
1.2 Our contribution
In this paper we consider the Spread maximization set problem, in the case where the un-
derlying social network behaves like the voter model. The voter model, which was introduced
by Clifford and Sudbury [4] and Holley and Liggett [13], is probably one of the most basic
and natural probabilistic models to represent the diffusion of opinions in a social network; it
models the diffusion of opinions in a network as follows: in each step, each person changes
his opinion by choosing one of his neighbors at random and adopting the neighbor’s opinion.
The model has been studied extensively in the field of interacting particle systems [17, 18,
10, 1] and many variations of the network structure have been analyzed, e.g. d-dimensional
integer lattice [4, 13], finite torus [6], finite graphs [8], regular graphs [1] and small world
graphs [10].
While the voter model is different from the threshold models that were studied in [14, 15,
20], it still has the same key property that a person is more likely to change his opinion to the
one held by most of his neighbors. In fact, the threshold models of [14, 15, 20] are monotone
in the sense that once a vertex becomes “activated” it stays activated forever. This makes
these models suitable for studying phenomena such as infection processes. However, some
process, such as which product a user is currently using, are not monotone in this sense.
Therefore, the voter model, which allows to deactivate vertices, may be more suitable for
studying non monotone processes. Another important property of the voter model is that a
consensus is reached with probability one (see Theorem 4). It is interesting to observe that
many technologies (almost) reach consensus, for instance Windows as an operating system,
Google as a search engine, YouTube for sharing videos and many more.
Our main contributions are an exact solution to the spread maximization set problem
in the voter model, when all nodes have the same cost (the cost of a node is the cost of
introducing the person with a new technology/product), and providing an FPTAS
2
for the
more general case in which different nodes may have different costs. In contrast to most of the
2
An FPTAS, short for Fully Polynomial Time Approximation Scheme, is an algorithm that for any ² approximates
the optimal solution up to an error (1 + ²) in time poly(n/²).

previous results, which considered only the status of the network in the “limit”, that is, when
the network converges to a steady state, our algorithms easily adopt to the case of different
target times.
3
An interesting special case of our result is that the most natural heuristic
solution, which picks the nodes in the network with the highest degree, is indeed the optimal
solution, when all nodes have the same cost. We show that the optimal set for the long term
is the set that maximizes the chances of reaching consensus with new technology/product.
We note that while our results assume a synchronous model, i.e. at each step all the
users are updating their opinions, and unweighted graph all the results apply to asynchro-
nous models and weighted graphs with very simple modification that are omitted from this
extended abstract.
2 The Voter Model
We start by providing a formal definition of the voter mo del (see [4, 13] for more details).
Definition 1. Let G = G(V, E) be an undirected graph with self loops. For a node v V ,
we denote by N(v) the set of neighbors of v in G. Starting from an arbitrary initial 0/1
assignment to the vertices of G, at each time t 1, each node picks uniformly at random
one of its neighbors and adopts its opinion. More formally, starting from any assignment
f
0
: V {0, 1}, we inductively define
f
t+1
(v) =
(
1, with probability
|{uN(v):f
t
(u)=1}|
|N(v)|
0, with probability
|{uN(v):f
t
(u)=0}|
|N(v)|
Note that the voter model is a random process whose behavior depends on the initial
assignment f
0
. If we think of f
t
(v) = 1 as indicating whether v is using the product we
wish to advertise, then a natural quantity we wish to study is the expected number of nodes
satisfying f
t
(v) = 1 at any given time t. Of course, a simple way to maximize the number of
such nodes is to start from an initial assignment f
0
in which f
0
(v) = 1 for all v. However, in
reality we may not be able to start from such an assignment as there is a cost c
v
for setting
f
0
(v) = 1 and we have a limited budget B. For example, c
v
can be the cost of “convincing”
a website to use a certain application we want other websites to use as well. This is the main
motivation for the spread maximization set problem that is defined below in the context of
the voter model. As we have previous mentioned, this (meta) problem was first defined by
Domingos and Richardson [7, 22] and was studied by [22, 14, 15, 20] in other models of social
networks.
Definition 2 (The spread maximization set problem). Let G be a graph representing
a social network, c R
n
a vector of costs indicating the cost c
v
of setting f
0
(v) = 1, B a
budget, and t a target time. The spread maximization set problem is the problem of finding
an assignment f
0
: V {0, 1} that will maximize the expectation E
£
P
vV
f
t
(v)
¤
subject to
the budget constraint
P
{v:f
0
(v)=1}
c
v
B.
3
Kemp e et al. [14] considered also finite horizon but under different objective function, i.e. for every individual
how many timesteps she held the desired opinion until the target time. Furthermore, their approach required
maintaining a graph whose size is proportional to the original graph size times the target time.

3 Solving the Spread Maximization Set Problem
Our algorithms for solving the spread maximization set problem all rely on the well known
fact that the voter model can be analyzed using graphical models (see [9] for more details).
Let us state a very simple yet crucial fact regarding the voter model that follows from this
perspective. Recall that in the voter model, the probability that node v adopts the opinion
of one of its neighbors u is precisely 1/N(v). Stated equivalently, this is the probability that
a random walk of length 1 that starts at v ends up in u. Generalizing this observation to
more than one step, one can easily prove the following by induction on t.
Proposition 1. Let p
t
u,v
denote the probability that a random walk of length t starting at
node u stops at node v. Then the probability that after t iterations of the voter model, node
u will adopt the opinion that node v had at time t = 0 is precisely p
t
u,v
.
We thus get the following corollary.
Corollary 1. Let S = {u : f
0
(u) = 1 }. The probability that f
t
(v) = 1 is the probability that
a random walk of length t starting at v ends in S.
Equipped with the above facts we can now turn to describe the simple algorithms for the
spread maximization set problem.
3.1 The case of short term
We start by showing how to solve the problem for the case of the short term, that is when t
is (any) polynomial in n. We note that studying the spread maximization problem for short
time term is crucial to the early stages of introducing a new technology into the market.
As usual, let M be the normalized transition matrix of G, i.e. M(v, u) = 1/|N(v)|. For a
subset S {1, . . . , n} we will denote by 1
S
the 0/1 vector, whose i
th
entry is 1 iff i S. The
following lemma gives a characterization of the spread maximizing set.
Lemma 1. For any graph G with transition matrix M, the spread maximizing set S is the
set which maximizes 1
S
M
t
subject to
P
vS
c
v
B.
Proof. Recall the well known fact that p
t
u,v
, which is the probability that a random walk of
length t starting at u ends in v, is given by the (u, v) entry of the matrix M
t
. The spread
maximizing set problem asks for maximizing E
£
P
vV
f
t
(v)
¤
subject to
P
vS
c
v
B. By
linearity of expectation, we have that
E
"
X
vV
f
t
(v)
#
=
X
vV
P rob[f
t
(v) = 1].
By Corollary 1 we have that if we set f
0
(v) = 1 for any v S then
P rob[f
t
(v) = 1] = 1
S
M
t
1
T
{v}
.

Therefore,
E
"
X
vV
f
t
(v)
#
=
X
vV
1
S
M
t
1
T
{v}
= 1
S
M
t
,
and we conclude that the optimal set S is indeed the one maximizing 1
S
M
t
subject to
P
vS
c
v
B.
Using this formulation we can obtain the following theorems that shed light on how well
can be the maximizing spread set problem solved. We note that these positive results are
in contrast to the inapproximabilty results in the model introduced by [14] for threshold
networks.
Theorem 1. If the vector cost c is uniform, that is, if for all v we have c
v
= c, then the
spread maximization set problem can be solved exactly in polynomial time for any t = poly(n).
Proof. First note the entries of M
t
can b e computed efficiently for any t = poly(n). For
any t to compute M
t
we need to preform O(log t) matrix multiplication which can be done
efficiently. For every node v denote g
v
= 1
{v}
M
t
. By Lemma 1 we have that the problem is
equivalent to the problem of maximizing 1
S
M
t
subject to
P
vS
c
v
B. As 1
S
M
t
=
P
vS
g
v
and the cost of every node is identical, we get that for every budget B, the optimal set is
the first bB/cc nodes when sorted according to g
v
.
Theorem 2. There exists an FPTAS to the spread maximization set problem for any t =
pol y(n).
Proof. Once again, for every node v denote g
v
= 1
{v}
M
t
. Our goal is to maximize 1
S
M
t
=
P
vS
g
v
subject to
P
vS
p
v
B. Observe that this is just an instance of the Knapsack
problem and thus we can use the well known linear time FPTAS algorithm of Knapsack [16]
to obtain an FPTAS to the spread maximization set problem.
Observe that in general we can cannot expect to be able to solve the spread maximization
set problem exactly because when t = 0 this problem is equivalent to the Knapsack problem,
which is NP-hard.
3.2 The case of long term
In the previous subsection we have considered the case where t is polynomial in n. Let us
consider now the case of large t, where by large we mean t n
5
. Recall the well known fact
that for any graph G with self loops, a random walk starting from any node v, converges to
the steady state distribution after O(n
3
) steps (see [21]). Furthermore, if we set d
v
= |N(v)|
then the (unique) steady state distribution is that the probability of being at node u is
d
u
/2|E|. In other words, if t À n
3
then M
t
u,v
= (1 + o(1))d
u
/2|E|.
4
Once again, using
Lemma 1 we can obtain the following corollaries.
4
More precisely, the smaller we want the o(1) term to be the larger we need t to be.

Citations
More filters
Journal ArticleDOI

Maximizing the Spread of Influence through a Social Network

TL;DR: The problem of finding the most influential nodes in a social network is NP-hard as mentioned in this paper, and the first provable approximation guarantees for efficient algorithms were provided by Domingos et al. using an analysis framework based on submodular functions.
Journal ArticleDOI

Influence Maximization on Social Graphs: A Survey

TL;DR: This paper surveys and synthesizes a wide spectrum of existing studies on IM from an algorithmic perspective, with a special focus on a review of well-accepted diffusion models that capture the information diffusion process and build the foundation of the IM problem.
Journal ArticleDOI

A Shapley Value-Based Approach to Discover Influential Nodes in Social Networks

TL;DR: This paper proposes a new way of solving the top-k nodes problem and the λ -coverage problem using the concept of Shapley value which is a well known solution concept in cooperative game theory and compares the performance of the proposed SPIN algorithms with well known algorithms in the literature.
Book

Information and Influence Propagation in Social Networks

TL;DR: A detailed description of well-established diffusion models, including the independent cascade model and the linear threshold model, that have been successful at explaining propagation phenomena are described as well as numerous extensions to them, introducing aspects such as competition, budget, and time-criticality, among many others.
References
More filters
Journal ArticleDOI

Diffusion of innovations

TL;DR: Upon returning to the U.S., author Singhal’s Google search revealed the following: in January 2001, the impeachment trial against President Estrada was halted by senators who supported him and the government fell without a shot being fired.
Book

Social Network Analysis

John Scott
TL;DR: In this article, the development of social network analysis, tracing its origins in classical sociology and its more recent formulation in social scientific and mathematical work, is described and discussed. But it is argued that the analysis of social networks is not a purely static process.
Proceedings ArticleDOI

Maximizing the spread of influence through a social network

TL;DR: An analysis framework based on submodular functions shows that a natural greedy strategy obtains a solution that is provably within 63% of optimal for several classes of models, and suggests a general approach for reasoning about the performance guarantees of algorithms for these types of influence problems in social networks.
MonographDOI

What is social network analysis

John Scott
TL;DR: Social networks operate on many levels, from families up to the level of nations, and play a critical role in determining the way problems are solved, organizations are run, and the degree to which individuals achieve their goals.
Journal ArticleDOI

Threshold models of collective behavior.

TL;DR: This article developed models of collective behavior for situations where actors have two alternatives and the costs and/or benefits of each depend on how many other actors choose which alternative, and the key...