A note on maximizing the spread of influence in social networks

doi:10.1007/978-3-540-77105-0_27

A Note on Maximizing the Spread of Inﬂuence in

Social Networks

Eyal Even-Dar

1

and Asaf Shapira

2

1

Go ogle Research, Email: evendar@google.com

2

Microsoft Research, Email: asaﬁco@microsoft.com

Abstract. We consider the spread maximization problem that was deﬁned by Domingos and Richard-

son [7, 22]. In this problem, we are given a social network represented as a graph and are required

to ﬁnd the set of the most “inﬂuential” individuals that by introducing them with a new technology,

we maximize the expected number of individuals in the network, later in time, that adopt the new

technology. This problem has applications in viral marketing, where a company may wish to spread the

rumor of a new product via the most inﬂuential individuals in popular social networks such as Myspace

and Blogsphere.

The spread maximization problem was recently studied in several models of social networks [14, 15, 20].

In this short paper we study this problem in the context of the well studied probabilistic voter model.

We provide very simple and eﬃcient algorithms for solving this problem. An interesting special case of

our result is that the most natural heuristic solution, which picks the nodes in the network with the

highest degree, is indeed the optimal solution.

1 Introduction

1.1 Background

With the emerging Web 2.0, the importance of social networks as a marketing tool is growing

rapidly and the use of social networks as a marketing tool spans diverse areas, and has even

been recently used by the campaigns of presidential candidates in the United States

1

. Social

networks are networks (i.e. graphs) in which the nodes represent individuals and the edges

represent relations between them. To illustrate the viral marketing channel (see [2, 3, 7, 11,

19]), consider a new company that wishes to promote its new specialized search engine.

A promising way these days would be through popular social network such as Myspace,

Blogsphere etc, rather than using classical advertising channels. By convincing several key

persons in each network to adopt (or even to try) the new search engine, the company can

obtain an eﬀective marketing campaign and to enjoy the diﬀusion eﬀect over the network.

If we assume that “convincing” each key person to “spread” the rumor on the new product

costs money, then a natural problem is the following: given a social network, how can we

detect the players through which we can spread, or “diﬀuse”, the new technology in the most

eﬀective way.

Diﬀusion processes in social network have been studied for a long time in social sci-

ences, see e.g. [5, 12, 3, 23, 24]. The algorithmic aspect of marketing in social networks was

1

Hillary Clinton - http://proﬁle.myspace.com/index.cfm?fuseaction=user.viewproﬁle&friendID=64552165,

Barack Obama - http://proﬁle.myspace.com/index.cfm?fuseaction=user.viewproﬁle&friendid=184040237, John

Edwards-http://proﬁle.myspace.com/index.cfm?fuseaction=user.viewproﬁle&friendid=9736082, Rudy Giuliani -

http://www.myspace.com/rudygiulianiisgod

introduced by Domingos and Richardson [7, 22] and can be formulated as follows. Given a

social network structure and a diﬀusion dynamics (i.e. how the individuals inﬂuence each

other), ﬁnd a set S of nodes of cost at most K that by introducing them with a new tech-

nology/product, the spread of the technology/product will be maximized. We refer to the

problem of ﬁnding such a maximizing set S as the Spread maximization set problem. The

work of Domingos and Richardson [7, 22] studied this problem in a probabilistic setting and

mainly provided heuristics to compute a maximizing set. Following [7, 22], Kempe et al. [14,

15] and Mossel and Roch [20] considered a threshold network, in which users adopt a new

technology only if a ﬁxed fraction of their neighbors have already adopted this new technol-

ogy. Their results show that ﬁnding the optimal subset of size K is NP-Hard to approximate

within a factor smaller than 1 − 1/e and also show that a greedy algorithm achieves this

ratio.

1.2 Our contribution

In this paper we consider the Spread maximization set problem, in the case where the un-

derlying social network behaves like the voter model. The voter model, which was introduced

by Cliﬀord and Sudbury [4] and Holley and Liggett [13], is probably one of the most basic

and natural probabilistic models to represent the diﬀusion of opinions in a social network; it

models the diﬀusion of opinions in a network as follows: in each step, each person changes

his opinion by choosing one of his neighbors at random and adopting the neighbor’s opinion.

The model has been studied extensively in the ﬁeld of interacting particle systems [17, 18,

10, 1] and many variations of the network structure have been analyzed, e.g. d-dimensional

integer lattice [4, 13], ﬁnite torus [6], ﬁnite graphs [8], regular graphs [1] and small world

graphs [10].

While the voter model is diﬀerent from the threshold models that were studied in [14, 15,

20], it still has the same key property that a person is more likely to change his opinion to the

one held by most of his neighbors. In fact, the threshold models of [14, 15, 20] are monotone

in the sense that once a vertex becomes “activated” it stays activated forever. This makes

these models suitable for studying phenomena such as infection processes. However, some

process, such as which product a user is currently using, are not monotone in this sense.

Therefore, the voter model, which allows to deactivate vertices, may be more suitable for

studying non monotone processes. Another important property of the voter model is that a

consensus is reached with probability one (see Theorem 4). It is interesting to observe that

many technologies (almost) reach consensus, for instance Windows as an operating system,

Google as a search engine, YouTube for sharing videos and many more.

Our main contributions are an exact solution to the spread maximization set problem

in the voter model, when all nodes have the same cost (the cost of a node is the cost of

introducing the person with a new technology/product), and providing an FPTAS

2

for the

more general case in which diﬀerent nodes may have diﬀerent costs. In contrast to most of the

2

An FPTAS, short for Fully Polynomial Time Approximation Scheme, is an algorithm that for any ² approximates

the optimal solution up to an error (1 + ²) in time poly(n/²).

previous results, which considered only the status of the network in the “limit”, that is, when

the network converges to a steady state, our algorithms easily adopt to the case of diﬀerent

target times.

3

An interesting special case of our result is that the most natural heuristic

solution, which picks the nodes in the network with the highest degree, is indeed the optimal

solution, when all nodes have the same cost. We show that the optimal set for the long term

is the set that maximizes the chances of reaching consensus with new technology/product.

We note that while our results assume a synchronous model, i.e. at each step all the

users are updating their opinions, and unweighted graph all the results apply to asynchro-

nous models and weighted graphs with very simple modiﬁcation that are omitted from this

extended abstract.

2 The Voter Model

We start by providing a formal deﬁnition of the voter mo del (see [4, 13] for more details).

Deﬁnition 1. Let G = G(V, E) be an undirected graph with self loops. For a node v ∈ V ,

we denote by N(v) the set of neighbors of v in G. Starting from an arbitrary initial 0/1

assignment to the vertices of G, at each time t ≥ 1, each node picks uniformly at random

one of its neighbors and adopts its opinion. More formally, starting from any assignment

f

0

: V → {0, 1}, we inductively deﬁne

f

t+1

(v) =

(

1, with probability

|{u∈N(v):f

t

(u)=1}|

|N(v)|

0, with probability

|{u∈N(v):f

t

(u)=0}|

|N(v)|

Note that the voter model is a random process whose behavior depends on the initial

assignment f

0

. If we think of f

t

(v) = 1 as indicating whether v is using the product we

wish to advertise, then a natural quantity we wish to study is the expected number of nodes

satisfying f

t

(v) = 1 at any given time t. Of course, a simple way to maximize the number of

such nodes is to start from an initial assignment f

0

in which f

0

(v) = 1 for all v. However, in

reality we may not be able to start from such an assignment as there is a cost c

v

for setting

f

0

(v) = 1 and we have a limited budget B. For example, c

v

can be the cost of “convincing”

a website to use a certain application we want other websites to use as well. This is the main

motivation for the spread maximization set problem that is deﬁned below in the context of

the voter model. As we have previous mentioned, this (meta) problem was ﬁrst deﬁned by

Domingos and Richardson [7, 22] and was studied by [22, 14, 15, 20] in other models of social

networks.

Deﬁnition 2 (The spread maximization set problem). Let G be a graph representing

a social network, c ∈ R

n

a vector of costs indicating the cost c

v

of setting f

0

(v) = 1, B a

budget, and t a target time. The spread maximization set problem is the problem of ﬁnding

an assignment f

0

: V → {0, 1} that will maximize the expectation E

£

P

v∈V

f

t

(v)

¤

subject to

the budget constraint

P

{v:f

0

(v)=1}

c

v

≤ B.

3

Kemp e et al. [14] considered also ﬁnite horizon but under diﬀerent objective function, i.e. for every individual

how many timesteps she held the desired opinion until the target time. Furthermore, their approach required

maintaining a graph whose size is proportional to the original graph size times the target time.

3 Solving the Spread Maximization Set Problem

Our algorithms for solving the spread maximization set problem all rely on the well known

fact that the voter model can be analyzed using graphical models (see [9] for more details).

Let us state a very simple yet crucial fact regarding the voter model that follows from this

perspective. Recall that in the voter model, the probability that node v adopts the opinion

of one of its neighbors u is precisely 1/N(v). Stated equivalently, this is the probability that

a random walk of length 1 that starts at v ends up in u. Generalizing this observation to

more than one step, one can easily prove the following by induction on t.

Proposition 1. Let p

t

u,v

denote the probability that a random walk of length t starting at

node u stops at node v. Then the probability that after t iterations of the voter model, node

u will adopt the opinion that node v had at time t = 0 is precisely p

t

u,v

.

We thus get the following corollary.

Corollary 1. Let S = {u : f

0

(u) = 1 }. The probability that f

t

(v) = 1 is the probability that

a random walk of length t starting at v ends in S.

Equipped with the above facts we can now turn to describe the simple algorithms for the

spread maximization set problem.

3.1 The case of short term

We start by showing how to solve the problem for the case of the short term, that is when t

is (any) polynomial in n. We note that studying the spread maximization problem for short

time term is crucial to the early stages of introducing a new technology into the market.

As usual, let M be the normalized transition matrix of G, i.e. M(v, u) = 1/|N(v)|. For a

subset S ⊆ {1, . . . , n} we will denote by 1

S

the 0/1 vector, whose i

th

entry is 1 iﬀ i ∈ S. The

following lemma gives a characterization of the spread maximizing set.

Lemma 1. For any graph G with transition matrix M, the spread maximizing set S is the

set which maximizes 1

S

M

t

subject to

P

v∈S

c

v

≤ B.

Proof. Recall the well known fact that p

t

u,v

, which is the probability that a random walk of

length t starting at u ends in v, is given by the (u, v) entry of the matrix M

t

. The spread

maximizing set problem asks for maximizing E

£

P

v∈V

f

t

(v)

¤

subject to

P

v∈S

c

v

≤ B. By

linearity of expectation, we have that

E

"

X

v∈V

f

t

(v)

#

=

X

v∈V

P rob[f

t

(v) = 1].

By Corollary 1 we have that if we set f

0

(v) = 1 for any v ∈ S then

P rob[f

t

(v) = 1] = 1

S

M

t

1

T

{v}

.

Therefore,

E

"

X

v∈V

f

t

(v)

#

=

X

v∈V

1

S

M

t

1

T

{v}

= 1

S

M

t

,

and we conclude that the optimal set S is indeed the one maximizing 1

S

M

t

subject to

P

v∈S

c

v

≤ B.

Using this formulation we can obtain the following theorems that shed light on how well

can be the maximizing spread set problem solved. We note that these positive results are

in contrast to the inapproximabilty results in the model introduced by [14] for threshold

networks.

Theorem 1. If the vector cost c is uniform, that is, if for all v we have c

v

= c, then the

spread maximization set problem can be solved exactly in polynomial time for any t = poly(n).

Proof. First note the entries of M

t

can b e computed eﬃciently for any t = poly(n). For

any t to compute M

t

we need to preform O(log t) matrix multiplication which can be done

eﬃciently. For every node v denote g

v

= 1

{v}

M

t

. By Lemma 1 we have that the problem is

equivalent to the problem of maximizing 1

S

M

t

subject to

P

v∈S

c

v

≤ B. As 1

S

M

t

=

P

v∈S

g

v

and the cost of every node is identical, we get that for every budget B, the optimal set is

the ﬁrst bB/cc nodes when sorted according to g

v

.

Theorem 2. There exists an FPTAS to the spread maximization set problem for any t =

pol y(n).

Proof. Once again, for every node v denote g

v

= 1

{v}

M

t

. Our goal is to maximize 1

S

M

t

=

P

v∈S

g

v

subject to

P

v∈S

p

v

≤ B. Observe that this is just an instance of the Knapsack

problem and thus we can use the well known linear time FPTAS algorithm of Knapsack [16]

to obtain an FPTAS to the spread maximization set problem.

Observe that in general we can cannot expect to be able to solve the spread maximization

set problem exactly because when t = 0 this problem is equivalent to the Knapsack problem,

which is NP-hard.

3.2 The case of long term

In the previous subsection we have considered the case where t is polynomial in n. Let us

consider now the case of large t, where by large we mean t ≥ n

5

. Recall the well known fact

that for any graph G with self loops, a random walk starting from any node v, converges to

the steady state distribution after O(n

3

) steps (see [21]). Furthermore, if we set d

v

= |N(v)|

then the (unique) steady state distribution is that the probability of being at node u is

d

u

/2|E|. In other words, if t À n

3

then M

t

u,v

= (1 + o(1))d

u

/2|E|.

4

Once again, using

Lemma 1 we can obtain the following corollaries.

4

More precisely, the smaller we want the o(1) term to be the larger we need t to be.

A note on maximizing the spread of influence in social networks

Citations

Maximizing the Spread of Influence through a Social Network

Influence Maximization on Social Graphs: A Survey

Social Network Analysis: Computer Programs

A Shapley Value-Based Approach to Discover Influential Nodes in Social Networks

Information and Influence Propagation in Social Networks

References

Diffusion of innovations

Social Network Analysis

Maximizing the spread of influence through a social network

What is social network analysis

Threshold models of collective behavior.

Related Papers (5)

Maximizing the spread of influence through a social network

Mining the network value of customers

Mining knowledge-sharing sites for viral marketing

Efficient influence maximization in social networks

Cost-effective outbreak detection in networks