What have the authors contributed in "Event detection in activity networks" ?

The authors consider the problem of mining activity networks to identify interesting events, such as a big concert or a demonstration in a city, or a trending keyword in a user community in a social network. The authors show that this formulation can be mapped to the MaxCut problem, and thus, it can be solved by applying standard semidefinite programming techniques. For the two problems the authors introduce, they also propose efficient and effective greedy approaches and they prove performance guarantees for one of them. The authors experiment with the proposed algorithms on real datasets from a public bicycling system and a geolocation-enabled social network dataset collected from twitter.

What are the future works mentioned in the paper "Event detection in activity networks" ?

Their work opens many interesting directions for future research.

How do the authors plant events to the datasets?

The authors plant events to the datasets by setting to 1 the values of nodes that occur within the event and to 0 the volues of nodes outside the planted event.

What is the way to select the parameter?

One way of selecting the parameter is to execute the algorithms with different values of λ, plot the Pareto curve for those values and choose a value that yields the desired trade-off between spanned weight and distance.

How do the authors improve the quality of the events discovered by their algorithms?

To improve the quality of the events discovered by their algorithms the authors set the weight of a node to be |x −m|, where x is the current activity level of a station and m is the typical activity level of the station.

What is the objective function of the prize-collection Steiner-tree problem?

The objective is to select a subset of the nodes such that the cost of the tree to connect them plus the prizes of the nodes not in the tree is minimized.

What can be taken into account when monitoring traffic?

For instance, periodicity phenomena can be taken into account; for example, when monitoring the traffic activity in a city, the reference values for weekend 8am traffic are much lower than that of a week day.

What is the algorithm for PD?

The currently best algorithm [4] improves the approximation ratio to (2 − ), which is more important rather from a theoretical perspective.

What are the two simple greedy algorithms?

The authors show that the optimization function is submodular and the authors provide two simple greedy algorithms with provable approximation guarantees.

What is the shifted version of the EventTree problem?

EventAllPairs+ PROBLEMIn this section the authors present their algorihtms for the EventAllPairs+ problem, starting with efficient greedy approaches and continuing with a slower but more effective approach based on the MaxCut problem.

What are the problems the authors define with respect to the activity value monitored?

Events are defined with respect to the activity value monitored: the authors are interested in finding compact subareas where the traffic measurements are abnormal, the pollution levels are unusually high, and so on.

(Open Access) Event detection in activity networks (2014) | Polina Rozenshtein

Event detection in activity networks

Polina Rozenshtein

Aalto University

Espoo, Finland

polina.rozenshtein@aalto.ﬁ

Aris Anagnostopoulos

Sapienza University of Rome

Italy

aris@dis.uniroma1.it

Aristides Gionis

Aalto University and HIIT

Espoo, Finland

aristides.gionis@aalto.ﬁ

Nikolaj Tatti

Aalto University and HIIT

Espoo, Finland

nikolaj.tatti@aalto.ﬁ

ABSTRACT

With the fast growth of smart devices and social networks,

a lot of computing systems collect data that record diﬀerent

types of activities. An important computational challenge

is to analyze these data, extract patterns, and understand

activity trends. We consider the problem of mining activity

networks to identify interesting events, such as a big concert

or a demonstration in a city, or a trending keyword in a user

community in a social network.

We deﬁne an event to be a subset of nodes in the network

that are close to each other and have high activity levels.

We formalize the problem of event detection using two

graph-theoretic formulations. The ﬁrst one captures the

compactness of an event using the sum of distances among

all pairs of the event nodes. We show that this formulation

can be mapped to the MaxCut problem, and thus, it can

be solved by applying standard semideﬁnite programming

techniques. The second formulation captures compactness

using a minimum-distance tree. This formulation leads to

the prize-collecting Steiner-tree problem, which we solve by

adapting existing approximation algorithms. For the two

problems we introduce, we also propose eﬃcient and eﬀective

greedy approaches and we prove performance guarantees for

one of them. We experiment with the proposed algorithms

on real datasets from a public bicycling system and a

geolocation-enabled social network dataset collected from

twitter. The results show that our methods are able to

detect meaningful events.

Categories and Subject Descriptors

H.3.4 [Database Management]: Database Applications—

Data mining

General Terms

Algorithms, Experimentation

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full citation

on the ﬁrst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

KDD’14, August 24–27, 2014, New York, NY, USA.

ACM 978-1-4503-2956-9/14/08 ...$15.00.

http://dx.doi.org/10.1145/2623330.2623674 .

Keywords

Event detection; Submodular function maximization; Group

Steiner tree; Maximum cut.

1. INTRODUCTION

Detecting events is a fundamental problem in data mining

and numerous methods have been applied to a variety of

scenarios, including time series and data streams [19], point

clouds and vector spaces [12], and networks [11]. The work

in this paper concentrates on the latter category: discovering

events in networks.

At a high level, our goal is to identify parts of the network

with unusual high activity conﬁned in a small space or in a

dense part of the graph. We consider a network G = (V,E)

whose nodes monitor and record a certain activity. Such

a network can represent a sensor network, a social-media

site, activity levels from brain imaging, and so on. Given

a time instance t, each node v ∈ V in the network keeps a

value w

(v) with the measurement value for the monitored

activity. The objective of our approach is to detect an event

happening in the network at time instance t. The event

is deﬁned with respect to activity values w

(·) as well as

network connectivity. In particular, we aim at ﬁnding a

subset of network nodes S, such that all nodes in S are close

to each other and they all have high levels of activity.

The activity values w

(v) may be absolute or normalized.

Normalization here can be used to capture the abnormality

level of a node at a certain time with respect to the routine

operation of the network. In general, arbitrarily complex

models can be used to deﬁne appropriately normalized

activity values w

(v). For instance, periodicity phenomena

can be taken into account; for example, when monitoring

the traﬃc activity in a city, the reference values for weekend

8am traﬃc are much lower than that of a week day. In this

paper, we assume that the activity values w

(v) are provided

as input to our problem. Devising models to obtain ﬁnely

tuned and appropriately normalized activity values w

(v)

depends on the application at hand and we consider it as an

orthogonal problem, outside the scope of this paper.

The problems we deﬁne can be applied to a variety

of scenarios, such as medical diagnosis, performance

monitoring, image or video surveillance, and fraud detection.

In this paper we experiment with two applications, event

detection in sensor networks and social networks.

Sensor networks. Consider a network of sensors deployed

in a certain region and recording a measurement of interest,

such as traﬃc, pollution, or water-quality level. In some

cases, the network nodes are geolocated so that a distance

measure is deﬁned among all nodes, whereas in other cases

only a list of direct neighbors is known for each node. Events

are deﬁned with respect to the activity value monitored:

we are interested in ﬁnding compact subareas where the

traﬃc measurements are abnormal, the pollution levels are

unusually high, and so on.

Social networks. Social networks model social interactions

between individuals. The values recorded at each node

correspond to a type of activity we are interested in

monitoring, for example, the number of messages posted by

an individual during the last few hours, an average sentiment

score of those messages, or the frequency of keywords

associated with a topic of interest. In this application, the

events discovered by our methods will correspond to dense

network subgraphs that exhibit high values with respect to

the activity monitored.

Motivated by the previous discussion, we formalize event

detection as a problem of ﬁnding a subgraph S ⊆ V in

a graph G = (V,E,w) with node weights w(v), for each

v ∈ V . The subgraph S needs to be compact in terms

of graph distances, and the sum of weights of the nodes

of S needs to be large. We express subgraph compactness

using two diﬀerent deﬁnitions: sum of pairs of distances

and Steiner-tree cost. The ﬁrst deﬁnition leads to a

graph-cut problem. We show that the optimization function

is submodular and we provide two simple greedy algorithms

with provable approximation guarantees. We also show how

we can transform the problem to a variant of the MaxCut

problem to obtain an algorithm that is less practical but has

better approximation guarantee than the greedy algorithms.

The second subgraph-compactness deﬁnition, which is

based on the cost of the minimum Steiner tree, leads to

a prize-collecting Steiner-tree problem. For that problem

we apply a well known 2-approximation algorithm, which

is based on the primal–dual paradigm. We also experiment

with a greedy heuristic, which in practice gives almost as

good solutions as the primal–dual algorithm.

We evaluate our problem formulation and the proposed

algorithms on three sensor-network datasets and two

social-network datasets. The former are real-world datasets

from public bicycling systems from three large cities,

Barcelona, Washington D.C., and Minneapolis, while the

latter are datasets collected from twitter on two cities, New

York City and Los Angeles. Using our methods allows to

successfully discover real events in the cities, as reﬂected on

the usage of shared bikes or twitter volume.

1.1 Related work

Statistical methods. A statistical approach for ﬁnding

anomalies is the spatial scan statistic [23]. The method

typically assumes that the data are distributed according

to some distribution on a Euclidean space. The goal is

then to detect whether there exists a subarea where the

data are distributed according to the same distribution

but with a higher density parameter. This approach is

related to our setting but it has important diﬀerences.

First, the statistical approach deﬁnes a null hypothesis,

therefore, it assumes an underlying distribution over the

data. Instead, our approach is formalized through an

optimization function, and no such assumptions are needed.

Second, the classical approaches are rather heuristic in

nature: the methods detecting the most diverse subareas are

usually based on Monte Carlo sampling, even though there

have been approaches to formulate them as optimization

problems and provide algorithmic solutions [1, 2]. Yet a

third diﬀerence is that usually these approaches assume that

the shape is predeﬁned (e.g., a circle or an axis-parallel

rectangle), which allows for the design of algorithms that

can search the space of shapes. Statisticians have proposed

generalizations where the shape of the dense cluster is not

ﬁxed a priori [7, 25, 27], however this forces the solution of

the detection problem to be heuristic (e.g., Monte Carlo

simulations). Finally, all these approaches assume that

there exists an underlying Euclidean geometry on the space.

For several of the applications we are interested, such as

social networks, such Euclidean geometry is not present,

thus requiring alternative approaches.

In summary, the approach of this paper is conceptually

diﬀerent from all the related work on spatial scan statistics.

Here we provide a formal graph-theoretic deﬁnition of

the problem and algorithms that have theoretical quality

guarantees and are eﬃcient in practice.

Event detection in social media. Event detection based

on geospatial information from social media and twitter is

a research area that has attracted signiﬁcant attention in

the last years. Baldwin et al. [8] developed an interactive

system, in which a user can insert some queries and obtain

the volume of tweets containing these terms at diﬀerent

granularities of space and time. Walther and Kaisser [30]

developed a system for detecting events that take place

from the twitter stream. It gathers tweets as they are

created and it clusters them online based on geolocation.

A machine-learning module evaluates whether a cluster of

tweets refer to an event. Also Watanabe et al. [31] develop

a similar system that identiﬁes tweets that are created close

in time and space and by looking at co-occurring terms

it attempts to discover if they refer to the same event.

Olson et al. [24] designed a system that aims at detecting

very rapidly large rare events: events that happen with

very low frequency but with large consequences, such as

an earthquake or a tsunami. The diﬀerence of this line of

work with our approach is that we oﬀer a graph-theoretic

formulation to the problem, which can be applied to any

graph, not just geography-induced graphs. Additionally,

our method does not uses text, but it assumes numerical

measurements on the graph nodes.

Anomaly and outlier detection. Our problem has also

resemblance with problems related to outlier and anomaly

detection in networks. The main objective of these works

is to identify patterns that are diﬀerent than normal. For

instance, Heard et al. [20] apply statistical techniques to

discover a subset of the nodes that in a small period of time

change signiﬁcantly their communication patterns. Bhuyan

et al. [9] apply clustering on network data for detecting

intrusion attacks. The tutorial of Akoglu and Faloutsos [3]

has an extensive reference list with related publications.

Dense subgraphs. Our problem is related to ﬁnding dense

subgraphs [6, 14, 16, 22, 29]. Here, the goal is to ﬁnd small

parts of the network with a high number of edges. We are

interested in small subgraphs with high number of nodes,

thus obtaining diﬀerent objective functions.

Finally, in a work related to the Steiner-tree problem

formulation presented in this paper, Seufert et al. [26]

consider the problem of ﬁnding a subtree with k nodes such

that the total node weight is maximized. Their approach

also relies on the prize-collecting Steiner-tree, but their

methods focus on identifying a heavy tree of exactly k nodes.

2. PROBLEM FORMULATION

We consider a graph G = (V,E,w,c), with V being a set

of n vertices and E being a set of m edges. The weight

function w : V → R assigns a nonnegative value w(v) to

each vertex v, whereas the distance function c : E → R

assigns a distance value c(u,v) to each edge (u,v). The edge

distance function c can be used to deﬁne a new distance

function over all pairs of vertices (u,v) ∈ V × V . This can

be done by considering the shortest-path distance closure

between vertices. Namely, for each u,v ∈ V we deﬁne d(u,v)

to be equal to the distance of a shortest path from u to

v using edges of G, or ∞ if no such path exists. Unless

speciﬁed otherwise, in the rest of the paper we assume

that the shortest-path distance d is used, and that a ﬁnite

distance value is deﬁned for all pairs of vertices in the graph.

Our goal is to ﬁnd a subset of vertices S ⊆ V that has

large total weight according to the weight function w, and it

is suﬃciently compact, that is, the vertices in S are close to

each other according to the distance function d.

To capture our objective we need to deﬁne appropriate

weight and distance functions for subsets of vertices. Given

S ⊆ V we denote such weight and distance functions by

W (S) and D(S), respectively. As a set weight function we

consider simply the sum of all the weights in the set, that is,

W (S) =

v∈S

w(v).

For measuring the total distance of a set S we consider

two options. The ﬁrst option is to sum the distances of all

pairs of vertices in S, namely

(S) =

v∈S

u∈S

d(u,v).

This type of objective function is suitable for events that

are concentrated in a small area in a round-shaped area, for

instance a football game. But events might have diﬀerent

shapes, such as a parade, a street concert, a ﬁrework show,

and so on; for such events, we need a distance function that

does not penalize long distances, as long as points in between

are also active. This leads to using the minimum Steiner tree

of the subgraph induced by the set of vertices S. We denote

this total-distance measure by D

(S). Thus, we have

(S) = min

T ∈T (G[S])

(u,v)∈T

d(u,v),

where G[S] denotes the subgraph of G induced by a subset

of vertices S ⊆ V , and T (H) denotes the set of all the trees

of a graph H.

Subsets of vertices S ⊆ V that correspond to meaningful

events need to have large weight value W (S) and small

total distance value D(S). To combine the two measures

in a way that allows to maximize simultaneously W (S)

and minimize D(S) there are many diﬀerent options, for

example, optimizing the one measure while setting a budget

constraint on the other or taking a linear combination of

the two measures. The former approach induces diﬃculties

in devising approximation algorithms. To avoid them, we

consider a linear combination of the two measures into a

single objective with a normalization coeﬃcient λ. The

coeﬃcient λ provides an easy way to control the relative

importance of the two measures. In addition, as we will see

in the next section, such a linear combination leads to neat

mathematical forms, which can be viewed as a quadratic

integer program or as well studied tree-based problems.

Thus, we consider the following problem formulation.

Problem 1 (Event). Given a graph G = (V,E,w,c)

with vertex weights w and edge distance c, and a

normalization coeﬃcient λ, ﬁnd a subset of vertices S ⊆ V

that maximizes the objective function

Q(S) = λ W (S) − D(S). (1)

Problem 1 is a generic problem. The exact structure

of the problem and solution methods depend on which

distance function D is used. We obtain two instantiations of

Problem 1 by considering the two set distance functions D

and D

that we discussed previously. Thus, we consider the

following two speciﬁc problems.

Problem 2 (EventAllPairs). Given a graph G =

(V, E,w,c) with vertex weights w and edge distance c, and a

normalization coeﬃcient λ, ﬁnd a subset of vertices S ⊆ V

that maximizes the objective function

(S) = λ W (S) − D

(S). (2)

Problem 3 (EventTree). Given a graph G = (V,E,

w,c) with vertex weights w and edge distance c, and a

normalization coeﬃcient λ, ﬁnd a subset of vertices S ⊆ V

that maximizes the objective function

(S) = λ W (S) − D

(S). (3)

One should note that in the above problem formulations,

the objective functions Q, Q

, and Q

can take negative

values. Negative values do not create any problems as long

as one is after exact solutions. However, in the context of

approximation algorithms, objective functions with negative

values are problematic because it becomes more diﬃcult

to apply the concept of multiplicative approximation

guarantee. To overcome this problem we modify the

objective functions to ensure that they take nonnegative

values. We do so by adding a constant term, and we thus

consider a shifted version of our objective functions.

First, for the problem EventAllPairs we consider the

shifted function Q

(S) = Q

(S) + D

(V ). It is easy

to see that the function Q

is nonnegative for all S ⊆

V . This makes it easier to design approximation algorithms

with multiplicative approximation guarantees. A modiﬁed

version of the EventAllPairs problem, which we denote

by EventAllPairs+, is now deﬁned as follows.

Problem 4 (EventAllPairs+). Given a graph G =

(V,E,w,c) with vertex weights w and edge distance c, and a

normalization coeﬃcient λ, ﬁnd a subset of vertices S ⊆ V

that maximizes the objective function

(S) = Q

(S) + D

(V )

= λ W (S) − D

(S) + D

(V ). (4)

For the EventTree problem we follow a diﬀerent

approach: ﬁrst we note that with respect to ﬁnding an

exact solution, maximizing the function Q

is equivalent to

minimizing −Q

. We choose to work with this minimization

problem, and we consider minimizing the shifted function

(S) = −Q

(S) + λ W (V ), which is a nonnegative for all

S ⊆ V . For this shifted objected function it holds:

(S) = λ W (V ) − λ W (S) + D

(S)

= λ W (V \ S) + D

(S).

The interpretation of the above objective function is to

ﬁnd a set S so that the tree cost D

(S) and the (scaled

by λ) weight of the vertices not included in S is minimized.

This problem is known as the prize-collecting Steiner-tree

(PCST) problem [4, 21]. The term “prize collecting” comes

from thinking of the weights on the vertices of the graph

as prizes and the goal is to ﬁnd a tree that minimizes the

tree cost and the total value of prizes not spanned by it.

The shifted version of the EventTree problem, denoted by

EventTree+, is now deﬁned as follows.

Problem 5 (EventTree+). Given a graph G = (V,

E,w,c) with vertex weights w and edge distance c, and a

normalization coeﬃcient λ, ﬁnd a subset of vertices S ⊆ V

that minimizes the objective function

(S) = λ W (V \ S) + D

(S). (5)

For general graphs, the problem EventAllPairs+ is

NP-hard. The proof of the next lemma is obtained by

a reduction from the IndependentSet problem and is

omitted because of lack of space.

Lemma 1. The problem EventAllPairs+ is NP-hard

for graphs with general edge distance functions.

In the more restricted version in which the input graph is a

metric the complexity of the problem is open.

On the other hand, the problem EventTree+ is

NP-hard even for metric edge distances as it generalizes

the Steiner-tree problem:

Lemma 2. The problem EventTree+ is NP-hard.

3. ALGORITHMS FOR THE

EventAllPairs+ PROBLEM

In this section we present our algorihtms for the

EventAllPairs+ problem, starting with eﬃcient greedy

approaches and continuing with a slower but more eﬀective

approach based on the MaxCut problem.

3.1 Greedy algorithms

We start our discussion on the EventAllPairs+

problem by considering the properties of the underlying

objective function Q

. In particular, we can show that the

function Q

is submodular. Submodularity is a desirable

property because a number of approximation algorithms

rely on it. The approximability of a submodular function

depends on other properties as well, in particular whether

the function is monotone and/or symmetric. We recall

that if V is a ground set, a set function f : 2

→ R is

submodular if for all S,T ⊆ V we have that f(S) + f (T ) ≥

f(S ∩ T ) + f(S ∪ T ). The function f is monotone if for all

S ⊆ T ⊆ V it holds that f (S) ≤ f(T ). It is symmetric if for

all S ⊆ V we have that f (S) = f (V \ S).

Lemma 3. The function Q

is submodular.

Algorithm 1: BFNS: Greedy algorithm for

EventAllPairs+ using the approach of Buchbinder et

al. [13]

← ∅, Y

← V

for i = 1 to n do

← Q

i−1

∪ {u

}) − Q

i−1

)

← Q

i−1

\ {u

}) − Q

i−1

)

← max{a

,0}, b

← max{b

,0}

with probability a

/(a

+ b

) do

← X

i−1

∪ {u

}, Y

← Y

i−1

else with compliment probability b

/(a

+ b

) do

← Y

i−1

\ {u

}, X

← X

i−1

return X

(or equally Y

)

Proof. (Sketch) We set I(S) , λ W (S) and D(S) ,

(V ) − D

(S). It is Q

(S) = I(S) + D(S). It is

easy to see that both functions I and D are positive and

submodular. The lemma follows from the fact that the sum

of two positive submodular functions is submodular.

Thus we want to approximate a submodular function

without any constraints. A recent paper by Buchbinder et

al. [13] provides a linear-time

-approximation algorithm for

this problem, which we explain below.

The algorithm of Buchbinder et al. is based on a

randomized double-greedy approach. The technique utilizes

the fact that for a submodular function f , the function

f(S) = f (V \ S) is also submodular. Furthermore, if a set

∗

is optimal for f then V \ S

∗

is also optimal for

f and the

optimal values of the two functions are equal.

The suggested approach is a randomized algorithm, which

performs two types of greedy steps, searching for the optimal

solution for f and

f. The search strategy can be viewed

as running two greedy algorithms simultaneously. We start

with an arbitrary order of the elements in V , say u

, . . . ,u

The two greedy processes traverse the sequences of sets

, . . .} and {Y

, . . .}, respectively. The one greedy

process starts from the empty set X

= ∅ and grows it

to optimize f, while the other greedy process starts from

the ground set Y

= V and shrinks it to optimize

f. The

algorithm guarantees that the growing set X

is always

included into the shrinking set Y

. Which of the two steps

(grow X

or reduce Y

) is a random choice, which depends

on the marginal improvement obtained from each move.

The algorithm stops when the sets are equal. A formal

description of the algorithm, which we call BFNS, is shown

in Algorithm 1. We have the following.

Theorem 1 (Buchbinder et al. [13]). The BFNS

algorithm provides a

-factor approximation for the

EventAllPairs+ problem.

For the EventAllPairs+ problem the BFNS algorithm

has the following interpretation: It examines each graph

vertex v one by one and decides whether to keep v in the

solution or remove it. The decision is randomized and the

probabilities depend on the marginal gains incurred in the

function with respect to the current lower and upper

set solutions, X

and Y

Our objective has additional structure, which leads to a

trivial

-approximation algorithm. Recall the deﬁnitions

of the functions I and D in the proof of Lemma 3. Both

functions are positive and I(S) is increasing in S, whereas

D(S) is decreasing in S. Therefore, for the optimal value S

∗

we have that

∗

) = I(S

∗

) + D(S

∗

)

≤ (I(V ) + D(V )) + (I(∅) + D(∅))

= Q

(V ) + Q

(∅)

≤ 2 max{Q

(∅),Q

(V )}.

This means that simply taking the best of the empty set or

the entire node set V also provides a

approximation. We

call this algorithm Trivial.

We ﬁnally propose the standard greedy algorithm, which

starts from the empty set, adds one vertex at a time, and

stops when the solution cannot be improved. We refer to

this simple greedy as GreedyAP.

For general unconstrained submodular functions, the

greedy algorithm does not provide any guarantee. Here we

are able to exploit the speciﬁc structure of the objective, and

we prove that also GreedyAP provides a

approximation.

The result follows from the following lemma.

Lemma 4. Consider a submodular function F and let S

be the solution given by the greedy algorithm optimizing F .

Then, F (S) ≥ F (V ).

Proof. We prove a more general statement. We prove

that any hill-climbing algorithm, that is, any algorithm that

starts with the empty set and increasingly adds elements for

as long as the marginal gain is positive, if it returns solution

S we have that F (S) ≥ F (V ).

Assume that V \ S = {x

, . . . ,x

}. Then we have

F (S) − F (V ) =

i=1

(F (S ∪ {x

, . . . ,x

i−1

) − F (S ∪ {x

, . . . ,x

))

≥

i=1

(F (S) − F (S ∪ {x

})) ≥ 0,

where the ﬁrst inequality follows from the submodularity of

F and the second from the fact that the algorithm returned

S without adding any element x

, so each term in the sum

is positive.

Corollary 1. The GreedyAP algorithm provides a

-factor approximation for the EventAllPairs+ problem.

This bound is tight.

Proof. By the discussion just before Lemma 4, it suﬃces

to show that F (S) ≥ F (∅) and F (S) ≥ F (V ). The

former is trivially true, otherwise the algorithm would have

returned S = ∅ and the latter follows from Lemma 4. The

counterexample that shows that the greedy cannot, in the

worst case, achieve approximation better than 1/2, will

appear in the full version of this work.

Although all three algorithms, BFNS, Trivial, and

GreedyAP, have the same theoretical performance, our

experiments show that GreedyAP produces solutions

of higher quality, even slightly better or equal to

MaxCut-based algorithm, which is theoretically superior

and which we discuss next.

3.2 MaxCut formulation

In this section we reduce the EventAllPairs+ problem

to a variant of the MaxCut problem. This allows us to use

a well known algorithm by Goemans and Williamson [18],

which has a 0.868-approximation guarantee. For the

reduction we need the following variant of MaxCut.

Problem 6 ((s,t)-MaxCut). Given a graph G and

two vertices s and t, partition the vertices of G into two sets

A and B such that s ∈ A and t ∈ B and the total weight

of cross edges is maximized. We denote the cost of such a

solution A, B by Q(A, B).

The only diﬀerence between (s,t)-MaxCut and the

traditional MaxCut is that there are two vertices s and

t, for which it is forbidden to be in the same cut. This

technicality does not have any complexity consequences.

To reduce EventAllPairs+ to the (s,t)-MaxCut

problem, assume that we are given a graph G = (V, E)

equipped with the distance function c. Assume also that

we are given a parameter λ. We then construct a new graph

H by adding two special vertices s and t into G. We connect

s to each v ∈ V with a weight c(s, v) =

u∈V

d(v, u). We

connect t to each v ∈ V with a weight c(t, v) = 2λw(v).

Consider an A, B cut of H such that s ∈ A and t ∈ B.

Let S = A \ {s} and let T = B \ {t}. We argue that the

cost of the cut is twice the cost of EventAllPairs+, that

is, Q(A, B) = 2 Q

(S). To see this, notice that each vertex

v ∈ S will contribute 2 λ w(v) to the cost, and each vertex

v ∈ T will contribute c(s, v) =

u∈V

d(v, u) to the cost.

Additional costs will come from edges (u, v), where v ∈ S

and u ∈ T . Combining these costs gives us

Q(A, B) =

v∈S

2 λ w(v) +

v∈T

u∈V

d(v, u) +

v∈T

u∈S

d(v, u)

= 2 λ W (S) +

v∈T

u∈T

d(v, u)

v∈T

u∈S

d(v, u) +

v∈T

u∈S

d(v, u)

= 2 λ W (S) +

v∈V

u∈V

d(v, u) −

v∈S

u∈S

d(v, u)

= 2 λ W (S) + 2 D

(V ) − 2 D

(S)

= 2 Q

(S).

Thus, solving (s,t)-MaxCut also solves EventAllPairs+.

Moreover, since the costs of both problems are the same,

up to a scaling factor, any approximation guarantee for

(s,t)-MaxCut yields the same approximation guarantee for

EventAllPairs+.

Our ﬁnal step is to solve (s,t)-MaxCut. Following the

seminal algorithm of Goemans and Williamson [18], the

problem can be formulated as the following integer program:

max

u,v∈V (H)

c(u,v)

1 − x

such that x

∈ {−1, + 1}, for all u ∈ V (H),

= −1 .

The only diﬀerence with the original MaxCut formulation

is the additional constraint x

= −1. This constraint

ensures that the vertices s and t are in diﬀerent partitions

and the resulting cut is a feasible solution for (s,t)-MaxCut.

The algorithm of Goemans and Williamson proceeds by

making a semideﬁnite relaxation, and then rounds the

solution based on a random projection. Fortunately, the

additional constraint x

= −1 can be easily added to

Event detection in activity networks

Figures

Citations

When Engagement Meets Similarity: Efficient (k,r)-Core Computation on Social Networks

Dense Subgraph Discovery: KDD 2015 tutorial

A Nearly-Linear Time Framework for Graph-Structured Sparsity

A nearly-linear time framework for graph-structured sparsity

Scaling Distance Labeling on Small-World Networks

References

LOF: identifying density-based local outliers

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

A spatial scan statistic

SDPT3 — A Matlab software package for semidefinite programming, Version 1.3

A General Approximation Technique for Constrained Forest Problems

Related Papers (5)

Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs

Graph based anomaly detection and description: a survey

Fast subset scan for spatial pattern detection

Earthquake shakes Twitter users: real-time event detection by social sensors

The prize collecting Steiner tree problem: theory and practice

Frequently Asked Questions (12)

Q1. What have the authors contributed in "Event detection in activity networks" ?

Q2. What are the future works mentioned in the paper "Event detection in activity networks" ?

Q3. How do the authors plant events to the datasets?

Q4. What is the way to select the parameter?

Q5. How do the authors improve the quality of the events discovered by their algorithms?

Q6. What is the objective function of the prize-collection Steiner-tree problem?

Q7. What can be taken into account when monitoring traffic?

Q8. What is the algorithm for PD?

Q9. What are the two simple greedy algorithms?

Q10. What is the main objective function of the Steiner-tree problem?

Q11. What is the shifted version of the EventTree problem?

Q12. What are the problems the authors define with respect to the activity value monitored?