scispace - formally typeset
Open AccessProceedings ArticleDOI

Event detection in activity networks

TLDR
The problem of mining activity networks to identify interesting events, such as a big concert or a demonstration in a city, or a trending keyword in a user community in a social network is considered, using graph-theoretic formulations.
Abstract
With the fast growth of smart devices and social networks, a lot of computing systems collect data that record different types of activities. An important computational challenge is to analyze these data, extract patterns, and understand activity trends. We consider the problem of mining activity networks to identify interesting events, such as a big concert or a demonstration in a city, or a trending keyword in a user community in a social network.We define an event to be a subset of nodes in the network that are close to each other and have high activity levels. We formalize the problem of event detection using two graph-theoretic formulations. The first one captures the compactness of an event using the sum of distances among all pairs of the event nodes. We show that this formulation can be mapped to the maxcut problem, and thus, it can be solved by applying standard semidefinite programming techniques. The second formulation captures compactness using a minimum-distance tree. This formulation leads to the prize-collecting Steiner-tree problem, which we solve by adapting existing approximation algorithms. For the two problems we introduce, we also propose efficient and effective greedy approaches and we prove performance guarantees for one of them. We experiment with the proposed algorithms on real datasets from a public bicycling system and a geolocation-enabled social network dataset collected from twitter. The results show that our methods are able to detect meaningful events.

read more

Content maybe subject to copyright    Report

Event detection in activity networks
Polina Rozenshtein
Aalto University
Espoo, Finland
polina.rozenshtein@aalto.fi
Aris Anagnostopoulos
Sapienza University of Rome
Italy
aris@dis.uniroma1.it
Aristides Gionis
Aalto University and HIIT
Espoo, Finland
aristides.gionis@aalto.fi
Nikolaj Tatti
Aalto University and HIIT
Espoo, Finland
nikolaj.tatti@aalto.fi
ABSTRACT
With the fast growth of smart devices and social networks,
a lot of computing systems collect data that record different
types of activities. An important computational challenge
is to analyze these data, extract patterns, and understand
activity trends. We consider the problem of mining activity
networks to identify interesting events, such as a big concert
or a demonstration in a city, or a trending keyword in a user
community in a social network.
We define an event to be a subset of nodes in the network
that are close to each other and have high activity levels.
We formalize the problem of event detection using two
graph-theoretic formulations. The first one captures the
compactness of an event using the sum of distances among
all pairs of the event nodes. We show that this formulation
can be mapped to the MaxCut problem, and thus, it can
be solved by applying standard semidefinite programming
techniques. The second formulation captures compactness
using a minimum-distance tree. This formulation leads to
the prize-collecting Steiner-tree problem, which we solve by
adapting existing approximation algorithms. For the two
problems we introduce, we also propose efficient and effective
greedy approaches and we prove performance guarantees for
one of them. We experiment with the proposed algorithms
on real datasets from a public bicycling system and a
geolocation-enabled social network dataset collected from
twitter. The results show that our methods are able to
detect meaningful events.
Categories and Subject Descriptors
H.3.4 [Database Management]: Database Applications—
Data mining
General Terms
Algorithms, Experimentation
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
KDD’14, August 24–27, 2014, New York, NY, USA.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-2956-9/14/08 ...$15.00.
http://dx.doi.org/10.1145/2623330.2623674 .
Keywords
Event detection; Submodular function maximization; Group
Steiner tree; Maximum cut.
1. INTRODUCTION
Detecting events is a fundamental problem in data mining
and numerous methods have been applied to a variety of
scenarios, including time series and data streams [19], point
clouds and vector spaces [12], and networks [11]. The work
in this paper concentrates on the latter category: discovering
events in networks.
At a high level, our goal is to identify parts of the network
with unusual high activity confined in a small space or in a
dense part of the graph. We consider a network G = (V,E)
whose nodes monitor and record a certain activity. Such
a network can represent a sensor network, a social-media
site, activity levels from brain imaging, and so on. Given
a time instance t, each node v V in the network keeps a
value w
t
(v) with the measurement value for the monitored
activity. The objective of our approach is to detect an event
happening in the network at time instance t. The event
is defined with respect to activity values w
t
(·) as well as
network connectivity. In particular, we aim at finding a
subset of network nodes S, such that all nodes in S are close
to each other and they all have high levels of activity.
The activity values w
t
(v) may be absolute or normalized.
Normalization here can be used to capture the abnormality
level of a node at a certain time with respect to the routine
operation of the network. In general, arbitrarily complex
models can be used to define appropriately normalized
activity values w
t
(v). For instance, periodicity phenomena
can be taken into account; for example, when monitoring
the traffic activity in a city, the reference values for weekend
8am traffic are much lower than that of a week day. In this
paper, we assume that the activity values w
t
(v) are provided
as input to our problem. Devising models to obtain finely
tuned and appropriately normalized activity values w
t
(v)
depends on the application at hand and we consider it as an
orthogonal problem, outside the scope of this paper.
The problems we define can be applied to a variety
of scenarios, such as medical diagnosis, performance
monitoring, image or video surveillance, and fraud detection.
In this paper we experiment with two applications, event
detection in sensor networks and social networks.
Sensor networks. Consider a network of sensors deployed
in a certain region and recording a measurement of interest,

such as traffic, pollution, or water-quality level. In some
cases, the network nodes are geolocated so that a distance
measure is defined among all nodes, whereas in other cases
only a list of direct neighbors is known for each node. Events
are defined with respect to the activity value monitored:
we are interested in finding compact subareas where the
traffic measurements are abnormal, the pollution levels are
unusually high, and so on.
Social networks. Social networks model social interactions
between individuals. The values recorded at each node
correspond to a type of activity we are interested in
monitoring, for example, the number of messages posted by
an individual during the last few hours, an average sentiment
score of those messages, or the frequency of keywords
associated with a topic of interest. In this application, the
events discovered by our methods will correspond to dense
network subgraphs that exhibit high values with respect to
the activity monitored.
Motivated by the previous discussion, we formalize event
detection as a problem of finding a subgraph S V in
a graph G = (V,E,w) with node weights w(v), for each
v V . The subgraph S needs to be compact in terms
of graph distances, and the sum of weights of the nodes
of S needs to be large. We express subgraph compactness
using two different definitions: sum of pairs of distances
and Steiner-tree cost. The first definition leads to a
graph-cut problem. We show that the optimization function
is submodular and we provide two simple greedy algorithms
with provable approximation guarantees. We also show how
we can transform the problem to a variant of the MaxCut
problem to obtain an algorithm that is less practical but has
better approximation guarantee than the greedy algorithms.
The second subgraph-compactness definition, which is
based on the cost of the minimum Steiner tree, leads to
a prize-collecting Steiner-tree problem. For that problem
we apply a well known 2-approximation algorithm, which
is based on the primal–dual paradigm. We also experiment
with a greedy heuristic, which in practice gives almost as
good solutions as the primal–dual algorithm.
We evaluate our problem formulation and the proposed
algorithms on three sensor-network datasets and two
social-network datasets. The former are real-world datasets
from public bicycling systems from three large cities,
Barcelona, Washington D.C., and Minneapolis, while the
latter are datasets collected from twitter on two cities, New
York City and Los Angeles. Using our methods allows to
successfully discover real events in the cities, as reflected on
the usage of shared bikes or twitter volume.
1.1 Related work
Statistical methods. A statistical approach for finding
anomalies is the spatial scan statistic [23]. The method
typically assumes that the data are distributed according
to some distribution on a Euclidean space. The goal is
then to detect whether there exists a subarea where the
data are distributed according to the same distribution
but with a higher density parameter. This approach is
related to our setting but it has important differences.
First, the statistical approach defines a null hypothesis,
therefore, it assumes an underlying distribution over the
data. Instead, our approach is formalized through an
optimization function, and no such assumptions are needed.
Second, the classical approaches are rather heuristic in
nature: the methods detecting the most diverse subareas are
usually based on Monte Carlo sampling, even though there
have been approaches to formulate them as optimization
problems and provide algorithmic solutions [1, 2]. Yet a
third difference is that usually these approaches assume that
the shape is predefined (e.g., a circle or an axis-parallel
rectangle), which allows for the design of algorithms that
can search the space of shapes. Statisticians have proposed
generalizations where the shape of the dense cluster is not
fixed a priori [7, 25, 27], however this forces the solution of
the detection problem to be heuristic (e.g., Monte Carlo
simulations). Finally, all these approaches assume that
there exists an underlying Euclidean geometry on the space.
For several of the applications we are interested, such as
social networks, such Euclidean geometry is not present,
thus requiring alternative approaches.
In summary, the approach of this paper is conceptually
different from all the related work on spatial scan statistics.
Here we provide a formal graph-theoretic definition of
the problem and algorithms that have theoretical quality
guarantees and are efficient in practice.
Event detection in social media. Event detection based
on geospatial information from social media and twitter is
a research area that has attracted significant attention in
the last years. Baldwin et al. [8] developed an interactive
system, in which a user can insert some queries and obtain
the volume of tweets containing these terms at different
granularities of space and time. Walther and Kaisser [30]
developed a system for detecting events that take place
from the twitter stream. It gathers tweets as they are
created and it clusters them online based on geolocation.
A machine-learning module evaluates whether a cluster of
tweets refer to an event. Also Watanabe et al. [31] develop
a similar system that identifies tweets that are created close
in time and space and by looking at co-occurring terms
it attempts to discover if they refer to the same event.
Olson et al. [24] designed a system that aims at detecting
very rapidly large rare events: events that happen with
very low frequency but with large consequences, such as
an earthquake or a tsunami. The difference of this line of
work with our approach is that we offer a graph-theoretic
formulation to the problem, which can be applied to any
graph, not just geography-induced graphs. Additionally,
our method does not uses text, but it assumes numerical
measurements on the graph nodes.
Anomaly and outlier detection. Our problem has also
resemblance with problems related to outlier and anomaly
detection in networks. The main objective of these works
is to identify patterns that are different than normal. For
instance, Heard et al. [20] apply statistical techniques to
discover a subset of the nodes that in a small period of time
change significantly their communication patterns. Bhuyan
et al. [9] apply clustering on network data for detecting
intrusion attacks. The tutorial of Akoglu and Faloutsos [3]
has an extensive reference list with related publications.
Dense subgraphs. Our problem is related to finding dense
subgraphs [6, 14, 16, 22, 29]. Here, the goal is to find small
parts of the network with a high number of edges. We are
interested in small subgraphs with high number of nodes,
thus obtaining different objective functions.
Finally, in a work related to the Steiner-tree problem
formulation presented in this paper, Seufert et al. [26]

consider the problem of finding a subtree with k nodes such
that the total node weight is maximized. Their approach
also relies on the prize-collecting Steiner-tree, but their
methods focus on identifying a heavy tree of exactly k nodes.
2. PROBLEM FORMULATION
We consider a graph G = (V,E,w,c), with V being a set
of n vertices and E being a set of m edges. The weight
function w : V R assigns a nonnegative value w(v) to
each vertex v, whereas the distance function c : E R
assigns a distance value c(u,v) to each edge (u,v). The edge
distance function c can be used to define a new distance
function over all pairs of vertices (u,v) V × V . This can
be done by considering the shortest-path distance closure
between vertices. Namely, for each u,v V we define d(u,v)
to be equal to the distance of a shortest path from u to
v using edges of G, or if no such path exists. Unless
specified otherwise, in the rest of the paper we assume
that the shortest-path distance d is used, and that a finite
distance value is defined for all pairs of vertices in the graph.
Our goal is to find a subset of vertices S V that has
large total weight according to the weight function w, and it
is sufficiently compact, that is, the vertices in S are close to
each other according to the distance function d.
To capture our objective we need to define appropriate
weight and distance functions for subsets of vertices. Given
S V we denote such weight and distance functions by
W (S) and D(S), respectively. As a set weight function we
consider simply the sum of all the weights in the set, that is,
W (S) =
X
vS
w(v).
For measuring the total distance of a set S we consider
two options. The first option is to sum the distances of all
pairs of vertices in S, namely
D
AP
(S) =
1
2
X
vS
X
uS
d(u,v).
This type of objective function is suitable for events that
are concentrated in a small area in a round-shaped area, for
instance a football game. But events might have different
shapes, such as a parade, a street concert, a firework show,
and so on; for such events, we need a distance function that
does not penalize long distances, as long as points in between
are also active. This leads to using the minimum Steiner tree
of the subgraph induced by the set of vertices S. We denote
this total-distance measure by D
T
(S). Thus, we have
D
T
(S) = min
T ∈T (G[S])
X
(u,v)T
d(u,v),
where G[S] denotes the subgraph of G induced by a subset
of vertices S V , and T (H) denotes the set of all the trees
of a graph H.
Subsets of vertices S V that correspond to meaningful
events need to have large weight value W (S) and small
total distance value D(S). To combine the two measures
in a way that allows to maximize simultaneously W (S)
and minimize D(S) there are many different options, for
example, optimizing the one measure while setting a budget
constraint on the other or taking a linear combination of
the two measures. The former approach induces difficulties
in devising approximation algorithms. To avoid them, we
consider a linear combination of the two measures into a
single objective with a normalization coefficient λ. The
coefficient λ provides an easy way to control the relative
importance of the two measures. In addition, as we will see
in the next section, such a linear combination leads to neat
mathematical forms, which can be viewed as a quadratic
integer program or as well studied tree-based problems.
Thus, we consider the following problem formulation.
Problem 1 (Event). Given a graph G = (V,E,w,c)
with vertex weights w and edge distance c, and a
normalization coefficient λ, find a subset of vertices S V
that maximizes the objective function
Q(S) = λ W (S) D(S). (1)
Problem 1 is a generic problem. The exact structure
of the problem and solution methods depend on which
distance function D is used. We obtain two instantiations of
Problem 1 by considering the two set distance functions D
AP
and D
T
that we discussed previously. Thus, we consider the
following two specific problems.
Problem 2 (EventAllPairs). Given a graph G =
(V, E,w,c) with vertex weights w and edge distance c, and a
normalization coefficient λ, find a subset of vertices S V
that maximizes the objective function
Q
AP
(S) = λ W (S) D
AP
(S). (2)
Problem 3 (EventTree). Given a graph G = (V,E,
w,c) with vertex weights w and edge distance c, and a
normalization coefficient λ, find a subset of vertices S V
that maximizes the objective function
Q
T
(S) = λ W (S) D
T
(S). (3)
One should note that in the above problem formulations,
the objective functions Q, Q
AP
, and Q
T
can take negative
values. Negative values do not create any problems as long
as one is after exact solutions. However, in the context of
approximation algorithms, objective functions with negative
values are problematic because it becomes more difficult
to apply the concept of multiplicative approximation
guarantee. To overcome this problem we modify the
objective functions to ensure that they take nonnegative
values. We do so by adding a constant term, and we thus
consider a shifted version of our objective functions.
First, for the problem EventAllPairs we consider the
shifted function Q
+
AP
(S) = Q
AP
(S) + D
AP
(V ). It is easy
to see that the function Q
+
AP
is nonnegative for all S
V . This makes it easier to design approximation algorithms
with multiplicative approximation guarantees. A modified
version of the EventAllPairs problem, which we denote
by EventAllPairs+, is now defined as follows.
Problem 4 (EventAllPairs+). Given a graph G =
(V,E,w,c) with vertex weights w and edge distance c, and a
normalization coefficient λ, find a subset of vertices S V
that maximizes the objective function
Q
+
AP
(S) = Q
AP
(S) + D
AP
(V )
= λ W (S) D
AP
(S) + D
AP
(V ). (4)
For the EventTree problem we follow a different
approach: first we note that with respect to finding an
exact solution, maximizing the function Q
T
is equivalent to

minimizing Q
T
. We choose to work with this minimization
problem, and we consider minimizing the shifted function
Q
+
T
(S) = Q
T
(S) + λ W (V ), which is a nonnegative for all
S V . For this shifted objected function it holds:
Q
+
T
(S) = λ W (V ) λ W (S) + D
T
(S)
= λ W (V \ S) + D
T
(S).
The interpretation of the above objective function is to
find a set S so that the tree cost D
T
(S) and the (scaled
by λ) weight of the vertices not included in S is minimized.
This problem is known as the prize-collecting Steiner-tree
(PCST) problem [4, 21]. The term “prize collecting” comes
from thinking of the weights on the vertices of the graph
as prizes and the goal is to find a tree that minimizes the
tree cost and the total value of prizes not spanned by it.
The shifted version of the EventTree problem, denoted by
EventTree+, is now defined as follows.
Problem 5 (EventTree+). Given a graph G = (V,
E,w,c) with vertex weights w and edge distance c, and a
normalization coefficient λ, find a subset of vertices S V
that minimizes the objective function
Q
+
T
(S) = λ W (V \ S) + D
T
(S). (5)
For general graphs, the problem EventAllPairs+ is
NP-hard. The proof of the next lemma is obtained by
a reduction from the IndependentSet problem and is
omitted because of lack of space.
Lemma 1. The problem EventAllPairs+ is NP-hard
for graphs with general edge distance functions.
In the more restricted version in which the input graph is a
metric the complexity of the problem is open.
On the other hand, the problem EventTree+ is
NP-hard even for metric edge distances as it generalizes
the Steiner-tree problem:
Lemma 2. The problem EventTree+ is NP-hard.
3. ALGORITHMS FOR THE
EventAllPairs+ PROBLEM
In this section we present our algorihtms for the
EventAllPairs+ problem, starting with efficient greedy
approaches and continuing with a slower but more effective
approach based on the MaxCut problem.
3.1 Greedy algorithms
We start our discussion on the EventAllPairs+
problem by considering the properties of the underlying
objective function Q
+
AP
. In particular, we can show that the
function Q
+
AP
is submodular. Submodularity is a desirable
property because a number of approximation algorithms
rely on it. The approximability of a submodular function
depends on other properties as well, in particular whether
the function is monotone and/or symmetric. We recall
that if V is a ground set, a set function f : 2
V
R is
submodular if for all S,T V we have that f(S) + f (T )
f(S T ) + f(S T ). The function f is monotone if for all
S T V it holds that f (S) f(T ). It is symmetric if for
all S V we have that f (S) = f (V \ S).
Lemma 3. The function Q
+
AP
is submodular.
Algorithm 1: BFNS: Greedy algorithm for
EventAllPairs+ using the approach of Buchbinder et
al. [13]
X
0
, Y
0
V
for i = 1 to n do
a
i
Q
+
AP
(X
i1
{u
i
}) Q
+
AP
(X
i1
)
b
i
Q
+
AP
(Y
i1
\ {u
i
}) Q
+
AP
(Y
i1
)
a
0
i
max{a
i
,0}, b
0
i
max{b
i
,0}
with probability a
0
i
/(a
0
i
+ b
0
i
) do
X
i
X
i1
{u
i
}, Y
i
Y
i1
else with compliment probability b
0
i
/(a
0
i
+ b
0
i
) do
Y
i
Y
i1
\ {u
i
}, X
i
X
i1
return X
n
(or equally Y
n
)
Proof. (Sketch) We set I(S) , λ W (S) and D(S) ,
D
AP
(V ) D
AP
(S). It is Q
+
AP
(S) = I(S) + D(S). It is
easy to see that both functions I and D are positive and
submodular. The lemma follows from the fact that the sum
of two positive submodular functions is submodular.
Thus we want to approximate a submodular function
without any constraints. A recent paper by Buchbinder et
al. [13] provides a linear-time
1
2
-approximation algorithm for
this problem, which we explain below.
The algorithm of Buchbinder et al. is based on a
randomized double-greedy approach. The technique utilizes
the fact that for a submodular function f , the function
¯
f(S) = f (V \ S) is also submodular. Furthermore, if a set
S
is optimal for f then V \ S
is also optimal for
¯
f and the
optimal values of the two functions are equal.
The suggested approach is a randomized algorithm, which
performs two types of greedy steps, searching for the optimal
solution for f and
¯
f. The search strategy can be viewed
as running two greedy algorithms simultaneously. We start
with an arbitrary order of the elements in V , say u
1
, . . . ,u
n
.
The two greedy processes traverse the sequences of sets
{X
0
,X
1
, . . .} and {Y
0
,Y
1
, . . .}, respectively. The one greedy
process starts from the empty set X
0
= and grows it
to optimize f, while the other greedy process starts from
the ground set Y
0
= V and shrinks it to optimize
¯
f. The
algorithm guarantees that the growing set X
i
is always
included into the shrinking set Y
i
. Which of the two steps
(grow X
i
or reduce Y
i
) is a random choice, which depends
on the marginal improvement obtained from each move.
The algorithm stops when the sets are equal. A formal
description of the algorithm, which we call BFNS, is shown
in Algorithm 1. We have the following.
Theorem 1 (Buchbinder et al. [13]). The BFNS
algorithm provides a
1
2
-factor approximation for the
EventAllPairs+ problem.
For the EventAllPairs+ problem the BFNS algorithm
has the following interpretation: It examines each graph
vertex v one by one and decides whether to keep v in the
solution or remove it. The decision is randomized and the
probabilities depend on the marginal gains incurred in the
Q
+
AP
function with respect to the current lower and upper
set solutions, X
i
and Y
i
.
Our objective has additional structure, which leads to a
trivial
1
2
-approximation algorithm. Recall the definitions
of the functions I and D in the proof of Lemma 3. Both
functions are positive and I(S) is increasing in S, whereas

D(S) is decreasing in S. Therefore, for the optimal value S
we have that
Q
+
AP
(S
) = I(S
) + D(S
)
(I(V ) + D(V )) + (I() + D())
= Q
+
AP
(V ) + Q
+
AP
()
2 max{Q
+
AP
(),Q
+
AP
(V )}.
This means that simply taking the best of the empty set or
the entire node set V also provides a
1
2
approximation. We
call this algorithm Trivial.
We finally propose the standard greedy algorithm, which
starts from the empty set, adds one vertex at a time, and
stops when the solution cannot be improved. We refer to
this simple greedy as GreedyAP.
For general unconstrained submodular functions, the
greedy algorithm does not provide any guarantee. Here we
are able to exploit the specific structure of the objective, and
we prove that also GreedyAP provides a
1
2
approximation.
The result follows from the following lemma.
Lemma 4. Consider a submodular function F and let S
be the solution given by the greedy algorithm optimizing F .
Then, F (S) F (V ).
Proof. We prove a more general statement. We prove
that any hill-climbing algorithm, that is, any algorithm that
starts with the empty set and increasingly adds elements for
as long as the marginal gain is positive, if it returns solution
S we have that F (S) F (V ).
Assume that V \ S = {x
1
,x
2
, . . . ,x
r
}. Then we have
F (S) F (V ) =
r
X
i=1
(F (S {x
1
, . . . ,x
i1
) F (S {x
1
, . . . ,x
i
))
r
X
i=1
(F (S) F (S {x
i
})) 0,
where the first inequality follows from the submodularity of
F and the second from the fact that the algorithm returned
S without adding any element x
i
, so each term in the sum
is positive.
Corollary 1. The GreedyAP algorithm provides a
1
2
-factor approximation for the EventAllPairs+ problem.
This bound is tight.
Proof. By the discussion just before Lemma 4, it suffices
to show that F (S) F () and F (S) F (V ). The
former is trivially true, otherwise the algorithm would have
returned S = and the latter follows from Lemma 4. The
counterexample that shows that the greedy cannot, in the
worst case, achieve approximation better than 1/2, will
appear in the full version of this work.
Although all three algorithms, BFNS, Trivial, and
GreedyAP, have the same theoretical performance, our
experiments show that GreedyAP produces solutions
of higher quality, even slightly better or equal to
MaxCut-based algorithm, which is theoretically superior
and which we discuss next.
3.2 MaxCut formulation
In this section we reduce the EventAllPairs+ problem
to a variant of the MaxCut problem. This allows us to use
a well known algorithm by Goemans and Williamson [18],
which has a 0.868-approximation guarantee. For the
reduction we need the following variant of MaxCut.
Problem 6 ((s,t)-MaxCut). Given a graph G and
two vertices s and t, partition the vertices of G into two sets
A and B such that s A and t B and the total weight
of cross edges is maximized. We denote the cost of such a
solution A, B by Q(A, B).
The only difference between (s,t)-MaxCut and the
traditional MaxCut is that there are two vertices s and
t, for which it is forbidden to be in the same cut. This
technicality does not have any complexity consequences.
To reduce EventAllPairs+ to the (s,t)-MaxCut
problem, assume that we are given a graph G = (V, E)
equipped with the distance function c. Assume also that
we are given a parameter λ. We then construct a new graph
H by adding two special vertices s and t into G. We connect
s to each v V with a weight c(s, v) =
P
uV
d(v, u). We
connect t to each v V with a weight c(t, v) = 2λw(v).
Consider an A, B cut of H such that s A and t B.
Let S = A \ {s} and let T = B \ {t}. We argue that the
cost of the cut is twice the cost of EventAllPairs+, that
is, Q(A, B) = 2 Q
+
AP
(S). To see this, notice that each vertex
v S will contribute 2 λ w(v) to the cost, and each vertex
v T will contribute c(s, v) =
P
uV
d(v, u) to the cost.
Additional costs will come from edges (u, v), where v S
and u T . Combining these costs gives us
Q(A, B) =
X
vS
2 λ w(v) +
X
vT
X
uV
d(v, u) +
X
vT
X
uS
d(v, u)
= 2 λ W (S) +
X
vT
X
uT
d(v, u)
+
X
vT
X
uS
d(v, u) +
X
vT
X
uS
d(v, u)
= 2 λ W (S) +
X
vV
X
uV
d(v, u)
X
vS
X
uS
d(v, u)
= 2 λ W (S) + 2 D
AP
(V ) 2 D
AP
(S)
= 2 Q
+
AP
(S).
Thus, solving (s,t)-MaxCut also solves EventAllPairs+.
Moreover, since the costs of both problems are the same,
up to a scaling factor, any approximation guarantee for
(s,t)-MaxCut yields the same approximation guarantee for
EventAllPairs+.
Our final step is to solve (s,t)-MaxCut. Following the
seminal algorithm of Goemans and Williamson [18], the
problem can be formulated as the following integer program:
max
X
u,vV (H)
c(u,v)
1 x
u
x
v
2
such that x
u
{−1, + 1}, for all u V (H),
x
s
x
t
= 1 .
The only difference with the original MaxCut formulation
is the additional constraint x
s
x
t
= 1. This constraint
ensures that the vertices s and t are in different partitions
and the resulting cut is a feasible solution for (s,t)-MaxCut.
The algorithm of Goemans and Williamson proceeds by
making a semidefinite relaxation, and then rounds the
solution based on a random projection. Fortunately, the
additional constraint x
s
x
t
= 1 can be easily added to

Citations
More filters
Posted Content

When Engagement Meets Similarity: Efficient (k,r)-Core Computation on Social Networks

TL;DR: Comprehensive experiments on real-life data demonstrate that the maximal/maximum (k,r)-cores enable us to find interesting cohesive subgraphs, and performance of two mining algorithms is effectively improved by proposed techniques.
Proceedings ArticleDOI

Dense Subgraph Discovery: KDD 2015 tutorial

TL;DR: This tutorial aims to provide a comprehensive overview of major algorithmic techniques for finding dense subgraphs in large graphs and graph mining applications that rely on dense sub graph extraction, as well as the latest advances in the area, from theoretical and from practical point-of-view.
Proceedings Article

A Nearly-Linear Time Framework for Graph-Structured Sparsity

TL;DR: A framework for sparsity structures defined via graphs that is flexible and generalizes several previously studied sparsity models is introduced and achieves an information-theoretically optimal sample complexity for a wide range of parameters.
Proceedings Article

A nearly-linear time framework for graph-structured sparsity

TL;DR: In this article, the authors introduce a framework for sparsity structures defined via graphs and provide efficient projection algorithms for their sparsity model that run in nearly-linear time, achieving an information-theoretically optimal sample complexity for a wide range of parameters.
Proceedings ArticleDOI

Scaling Distance Labeling on Small-World Networks

TL;DR: Scale distance labeling on small-world networks by proposing a Parallel Shortest-distance Labeling (PSL) scheme and further reducing the index size by exploiting graph and label properties and near-linear speedup in a multi-core environment.
References
More filters
Journal ArticleDOI

LOF: identifying density-based local outliers

TL;DR: This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties.
Journal ArticleDOI

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

TL;DR: This algorithm gives the first substantial progress in approximating MAX CUT in nearly twenty years, and represents the first use of semidefinite programming in the design of approximation algorithms.
Journal ArticleDOI

A spatial scan statistic

TL;DR: In this article, a spatial scan statistic for the detection of clusters in a multi-dimensional point process is proposed, where the area of the scanning window is allowed to vary, and the baseline process may be any inhomogeneous Poisson process or Bernoulli process with intensity pro-portional to some known function.
Journal ArticleDOI

SDPT3 — A Matlab software package for semidefinite programming, Version 1.3

TL;DR: In this article, a MATLAB implementation of infeasible path-following algorithms for solving standard semidefinite programs (SDP) is presented, and Mehrotra-type predictor-corrector variants are included.
Journal ArticleDOI

A General Approximation Technique for Constrained Forest Problems

TL;DR: The first approximation algorithms for many NP-complete problems, including the non-fixed point-to-point connection problem, the exact path partitioning problem and complex location-design problems are derived.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What have the authors contributed in "Event detection in activity networks" ?

The authors consider the problem of mining activity networks to identify interesting events, such as a big concert or a demonstration in a city, or a trending keyword in a user community in a social network. The authors show that this formulation can be mapped to the MaxCut problem, and thus, it can be solved by applying standard semidefinite programming techniques. For the two problems the authors introduce, they also propose efficient and effective greedy approaches and they prove performance guarantees for one of them. The authors experiment with the proposed algorithms on real datasets from a public bicycling system and a geolocation-enabled social network dataset collected from twitter. 

Their work opens many interesting directions for future research. 

The authors plant events to the datasets by setting to 1 the values of nodes that occur within the event and to 0 the volues of nodes outside the planted event. 

One way of selecting the parameter is to execute the algorithms with different values of λ, plot the Pareto curve for those values and choose a value that yields the desired trade-off between spanned weight and distance. 

To improve the quality of the events discovered by their algorithms the authors set the weight of a node to be |x −m|, where x is the current activity level of a station and m is the typical activity level of the station. 

The objective is to select a subset of the nodes such that the cost of the tree to connect them plus the prizes of the nodes not in the tree is minimized. 

For instance, periodicity phenomena can be taken into account; for example, when monitoring the traffic activity in a city, the reference values for weekend 8am traffic are much lower than that of a week day. 

The currently best algorithm [4] improves the approximation ratio to (2 − ), which is more important rather from a theoretical perspective. 

The authors show that the optimization function is submodular and the authors provide two simple greedy algorithms with provable approximation guarantees. 

in a work related to the Steiner-tree problem formulation presented in this paper, Seufert et al. [26]consider the problem of finding a subtree with k nodes such that the total node weight is maximized. 

EventAllPairs+ PROBLEMIn this section the authors present their algorihtms for the EventAllPairs+ problem, starting with efficient greedy approaches and continuing with a slower but more effective approach based on the MaxCut problem. 

Events are defined with respect to the activity value monitored: the authors are interested in finding compact subareas where the traffic measurements are abnormal, the pollution levels are unusually high, and so on.