scispace - formally typeset
Open AccessProceedings ArticleDOI

Learning from labeled and unlabeled data on a directed graph

TLDR
A general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered, which generalizes the spectral clustering approach for undirected graphs.
Abstract
We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed numerical techniques. In the absence of labeled instances, this framework can be utilized as a spectral clustering method for directed graphs, which generalizes the spectral clustering approach for undirected graphs. We have applied our framework to real-world web classification problems and obtained encouraging results.

read more

Content maybe subject to copyright    Report

Learning from Labeled and Unlabeled Data on a Directed Graph
Dengyong Zhou dengyong.zhou@tuebingen.mpg.de
Max Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076 T¨ubingen, Germany
Jiayuan Huang j9huang@cs.uwaterloo.ca
School of Computer Science, University of Waterloo, Waterloo ON, N2L 3G1, Canada
Bernhard Scolkopf bernhard.schoelkopf@tuebingen.mpg.de
Max Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076 T¨ubingen, Germany
Abstract
We propose a general framework for learning
from labeled and unlabeled data on a directed
graph in which the structure of the graph in-
cluding the directionality of the edges is con-
sidered. The time complexity of the algo-
rithm derived from this framework is nearly
linear due to recently developed numerical
techniques. In the absence of labeled in-
stances, this framework can be utilized as
a spectral clustering method for directed
graphs, which generalizes the spectral clus-
tering approach for undirected graphs. We
have applied our framework to real-world web
classification problems and obtained encour-
aging results.
1. Introduction
Given a directed graph, the vertices in a subset of the
graph are labeled. Our problem is to classify the re-
maining unlabeled vertices. Typical examples of this
kind are web page categorization based on hyperlink
structure and document classification based on cita-
tion graphs (Fig. 1). The main issue to be resolved
is to determine how to effectively exploit the structure
of directed graphs.
One may assign a label to an unclassified vertex on the
basis of the most common label present among the
classified neighbors of the vertex. However we want
to exploit the structure of the graph globally rather
than locally such that the classification or clustering
is consistent over the whole graph. Such a point of
Appearing in Proceedings of the 22
nd
International Confer-
ence on Machine Learning, Bonn, Germany, 2005. Copy-
right 2005 by the author(s)/owner(s).
view has been considered previously in the method of
Zhou et al. (2005). It is motivated by the framework
of hubs and authorities (Kleinberg, 1999), which sep-
arates web pages into two categories and uses the fol-
lowing recursive notion: a hub is a web page with links
to many good authorities, while an authority is a web
page that receives links from many good hubs. In con-
trast, the approach that we will present is inspired by
the ranking algorithm PageRank used by the Google
search engine (Page et al., 1998). Different from the
the framework of hubs and authorities, PageRank is
based on a direct recursion as follows: an authorita-
tive web page is one that receives many links from
other authoritative web pages. When the underlying
graph is undirected, the approach that we will present
reduces to the method of Zhou et al. (2004).
There has been a large amount of activity on how to
exploit the link structure of the web for ranking web
pages, detecting web communities, finding web pages
similar to a given web page or web pages of interest to a
given geographical region, and other applications. One
may refer to (Henzinger, 2001) for a comprehensive
survey. Unlike those work, the present work is on how
to classify the unclassified vertices of a directed graph
in which some vertices have been classified by globally
exploiting the structure of the graph. Classifying a fi-
nite set of objects in which some are labeled is called
transductive inference (Vapnik, 1998). In the absence
of labeled instances, our approach reduces to a spectral
clustering method for directed graphs, which general-
izes the work of Shi and Malik (2000) that may be the
most popular spectral clustering scheme for undirected
graphs. We would like to mention that understanding
how eigenvectors partition a directed graph has been
proposed as one of six algorithmic challenges in web
search engines by Henzinger (2003). The framework
of probabilistic relational models may also be used to

Learning from Labeled and Unlabeled Data on a Directed Graph
Figure 1. The World Wide Web can be thought of as a
directed graph, in which the vertices represent web pages,
and the directed edges hyperlinks.
deal with structured data like the web (e.g. Getoor
et al. (2002)). In contrast to the spirit of the present
work however, it focuses on modeling the probabilistic
distribution over the attributes of the related entities
in the model.
The structure of this paper is as follows. We first
introduce some basic notions from graph theory and
Markov chains in Section 2. The framework for learn-
ing from directed graphs is presented in Section 3. In
the absence of labeled instances, as shown in section 4,
this framework can be utilized as a spectral clustering
approach for directed graphs. In Section 5, we develop
discrete analysis for directed graphs, and characterize
this framework in terms of discrete analysis. Exp eri-
mental results on web classification problems are de-
scribed in Section 6.
2. Preliminaries
A directed graph
G
= (
V, E
) consists of a finite set
V,
together with a subset E V ×V. The elements of V
are the vertices of the graph, and the elements of E are
the edges of the graph. An edge of a directed graph is
an ordered pair [u, v] where u and v are the vertices of
the graph. We say that the vertex v is adjacent from
the vertex u, and the the vertex u is adjacent to the
vertex v. Moreover, we say that the edge is incident
from the vertex u and incident to the vertex v. When
u = v the edge is called a loop. A graph is simple if it
has no loop.
A path in a directed graph is a tuple of vertices
(v
1
, v
2
, . . . , v
p
) with the property that [v
i
, v
i+1
] E
for 1 i p 1. We say that a directed graph is
strongly connected when for every pair of vertices u
and v there is a path in which v
1
= u and v
p
= v. For
a strongly connected graph, there is an integer k 1
and a unique partition V = V
0
V
1
∪···V
k1
such that
for all 0 r k 1 each edge [u, v] E with u V
r
has v V
r+1
, where V
k
= V
0
, and k is maximal, that
is, there is no other such partition V = V
0
0
···V
0
k
0
1
with k
0
> k. When k = 1, we say that the graph is
aperiodic; otherwise we say that the graph is periodic.
A graph is weighted when there is a function w :
E R
+
which associates a positive value w([u, v])
with each edge [u, v] E. The function w is called a
weight function. Typically, we can equip a graph with
a canonical weight function defined by w([u, v]) := 1 at
each edge [u, v] E. Given a weighted directed graph
and a vertex v of this graph, the in-degree function
d
: V R
+
and out-degree function d
+
: V R
+
are respectively defined by d
(v) :=
P
uv
w([u, v]),
and d
+
(v) :=
P
uv
w([v, u]), where u v denotes
the set of vertices adjacent to the vertex v, and u v
the set of vertices adjacent from the vertex v.
Let H(V ) denote the space of functions, in which each
one f : V R assigns a real value f(v) to each vertex
v. A function in H(V ) can be thought of as a col-
umn vector in R
|V |
, where |V | denotes the number of
the vertices in V . The function space H(V ) then can
be endowed with the standard inner product in R
|V |
as hf, gi
H(V )
=
P
v V
f(v)g(v) for all f, g H(V ).
Similarly, define the function space H(E) consisting of
the real-valued functions on edges. When the function
space of the inner product is clear in its context, we
omit the subscript H(V ) or H(E).
For a given weighted directed graph, there is a nat-
ural random walk on the graph with the transition
probability function p : V × V R
+
defined by
p(u, v) = w([u, v])/d
+
(u) for all [u, v] E, and 0 oth-
erwise. The random walk on a strongly connected and
aperiodic directed graph has a unique stationary dis-
tribution π, i.e. a unique probability distribution satis-
fying the balance equations π(v) =
P
uv
π(u)p(u, v),
for all v V. Moreover, π(v) > 0 for all v V.
3. Regularization Framework
Given a directed graph G = (V, E) and a label set Y =
{1, 1}, the vertices in a subset S V is labeled. The
problem is to classify the vertices in the complement of
S. The graph G is assumed to be strongly connected
and aperiodic. Later we will discuss how to dispose
this assumption.
Assume a classification function f H(V ), which as-
signs a label sign f(v) to each vertex v V. On the one
hand, the classification function should be as smooth
as possible. Specifically, a pair of vertices linked by
an edge are likely to have the same label; moreover,
vertices lying on a densely linked subgraph are likely

Learning from Labeled and Unlabeled Data on a Directed Graph
to have the same label. Thus we define a functional
Ω(f) :=
1
2
X
[u,v ]E
π(u)p(u, v)
Ã
f(u)
p
π(u)
f(v)
p
π(v)
!
2
,
(1)
which sums the weighted variation of a function on
each edge of the directed graph. On the other hand,
the initial label assignment should be changed as little
as possible. Let y denote the function in H(V ) defined
by y(v) = 1 or 1 if vertex v has been labeled as pos-
itive or negative respectively, and 0 if it is unlabeled.
Then we may consider the optimization problem
argmin
f∈H(V )
©
Ω(f) + µkf yk
2
ª
, (2)
where µ > 0 is the parameter specifying the tradeoff
between the two competitive terms.
We will provide the motivations for the functional de-
fined by (1). In the end of this section, this functional
will be compared with another choice which may seem
more natural. The comparison may make us gain an
insight into this functional. In Section 4, it will be
shown that this functional may be naturally derived
from a combinatorial optimization problem. In Sec-
tion 5, we will further characterize this functional in
terms of discrete analysis on directed graphs.
For an undirected graph G = (V, E), it is well-known
that the stationary distribution of the natural random
walk has a closed-form expression π(v) = d(v)/ vol V,
where d(v) denotes the degree of the vertex v, and
vol V =
P
uV
d(u). Substituting the closed-form ex-
pression into (1), we have
Ω(f) =
1
2
X
[u,v]E
w([u, v])
Ã
f(u)
p
d(u)
f(v)
p
d(v)
!
2
,
which is the regularizer of the transductive inference
algorithm of Zhou et al. (2004) operating on undi-
rected graphs.
For solving the optimization problem (2), we introduce
an operator Θ : H(V ) H(V ) defined by
f)(v) =
1
2
µ
X
uv
π(u)p(u, v)f(u)
p
π(u)π(v)
+
X
uv
π(v)p(v, u)f(u)
p
π(v)π(u)
. (3)
Let Π denote the diagonal matrix with Π(v, v) = π(v)
for all v V. Let P denote the transition probability
matrix and P
T
the transpose of P. Then
Θ =
Π
1/2
P Π
1/2
+ Π
1/2
P
T
Π
1/2
2
. (4)
Lemma 3.1. Let = I Θ, where I denotes the
identity operator. Then Ω(f) = hf, fi.
Proof. The idea is to use summation by parts, a dis-
crete analogue of the more common integration by
parts.
X
[u,v ] E
π(u)p(u, v)
Ã
f(u)
p
π(u)
f(v)
p
π(v)
!
2
=
1
2
X
vV
½
X
uv
π(u)p(u, v)
Ã
f(u)
p
π(u)
f(v)
p
π(v)
!
2
+
X
uv
π(v)p(v, u)
Ã
f(v)
p
π(v)
f(u)
p
π(u)
!
2
¾
=
1
2
X
vV
½
X
uv
p(u, v)f
2
(u) +
X
uv
π(u)p(u, v)
π(v)
f
2
(v)
2
X
uv
π(u)p(u, v)f(u)f(v)
p
π(u)π(v)
¾
+
1
2
X
v V
½
X
uv
p(v, u)f
2
(v) +
X
uv
π(v)p(v, u)
π(u)
f
2
(u)
2
X
uv
π(v)p(v, u)f(v)f(u)
p
π(v)π(u)
¾
The first term on the right-hand side may be written
X
[u,v ] E
p(u, v)f
2
(u) =
X
uV
X
vu
p(u, v)f
2
(u)
=
X
uV
Ã
X
vu
p(u, v)
!
f
2
(u) =
X
uV
f
2
(u) =
X
vV
f
2
(v),
and the second term
X
vV
Ã
X
uv
π(u)p(u, v)
π(v)
!
f
2
(v) =
X
v V
f
2
(v).
Similarly, for the fourth and fifth terms, we can show
that
X
vV
X
uv
p(v, u)f
2
(v) =
X
v V
f
2
(v),
and
X
v V
X
uv
π(v)p(v, u)
π(u)
f
2
(u) =
X
vV
f
2
(v).
respectively. Therefore,
Ω(f) =
X
vV
½
f
2
(v)
1
2
µ
X
uv
π(u)p(u, v)f(u)f(v)
p
π(u)π(v)
+
X
uv
π(v)p(v, u)f(v)f(u)
p
π(v)π(u)
¶¾
,
which completes the proof.

Learning from Labeled and Unlabeled Data on a Directed Graph
Lemma 3.2. The eigenvalues of the operator Θ are in
[1, 1], and the eigenvector with the eigenvalue equal
to 1 is
π.
Proof. It is easy to see that Θ is similar to the operator
Ψ : H(V ) H(V ) defined by Ψ =
¡
P + Π
1
P
T
Π
¢
/2.
Hence Θ and Ψ have the same set of eigenvalues. As-
sume that f is the eigenvector of Ψ with eigenvalue λ.
Choose a vertex v such that |f (v)| = max
uV
|f(u)|.
Then we can show that |λ| 1 by
|λ||f(v)| =
¯
¯
¯
¯
¯
X
uV
Ψ(v, u)f(u)
¯
¯
¯
¯
¯
X
uV
Ψ(v, u)|f(v)|
=
|f(v)|
2
Ã
X
uv
p(v, u) +
X
uv
π(u)p(u, v)
π(v)
!
= |f(v)|.
In addition, we can show that Θ
π =
π by
1
2
Ã
X
uv
π(u)p(u, v)
p
π(u)
p
π(u)π(v)
+
X
uv
π(v)p(v, u)
p
π(u)
p
π(v)π(u)
!
=
1
2
Ã
X
uv
π(u)p(u, v)
p
π(v)
+
X
uv
π(v)p(v, u)
p
π(v)
!
=
1
2
Ã
1
p
π(v)
X
uv
π(u)p(u, v) +
p
π(v)
X
uv
p(v, u)
!
=
p
π(v).
Theorem 3.3. The solution of (2) is f
= (1α)(I
αΘ)
1
y, where α = 1/(1 + µ).
Proof. From Lemma 3.1, differentiating (2) with re-
spect to function f, we have (I Θ)f
+µ(f
y) = 0.
Define α = 1/(1 + µ). This system may b e written
(I αΘ)f
= (1 α)y. From Lemma 3.2, we easily
know that (I αΘ) is positive definite and thus in-
vertible. This completes the proof.
At the beginning of this section, we assume the graph
to be strongly connected and aperiodic such that the
natural random walk over the graph converges to a
unique and positive stationary distribution. Obviously
this assumption cannot be guaranteed for a general di-
rected graph. To remedy this problem, we may intro-
duce the so-called teleporting random walk (Page et al.,
1998) as the replacement of the natural one. Given
that we are currently at vertex u with d
+
(u) > 0, the
next step of this random walk proceeds as follows: (1)
with probability 1 η jump to a vertex chosen uni-
formly at random over the whole vertex set except u;
and (2) with probability ηw([u, v])/d
+
(u) jump to a
vertex v adjacent from u. If we are at vertex u with
d
+
(u) = 0, just jump to a vertex chosen uniformly at
random over the whole vertex set except u.
Algorithm. Given a directed graph G = (V, E) and
a label set Y = {1, 1}, the vertices in a subset S V
are labeled. Then the remaining unlabeled vertices
may be classified as follows:
1. Define a random walk over G with a transition
probability matrix P such that it has a unique sta-
tionary distribution, such as the teleporting ran-
dom walk.
2. Let Π denote the diagonal matrix with its di-
agonal elements being the stationary distribu-
tion of the random walk. Compute the matrix
Θ =
1/2
P Π
1/2
+ Π
1/2
P
T
Π
1/2
)/2.
3. Define a function y on V with y(v) = 1 or 1 if
vertex v is labeled as 1 or 1, and 0 if v is unla-
beled. Compute the function f = (I αΘ)
1
y,
where α is a parameter in ]0, 1[, and classify each
unlabeled vertex v as sign f(v).
It is worth mentioning that the approach of Zhou
et al. (2005) can also be derived from this algo-
rithmic framework by defining a two-step random
walk. Assume a directed graph G = (V, E) with
d
+
(v) > 0 and d
(v) > 0 for all v V. Given that
we are currently at vertex u, the next step of this
random walk proceeds as follows: first jump back-
ward to a vertex h adjacent to u with probability
p
(u, h) = w([h, u])/d
(u); then jump forward to a
vertex v adjacent from u with probability p
+
(h, v) =
w ([h, v])/d
+
(h). Thus the transition probability from
u to v is p(u, v) =
P
hV
p
(u, h)p
+
(h, v). It is easy
to show that the stationary distribution of the random
walk is π(v) = d
(v)/
P
uV
d
(u) for all v V. Sub-
stituting the quantities of p(u, v) and π(v) into (1), we
then recover one of the two regularizers proposed by
Zhou et al. (2005). The other one can also be recov-
ered simply by reversing this two-step random walk.
Now we discuss implementation issues. The closed
form solution shown in Theorem 3.3 involves a ma-
trix inverse. Given an n × n invertible matrix A, the
time required to compute the inverse A
1
is generally
O(n
3
) and the representation of the inverse requires
Ω(n
2
) space. Recent progress in numerical analy-

Learning from Labeled and Unlabeled Data on a Directed Graph
sis (Spielman & Teng, 2003), however, shows that,
for an n × n symmetric positive semi-definite, diag-
onally dominant matrix A with m non-zero entries
and a n-vector b, we can obtain a vector ˜x within rel-
ative distance ² of the solution to Ax = b in time
O(m
1.31
log(
f
(A))
O(1)
), where κ
f
(A) is the log of
the ratio of the largest to smallest non-zero eigenvalue
of A. It can be shown that our approach can benefit
from this numerical technique. From Theorem 3.3,
µ
I α
Π
1/2
P Π
1/2
+ Π
1/2
P
T
Π
1/2
2
f
= (1 α)y,
which may be transformed into
µ
Π α
ΠP + P
T
Π
2
1/2
f
) = (1 α
1/2
y.
Let A = Π α
ΠP + P
T
Π
2
. It is easy to verify that A
is diagonally dominant.
For well understanding this regularization framework,
we may compare it with an alternative approach in
which the regularizer is defined by
Ω(f) =
X
[u,v]E
w([u, v])
Ã
f(u)
p
d
+
(u)
f(v)
p
d
(v)
!
2
.
(5)
A similar closed form solution can be obtained from
the corresponding optimization problem. Clearly, for
undirected graphs, this functional also reduces to that
in (Zhou et al., 2004). At first glance, this functional
may look natural, but in the later experiments we will
show that the algorithm based on this functional does
not work as well as the previous one. This is because
the directionality is only slightly taken into account by
this functional via the degree normalization such that
much valuable information for classification conveyed
by the directionality is ignored by the corresponding
algorithm. Once we remove the degree normalization
from this functional, the resulted functional is totally
insensitive to the directionality.
4. Directed Spectral Clustering
In the absence of labeled instances, this framework can
be utilized in an unsupervised setting as a spectral
clustering method for directed graphs. We first define
a combinational partition criterion, which generalizes
the normalized cut criterion for undirected graphs (Shi
& Malik, 2000). Then relaxing the combinational op-
timization problem into a real-valued one leads to the
functional defined in Section 3.
Given a subset S of the vertices of a directed graph G,
define the volume of S by vol S :=
P
v S
π(v). Clearly,
Figure 2. A subset S and its complement S
c
. Note that
there is only one edge in the out-boundary of S.
vol S is the probability with which the random walk
occupies some vertex in S and consequently vol V = 1.
Let S
c
denote the complement of S (Fig. 2). The out-
boundary S of S is defined by S := {[u, v]|u
S, v S
c
}. The value vol S :=
P
[u,v ]S
π(u)p(u, v)
is called the volume of S. Note that vol S is the
probability with which one sees a jump from S to S
c
.
Generalizing the normalized cut criterion for undi-
rected graphs is based on a key observation stated by
Proposition 4.1. vol S = vol S
c
.
Proof. It immediately follows from that the probabil-
ity with which the random walk leaves a vertex equals
the probability with which the random walk arrives at
this vertex. Formally, for each vertex v in V, it is easy
to see that
X
uv
π(u)p(u, v)
X
uv
π(v)p(v, u) = 0.
Summing the above equation over the vertices of S
(see also Fig. 2), then we have
X
v S
Ã
X
uv
π(u)p(u, v)
X
uv
π(v)p(v, u)
!
=
X
[u,v ]S
c
π(u)p(u, v)
X
[u,v ]S
π(u)p(u, v) = 0,
which completes the proof.
From Proposition 4.1, we may partition the vertex set
of a directed graph into two nonempty parts S and S
c
by minimizing
Ncut(S) = vol S
µ
1
vol S
+
1
vol S
c
, (6)
which is a directed generalization of the normalized
cut criterion for undirected graphs. Clearly, the ra-
tio of vol S to vol S is the probability with which the

Citations
More filters
BookDOI

Semi-Supervised Learning

TL;DR: Semi-supervised learning (SSL) as discussed by the authors is the middle ground between supervised learning (in which all training examples are labeled) and unsupervised training (where no label data are given).
Book

Introduction to Semi-Supervised Learning

TL;DR: This introductory book presents some popular semi-supervised learning models, including self-training, mixture models, co-training and multiview learning, graph-based methods, and semi- supervised support vector machines, and discusses their basic mathematical formulation.
Journal ArticleDOI

Feature Selection: A Data Perspective

TL;DR: This survey revisits feature selection research from a data perspective and reviews representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data, and categorizes them into four main groups: similarity- based, information-theoretical-based, sparse-learning-based and statistical-based.
Journal ArticleDOI

Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation

TL;DR: The model, which nicely fits into the so-called "statistical relational learning" framework, could also be used to compute document or word similarities, and could be applied to machine-learning and pattern-recognition tasks involving a relational database.
References
More filters

Statistical learning theory

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Proceedings Article

The PageRank Citation Ranking : Bringing Order to the Web

TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
Journal ArticleDOI

Normalized cuts and image segmentation

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Proceedings ArticleDOI

Normalized cuts and image segmentation

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Journal ArticleDOI

Authoritative sources in a hyperlinked environment

TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.
Frequently Asked Questions (15)
Q1. What are the contributions in "Learning from labeled and unlabeled data on a directed graph" ?

The authors propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed numerical techniques. In the absence of labeled instances, this framework can be utilized as a spectral clustering method for directed graphs, which generalizes the spectral clustering approach for undirected graphs. 

For a strongly connected graph, there is an integer k ≥ 1 and a unique partition V = V0∪V1∪· · ·∪Vk−1 such thatfor all 0 ≤ r ≤ k − 1 each edge [u, v] ∈ E with u ∈ 

Given an n × n invertible matrix A, the time required to compute the inverse A−1 is generally O(n3) and the representation of the inverse requires Ω(n2) space. 

For an undirected graph G = (V,E), it is well-known that the stationary distribution of the natural random walk has a closed-form expression π(v) = d(v)/ volV, where d(v) denotes the degree of the vertex v, and volV = ∑ u∈V d(u). 

Given a directed graph G = (V, E), it may be partitioned into two parts as follows:1. Define a random walk over G with a transition probability matrix P such that it has a unique stationary distribution. 

Compute an eigenvector Φ of Θ corresponding to the second largest eigenvalue, and then partition the vertex set V of G into the two parts S = {v ∈ V |Φ(v) ≥ 0} and Sc = {v ∈ V |Φ(v) < 0}. 

In the absence of labeled instances, their approach reduces to a spectral clustering method for directed graphs, which generalizes the work of Shi and Malik (2000) that may be the most popular spectral clustering scheme for undirected graphs. 

Giventhat the authors are currently at vertex u with d+(u) > 0, the next step of this random walk proceeds as follows: (1) with probability 1 − η jump to a vertex chosen uniformly at random over the whole vertex set except u; and (2) with probability ηw([u, v])/d+(u) jump to a vertex v adjacent from u. 

At the beginning of this section, the authors assume the graph to be strongly connected and aperiodic such that the natural random walk over the graph converges to a unique and positive stationary distribution. 

Given that the authors are currently at vertex u, the next step of this random walk proceeds as follows: first jump backward to a vertex h adjacent to u with probability p−(u, h) = w([h, u])/d−(u); then jump forward to a vertex v adjacent from u with probability p+(h, v) = w([h, v])/d+(h). 

In the absence of labeled instances, this framework can be utilized in an unsupervised setting as a spectral clustering method for directed graphs. 

Define an indicator function h ∈ H(V ) by h(v) = 1 if v ∈ S, and −1 if v ∈ Sc. Denote by ν the volume of S. Clearly, the authors have 0 < ν < 1 due to S ⊂ G. 

The authors first define a combinational partition criterion, which generalizes the normalized cut criterion for undirected graphs (Shi & Malik, 2000). 

In the absence of labeled instances, as shown in section 4, this framework can be utilized as a spectral clustering approach for directed graphs. 

a pair of vertices linked by an edge are likely to have the same label; moreover, vertices lying on a densely linked subgraph are likelyto have the same label.