scispace - formally typeset
Open AccessJournal ArticleDOI

Maintaining bipartite matchings in the presence of failures

Edwin H.-M. Sha, +1 more
- 01 Aug 1993 - 
- Vol. 23, Iss: 5, pp 459-471
TLDR
The algorithm is deadlock-free and, with k failures, maintains at least M – k matching pairs during the reconfiguration process, where M is the size of the original maximum matching.
Abstract
We present an on-line distributed reconfiguration algorithm for finding a new maximum matching incrementally after some nodes have failed. Our algorithm is deadlock-free and, with k failures, maintains at least M – k matching pairs during the reconfiguration process, where M is the size of the original maximum matching. The algorithm tolerates failures that occur during reconfiguration. The worst-case reconfiguration time is O(k min(|A|, |B|)) after k failures, where A and B are the node sets, but simulations show that the average-case reconfiguration time is much better. The algorithm is also simple enough to be implemented in hardware. © 1993 by John Wiley & Sons, Inc.

read more

Content maybe subject to copyright    Report

Maintaining Bipartite Matchings in the
Presence
of
Failures*
Edwin Hsing-Mean Sha
Department
of
Computer
Science
&
Engineering, University
of
Notre
Dame,
Notre Dame,
Indiana
46556
Kenneth Steiglitz
Department
of
Computer
Science,
Princeton University, Princeton,
New
Jersey 08544
We present an on-line distributed reconfiguration algorithm for finding a new maximum matching
incrementally after some nodes have failed. Our algorithm is deadlock-free and, with
k
failures,
maintains at least
M
-
k
matching pairs during the reconfiguration process, where
M
is the size of
the original maximum matching. The algorithm tolerates failures that occur during reconfiguration.
The worst-case reconfiguration time is
O(k
min(lAl,161)) after kfailures, whereA and
6
are the node
sets, but simulations show that the average-case reconfiguration time is much better. The algorithm
is also simple enough to be implemented
in
hardware.
0
1993
by
John
Wiley
&
Sons,
Inc.
1.
INTRODUCTION
Imagine that there are
n
persons in Village
A
and
m
in
Village
B.
Two persons from different villages can be
matched to become a couple, and at any time, only one
person can be matched to another. Initially, the match-
ing is maximum. Sometimes, however, people decide
to be alone. Without
loss
of generality, assume that
some in
B
change their minds. Let
G
=
(A,
B,
E)
be a
bipartite graph and
(A(
=
n,
(B(
=
m.
An edge between
two nodes means that they are allowed to become a
couple. After
a
person
b
has changed his
or
her mind,
b’s
original matching in
A
must find another available
one in
B,
if possible.
The
process
of finding a new matching to obtain the
maximum number of pairs
is
called
reconfiguration.
Unfortunately, there is no central agency to perform
*This
work was supported
in
part
by
NSF
Grant
MIP-8912100,
and
U.S.
Army
Research Office-Durham
Grant
DAAL03-89-K-
0074.
the reconfiguration process,
so
this process must be
done in a distributed and parallel way. It
is
also desir-
able that, during the reconfiguration process, as many
matched pairs be maintained as possible and that fail-
ures during the process be tolerated. Ideally, there
should always be at least
M
-
k
matching pairs after
k
persons have changed their minds, where
M
is the
original number of matching pairs. The number of
matching pairs should monotonically increase in the
reconfiguration process. Therefore, if no new persons
change their minds, the reconfiguration process will
finally regenerate a new maximum matching, if one is
possible.
One motivation for this problem is that such an al-
gorithm can be applied to any fault-tolerant system
that involves bipartite matching.
For
example, Kuo
and Fuchs
[5]
showed that many problems of spare
allocation in
VLSI
arrays can be modeled as bipartite
matching. Based on our bipartite matching algorithm,
we can have
a
distributed reconfiguration mechanism
to replace faulty nodes by spare nodes in a redundant
NETWORKS,
Vol.
23 (1993) 459-471
0
1993 by
John
Wiley
13
Sons,
Inc.
ccc
0028-3045/93/050459-13
459

460
SHA
AND
STElGLln
array. In
[9,
101,
highly reliable structures
with
the
asymptotically optimal number of nodes and edges for
one-dimensional and treelike array architectures were
given. They used bipartite matchings between levels
in
layered graphs and
so
these are particularly well suited
for the run-time-tolerant algorithm described
in
this
paper.
The general matching problem has been extensively
studied. For maximum matching in bipartite graphs,
the algorithm of Hopocroft and Karp
[3]
is
the fastest
known, and the algorithm by Micali and Vazirani
[6]
is
the
most efficient one for finding matchings
in
general
graphs. More recently, an algorithm for on-line bipar-
tite matching was presented
[4].
Some papers
[H,
Ill
also gave distributed algorithms for maximum match-
ing
in
general graphs.
Our
problem is different from the usual matching
problem, which starts with
an
empty matching. We
assume that we start with a maximum matching, and
after some nodes fail, we would like to have a simple,
efficient, and distributed way to find a new maximum
matching. Further, the algorithm should start to recon-
figure the system as soon as failures occur, even
though new failures may occur during the reconfigura-
tion process. We say a reconfiguration algorithm is
on-
line
if
it
can start to reconfigure the system immedi-
ately after
a
failure occurs and can endure new failures
during reconfiguration. This
is
an especially desirable
property for run-time fault tolerance, since the system
need not stop
to
do
a
reconfiguration process.
We
will
not be concerned
so
much with the number
of messages that
PEs
need
to
send to achieve a new
matching, such
as
is done
in
the matching algorithms
in
[8,
111,
which,
in
any event, are not designed to
operate
in
the presence of faults. Rather, we want to
minimize the effects of failures during reconfiguration.
Our algorithm does tolerate faults during operation
and ensures that after
k
failures there are always at
least
M
-
k
matching pairs, where
M
is the original
number of matching pairs.
If
there are no further fail-
ures, the size of the matching grows monotonically
until
it
becomes maximum. The algorithm is simple
enough to be implemented in hardware. The overall
reconfiguration time is
O(k
min(lA1,
\El))
after
k
fail-
ures. The simulation results show that the average-
case reconfiguration time is much better.
2.
THE BASIC IDEAS
OF
OUR
ALGORITHM
We
first explain our model: An array architecture
is
represented by
a
graph
G;
each node of
G
is
regarded
as a processor, and each edge as a connection between
two processors. If nodes have failed, the failed nodes
and
all
the edges incident
to
them will be removed.
If
later
a
failed node is repaired, this node with the corre-
sponding edges will be added to the graph. We assume
that
if
two nodes have not failed, and are connected,
they can communicate, i.e., we do not model failures
of communication.
Definition
2.1.
Given
a
bipartite graph
G
=
(A,
B,
E),
a
matching
M
is
a subset of the edges
such
that
no
two
edges
in
A4
share the same end node.
Definition
2.2.
If an edge
(a,
b)
is
in
M,
we say that
a
is
b's
matching
node
in
M
or vice versa. This pair
(a,
b)
is
also called matching pair or a matching edge. If no
edge
in
M
is connected
to
node
x,
we say
x
is a
free
node.
Definition 2.3.
A matching is
maximum
if
no other
matching of
G
contains more edges. Given a matching
M,
an
alternatingpath
P
is a path that does not contain
two consecutive edges that are not
in
M.
If
an alternat-
ing path
P
starts and ends at free nodes,
it
is an
aug-
menting path.
It
is well known that
M
is not a maximum matching if
and only if there is an augmenting path. Our algorithm
searches for augmenting paths to obtain the maximum
matching of
G.
After some nodes have failed, the search for aug-
menting paths to find free nodes will traverse
the
graph. Basically, our algorithm performs a depth-first
search for finding free nodes.
In
this section, we de-
scribe our algorithm informally. A formal description
of our algorithm
is
given in the next section. Let
G
=
(A,
B,
E)
be
a
bipartite graph. We think
of
sets
A
and
B
as two levels of nodes
in
a bipartite graph. Initially, we
assume
that
a maximum matching already exists. An
initial maximum matching can be obtained from our
algorithm
in
the following way: Initially, every node in
A
regards its matching node as failed and starts to
run
the bipartite matching algorithm. We assume that a
failure of a matched node can be detected by its cur-
rent matching node.
Nodes in both
A
and
B
can fail. For failures
in
B
(resp.,
A),
nodes in
A
(resp.,
B)
will search for free
nodes. We have two versions of our algorithm: Ver-
sion
A
is for failures
in
B
and Version
B
is for failures
in
A. These versions are the same except
A
and
B
are
interchanged. However, if our algorithm is to be used
as a reconfiguration algorithm for the layered fault-
tolerant structure in
[
101,
we only need the Version A
because each layer can be regarded
as
level
A.
Let
a
be
a
matched node in
A
and
b
be
a's
matching
node
in
B.
If node
b
fails, Version
A
of
our
reconfigura-
tion starts
at
node
a.
Node
a
becomes what
we
call
a
supernode
because
it
has
the
privilege of choosing
a

MAINTAINING BIPARTITE MATCHINGS
461
(b)
s
Super Node
-
Current matching edge
Fig.
1.
The figure for passing supernodes.
good node to be its matching node. If a node
in
A
fails,
the matching node of this failed node
will
become a
supernode to initiate Version
B
of our reconfiguration
algorithm. These two versions of our algorithm are
performed independently to obtain a maximum match-
ing. In this section, without
loss
of generality, we only
explain Version
A.
However, we need to show that the
failures
in
A
do not affect the correctness of the Ver-
sion
A.
Here, we explain what the actions a supernode
a
will
do.
First, supernode
a
tries to find a free node
in
B
that
is
connected to
a.
If
this node is available,
it
becomes
a's
matching node. Otherwise, supernode
a
will
try to
steal a node that is already matched to another node
in
A.
For example,
in
Figure
1,
after node
b
fails,
a
be-
comes a supernode. Since there is no free node con-
nected to
a,
a
will steal node
b'
that was matched to
a'.
Definition
2.4.
If a supernode
x
chooses
a
node
y
that
has been matched to
xr
to be its new matching node,
we say that
x
steals
y
from
XI.
After
b'
has been
stolen
by
a,
node
a'
will become a
supernode because
a'
does not have a matching node.
We can
think
of this process as the token of
supernode
traversing the path from node
a
to node
a'
[Fig. l(b)].
A
root node
is
a
node that initiates a search process for
finding a new matching after its matching node be-
comes faulty. The root node
is
the first supernode
in
a
search process. There may be several searches going
on simultaneously, each having a root node.
Our
algorithm does a depth-first search (DFS) for
finding augmenting paths
[7].
The process of searching
can be represented as a search tree called an
alternat-
ing
tree.
A
typical alternating tree is shown
in
Figure
2.
Each root node is the root of an alternating tree, and
at any time, a supernode is associated
with
the node
that is performing DFS
in
a tree. There will be pre-
cisely one supernode
in
each alternating tree.
A
new
matching is found when a supernode acquires a free
node. To prevent cycles
in
searching, we can simply
store a bit
in
each node
b
to
indicate if
it
has been
Current
-
matching
edge
x
Fig.
2.
An example of alternating tree.
reached.
We say that this node is marked
reached.
When a supernode finds a free node,
this
supernode
sends messages to unmark the corresponding nodes,
as explained later
in
this section.
If a supernode at a particular point cannot find an
adjacent free node, and finds that all the adjacent
nodes are marked
reached
(either
by
this tree search
or
some other), it backtracks immediately. Under
backtracking, some supernodes may backtrack to root
nodes, and these supernodes remain there
in
an idle
state. Thus, we need a way to reactivate when some
other supernodes find free nodes. After a supernode
has found a free node, this supernode sends a mes-
sage, called
UNMARK-BACKTRACK,
recursively to
unmark all the nodes that have been passed through by
a backtracking supernode along an alternating path.
For
example,
in
Figure
3
there are two idle
su-
pernodes,
S1
and
S2.
After
S3
has found a free node,
S3
will send the message,
UNMARK-BACKTRACK
to wake
up
the idle super nodes
Sl
and
S2.
Versions
A
and
B
of
our
algorithms are performed
alternatively. In each version, there are three phases
as shown in Figure
4.
Every node performs the same
52
idle
0
\
A
B
0
s3
has found an
A
/
unmatched node
b
O'b
Fig.
3.
An
example
of
breaking idleness.
B

462
SHA
AND
STEIGLI’TZ
Version
A
Version
B
Version
A
Version
B
Fig.
4.
A
running sequence
of
our algorithm; each version has
three
phases.
phase
in
the same version. Therefore, we need to syn-
chronize all the nodes to perform the same version and
the same phase.
Our
possible implementation is to use
common wires connected to every node. Because we
consider
our
algorithm to be performed
in
tightly cou-
pled processor arrays, few wires connected to every
node
(PE)
are practical assumptions. We can assume
there are three signal wires connected to every node
(PE).
Wire
wCLOCK
is the clock wire to synchronize the
phases of a clock. Wires
wA
and
wE
are to indicate
which version is running. When
wA
(resp.,
M’~)
is high,
Version
A
(resp.,
B)
is running. If we do not want to
use these common wires, we can use more compli-
cated message passing protocol for synchronization
[I].
3.
OUR RECONFIGURATION ALGORITHM
In
this
section, we explain our algorithm. Since Ver-
sions
A
and
B
are essentially the same, we only
present Version
A
in
this section. First, we define
some terms for Version
A
of
our
algorithm:
Definition
3.1.
The node
old(n)
is
n’s
original match-
ing node before the reconfiguration, and the node
cur(n)
is
n’s
current matching node during reconfigura-
tion.
Initially, for every node
n,
we set
cur(n)
=
old(n).
In
our
algorithm, there are several attributes for
nodes
in
A
and
B,
which are used and set during the
operation of the algorithm. First, any node is
good
if
it
has not malfunctioned. The attributes of a node
b
in
B
are summarized as follows:
A
node
b
E
B
is
free
if
it
has no matching node under the current
matching,
reached
if
it
has been reached by some DFS
in
our
algorithm. When a node
is
not reached, we say that
this
node is
unreached.
The attributes of a node
a
E
A
that is reached by some
search process can be marked by message passing as
follows: Node
a
E
A
is
super
if
cur(a)
is not good,
or
it
is unmatched be-
cause its matching node
cur(a)
has been stolen by
some other node;
backtracked
if a search that reaches node
a
finishes
searching node
a’s
subtree and must backtrack to
a’s
parent.
We call a node super
if
and only
if
it
has
a
su-
pernode token.
This token can be transferred to other
nodes along the DFS traversed
in
our
algorithm. Mes-
sages need
to
be passed in
our
algorithm for changing
the current
states
of
nodes
a
E
A.
There are three
messages
that
can be sent:
SUPERNODE,
UNMARKJACKTRACK,
and
CHANGE-OLD-
MATCHING.
We discuss these three messages one by
one as follows:
1.
The message
SUPERNODE
represents the
su-
pernode token. If node
a
receives the message
SUPERNODE,
a
becomes the supernode. There
are two situations when a node
a
sends this mes-
sage. The first situation is when node
a
steals some
other’s matching node. The second situation
will
be
explained later in the section (see Fig.
5).
2.
After a supernode
s
has found a free and good node
in
B,
s
will send the message
UNMARK-
cur(b)
%matching
sunds
9
super
node
for
Version
oh
matchinn
01
matching
&
(a)
Fig.
5.
A
failure
in
A
that
is
in
an active alternating path.

MAINTAINING BIPARTITE MATCHINGS
463
BACKTRACK
to all the backtracked nodes that are
adjacent to node
old(s).
This message is used to set
some nodes in
B
as
not
reached
so
that some idle
supernodes can start
to
search for free nodes.
When
a
node
a
E
A
receives the message
UNMARK-BACKTRACK,
a
will set node
old(a)
as
unreached.
Then, after
a
sends
UNMARK-BACKTRACK to
old(a),
old(a)
will
immediately send this message to all the back-
tracked nodes that are adjacent to old@).
3.
When a supernode finds
a
free and good node
in
B.
this supernode will send the message
CHANGE-
OLD-MATCHING
to
the
nodes in
the
alternating
path
so
that their old matching nodes are set to be
the current matching nodes. When
a
node
a
gets the
message CHANGE-OLD-MATCHING, node
a
will
mark
the node
old(a)
as unreached and ask
old@)
to send UNMARK-BACKTRACK to all the
backtracked nodes that are adjacent
to
old(a).
Our algorithm runs in parallel at all
the
nodes. Ini-
tially, there
is
a
bipartite maximum matching. In Phase
1,
each node checks
if
it
needs to initiate a searching
process because of the failure of its current matching
node.
The real search process
is
performed in Phase
2.
If
the supernode
a
is successful in finding a free and
good node,
a
sends messages CHANGE-OLD
-MATCHING and UNMARK-BACKTRACK as we
explained previously. If node
a
cannot find a free
node, node
a
will try to steal others’ matching nodes.
The supernode
a
will steal an unreached and good
node
6,
and send the message SUPERNODE to node
cur(b).
Otherwise,
if
all
a’s
adjacent nodes have been
marked
reached
and the node
old(a)
is good,
a
will
backtrack. Node
a
will retain its old matching node
and send SUPERNODE to node
cur(old(a)).
Other-
wise, if the supernode token has backtracked to a root
node, this supernode token will wait there.
In Phase
3,
node
a
will do the appropriate opera-
tions depending on which message
a
has received. If
there are failures in
A,
their corresponding old match-
ing nodes become supernodes. We will explain the de-
tails later. In Version
A,
a supernode
in
A
should not
steal any supernode in
B,
since these supernodes
in
B
will start their searches later
in
Version
B.
Denote by
N
the node that is performing the following algorithm.
The following
is
a sketch of Version
A
of our algorithm
that runs at
all
the nodes in
A
in
parallel. A more
detailed algorithm is presented in the Appendix.
/*
Let set
E
be the set of nodes
in
B
which are good,
adjacent to
N,
and not supernodes.
*I
Phase
1
If
cur(N)
is
not good,
N
is
a
supernode.
Phase
2
If
N
is a supernode
If there
exists
a free node
in
E
Set
old(N)
to
be not reached
Ask
old(N)
to send CHANGE-OLD-
MATCHING to
cur(old(N))
Ask
old(N)
to send UNMARK-BACKTRACK
to all adjacent backtracked nodes
Else if
N
can steal an unreached node
b
in
E
Ask
b
to
send SUPERNODE to
cur(b)
Else if
old(N)
is
good
backtrack from
N
Else
Do nothing
Phase
3
If
N
receives SUPERNODE
Set
N
to be a
supernode.
If
N
receives CHANGE-OLD-MATCHING
Set
old(N)
to be not reached
Ask
old(N)
to send CHANGE-OLD-
MATCHING
to
cur(old(N))
Ask
old(N)
to send UNMARK-BACKTRACK
to
all
adjacent backtracked nodes
Set
old(N)
to be not reached
Ask
old(N)
to send UNMARK-BACKTRACK
to
all
adjacent backtracked nodes
If
N
receives UNMARK-BACKTRACK
We would like
to
discuss the operations that nodes
in
B
perform
in
Version
A.
We need to define the
following terms:
Definition
3.2.
A supernode
a
is called
idle
if
node
a
is
a supernode and every adjacent node of
a
is labeled
reached,
and
old(a)
is
not good; otherwise,
a
su-
pernode is called
active.
We say an alternating path is
active
if
the
corresponding supernode is active.
In Version
A,
nodes in
B
basically perform the mes-
sage passing for nodes in
A.
However, when there are
failures
in
A,
the old matching nodes of these failures
become supemodes. These supernodes in
B
do not
perform any search while the algorithm is running Ver-
sion
A,
but they need to
do
some operations for nodes
in
A.
There are two cases for failure of a node
a
in
A:
Either
a
is
not in an active alternating path or
a
is.
Let
b
be the old matching node of
a.
If
a
is
not
in an active
alternating path,
b
will do nothing except become
a
supernode for Version
B.
If
a
is in
an
active alternating
path as Figure 5(a) shows,
b
becomes a supernode
and initiates
a
backtracking to
cur(b)
[b
sends
SUPERNODE
to
cur(b)].
This backtracking
is
to re-
store the alternating path. We can regard this original

Citations
More filters
Book ChapterDOI

Subtyping recursive types modulo associative commutative products

TL;DR: This work sets the formal bases for building tools that help retrieve classes in object-oriented libraries by proposing subtyping of recursive types in the presence of associative and commutative products as a model of the relation that exists between the user's query and the tool's answers.
References
More filters
Journal ArticleDOI

An $n^{5/2} $ Algorithm for Maximum Matchings in Bipartite Graphs

TL;DR: This paper shows how to construct a maximum matching in a bipartite graph with n vertices and m edges in a number of computation steps proportional to $(m + n)\sqrt n $.
Proceedings ArticleDOI

An O(v|v| c |E|) algoithm for finding maximum matching in general graphs

TL;DR: An 0(√|V|¿|E|) algorithm for finding a maximum matching in general graphs works in 'phases'.
Proceedings ArticleDOI

An optimal algorithm for on-line bipartite matching

TL;DR: This work applies the general approach to data structures, bin packing, graph coloring, and graph coloring to bipartite matching and shows that a simple randomized on-line algorithm achieves the best possible performance.
Journal ArticleDOI

Complexity of network synchronization

TL;DR: A new simulation technique, referred to as a synchronizer, which is a new, simple methodology for designing efficient distributed algorithms in asynchronous networks, is proposed and is proved to be within a constant factor of the lower bound.
Book

Configuration of VLSI arrays in the presence of defects

TL;DR: The penalties for configuring VLSI arrays for yield enhancement are assessed and algorithms are presented that connect any fraction R < I - p of the dements with yield approaching one as N increases.
Related Papers (5)