scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Improved Fast Rerouting Using Postprocessing

TL;DR: This paper presents an algorithmic framework for improving a given FRR network decomposition, using postprocessing, based on iterative arc swapping strategies and supports a number of use cases, from strengthening the resilience to improving the quality of the resulting routes.
Abstract: To provide fast traffic recovery upon failures, most modern networks support static Fast Rerouting (FRR) mechanisms for mission critical services. However, configuring FRR mechanisms to tolerate multiple failures poses challenging algorithmic problems. While state-of-the-art solutions leveraging arc-disjoint arborescence-based network decompositions ensure that failover routes always reach their destinations eventually, even under multiple concurrent failures, these routes may be long and introduce unnecessary loads; moreover, they are tailored to worst-case failure scenarios. This paper presents an algorithmic framework for improving a given FRR network decomposition, using postprocessing. In particular, our framework is based on iterative arc swapping strategies and supports a number of use cases, from strengthening the resilience (e.g., in the presence of shared risk link groups) to improving the quality of the resulting routes (e.g., reducing route lengths and induced loads). Our simulations show that postprocessing is indeed beneficial in various scenarios, and can therefore enhance today's approaches.

Summary (4 min read)

1 INTRODUCTION

  • Communication networks have become a critical infrastructure of their digital society: enterprises which outsource their IT infrastructure to the cloud, as well as many applications related to health monitoring, power grid management, or disaster response [1], depend on the uninterrupted availability of such networks.
  • When encountering a failure, a packet is rerouted onto the next arborescence according to some pre-defined order.
  • This paper presents an algorithmic framework for postprocessing state-of-the-art FRR mechanisms based on network decompositions, to improve resilience, performance, and flexibility, of fast rerouting.
  • The authors show that they do not limit ourselves by focusing on arc-disjoint arborescence network decompositions by proving that arborescence-based decompositions are as good as any deterministic local failover method.

2 IMPOSSIBILITY OF BEATING ARBORESCENCES

  • The authors first motivate their focus on failover algorithms based on arborescence network decompositions, showing that this approach does not only provide a high resilience but also competitive route qualities (in terms of lengths).
  • The additive stretch of the routing scheme is then the maximum stretch along all failover routes, i.e., from all v to t.
  • The authors start with some definitions for arborescence-based re-routing.
  • For those nodes, there is no shorter path to the destination after such a failure, and hence from a competitive point of view, their failover route is optimal.
  • To this end, the authors will show that there are k-connected k-regular graphs where every deterministic local algorithm has to take large detours, even though short routes are available.

3 THE POSTPROCESSING FRAMEWORK

  • This section presents their algorithmic framework to postprocess arborescence-based network decompositions for improved resilience and performance.
  • The authors consider two classes of objectives in this paper and present two examples each.
  • From the above correctness conditions (1,4,5) are always satisfied, while (2,3) are irrelevant.
  • Based on this arc-swap operation, the idea of their algorithmic framework is then to swap arcs only if it improves a certain objective function, see Algorithm 2.
  • When exactly two arcs e, e′ are swapped in a given valid arborescence decomposition, both must be outgoing from the same node v, else at least one arborescence will be disconnected, i.e., the decomposition is invalid.

4 USE CASES AND EVALUATION

  • The authors framework for postprocessing a decomposition can be configured with different objective functions, depending on the specific needs.
  • In the following, the authors discuss and evaluate different use cases, namely two traffic scenario optimization use cases (for stretch/load) and two pure network decomposition optimizations (SRLG and independent paths).
  • For the experimental evaluation the authors generate 100 instances of undirected (bi-directional) 5-regular random graphs with 100 nodes with the NetworkX library3 implementation of Steger and Wormwald’s algorithm [16].
  • The authors then compare the unoptimized and optimized arborescences by failing a fraction of the network links picked at random, and simulate a circular arborescence routing process on the resulting infrastructure.
  • In the latter case, they continue on the next available arborescence, i.e., if a packet has used arborescence Ti up to the failed link, it will then follow arborescence Ti+1 provided that the corresponding outgoing link is available, or try arborescences Ti+2, . . . otherwise.

4.1 Impact of the Original Network Decomposition

  • The authors first study the impact of the network arborescence decomposition algorithm (that is, the input of the optimization process) on the optimization efficiency, before analyzing the optimization scenario in more detail.
  • Both of them are described next in more detail.
  • Hence, as each arc of the O(E) arcs might get tested O(k) times, the construction finishes in O(|E|2k2).
  • The greedy decomposition is analogous to the random decomposition and is used for the experimental evaluation in [17].
  • First, one can observe that the Random arborescence decomposition (top) performs worse than the Greedy arborescences decomposition before optimization : for instance facing x = 20 random link failures, the median stretch is 11 for Random and only 5 for Greedy, and 10% of the samples have a stretch above 22 for Random, and only above 9 for Greedy.

4.2 Optimization Use Cases

  • A first fundamental objective is to ensure that failover routes are short.
  • Given a subset of nodes that are deemed crucial and need to send packets to some destination node (the root of the arborescence) as well as a set of links highly susceptible to failures, the packets should reach the destination even if all these links or a subset of them failed with short detours.
  • The two next metrics exhibit a mirrored trend compared to the figure of stretch: optimizing load efficiently reduces the load in both median and 10% worst cases.
  • When the number of SRLG links increases, the algorithms manages to put proportionally less such links in the last arborescences.
  • Thus paths are already independent with a high probability (949/1000 on average), and that this quantity varies considerably across networks (high dispersion of values).

4.3 Runtime Analysis

  • The authors now turn their attention to the runtime of their optimization framework.
  • The single-threaded code is executed on a 24-core Intel Xeon E5-2620 platform with 32Gb memory.
  • Figure 9 presents the distribution of those results.
  • It shows that optimizing stretch or load on a 80-nodes topology takes on average around 750 seconds.
  • Quite surprisingly, connectivity only has a slight impact on runtime.

4.4 Optimizing Network Decomposition Heuristics

  • So far in this section, the authors evaluated their postprocessing framework on network decomposition algorithms that always yield a valid output.
  • Recent work [14] also proposed a heuristic called Bonsai that attempts to generate arborescences of small depth, with no guarantees if a valid output may be produced.
  • This is in contrast to the random and greedy schemes, which build arborescences sequentially.
  • Even though the Bonsai round-robin scheme outperforms the greedy and random schemes regarding stretch quality in evaluations in [14], it has the downside that it might not produce a valid decomposition.

4.5 Experiments on Real World Graphs

  • To complement their experiments on synthetic graphs, the authors also ran them on well-connected cores of network topologies, taken from the Topology Zoo data set [18].
  • The authors trim the Topology Zoo graphs s.t. only the well-connected cores remain, as follows.
  • Next, the authors replace nodes that have a degree 3 with three edges between the three affected neighbors.
  • The results of the experiments are very similar to the results on synthetic graphs.
  • In all cases, the optimizations are computed quickly and yield improvements in the same percentage range as the authors have observed on synthetic graphs.

5 AN EXAMPLE ILP MODEL FOR THE CIRCULAR ROUTING SCHEME

  • The existence of a valid circular routing scheme based on k arc-disjoint spanning arborescences in a given network graph containing a known set of failed links can also be analyzed with the aid of Integer Linear Programming (ILP) tools.
  • To illustrate one of the possible approaches, the authors formulate an example mathematical model of the corresponding ILP optimization problem for path lengths and stretch below.
  • The remaining terms in Formula (2) guarantee that the corresponding binary variables are set to 0, unless the positive value is required to satisfy the constraints.
  • Then, the authors eliminate the forbidden combinations of used arborescences, which is enforced by the following groups of constraints (16: Non-consecutive trees A)(19: Prohibited rerouting B).

7 CONCLUSION

  • This paper was motivated by the computational challenges involved in computing network decompositions which do not only provide basic connectivity but also account for the quality of routes after failures.
  • The authors proposed and evaluated a simple solution which improves an arbitrary network decomposition, using fast postprocessing, in terms of basic traffic engineering metrics such as route length and load.
  • Furthermore, the authors showed that their framework can also be used to improve resiliency for shared risk link groups: an important extension in practice.
  • Lastly, in order to guarantee reproducibility and facilitate other researchers to build upon their algorithms, their code is publicly available at https://gitlab.cs.univie.ac.at/ctpapers/fast-failover.

Did you find this useful? Give us your feedback

Figures (11)

Content maybe subject to copyright    Report

HAL Id: hal-03048830
https://hal.laas.fr/hal-03048830
Submitted on 11 Dec 2020
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Improved Fast Rerouting Using Postprocessing
Klaus-Tycho Foerster, Andrzej Kamisinski, Yvonne-Anne Pignolet, Stefan
Schmid, Gilles Trédan
To cite this version:
Klaus-Tycho Foerster, Andrzej Kamisinski, Yvonne-Anne Pignolet, Stefan Schmid, Gilles Tré-
dan. Improved Fast Rerouting Using Postprocessing. IEEE Transactions on Dependable and Se-
cure Computing, Institute of Electrical and Electronics Engineers, 2022, 19 (1), pp.537 - 550.
�10.1109/TDSC.2020.2998019�. �hal-03048830�

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 1, NO. 1, MAY 2020 1
Improved Fast Rerouting Using Postprocessing
Klaus-Tycho Foerster , Andrzej Kamisi
´
nski , Yvonne-Anne Pignolet ,
Stefan Schmid , and Gilles Tredan
Abstract—To provide fast traffic recovery upon failures, most modern networks support static Fast Rerouting (FRR) mechanisms for
mission critical services. However, configuring FRR mechanisms to tolerate multiple failures poses challenging algorithmic problems.
While state-of-the-art solutions leveraging arc-disjoint arborescence-based network decompositions ensure that failover routes always
reach their destinations eventually, even under multiple concurrent failures, these routes may be long and introduce unnecessary loads;
moreover, they are tailored to worst-case failure scenarios.
This paper presents an algorithmic framework for improving a given FRR network decomposition, using postprocessing. In particular, our
framework is based on iterative arc swapping strategies and supports a number of use cases, from strengthening the resilience (e.g., in
the presence of shared risk link groups) to improving the quality of the resulting routes (e.g., reducing route lengths and induced loads).
Our simulations show that postprocessing is indeed beneficial in various scenarios, and can therefore enhance today’s approaches.
Index Terms—Resilience, fault-tolerance, computer networks, failover.
F
1 INTRODUCTION
Communication networks have become a critical infrastruc-
ture of our digital society: enterprises which outsource their
IT infrastructure to the cloud, as well as many applications
related to health monitoring, power grid management, or dis-
aster response [1], depend on the uninterrupted availability
of such networks. To meet their dependability requirements,
most modern networks provide static Fast Rerouting (FRR)
mechanisms [2], [3], [4], [5]. Since FRR mechanisms pre-
configure conditional failover behaviors, they enable a very
fast traffic recovery upon failures, which only involves the
data plane but not the (typically much slower [6]) control
plane.
However, while allowing to pre-configure conditional
failover behavior is the key benefit of FRR, enabling the
fast response to failures, it is also the key challenge when it
comes to designing algorithms for such mechanisms: as the
conditional failover behavior needs to be configured before
the failures are known, the algorithmic problem of how to
optimally configure the failover rules at the different routers,
for all possible failures, seems inherently combinatorial. The
problem is particularly challenging in scenarios where packet
headers cannot be used to carry meta-information about en-
countered failures: such header rewriting is often undesired
and introduces overhead (related to header rewriting itself,
but also in terms of additional rules required at the routers
to process such information).
While FRR technology has been used for many years al-
ready in modern communication networks, a major algorith-
mic result on how to configure FRR mechanisms is relatively
K.-T. Foerster and S. Schmid are with the Faculty of Computer Science at
the University of Vienna, Austria.
Andrzej Kamisi´nski is with the AGH University of Science and Technology,
Poland.
Yvonne-Anne Pignolet is with DFINITY, Switzerland.
Gilles Tredan is with LAAS-CNRS, France.
Manuscript submitted December 15, 2019, revised May 11, 2020.
recent: Chiesa et al. [7], [8] showed that by decomposing the
network into
k
arc-disjoint spanning arborescences [9], highly
resilient FRR configurations can be defined. Edmonds [10]
proved that
k
-connected graphs always allow for
k
such
arborescences, and they can be computed rapidly [9].
However, Chiesa et al.’s conjecture that for any
k
-
connected graph, there exists a failover routing resilient
to any
k 1
failures, remains an open problem. What is
more, while this network decomposition approach ensures
connectivity, the failover routes may be far from optimal
regarding latency (i.e., route length) and congestion.
The goal of this paper is to improve the network decom-
position approach, in terms of resilience, performance, and
flexibility. In particular, we are motivated by the observa-
tion that in practice, additional information about failure
scenarios and failover objectives may be available, e.g., about
shared risk link groups [11], [12], [13] or about critical flows
for which it is important to be routed along short paths, even
after failures. Existing optimizations of arborescence-based
failover schemes are oblivious to such aspects.
Model.
In a nutshell, we consider the problem of pre-
defining (static) conditional failover rules at network’s nodes
(i.e., switches or routers), which define to which link to
forward an incoming packet. These forwarding rules can
only depend on the destination
t
, the in-port at which a
packet arrives at the current node, as well as the status of the
links directly incident to the node. At the same time, they
should not depend on non-local failures or the packet source.
In particular, we do not allow for packet tagging (i.e., header
rewriting) or carrying failure information in the header.
More specifically, we consider FRR mechanisms leverag-
ing arc-disjoint arborescence network decompositions [7], [8]:
for each destination, a set of arborescences are defined which
are rooted at the destination and span the entire network
without two arborescences sharing an arc. As long as no
failure is encountered, a packet travels along an arbitrary
arborescence towards the root, being the destination. When
encountering a failure, a packet is rerouted onto the next

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 1, NO. 1, MAY 2020 2
arborescence according to some pre-defined order. The logic
of the latter is defined by the arborescence routing strategy.
Contribution.
This paper presents an algorithmic framework
for postprocessing state-of-the-art FRR mechanisms based on
network decompositions, to improve resilience, performance,
and flexibility, of fast rerouting. The framework relies on
an iterative swapping of arcs, hence changing the network
decompositions towards a certain objective. More specifically,
such swapping operations can be used to account for specific
failure scenarios (e.g., given by shared risk link groups), to
improve traffic engineering properties of failover paths (such
as load and stretch), or to flexibly adjust the failover routes
to the specific requirements or priorities of flows (and their
applications).
We show that we do not limit ourselves by focusing
on arc-disjoint arborescence network decompositions by
proving that arborescence-based decompositions are as good
as any deterministic local failover method. Furthermore, we
demonstrate the potential of our arc-swapping framework
in four different use cases: two related to routing (i.e.,
improving stretch and load), and two related to properties
of the decomposition (namely, depth and independence of
paths). We report on extensive simulations using synthetic
network topologies, which illustrate the benefits of our
approach. Moreover, we also provide a novel Integer Linear
Program (ILP) formulation, to directly create optimized
arborescences, instead of postprocessing them.
Organization.
The remainder of this paper is organized
as follows. Section 2 provides intuition on why focusing
on arborescences-based network decompositions is not a
limitation. Our postprocessing framework is described in
Section 3. We discuss and evaluate case studies in Section 4
and present our ILP in Section 5. After reviewing related
work in Section 6 we conclude in Section 7.
2 IMPOSSIBILITY OF BEATING ARBORESCENCES
We first motivate our focus on failover algorithms based
on arborescence network decompositions, showing that this
approach does not only provide a high resilience but also
competitive route qualities (in terms of lengths).
In general, while the static fast rerouting algorithms
considered in this paper have the advantage that they do not
require header rewriting nor control plane reconvergence,
the resulting failover routes may have a high additive stretch.
More formally, the (additive) stretch of a failover route from
v
to
t
is defined as the difference between the number of
hops taken
1)
along the failover route from
v
to
t
and the
hops
2)
along the shortest route from
v
to
t
. The additive
stretch of the routing scheme is then the maximum stretch
along all failover routes, i.e., from all v to t.
We will see that this is a feature inherent to all local fast
failover algorithms, though as the later evaluation sections
show, it is more of a rarely occurring worst-case scenario.
We start with some definitions for arborescence-based
re-routing. Let
(u, v)
denote a directed arc from node
u
to
v
.
A directed subgraph
T
is an
r
-rooted spanning arborescence
of
G
if (i)
r V (G)
, (ii)
V (T ) = V (G)
, (iii)
r
is the only
node without outgoing arcs and (iv), for each
v V \ {r}
,
there exists a single directed path from
v
to
r
. When it is
clear from the context, we use the term “arborescence” to
t
x u
w v
T
1
t
x u
w v
T
2
Fig. 1. Example network from [14] with two different
t
-rooted arc-disjoint
spanning arborescence decompositions,
T
1
left and
T
2
right. In both of
them one arborescence is drawn with dotted red arrows, while the second
arborescence is depicted with dashed blue arrows. Note that the mean
path length of the arborescences of
T
1
is 2.5, while it is less than 2 in
T
2
.
refer to a
t
-rooted spanning arborescence, where
t
is the
destination node. A set of arborescences
T = {T
1
, . . . T
k
}
is
arc-disjoint if no pair of arborescences in
T
shares common
arcs, i.e., if
(u, v) E(T
i
)
then
(u, v) / E(T
j
)
for all
i 6= j
. A set of
t
-rooted arc-disjoint spanning arborescences
is a valid arborescence-based decomposition. See Fig. 1
for two examples of such arborescence decompositions. In
arborescence-based routing, packets follow an arborescence
towards its root. In case of encountering failures on its path
to the root, the packet switches to another arborescence. Let
the blue dashed arborescence be failover route for
x
if its
direct link to
t
fails. In this case the additive stretch is 3 for
the decomposition
T
1
, while it is 1 for
T
2
. This illustrates that
the choice of the decomposition has an impact on the quality
of service in case of failures.
In the following, we show that the arborescence-based
routing scheme depicted in Fig. 2 may lead to a detour
of length
Ω(n)
, even though a constant-length detour is
available. In our example, the arborescences (depicted with
different colours and line patters in Fig. 2) to be used have
been constructed such that a certain set of failures leads to a
long detour for packets emitted by node 22, even though 22
is very close to the destination
t
. The only link out of node
22 belongs to an arborescence that takes this long detour if
no other links fail as the packet will stay on this arborescence
until it reaches t.
In general though, no failover algorithm can obtain a
better stretch than
Ω(n)
for three failures: an adversary could
fail the links
(22, t), (21, 11), (23, 13)
, in which case even
algorithms with global information would take a detour of
length Ω(n).
However, what happens when we strengthen the defini-
tion of additive stretch to a competitive [15] point of view?
Recall that we so far defined the stretch in comparison
to the shortest path in the network without failures. In a
competitive setting, we compare the stretch under some
failure set, to the shortest path in the network with failures.
To give some intuition, in the simple network setting of a
cycle, a local failover strategy is to switch between clockwise
and counterclockwise routing. This strategy induces an
additive stretch according to the size of the cycle, for
the nodes neighboring the destination, when their direct
connection to the destination fails. However, for those nodes,
there is no shorter path to the destination after such a failure,
and hence from a competitive point of view, their failover
route is optimal. We next consider short failover routes.
In the failure example of Fig. 2, an algorithm with global
information could simply take a tour of length
5
from node
22

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 1, NO. 1, MAY 2020 3
`3
`2
`1
C
`
13
t
11
C
1
23
22
21
C
2
33
32
31
C
3
. . .. . .
Fig. 2. Example of a
(4, `)
-clique-torus (see Definition 1), with
4 t
-rooted
arc-disjoint arborescences, in blue (dotted), red (dash-dotted), green
(dashed), and olive (loosely dashed). Three links (striked out) incident
to
22
have failed in this scenario, forcing a circular scheme to use the
olive (loosely dashed) arborescence at
22
, which takes a tour of length
at least ` 1, even though a short 5-hop alternative is available.
to
t
, as already pointed out above. Is it possible to find better
deterministic local failover algorithms that can outperform
arborescence-based routing in this example?
In this context, a deterministic algorithm makes all
decisions on which out-port is used by a packet entirely
depend on available information only, no randomness is
used, e.g., when switching to another arborescence. An
algorithm is local if its failover decisions do not take the state
of other routers into account, but only the locally available
information (inport, dst). In particular a router does not know
where in the network other failures have happened.
Succinctly stated, the answer is no—all deterministic local
fast failover routing schemes perform badly in such cases,
i.e., they do not outperform arborescence-based routing. To
this end, we will show that there are
k
-connected
k
-regular
graphs where every deterministic local algorithm has to take
large detours, even though short routes are available.
The intuition behind this statement lies in the fact that
even with the freedom of taking other decisions, there are
cases that lead to long detours and/or high load when
only local knowhow can be used (i.e., the router does not
know where else failures have happened). Thus the power
of algorithms that make deterministic decisions without
knowing anything about the state of other flows and routers
coincides with the power of arborescence based algorithms.
For our proof we define the following graph class: start
with a cycle of
`
nodes and replace each link with
k 1
parallel links. Observe that these graphs are
k
-connected and
k
-regular, but have parallel links between neighboring nodes.
In order to obtain a simple graph without parallel links, we
expand each node into a clique of
k 1
nodes, preserving
connectivity and regularity. For example for
k 1 = 3
, this
results in a 3 × ` torus graph, as in Fig. 2.
Definition 1.
Let
k, ` N
with
k 3
,
` 3
. A
(k, `)
-clique-
torus is a graph with
(k1)`
nodes and
(`(k1))+(`
(k1)(k2)
2
)
links, constructed as follows: create
`
cliques
C
j
,
1 j `
, of
k 1
nodes, i.e., so far every node has degree
k 2
. Denote the
k 1
nodes of each clique
C
j
as
v
j,1
, v
j,2
, . . . , v
j,k1
. Next and
last, for each
1 j `
and each
1 i k 1
, connect
v
j,i
with v
(j mod `)+1,i
.
We show that every deterministic local fast failover
algorithm sometimes has to take detours with a length in
the order of the diameter of the graph, even though a route
with a constant number of hops is available. We note that
our results here refer to deterministic algorithms.
Theorem 1.
For all
k 3
,
` 6
: for deterministic local fast
failover algorithms
ALG
resilient to
k 1
failures on
k
-connected
k
-regular graphs, matching on in-port (from which link the packet
arrives) and destination, the competitive additive stretch of
ALG
vs. a globally optimal algorithm is ` 6.
We utilize the following Lemma for the theorem proof:
Lemma 1.
Let
G = (V, E)
be a connected graph with a link
failure set
F E
. Let
U
be a
(V
1
, V
2
)
-node separator of
G
s.t.
1) F E(V
1
)
,
2)
all links in
F
are not of the type
(v
1
, v
2
), v
1
V
1
, v
2
V
2
,
3)
all nodes in
V
1
, which are adjacent to nodes in
V
2
(denoted as
N
V
2
(V
1
)
) have at most degree
|F | + 1
. Let
t V
2
s.t. there is no path to
t
in
E(V
1
) \ F
from any node in
N
V
2
(V
1
)
,
but in
E \ F
. Let
R
(V
1
,V
2
)
t,F
be the shortest path from any node in
N
V
2
(V
1
)
to
t
only using edges from
E \ F
and, over all nodes
v V
2
, v
0
V
1
, with
F
0
= E(v
0
)
, let
R
(V
1
,V
2
)
t,F
0
be the maximum
length, of all shortest paths from nodes
v N
V
1
(V
2
)
to
t
only
using edges from
E \ E(v
0
)
. Then, the competitive additive stretch
of any
|F |
-resilient local fast failover algorithm
A
, matching on
in-port and destination, on
G
is at least, for all eligible
t, V
1
, V
2
, F
:
max
t,V
1
,V
2
,F
R
(V
1
,V
2
)
t,F
R
(V
1
,V
2
)
t,F
0
1 . (1)
Proof: We start by considering fixed
t, V
1
, V
2
, F
fulfill-
ing the lemma requirements. Observe that
A
must provision
routes from all nodes in
N
V
2
(V
1
)
to
t
when the links
F
fail,
where the shortest of such routes has the length
R
(V
1
,V
2
)
t,F
.
Let
e
be the first link used by
R
(V
1
,V
2
)
t,F
, i.e.,
e
is from some
v
1
V
1
to some
v
2
V
2
. After traversing
e
, the route is
deterministically predefined, never encountering incident
links from
F
. We can furthermore enforce that
e
will be
traversed by
A
, by setting the failure set
F
0
as all links
incident to
v
1
except
e
, with
|F
0
| |F |
. Now, being at
node
v
2
, both failure sets
F, F
0
are indistinguishable to
A
,
i.e., the remaining route of
R
(V
1
,V
2
)
t,F
will be used. On the
other hand, the globally optimal route from
v
2
to
t
has at
most the the length
R
(V
1
,V
2
)
t,F
0
, with one additional hop from
v
1
. As such, we proved a competitive additive stretch of
(R
(V
1
,V
2
)
t,F
) (R
(V
1
,V
2
)
t,F
0
+ 1)
for fixed eligible
t, V
1
, V
2
, F
, from
which the lemma statement follows directly.
We can now prove Theorem 1 in a succinct fashion, using
Lemma 1 as follows for
(k, `)
-clique-torus graphs: a local
algorithm cannot distinguish the situation where
1)
all links
between two cliques failed, forcing a long detour, and
2)
being enforced to take a hop on the long detour, by a dense
cluster of failures which leaves a short detour intact.
Proof of Theorem 1: We pick
t
from clique
1
and set
F
as the
k 1
links between clique
1, 2
, with
V
1
being clique
2
and
V
2
being
V \ V
1
, where the picked
t, V
1
, V
2
, F
fulfill the
requirements of Theorem 1. The theorem statement follows
from
R
(V
1
,V
2
)
t,F
` 1
and that
R
(V
1
,V
2
)
t,F
0
+ 1 5
(
1
hop for
e
,
1 in C
3
, 2 to reach C
1
, at most 1 extra to reach t C
1
).
Combining the fact that no deterministic local algorithm
can have a better competitive additive stretch than
Ω(`)
with
the fact that a
(k, `)
-clique-torus graph has
(k 1)`
nodes,
i.e., ` Ω(n/(k 1)), yields the following:
Corollary 1.
For all
k 3
, deterministic local fast failover
algorithms resilient to
k 1
failures, matching destination and
in-port, have competitive additive stretch of Ω(n/(k 1)).

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 1, NO. 1, MAY 2020 4
Algorithm 1: Basic Arc-Swap Operation
Input: valid arborescence-based decomposition
Output: modified valid arborescence-based
decomposition
1 given a node v and two outgoing arcs e, e
0
2 if arborescence conditions hold for e, e
0
then
3 swap arborescences
3 THE POSTPROCESSING FRAMEWORK
This section presents our algorithmic framework to post-
process arborescence-based network decompositions for im-
proved resilience and performance. In the following, we first
present the general framework, before we discuss concrete
use cases. In particular, we do not make any assumptions on
the given arborescence-based network decompositions nor
the (re)routing strategy used, which can be arbitrary (specific
examples will be considered in our simulations).
The framework can be used to optimize a large set of
objectives. We consider two classes of objectives in this paper
and present two examples each. In the first class, we aim to
improve traffic-engineering metrics of failover routes (like
load or stretch) or account for flow or application priorities,
given certain assumptions about a traffic scenario and failure
model, without sacrificing maximum resilience. As a shorthand,
we will refer to this first class as the
traffic scenario
. In the
second class, the concrete routing mechanism is ignored and
properties of the decomposition, e.g., depth and independence,
are improved, which can lead to shorter paths and higher
resilience respectively.
We will refer to this second class as the
decomposition
.
In the remainder of this section, we introduce our framework
without concrete instantiations of objective functions, which
we will cover in the next section.
At the heart of our postprocessing framework lies an
arc-swapping algorithm which can come in different flavors,
depending on the use case. All the different variants of the
arc-swapping algorithm have in common that they always
preserve connectivity: if a source-destination pair features
a certain property that influences the objective, then this
property can only be improved in each arc-swap operation.
In particular, these swaps must maintain the arborescence
character of the decompositions, i.e., they cannot introduce
cycles.
The general principle is quite simple (see Algorithm 1):
we only swap the arborescences of two outgoing arcs of the
same node
v
and ensure that no cycles are generated. For
simplicity we refer to the set of arcs that do not belong to
any arborescence as the
0
-arborescence, even though they do
not form an arborescence. This allows us to treat all arcs
in a uniform manner and simplifies the description of our
algorithm.
More formally, we revisit the approach use to gener-
ate arborescences as in e.g. [9], where arcs are added to
arborescences incrementally until no further arcs can be
added. When this situation is reached, arcs belonging to
different arborescences and possibly the 0-arborescences are
swapped to allow the process to continue. The (incomplete)
arborescence set is denoted by
{T
1
, . . . , T
k
}
. When growing
an arborescence
T
i
, the following minimal conditions must
t
v
1
v
2
Before swapping After swapping
t
v
1
v
2
e
0
e
e
0
e
Fig. 3. Introductory example with three nodes where growing arbores-
cences sequentially can end in a deadlock. On the left side, the dotted
blue arborescence uses both arcs to
t
, leaving no possibility for the
remaining dashed red arborescence to route to the destination. However,
after swapping
(v
2
, t)
with
(v
2
, v
1
)
, the dashed red arborescence may
use the link
(v
2
, t)
on the right side (and subsequentially, the link
(v
1
, v
2
)
to complete the construction).
hold when swapping
e = (u, v) 0 arborescence
with
e
0
= (u, v
0
) E(T
j
)
to ensure that the resulting arbores-
cences are valid [14]:
1
(1) u has a neighbor v
0
V (T
i
)
(2) e = (u, v)
does not belong to any arborescence yet, i.e.,
e /
ρ=1..k
E(T
ρ
)
(3) u / V (T
i
)
(4) j, s.t. e
0
= (u, v
0
) E(T
j
)
(5) v V (T
j
)
(6) v
0
is not on the path to from v to the root in T
j
.
Let us consider an example. Let us assume that arbores-
cences have different colors. In Fig. 3, when we swap the
dotted blue arc
(v
1
, t)
to the unused arc
(v
1
, v
2
)
, the dashed
red arborescence may now take over
(v
1
, t)
, removing the
current deadlock situation. In general, when we cannot add
an arc to
T
i
in the normal round-robin fashion as explained
in [14] and discussed further in Section 4.4, we can check for
candidate arc pairs
e = (u, v), e
0
= (u, v
0
)
leaving node
u
if
we could perform a swapping operation. Analogously to the
above conditions, we can formulate the criteria for swapping
two arcs belonging to arborescences T
i
and T
j
.
In contrast to the swapping checks necessary when
constructing arborescences, we do not have to test whether
each node is incident to an arborescence in this case: this is
guaranteed already by the existing decomposition (condition
(1,4,5)). Thus, in contrast to the swapping conditions during
the arborescence decomposition, there are two cases to
consider.
Case
(i)
:
e = (u, v) E(T
i
), e
0
= (u, v
0
) E(T
j
)
. From
the above correctness conditions (1,4,5) are always satisfied,
while (2,3) are irrelevant. In addition to (6), it must hold that
v
is not on the path to from
v
0
to the root in
T
i
. If these
conditions are satisfied, then
e
can be added to
T
j
and
e
0
can
be added to
T
i
. An example is provided in Figure 4, which
improves the depth of both arborescences.
Case
(ii)
,
e = (u, v) E(T
i
)
and
e
0
= (u, v
0
)
does not
belong to any (real) arborescence,
e
0
/
ρ=1..k
E(T
ρ
)
. In this
case, (1) is always satisfied and (2-6) are irrelevant. Instead,
to be able to remove
e
from
T
i
and replace it with
e
0
it must
hold that
v
does not belong to the path from
v
0
to the root in
T
i
. An example is provided in Figure 5, which gives a better
depth for the dashed red arborescence.
If the conditions are met, then the arborescence set after
the swap is still valid. The time complexity of picking all
1
[9] and similar approaches use additional criteria which are immaterial
to this discussion

Citations
More filters
Journal ArticleDOI
TL;DR: This survey presents a systematic, tutorial-like overview of packet-based fast-recovery mechanisms in the data plane, focusing on concepts but structured around different networking technologies, from traditional link-layer and IP-based mechanisms, over BGP and MPLS to emerging software-defined networks and programmable data planes.
Abstract: In order to meet their stringent dependability requirements, most modern packet-switched communication networks support fast-recovery mechanisms in the data plane. While reactions to failures in the data plane can be significantly faster compared to control plane mechanisms, implementing fast recovery in the data plane is challenging, and has recently received much attention in the literature. This survey presents a systematic, tutorial-like overview of packet-based fast-recovery mechanisms in the data plane, focusing on concepts but structured around different networking technologies, from traditional link-layer and IP-based mechanisms, over BGP and MPLS to emerging software-defined networks and programmable data planes. We examine the evolution of fast-recovery standards and mechanisms over time, and identify and discuss the fundamental principles and algorithms underlying different mechanisms. We then present a taxonomy of the state of the art, summarize the main lessons learned, and propose a few concrete future directions.

42 citations

Posted Content
TL;DR: It is proved that it is impossible to achieve perfect resilience on any non-planar graph, and it is shown that graph families which are closed under the subdivision of links, can allow for simple and efficient failover algorithms which simply skip failed links.
Abstract: In order to provide a high resilience and to react quickly to link failures, modern computer networks support fully decentralized flow rerouting, also known as local fast failover. In a nutshell, the task of a local fast failover algorithm is to pre-define fast failover rules for each node using locally available information only. These rules determine for each incoming link from which a packet may arrive and the set of local link failures (i.e., the failed links incident to a node), on which outgoing link a packet should be forwarded. Ideally, such a local fast failover algorithm provides a perfect resilience deterministically: a packet emitted from any source can reach any target, as long as the underlying network remains connected. Feigenbaum et al. showed that it is not always possible to provide perfect resilience and showed how to tolerate a single failure in any network. Interestingly, not much more is known currently about the feasibility of perfect resilience. This paper revisits perfect resilience with local fast failover, both in a model where the source can and cannot be used for forwarding decisions. We first derive several fairly general impossibility results: By establishing a connection between graph minors and resilience, we prove that it is impossible to achieve perfect resilience on any non-planar graph; furthermore, while planarity is necessary, it is also not sufficient for perfect resilience. On the positive side, we show that graph families which are closed under the subdivision of links, can allow for simple and efficient failover algorithms which simply skip failed links. We demonstrate this technique by deriving perfect resilience for outerplanar graphs and related scenarios, as well as for scenarios where the source and target are topologically close after failures.

10 citations


Cites background from "Improved Fast Rerouting Using Postp..."

  • ...who introduced a powerful approach which decomposes the network into arc-disjoint arborescence covers [13–15], further investigated in [23–26] to reduce stretch and load....

    [...]

  • ..., [5, 10, 16, 25, 35, 36, 39, 43, 46]....

    [...]

Proceedings ArticleDOI
10 May 2021
TL;DR: In this article, the authors present several fast rerouting algorithms which are not limited by spanning trees, but rather extend and combine multiple spanning arborescences to improve resilience.
Abstract: To provide a high availability and to be able to quickly react to link failures, most communication networks feature fast rerouting (FRR) mechanisms in the data plane. However, configuring these mechanisms to provide a high resilience against multiple failures is algorithmically challenging, as rerouting rules can only depend on local failure information and need to be pre-defined. This paper is motivated by the observation that the common approach to design fast rerouting algorithms, based on spanning trees and covering arborescences, comes at a cost of reduced resilience as it does not fully exploit the available links in heterogeneous topologies. We present several novel fast rerouting algorithms which are not limited by spanning trees, but rather extend and combine ("graft") multiple spanning arborescences to improve resilience. We compare our algorithms analytically and empirically, and show that they can significantly improve not only the resilience, but also accelerate the preprocessing to generate the local fast failover rules.

8 citations

Posted ContentDOI
30 Jun 2020
TL;DR: This survey presents a systematic, tutorial-like overview of packet-based fast-recovery mechanisms in the data plane, focusing on concepts but structured around different networking technologies, from traditional link-layer and IP-based mechanisms, over BGP and MPLS to emerging software-defined networks and programmable data planes.
Abstract: In order to meet their stringent dependability requirements, most modern communication networks support fast-recovery mechanisms in the data plane. While reactions to failures in the data plane can be significantly faster compared to control plane mechanisms, implementing fast recovery in the data plane is challenging, and has recently received much attention in the literature. This survey presents a systematic, tutorial-like overview of packet-based fast-recovery mechanisms in the data plane, focusing on concepts but structured around different networking technologies, from traditional link-layer and IP-based mechanisms, over BGP and MPLS to emerging software-defined networks and programmable data planes. We examine the evolution of fast-recovery standards and mechanisms over time, and identify and discuss the fundamental principles and algorithms underlying different mechanisms. We then present a taxonomy of the state of the art and compile open research questions.

8 citations

Posted ContentDOI
TL;DR: In this article, the authors propose to leverage the network's path diversity to extend edge disjoint path mechanisms to tree routing, in order to improve the performance of fast rerouting.
Abstract: Today's communication networks have stringent availability requirements and hence need to rapidly restore connectivity after failures. Modern networks thus implement various forms of fast reroute mechanisms in the data plane, to bridge the gap to slow global control plane convergence. State-of-the-art fast reroute commonly relies on disjoint route structures, to offer multiple independent paths to the destination. We propose to leverage the network's path diversity to extend edge disjoint path mechanisms to tree routing, in order to improve the performance of fast rerouting. We present two such tree-mechanisms in detail and show that they boost resilience by up to 12% and 25% respectively on real-world, synthetic, and data center topologies, while still retaining good path length qualities.

5 citations

References
More filters
Proceedings ArticleDOI
07 Jun 2003
TL;DR: This paper presents the first formal performance analysis of link reversal algorithms in terms of work (number of node reversals) and the time needed until the network stabilizes to a state in which all the routes are reestablished.
Abstract: Link reversal algorithms provide a simple mechanism for routing in mobile ad hoc networks. These algorithms maintain routes to any particular destination in the network, even when the network topology changes frequently. In link reversal, a node reverses its incident links whenever it loses routes to the destination. Link reversal algorithms have been studied experimentally and have been used in practical routing algorithms, including [8].This paper presents the first formal performance analysis of link reversal algorithms. We study these algorithms in terms of work (number of node reversals) and the time needed until the network stabilizes to a state in which all the routes are reestablished. We focus on the full reversal algorithm and the partial reversal algorithm, both due to Gafni and Berstekas [5]; the first algorithm is simpler, while the latter has been found to be more efficient for typical cases. Our results are as follows:(1) The full reversal algorithm requires O(n2) work and time, where n is the number of nodes which have lost the routes to the destination.(2) The partial reversal algorithm requires O(n • a* + n2) work and time, where a* is a non-negative integer which depends on the state of the network. This bound is tight in the worst case, for any a*.(3) There are networks such that for every deterministic link reversal algorithm, there are initial states which require requires ω(n2) work and time to stabilize. Therefore, surprisingly, the full reversal algorithm is asymptotically optimal in the worst case, while the partial reversal algorithm is not, since a* can grow arbitrarily large.

57 citations

Proceedings ArticleDOI
17 Aug 2014
TL;DR: Unlike efforts to redesign the Internet from scratch, it is shown that ARROW can address a set of well-known Internet vulnerabilities, for most users, with the adoption of only a single transit ISP.
Abstract: A longstanding problem with the Internet is that it is vulnerable to outages, black holes, hijacking and denial of service. Although architectural solutions have been proposed to address many of these issues, they have had difficulty being adopted due to the need for widespread adoption before most users would see any benefit. This is especially relevant as the Internet is increasingly used for applications where correct and continuous operation is essential. In this paper, we study whether a simple, easy to implement model is sufficient for addressing the aforementioned Internet vulnerabilities. Our model, called ARROW (Advertised Reliable Routing Over Waypoints), is designed to allow users to configure reliable and secure end to end paths through participating providers. With ARROW, a highly reliable ISP offers tunneled transit through its network, along with packet transformation at the ingress, as a service to remote paying customers. Those customers can stitch together reliable end to end paths through a combination of participating and non-participating ISPs in order to improve the fault-tolerance, robustness, and security of mission critical transmissions. Unlike efforts to redesign the Internet from scratch, we show that ARROW can address a set of well-known Internet vulnerabilities, for most users, with the adoption of only a single transit ISP. To demonstrate ARROW, we have added it to a small-scale wide-area ISP we control. We evaluate its performance and failure recovery properties in both simulation and live settings.

53 citations


"Improved Fast Rerouting Using Postp..." refers background in this paper

  • ...Communication networks have become a critical infrastructure of our digital society: enterprises which outsource their IT infrastructure to the cloud, as well as many applications related to health monitoring, power grid management, or disaster response [1], depend on the uninterrupted availability of such networks....

    [...]

Journal ArticleDOI
TL;DR: This paper embarked upon a systematic algorithmic study of the resiliency of forwarding tables in a variety of models (i.e., deterministic/probabilistic routing, with packets-header-rewriting, with packet-duplication), and shows that resiliencies to four simultaneous link failures, with limited path stretch, can be achieved without any packet modification/duplications or randomization.
Abstract: Fast reroute and other forms of immediate failover have long been used to recover from certain classes of failures without invoking the network control plane. While the set of such techniques is growing, the level of resiliency to failures that this approach can provide is not adequately understood. In this paper, we embarked upon a systematic algorithmic study of the resiliency of forwarding tables in a variety of models (i.e., deterministic/probabilistic routing, with packet-header-rewriting, with packet-duplication). Our results show that the resiliency of a routing scheme depends on the “connectivity” $k$ of a network, i.e., the minimum number of link deletions that partition a network. We complement our theoretical result with extensive simulations. We show that resiliency to four simultaneous link failures, with limited path stretch, can be achieved without any packet modification/duplication or randomization. Furthermore, our routing schemes provide resiliency against $k-1$ failures, with limited path stretch, by storing $\log (k)$ bits in the packet header, with limited packet duplication, or with randomized forwarding technique.

53 citations

Proceedings ArticleDOI
21 Nov 2013
TL;DR: This paper introduces Plinko, a network architecture that uses a novel forwarding model and routing algorithm to build networks with forwarding paths that, assuming arbitrarily large forwarding tables, are provably resilient against t link failures, ∀t ∈ N.
Abstract: This paper introduces Plinko, a network architecture that uses a novel forwarding model and routing algorithm to build networks with forwarding paths that, assuming arbitrarily large forwarding tables, are provably resilient against t link failures, ∀t ∈ N. However, in practice, there are clearly limits on the size of forwarding tables. Nonetheless, when constrained to hardware comparable to modern top-of-rack (TOR) switches, Plinko scales with high resilience to networks with up to ten thousand hosts. Thus, as long as t or fewer links have failed, the only reason packets of any flow in a Plinko network will be dropped are congestion, packet corruption, and a partitioning of the network topology, and, even after t + 1 failures, most, if not all, flows may be unaffected. In addition, Plinko is topology independent, supports arbitrary paths for routing, provably bounds stretch, and does not require any additional computation during forwarding. To the best of our knowledge, Plinko is the first network to have all of these properties.

51 citations

Journal ArticleDOI
TL;DR: An IP fast reroute method that employs rooted arc-disjoint spanning trees to guarantee recovery from up to (k-1) link failures in a k-edge-connected network is developed.
Abstract: IP fast reroute methods are used to recover packets in the data plane upon link failures. Previous work provided methods that guarantee failure recovery from at most two-link failures. We develop an IP fast reroute method that employs rooted arc-disjoint spanning trees to guarantee recovery from up to $(k-1)$ link failures in a $k$ -edge-connected network. As arc-disjoint spanning trees may be constructed in sub-quadratic time in the size of the network, our approach offers excellent scalability. Through experimental results, we show that employing arc-disjoint spanning trees to recover from multiple failures reduces path stretch in comparison with previously known techniques.

50 citations

Frequently Asked Questions (14)
Q1. What have the authors contributed in "Improved fast rerouting using postprocessing" ?

This paper presents an algorithmic framework for improving a given FRR network decomposition, using postprocessing. Their simulations show that postprocessing is indeed beneficial in various scenarios, and can therefore enhance today ’ s approaches. 

The authors understand their work as the first step and believe that it opens several interesting avenues for future research. In particular, it will be interesting to study alternative postprocessing algorithms, and derive formal performance guarantees for them. It would also be interesting to study further use cases for their framework, beyond the ones given in this paper, e. g., for SRLGs combined with load and stretch. 

For low load some flows must take detours, so in general optimizing for low load leads to higher stretch, as the authors will see in their next experiments. 

The existence of a valid circular routing scheme based on k arc-disjoint spanning arborescences in a given network graph containing a known set of failed links can also be analyzed with the aid of Integer Linear Programming (ILP) tools. 

The first group of constraints (1: Arc in one tree) guarantees that each arc in the network graph belongs to at most one of k arc-disjoint spanning arborescences covering the graph. 

This paper was motivated by the computational challenges involved in computing network decompositions which do not only provide basic connectivity but also account for the quality of routes after failures. 

The authors in this paper are interested in static fast rerouting algorithms in the data plane, which rely on precomputed failover rules and do not require packet header rewriting. 

Even under a high number of failures (e.g. 40), the median of routing failures is 0 in both optimized and unoptimized arborescences, only the 10% worst unoptimized arborescences seem to raise to a low 5% failure rate. 

One can first observe (top) that this optimization has an impact on the routing failure rate: before optimizing, some packets do not reach their destination, but after swapping, the failure rate is 0. 

Figure 8 (right) presents the results of swapping edges with the objective of increasing the number of independent paths from all nodes in all arborescence pairs. 

the authors note that their algorithmic framework can also be generalized to swap multiple (i.e., more than two) arcs before an improvement of the objective function is required, even from multiple nodes at once. 

to be able to minimize the maximum path stretch among all user demands d in the network graph containing failed links (arcs belonging to the set F ), the authors first introduce additional virtual unit flows to find the shortest paths between the source nodes sd and the root node r, and then, the authors determine the maximum path stretch based on the difference in length between the actually used paths (circular routing) and the reference paths (the shortest paths avoiding the failed links). 

The problem is particularly challenging in scenarios where packet headers cannot be used to carry meta-information about encountered failures: such header rewriting is often undesired and introduces overhead (related to header rewriting itself, but also in terms of additional rules required at the routers to process such information). 

In particular, the authors are motivated by the observation that in practice, additional information about failure scenarios and failover objectives may be available, e.g., about shared risk link groups [11], [12], [13] or about critical flows for which it is important to be routed along short paths, even after failures.