scispace - formally typeset
Open AccessProceedings ArticleDOI

Measuring control plane latency in SDN-enabled switches

TLDR
The authors' measurements show that control actions, such as rule installation, have surprisingly high latency, due to both software implementation inefficiencies and fundamental traits of switch hardware.
Abstract
Timely interaction between an SDN controller and switches is crucial to many SDN applications---e.g., fast rerouting during link failure and fine-grained traffic engineering in data centers. However, it is not well understood how the control plane in SDN switches impacts these applications. To this end, we conduct a comprehensive measurement study using four types of production SDN switches. Our measurements show that control actions, such as rule installation, have surprisingly high latency, due to both software implementation inefficiencies and fundamental traits of switch hardware.

read more

Content maybe subject to copyright    Report

Measuring Control Plane Latency in SDN-enabled Switches
Keqiang He
, Junaid Khalid
, Aaron Gember-Jacobson
, Sourav Das
, Chaithan Prakash
,
Aditya Akella
, Li Erran Li*, Marina Thottan*
University of Wisconsin-Madison, *Bell Labs
ABSTRACT
Timely interaction between an SDN controller and switches is cru-
cial to many SDN applications—e.g., fast rerouting during link fail-
ure and fine-grained traffic engineering in data centers. However, it
is not well understood how the control plane in SDN switches im-
pacts these applications. To this end, we conduct a comprehensive
measurement study using four types of production SDN switches.
Our measurements show that control actions, such as rule instal-
lation, have surprisingly high latency, due to both software imple-
mentation inefficiencies and fundamental traits of switch hardware.
Categories and Subject Descriptors
C.2.0 [Computer-Communication Network]: General; C.4 [Perfo-
rmance of Systems]: Metrics—performance measures
Keywords
Software-defined Networking (SDN); Latency; Measurement
1. INTRODUCTION
Software defined networking (SDN) advocates for the separation
of control and data planes in network devices, and provides a logi-
cally centralized platform to program data plane state [3, 14]. This
has opened the door to rich network control applications that can
adapt to changes in network topology or traffic patterns more flexi-
bly and more quickly than legacy control planes [2,6,7,9,10,13,16].
However, to optimally satisfy network objectives, many important
control applications require the ability to reprogram data plane state
at very fine time-scales. For instance, fine-grained data center traf-
fic engineering requires routes to be set up within a few hundred
milliseconds to leverage short-term traffic predictability [2]. Simi-
larly, setting up routes in cellular networks (when a device becomes
active, or during a handoff) must complete within 30-40ms to en-
sure users can interact with Web services in a timely fashion [10].
Timeliness is determined by: (1) the speed of control programs,
(2) the latency to/from the logically central controller, and (3) the
responsiveness of network switches in interacting with the controller—
specifically, in generating the necessary input messages for con-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from Permissions@acm.org.
SOSR 2015, June 17–18, 2015, Santa Clara, CA, USA.
Copyright 2015 ACM 978-1-4503-3451-8/15/06 ...$15.00
http://dx.doi.org/10.1145/2774993.2775069.
trol programs, and in modifying forwarding state as dictated by
the programs. Robust control software design and advances in dis-
tributed controllers [12] have helped overcome the first two issues.
However, with the focus in current/upcoming generations of SDN
switches being on the flexibility benefits of SDN w.r.t. legacy tech-
nology, the third issue has not gained much attention. Thus, it is
unknown whether SDN can provide sufficiently responsive control
to support the aforementioned applications.
To this end, we present a thorough systematic exploration of la-
tencies in four types of production SDN switches from three dif-
ferent vendors—Broadcom, Intel, and IBM—using a variety of
workloads. We investigate the relationship between switch design
and observed latencies using greybox probes and feedback from
vendors. Key highlights from our measurements are as follows:
(1) We find that inbound latency, i.e., the latency involved in the
switch generating events (e.g., when a flow is seen for the first
time) can be high—8 ms per packet on average on Intel. The
delay is particularly high whenever the switch is simultaneously
processing forwarding rules received from the controller. (2) We
find that outbound latency, i.e., the latency involved in the switch
installing/modifying/deleting forwarding rules provided by control
applications, is also high—3ms and 30ms per rule for insertion and
modification, respectively, in Broadcom. The latency crucially de-
pends on the priority patterns of both the rules being inserted and
those already in a switch’s table. (3) We find significant differ-
ences in latency trends across switches with different chipsets and
firmware, pointing to different internal optimizations.
These observations highlight two important gaps in current switch
designs. First, some of our findings show that poor switch soft-
ware design contributes significantly to observed latencies (affirm-
ing [5,8, 17]). We believe near term work will address these issues;
our measurements with an early release of Broadcom’s OpenFlow
1.3 firmware exemplify this. More crucially, our measurements re-
veal latencies that appear to be fundamentally rooted in hardware
design: e.g., rules must be organized in switch hardware tables in
priority order, and simultaneous switch control actions must con-
tend for limited bus bandwidth between a switch’s CPU and ASIC.
Unless the hardware significantly changes—and our first-of-a-kind
in-depth measurement study may engender such changes—we be-
lieve these latencies will manifest even in next generation switches.
2. BACKGROUND
Instead of running a complex control plane on each switch, SDN
delegates network control to external applications running on a log-
ically central controller. Applications determine the routes traffic
should take, and they instruct the controller to update switches with
the appropriate forwarding state. These decisions may be based
on data packets that are received by switches and sent to the con-

PHY PHY
Forwarding
Engine
Hardware
Tables
CPU
Memory
PCIe
ASIC
SDK
OF
Agent
lookup
packet_in/inbound
switch
1G/10G ports
DMA
controller
I1
I2
I3
O1
O2
O3
O4
flow_mod/outbound
Switch
fabric
CPU
board
Figure 1: Schematic of an OpenFlow switch. We also show the
factors contributing to inbound and outbound latency
troller. Such packet events and state update operations are enabled
by OpenFlow [14]—a standard API implemented by switches to fa-
cilitate communication with the controller. Although SDN moves
control plane logic from switches to a central controller, switches
must still perform several steps to generate packet events and up-
date forwarding state. We describe these steps below.
Packet Arrival. When a packet arrives, the switch ASIC first per-
forms a lookup in the switch’s hardware forwarding tables. If a
match is found, the packet is forwarded at line rate. Otherwise the
following steps occur (Figure 1): (I1) The ASIC sends the packet to
the switch’s CPU via the PCIe bus. (I2) An OS interrupt is raised, at
which point the ASIC SDK gets the packet and dispatches it to the
switch-side OpenFlow agent. (I3) The agent wakes up, processes
the packet, and sends to the controller a packet_in message con-
taining metadata and the first 128B of the packet. All three steps,
I1–I3, can impact the latency in generating a packet_in message.
We categorize this as inbound latency, since the controller receives
the message as input.
Forwarding Table Updates. The controller sends flow_mod mes-
sages to update a switch’s forwarding tables. A switch takes the
following steps to handle a flow_mod (Figure 1): (O1) The Open-
Flow agent running on the CPU parses the message. (O2) The agent
schedules the addition (or removal) of the forwarding rule in hard-
ware tables, typically TCAM. (O3) Depending on the nature of the
rule, the chip SDK may require existing rules in the tables to be
rearranged, e.g., to accommodate high priority rules. (O4) The rule
is inserted (or removed) in the hardware table. All four steps, O1–
O4, impact the total latency in executing a flow_mod action. We
categorize this as outbound latency, since the controller outputs a
flow_mod message.
3. LATENCY MEASUREMENTS
In this section, we systematically measure in/outbound latencies
to understand what factors contribute to high latencies. We gen-
erate a variety of workloads to isolate specific factors, and we use
production switches from three vendors, running switch software
with support for OpenFlow 1.0 [14] or, if available, OpenFlow 1.3,
to highlight the generality of our observations and to understand
how software evolution impacts latencies.
1
Henceforth, we refer
to the four hardware and software combinations (Table 1) as Intel,
BCM-1.0, BCM-1.3, and IBM. To ensure we are experimenting in
the optimal regimes for each switch, we take into account factors
such as flow table capacity and support for packet_in.
3.1 Measurement Methodology
Figure 2 shows our measurement setup. The host has one 1Gbps
and two 10Gbps interfaces connected to the switch under test. The
eth0 interface is connected to the control port of the switch, and
1
When using OpenFlow 1.3 firmware, we only leverage features
also available in OpenFlow 1.0 for an apples-to-apples comparison.
Model CPU RAM
OF
Ver.
Flow
Table Size
Ifaces
Intel
FM6000
2Ghz 2GB 1.0 4096
40x10G
+ 4x40G
Broadcom
956846K
1Ghz 1GB
1.0 896 14x10G
+ 4x40G1.3 1792 (ACL tbl)
IBM
G8264
? ? 1.0 750
48x10G
+ 4x40G
Table 1: Switch specifications
eth0
eth1
eth2
Control Channel
Flows IN
Flows OUT
OpenFlow
Switch
Figure 2: Measurement experiment setup
an SDN controller (POX for Intel, BCM-1.0, and IBM; RYU for
BCM-1.3) running on the host listens on this interface. The RTT
between switch and controller is negligible (0.1ms). We use the
controller to send a burst of OpenFlow flow_mod commands to the
switch. For Intel, BCM-1.0, and IBM, we install/modify/delete
rules in the single table supported by OpenFlow 1.0; for BCM-
1.3, we use the highest numbered table, which supports rules de-
fined over any L2, L3, or L4 header fields. The host’s eth1 and
eth2 interfaces are connected to data ports on the switch. We run
pktgen [15] in kernel space to generate traffic on eth1 at a rate of
600-1000Mbps using minimum Ethernet frame size.
Prior work notes that accurate execution times for OpenFlow
commands on commercial switches can only be observed in the
data plane [17]. Thus, we craft our experiments to ensure the la-
tency impact of various factors can be measured directly from the
data plane (at eth2 in Figure 2), with the exception of packet_in
generation latency. We run libpcap on our measurement host to ac-
curately timestamp the packet and rule processing events of each
flow. We first log the timestamps in memory, and when the exper-
imental run is complete, the results are dumped to disk and pro-
cessed. We use the timestamp of the first packet associated with
a particular flow as the finish time of the corresponding flow_mod
command; more details are provided later in this section.
3.2 Dissecting Inbound Latency
To measure inbound latency, we empty the table at the switch,
and we generate traffic such that packet_in events are generated at
a certain rate (i.e., we create packets for new flows at a fixed rate).
To isolate the impact of packet_in processing from other message
processing, we perform two kinds of experiments: (1) the packet_in
will trigger corresponding flow_mod (insert simple OpenFlow rules
differing just in destination IP) and packet_out messages; (2) the
packet_in message is dropped silently by the controller.
We record the timestamp (t
1
) when each packet is transmitted on
the measurement host’s eth1 interface (Figure 2). We also record
the timestamp (t
2
) when the host receives the corresponding packet_in
0
50
100
150
200
0 200 400 600 800 1000
inbound delay(ms)
flow #
(a) with flow_mod/pkt_out
0
20
40
60
80
100
0 200 400 600 800 1000
inbound delay(ms)
flow #
(b) w/o flow_mod/pkt_out
Figure 3: Inbound delay on Intel, flow arrival rate = 200/s

with flow mod/pkt out
flow rate 100/s 200/s
cpu usage 15.7% 26.5%
w/o flow mod/pkt out
flow rate 100/s 200/s
cpu usage 9.8% 14.4%
Table 2: CPU usage on Intel
message on eth0. The difference (t
2
t
1
) is the inbound latency.
2
Representative results for an Intel switch are shown in Figure 3;
IBM has similar performance (5ms latency per packet_in on aver-
age).
3
In the first experiment (Figure 3a), we see the inbound la-
tency is quite variable with a mean of 8.33ms, a median of 0.73ms,
and a standard deviation of 31.34ms. In the second experiment
(Figure 3b), the inbound delay is lower (mean of 1.72ms, median
of 0.67ms) and less variable (standard deviation of 6.09ms). We
also observe that inbound latency depends on the packet_in rate:
e.g. in the first experiment the mean is 3.32 ms for 100 flows/s (not
shown) vs. 8.33ms for 200 flows/s (Figure 3a).
The only difference between the two experiments is that in the
former case the switch CPU must process flow_mod and packet_out
messages, and send forwarding entries and outbound packets across
the PCIe bus to the ASIC, in addition to generating packet_in mes-
sages. As such, we observe that the CPU usage is higher when the
switch is handling concurrent OpenFlow operations and generat-
ing more packet_in messages (Table 2). However, since the Intel
switch features a powerful CPU (Table 1), plenty of CPU capacity
remains. Our conversations with the switch vendor suggest that the
limited bus bandwidth between the ASIC and switch CPU is the
primary factor contributing to inbound latency.
3.3 Dissecting Outbound Delay
We now study the outbound latencies for three different flow_mod
operations: insertion, modification, and deletion. For each opera-
tion, we examine the latency impact of key factors, including table
occupancy and rule priority.
Before measuring outbound latency, we install a single default
low priority rule which instructs the switch to drop all traffic. We
then install a set of non-overlapping OpenFlow rules that output
traffic on the port connected to the eth2 interface of our measure-
ment host. For some experiments, we systematically vary the rule
priorities.
3.3.1 Insertion Latency
We first examine how different rule workloads impact insertion
latency. We insert a burst of B rules: r
1
, · · · , r
B
. Let T (r
i
) be
the time we observe the first packet matching r
i
emerging from
the output port specified in the rule. We define per-rule insertion
latency as T (r
i
) T (r
i1
).
Rule Complexity. To understand the impact of rule complexity
(i.e., the number of header fields specified in a rule), we install
bursts of rules that specify either 2, 8, or 12 fields. In particular,
we specify destination IP and EtherType (others wilcarded) in the
2-field case; input port, EtherType, source and destination IPs, ToS,
protocol, and source and destination ports in the 8-field case; and
all supported header fields in the 12-field (exact match) case. We
use a burst size of 100 and all rules have the same priority.
We find that rule complexity does not impact insertion latency.
The mean per-rule insertion delay for 2-field, 8-field, and exact
match cases is 3.31ms, 3.44ms, and 3.26ms, respectively, for BCM-
1.0. Similarly, the mean per-rule insertion delay for Intel, IBM, and
BCM-1.3 is 1 ms irrespective of the number of fields. All exper-
iments that follow use rules with 2 fields.
2
Our technique differs from [8], where the delay was captured from
the switch to the controller, which includes controller overhead.
3
BCM-1.0 and BCM-1.3 do not support packet_in messages.
Table occupancy. To understand the impact of table occupancy,
we insert a burst of B rules into a switch that already has S rules
installed. All B + S rules have the same priority. We fix B and
vary S, ensuring B+S rules can be accommodated in each switch’s
hardware table.
We find that flow table occupancy does not impact insertion de-
lay if all rules have the same priority. Taking B = 400 as an exam-
ple, the mean per-rule insertion delay is 3.14ms, 1.09ms, 1.12ms,
and 1.11ms (standard deviation 2.14ms, 1.24ms, 1.53ms, and 0.18ms)
for BCM-1.0, BCM-1.3, IBM and Intel, respectively, regardless of
the value of S.
Rule priority. To understand the effect of rule priority on the in-
sertion operations, we conducted three different experiments each
covering different patterns of priorities. In each, we insert a burst
of B rules into an empty table (S = 0); we vary B. In the same
priority experiment, all rules have the same priority. In the increas-
ing and decreasing priority experiments, each rule has a different
priority and the rules are inserted in increasing/decreasing priority
order, respectively.
Representative results for same priority rules are shown in Fig-
ure 4a and 4b for B = 100 and B = 200, respectively; the switch
is BCM-1.0. For both burst sizes, the per-rule insertion delay is
similar, with medians of 3.12ms and 3.02ms, and standard devia-
tions of 1.70ms and 2.60ms for B = 100 and B = 200, respec-
tively. The same priority insertion delays on BCM-1.3, IBM, and
Intel are slightly lower, but still similar: mean per-rule insertion de-
lay is 1.09ms, 1.1ms, and 1.17ms, respectively, for B = 100. We
conclude that same priority rule insertion delay does not vary with
burst size.
In contrast, the per-rule insertion delay of increasing priority
rules increases linearly with the number of rules inserted for BCM-
1.0, BCM-1.3, and IBM. Figure 4c and 4d shows this effect for
B = 100 and B = 200, respectively, for BCM-1.0. Compared
with the same priority experiment, the average per-rule delay is
much larger: 9.47ms (17.66ms) vs. 3.12ms (3.02ms), for B = 100
(200). The results are similar for BCM-1.3 and IBM: the average
per-rule insertion delay is 7.75ms (16.81ms) and 10.14ms (18.63)
for B = 100 (200), respectively. We also observe the slope of the
latency increase is constant—for a given switch—regardless of B.
The increasing latency in BCM-1.0, BCM-1.3, and IBM stems
from the TCAM storing high priority rules at low (preferred) mem-
ory addresses. Each rule inserted in the increasing priority experi-
ments displaces all prior rules!
Surprisingly, latency does not increase when increasing prior-
ity rules are inserted in Intel. As shown in Figure 5a, the median
per-rule insertion delay for Intel is 1.18ms (standard deviation of
1.08ms), even with B = 800! Results for other values of B are
similar. This shows that the Intel TCAM architecture is fundamen-
tally different from Broadcom and IBM. Rules are ordered in Intel’s
TCAM such that higher priority rules do not displace existing low
priority rules.
However, displacement does still occur in Intel. Figure 5b shows
per-rule insertion latencies for for decreasing priority rules for B =
800. We see two effects: (1) the latencies alternate between two
modes at any given time, and (2) there is a step-function effect after
every 300 or so rules.
A likely explanation for the former is bus buffering. Since rule
insertion is part of the switch’s control path, it is not really opti-
mized for latency. The latter effect can be explained as follows: Ex-
amining the Intel switch architecture, we find that it has 24 slices,
A
1
. . . A
24
, and each slice holds 300 flow entries. There exists a
consumption order (low-priority first) across all slices. Slice A
i
stores the i
th
lowest priority rule group. If rules are inserted in de-

0
5
10
15
20
25
30
0 20 40 60 80 100
insertion delay(ms)
rule #
(a) burst size 100, same priority
0
5
10
15
20
25
30
0 50 100 150 200
insertion delay(ms)
rule #
(b) burst size 200, same priority
0
10
20
30
40
50
0 20 40 60 80 100
insertion delay(ms)
rule #
(c) burst size 100, incr. priority
0
10
20
30
40
50
0 50 100 150 200
insertion delay(ms)
rule #
(d) burst size 200, incr. priority
Figure 4: BCM-1.0 priority per-rule insert latency
1
2
3
4
5
6
7
8
0 100 200 300 400 500 600 700 800
insertion delay(ms)
rule #
(a) burst size 800, incr. priority
0
2
4
6
8
10
0 100 200 300 400 500 600 700 800
insertion delay(ms)
rule #
(b) burst size 800, decr. priority
Figure 5: Intel priority per-rule insert
creasing priority, A
1
is consumed first until it becomes full. When
the next low priority rule is inserted, this causes one rule to be dis-
placed from A
1
to A
2
. This happens for each of the next 300 rules,
after which cascaded displacements happen: A
1
A
2
A
3
, and
so on. We confirmed this with Intel.
We observe different trends when inserting decreasing priority
rules in BCM-1.0, BCM-1.3, and Intel. With BCM-1.0, we find the
average per-rule insertion delay increases with burst size: 8.19ms
for B = 100 vs. 15.5ms for B = 200. Furthermore, we observe
that the burst of B rules is divided into several groups, and each
group is reordered and inserted in the TCAM in order of increasing
priority. This indicates that BCM-1.0 firmware reorders the rules
and prefers increasing priority insertion. In contrast, BCM-1.3’s
per-rule insertion delay for decreasing priority rules is similar to
same priority rule insertion: 1ms. Hence, the BCM-1.3 firmware
has been better optimized to handle decreasing priority rule inser-
tions. The same applies to Intel: per-rule insertion delay for de-
creasing priority rules is similar to same priority rule insertion:
1.1ms.
Priority and table occupancy combined effects. We now study
the combined impact of rule priority and table occupancy. We con-
duct two experiments: For the first experiment, the table starts with
S high priority rules, and we insert B low priority rules. For the
second experiment, the priorities are inverted. For both experi-
ments, we measure the total time to install all rules in the burst,
T (r
B
) T (r
1
).
For BCM-1.0, BCM-1.3, and IBM, we expect that as long as
the same number of rules are displaced, the completion time for
different values of S should be the same. Indeed, from Figure 6a
(for BCM-1.0), we see that even with 400 high priority rules in the
table, the insertion delay for the first experiment is no different from
the setting with only 100 high priority rules in the table. In contrast,
in Figure 6b, newly inserted high priority rules will displace low
priority rules in the table, so when S = 400 the completion time
is about 3x higher than S = 100. For IBM (not shown), inserting
300 high priority rules into a table with 400 low priority rules takes
more than 20 seconds.
For Intel, the results are similar to same priority rule insertion.
This indicates that Intel uses different TCAM organization schemes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
0 100 200 300 400 500 600 700
avg completion time (ms)
burst size
table 100
table 400
(a) insert low priority rules into
a table with high priority rules
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0 100 200 300 400 500 600 700
avg completion time (ms)
burst size
table 100
table 400
(b) insert high priority rules into
a table with low priority rules
Figure 6: Overall completion time on BCM-1.0. Initial table oc-
cupancy is S high (low) priority rules; insert a burst of low (high)
priority rules. Averaged over 5 runs.
than the Broadcom and IBM switches.
Summary and root causes. We observe that: (1) rule complex-
ity does not affect insertion delay; (2) same priority insertions in
BCM-1.0, BCM-1.3, Intel and IBM are fast and not affected by
flow table occupancy; and (3) priority insertion patterns can affect
insertion delay very differently. For Intel, increasing priority in-
sertion is similar to same priority insertion, but decreasing priority
incurs much higher delay. For BCM-1.3 and IBM the behavior is
inverted: decreasing priority insertion is similar to same priority
insertion and increasing priority insertion incurs higher delay. For
BCM-1.0, insertions with different priority patterns are all much
higher than insertions with same priority.
Key root causes for observed latencies are: (1) how rules are
organized in the TCAM, and (2) the number of slices. Both of
these are intrinsically tied to switch hardware. Even in the best
case (Intel), per-rule insertion latency of 1ms is higher than what
native TCAM hardware can support (100M updates/s [1]). Thus,
in addition to the above two causes, there appears to be an intrinsic
switch software overhead contributing to all latencies.
3.3.2 Modification Latency
We now study modification operations. As before, we use bursts
of rules and a similar definition of latency.
Table occupancy. To study the impact of table occupancy, we pre-
insert S rules into a switch, all with the same priority. We then
modify one rule at a time by changing the rule’s output port, send-
ing modification requests back to back.
Per-rule modification delay for BCM-1.0 when S = 100 and
S = 200 are shown in Figure 7a and 7b, respectively. We see that
the per-rule delay is more than 30 ms for S = 100. When we dou-
ble the number of rules, S = 200, latency doubles as well. It grows
linearly with S (not shown). Note that this latency is much higher
than the corresponding insertion latency (3.12ms per rule) (§3.3.1).
IBM’s per-rule modification latency is also affected significantly
by the table occupancy— the per-rule modification latencies for
S = 100 and S = 200 are 18.77ms and 37.13ms, respectively.

0
20
40
60
80
100
120
140
0 20 40 60 80 100
modification delay (ms)
rule #
(a) 100 rules in table
0
20
40
60
80
100
120
140
0 50 100 150 200
modification delay(ms)
rule #
(b) 200 rules in table
Figure 7: BCM-1.0 per-rule mod. latency, same priority
0
20
40
60
80
100
0 20 40 60 80 100
modification delay(ms)
rule #
(a) burst size 100, incr. priority
0
20
40
60
80
100
0 20 40 60 80 100
modification delay(ms)
rule #
(b) burst size 100, decr. priority
Figure 8: BCM-1.0 priority per-rule modification latency
In contrast, Intel and BCM-1.3 have lower modification delay,
and it does not vary with table occupancy. For Intel (BCM-1.3)
the per-rule modification delay for both S = 100 and S = 200 is
around 1 ms (2ms) for all modified rules, similar to (2X more than)
same priority insertion delay.
Rule Priority. We conduct two experiments on each switch to
study the impact of rule priority. In each experiment, we insert
B rules into an empty table (S = 0). In the increasing priority
experiments, the rules in the table each have a unique priority, and
we send back-to-back modification requests for rules in increasing
priority order. We do the opposite in the decreasing priority exper-
iment. We vary B.
Figure 8a and 8b show the results for the increasing and decreas-
ing priority experiments, respectively, for B = 100 on BCM-1.0.
In both cases, we see: (1) the per-rule modification delay is similar
across the rules, with a median of 25.10ms and a standard devia-
tion of 6.74ms, and (2) the latencies are identical across the experi-
ments. We similarly observe that priority does not affect modifica-
tion delay in BCM-1.3, Intel and IBM (not shown).
Summary and root causes. We conclude that the per-rule modi-
fication latency on BCM-1.0 and IBM is impacted purely by table
occupancy, not by rule priority structure. For BCM-1.3 and Intel,
the per-rule modification delay is independent of rule priority, table
occupancy, and burst size; BCM-1.3’s per-rule modification delay
is 2X higher than insertion.
Conversations with Broadcom indicated that TCAM modifica-
tion should ideally be fast and independent of table size, so the
underlying cause appears to be less optimized switch software in
BCM-1.0. Indeed, our measurements with BCM-1.3 show that this
issue has (at least partly) been fixed.
3.3.3 Deletion Latency
We now study the latency of rule deletions. We again use bursts
of operations. T (r
i
) denotes the time we stop observing packets
matching rule r
i
from the intended port of the rule action. We
define deletion latency as T (r
i
) T (r
i1
).
Table Occupancy. We pre-insert S rules into a switch, all with the
same priority. We then delete one rule at a time, sending deletion
requests back-to-back. The results for BCM-1.0 at S = 100 and
0
10
20
30
40
50
60
0 20 40 60 80 100
deletion delay(ms)
rule #
(a) 100 rules in table
0
20
40
60
80
100
0 50 100 150 200
deletion delay(ms)
rule #
(b) 200 rules in table
Figure 9: BCM-1.0 per-rule del. latency, same priority
0
2
4
6
8
10
12
14
0 20 40 60 80 100
deletion delay(ms)
rule #
(a) 100 rules in table
0
2
4
6
8
10
12
14
0 50 100 150 200
deletion delay(ms)
rule #
(b) 200 rules in table
Figure 10: Intel per-rule del. latency, same priority
0
10
20
30
40
50
60
0 20 40 60 80 100
deletion delay(ms)
rule #
(a) increasing priority
0
10
20
30
40
50
0 20 40 60 80 100
deletion delay(ms)
rule #
(b) decreasing priority
Figure 11: BCM-1.0 priority per-rule del. latency, B=100
0
2
4
6
8
10
12
14
0 20 40 60 80 100
deletion delay(ms)
rule #
(a) increasing priority
0
2
4
6
8
10
12
14
0 20 40 60 80 100
deletion delay(ms)
rule #
(b) decreasing priority
Figure 12: Intel priority per-rule del. latency, B=100
S = 200 are shown in Figure 9a and 9b, respectively. We see that
per rule deletion delay decreases as the table occupancy drops. We
see a similar trend for Intel (Figure 10a and 10b) BCM-1.3 and
IBM (not shown).
Rule Priorities. We start with B existing rules in the switch, and
delete one rule at a time in increasing and decreasing priority order.
For all switches (BCM-1.0 and Intel shown in Figure 11 and 12,
respectively) deletion is not affected by the priorities of rules in the
table or the order of deletion.
Root cause. Since deletion delay decreases with rule number in
all cases, we conclude that deletion is incurring TCAM reordering.
We also observe that processing rule timeouts at the switch does
not noticeably impact flow_mod operations. Given these two ob-
servations, we recommend allowing rules to time out rather than
explicitly deleting them, if possible.
3.4 Implications

Citations
More filters
Journal ArticleDOI

Software-defined networking (SDN): a survey

TL;DR: This paper aims to shed light on SDN related issues and give insight into the challenges facing the future of this revolutionary network model, from both protocol and architecture perspectives, and present different existing solutions and mitigation techniques that address SDN scalability, elasticity, dependability, reliability, high availability, resiliency, security, and performance concerns.
Journal ArticleDOI

Detour: Dynamic Task Offloading in Software-Defined Fog for IoT Applications

TL;DR: The greedy solution takes into account delay, energy consumption, multi-hop paths, and dynamic network conditions, such as link utilization and SDN rule-capacity, and is capable of reducing the average delay and energy consumption compared with the state of the art.
Journal ArticleDOI

Incremental Flow Scheduling and Routing in Time-Sensitive Software-Defined Networks

TL;DR: These algorithms exploit the global view of the control plane on the data plane to schedule and route time-triggered flows needed for the dynamic applications in the Industrial Internet of Things (Industry 4.0).
Journal ArticleDOI

Survey of Consistent Software-Defined Network Updates

TL;DR: This paper identifies the different desirable consistency properties that should be provided throughout a network update, the algorithmic techniques which are needed to meet these consistency properties, and the implications on the speed and costs at which updates can be performed.
Journal ArticleDOI

STAR: Preventing flow-table overflow in software-defined networks

TL;DR: SofTware-defined Adaptive Routing is proposed, an online routing scheme that efficiently utilizes limited flow-table resources to maximize network performance and outperforms existing schemes by decreasing the controller’s workload for routing new flows.
References
More filters
Journal ArticleDOI

OpenFlow: enabling innovation in campus networks

TL;DR: This whitepaper proposes OpenFlow: a way for researchers to run experimental protocols in the networks they use every day, based on an Ethernet switch, with an internal flow-table, and a standardized interface to add and remove flow entries.
Proceedings ArticleDOI

B4: experience with a globally-deployed software defined wan

TL;DR: This work presents the design, implementation, and evaluation of B4, a private WAN connecting Google's data centers across the planet, using OpenFlow to control relatively simple switches built from merchant silicon.
Proceedings ArticleDOI

Onix: a distributed control platform for large-scale production networks

TL;DR: Onix provides a general API for control plane implementations, while allowing them to make their own trade-offs among consistency, durability, and scalability.
Proceedings ArticleDOI

DevoFlow: scaling flow management for high-performance networks

TL;DR: DevoFlow is designed and evaluated, a modification of the OpenFlow model which gently breaks the coupling between control and global visibility, in a way that maintains a useful amount of visibility without imposing unnecessary costs.
Proceedings ArticleDOI

Achieving high utilization with software-driven WAN

TL;DR: A novel technique is developed that leverages a small amount of scratch capacity on links to apply updates in a provably congestion-free manner, without making any assumptions about the order and timing of updates at individual switches.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What are the contributions in "Measuring control plane latency in sdn-enabled switches" ?

To this end, the authors conduct a comprehensive measurement study using four types of production SDN switches. 

Their conversations with the switch vendor suggest that the limited bus bandwidth between the ASIC and switch CPU is the primary factor contributing to inbound latency. 

The authors find that the underlying causes are linked to software inefficiencies, as well as pathological interactions between switch hardware properties (shared resources and how forwarding rules are organized) and the control operation workload (the order of operations, and concurrent switch activities). 

Conversations with Broadcom indicated that TCAM modification should ideally be fast and independent of table size, so the underlying cause appears to be less optimized switch software in BCM-1.0. 

(2) The authors find that outbound latency, i.e., the latency involved in the switch installing/modifying/deleting forwarding rules provided by control applications, is also high—3ms and 30ms per rule for insertion and modification, respectively, in Broadcom. 

In [8], the authors also studied 3 commercial switches (HP Procurve, Fulcrum, Quanta) and found that delay distributions were distinct, mainly due to variable control delays. 

the authors observe that the burst of B rules is divided into several groups, and each group is reordered and inserted in the TCAM in order of increasing priority. 

The authors conclude that the per-rule modification latency on BCM-1.0 and IBM is impacted purely by table occupancy, not by rule priority structure. 

The authors see two effects: (1) the latencies alternate between two modes at any given time, and (2) there is a step-function effect after every 300 or so rules. 

Dionysus [11] optimally schedules a set of rule updates while maintaining desirable consistency properties (e.g., no loops and no blackholes). 

given that software will continue to bridge control and data planes in SDN switches, the authors remain skeptical whether latencies will ever reach what hardware can natively support. 

Even in the best case (Intel), per-rule insertion latency of 1ms is higher than what native TCAM hardware can support (100M updates/s [1]). 

For Intel, BCM-1.0, and IBM, the authors install/modify/delete rules in the single table supported by OpenFlow 1.0; for BCM1.3, the authors use the highest numbered table, which supports rules defined over any L2, L3, or L4 header fields. 

In contrast, in Figure 6b, newly inserted high priority rules will displace low priority rules in the table, so when S = 400 the completion time is about 3x higher than S = 100.