A practical scheduling algorithm to achieve 100% throughput in input-queued switches
Summary (3 min read)
1 Introduction
- Traditionally, switches and routers have been most often designed as a collection of line-cards connected to a single shared bus.
- If the aggregate bandwidths of the bus and memory are high enough, the system is able to keep all of the outgoing links continuously busy, making the system highly efficient.
- Furthermore, the system is able to control packet departure times and hence provides guaranteed qualities-of-service (QoS) [3] [15] [20] [21] .
- Switch and router designers are finding that the continued growth in bandwidth is making it increasingly difficult to design a shared bus and centralized memory that run fast enough.
- The data rate of a shared bus is limited by electrical considerations, such as the loading on the bus, and reflections from connectors.
N N
- Increasingly, a passive shared bus is being replaced by an active non-blocking switch fabric -most often a crossbar switch.
- The very fastest switches and routers usually transfer packets across the switching fabric in fixed size units, that the authors shall refer to as "cells.".
- Increased overflows occur because a maximum size matching algorithm does not consider queue lengths when deciding which input queues to service.
- With LPF their goal is to combine the benefits of a maximum size matching algorithm, with those of a maximum weight algorithm, while lending itself to simple implementation in hardware.
- This enables LPF to take advantage of both the high instantaneous throughput of a maximum size matching algorithm, and the ability of a maximum weight matching algorithm to achieve high throughput, and a small number of overflows even when the arriving traffic is non-uniform.
LPF has a running-time complexity of
- Furthermore, the comparators that limit the performance of LQF are removed from the critical path of the LPF algorithm.
- In fact, the heart of the LPF algorithm uses a slightly modified maximum size matching algorithm, for which there are a variety of existing, heuristic approximations [1][9][10][17].
- In Section 3, the authors describe LPF and its properties before presenting their performance analysis.
2 Our Switch Model
- Figure 1 shows an input-queued switch consisting of input and output ports, a non-blocking switching fabric and a scheduler.
- The scheduler determines which inputs and outputs are connected during each slot.
3 The LPF Algorithm
- Together, the sum of the input and output occupancies represents the work load or congestion that a cell faces as it competes for transmission to its output.
- The authors call this sum the port occupancy; LPF favors queues with high port occupancy.
Property 1: The total weight of an LPF match is equal to the occupancy sum of all matched inputs and outputs, i.e,
- , where and are the set of matched inputs and matched outputs respectively.
- LPF finds a match that is both maximum size and maximum weight, also known as Theorem 1.
3.1 Finding an LPF Match Using a Maximum Size Matching Algorithm
- Existing maximum size matching algorithms cannot be used to implement LPF because they are unable to select the maximum size match with the largest weight.
- Then the authors use a modified Edmonds-Karp maximum size matching algorithm [2] [19] to find the LPF match .
- First, LPFS builds a tree with as its root.
- Initially every input and output is colored white -undiscovered, then is grayed when it is discovered, and finally is blackened when it is finished.
- From the tree, an augmenting path from to which must go through an unmatched input can be found by walking the predecessor list which begins at a selected unmatched input.
3.2 A Practical Approximation to LPF
- LPF can be adapted to run at higher speed using simple heuristic approximations.
- The second step consists of a double for-loop used to find a maximal size match.
- Since the requests have already been ordered in the first step, the maximal size matching in the second step does not need to compare request weights.
- The authors exploratory design work suggests that the second step can be implemented using simple hardware; for a switch, their synthesized design can make a scheduling decision in just 10ns using a commercial 0.25 CMOS ASIC technology.
- The first step, which requires simple integer arithmetic, can also run in 10ns, allowing the switch to run at a line rate of 20 Gb/s. 1.
Figure 6:
- First, the algorithm builds a sorted list of all inputs and outputs based on their occupancies.
- Then, starting from the largest output and input, the algorithm finds a maximal size match.
Iterative LPF algorithm
- Step 1. 1 Sort inputs&outputs based on their occupancies 2 Reorder requests according to their input and output occupancies Step 2.
- The authors define a switch to be stable for a particular arrival process if the expected length of the input queues does not grow without bound, i.e., .
- A switch can achieve 100% throughput if it is stable for all independent and admissible arrivals, also known as Definition 5.
- The LPF algorithm is stable for all admissible independent arrival processes, also known as Theorem 3.
3.4 Stability With a Finite Pipeline Delay
- Because the modified maximum size matching algorithm requires the input and outputs to be pre-ordered, LPF and iLPF need sorting networks to sort all inputs and outputs.
- Due to the relatively high complexity of the sorting networks, they could dominate the running time of the algorithm.
- This means that the maximum size matching algorithm is operating on weights that are now one slot out of date -it is possible for the algorithm to favor the 6 , inputs and outputs are pre-sorted by the two sorter networks.
- Raw requests (requests with weights removed) is given in a matrix form.
- The match needs to be permuted back to its natural order.
Raw Requests
- Because of the speed benefits of pipelining, the authors consider here its effect on throughput.
- A slot pipeline delay is equivalent to non-pipelined LPF but with slot old weights, .
- Hence, it finds the match that maximizes .
- Perhaps surprisingly, the authors can verify the following: Theorem 4: Using k slot old weights, the LPF algorithm is stable for all admissible independent arrival processes, .
4 Conclusion
- Input-queued non-blocking switches offer much higher aggregate bandwidth than systems based on shared buses and centralized shared memory.
- While VOQs make it theoretically possible for an input-queued switch to achieve high throughput, most existing scheduling algorithms yield low throughput or are too complex to run at high speed.
- The authors new scheduling algorithm, LPF, is both practical, and can achieve 100% throughput for all traffic with independent arrivals.
- Because LPF uses a maximum size matching algorithm, it leads to a fast, iterative, heuristic algorithm called iLPF that is simple to implement in hardware.
- Initial investigation suggests that iLPF can configure a switch in 10ns using today's ASIC technology.
Did you find this useful? Give us your feedback
Citations
1,277 citations
Cites methods from "A practical scheduling algorithm to..."
...For example, the algorithms described in [25] and [ 28 ] that achieve 100% throughput, use maximum weight bipartite matching algorithms [35], which have a running-time complexity of...
[...]
...When VOQ’s are used, it has been shown possible to increase the throughput of an input-queued switch from 58.6% to 100% for both uniform and nonuniform traffic [25], [ 28 ]....
[...]
328 citations
314 citations
Cites background from "A practical scheduling algorithm to..."
...1 This policy is shown to maintain average queue occupancy within a fixed upper bound and is robust to arbitrary changes in the input rates....
[...]
311 citations
Cites background or methods from "A practical scheduling algorithm to..."
...Maximum weight metrics are also considered in the switching and scheduling literature [95] [97] [88] [132] [81] [62], and recently for multi-access uplink communication in [149] [84] and for a single server downlink with heavy traffic in [124]....
[...]
...Such a technique has been recently used for establishing stability in an uplink with static channels in [149], [84], in a one-hop static network in [71], and in the switching literature [97] [95] [75] [88] [109]....
[...]
260 citations
References
21,651 citations
5,749 citations
3,967 citations
"A practical scheduling algorithm to..." refers background in this paper
...Furthermore, the system is able to control packet departure times and hence provides guaranteed qualities-of-service (QoS) [3][ 15 ][20][21]....
[...]
2,785 citations
2,639 citations