A practical scheduling algorithm to achieve 100% throughput in input-queued switches

doi:10.1109/INFCOM.1998.665102

Proceedings Article•DOI•

A practical scheduling algorithm to achieve 100% throughput in input-queued switches

Adisak Mekkittikul¹, Nick McKeown•Institutions (1)

29 Mar 1998-Vol. 2, pp 792-799

TL;DR: This work introduces a new algorithm called longest port first (LPF), which is designed to overcome the complexity problems of LQF, and can be implemented in hardware at high speed.

read less

Abstract: Input queueing is becoming increasingly used for high-bandwidth switches and routers. In previous work, it was proved that it is possible to achieve 100% throughput for input-queued switches using a combination of virtual output queueing and a scheduling algorithm called LQF However, this is only a theoretical result: LQF is too complex to implement in hardware. We introduce a new algorithm called longest port first (LPF), which is designed to overcome the complexity problems of LQF, and can be implemented in hardware at high speed. By giving preferential service based on queue lengths, we prove that LPF can achieve 100% throughput.

...read moreread less

Summary (3 min read)

Jump to: [1 Introduction] – [N N] – [LPF has a running-time complexity of] – [2 Our Switch Model] – [3 The LPF Algorithm] – [Property 1: The total weight of an LPF match is equal to the occupancy sum of all matched inputs and outputs, i.e,] – [3.1 Finding an LPF Match Using a Maximum Size Matching Algorithm] – [3.2 A Practical Approximation to LPF] – [Figure 6:] – [Iterative LPF algorithm] – [3.4 Stability With a Finite Pipeline Delay] – [Raw Requests] and [4 Conclusion]

1 Introduction

Traditionally, switches and routers have been most often designed as a collection of line-cards connected to a single shared bus.
If the aggregate bandwidths of the bus and memory are high enough, the system is able to keep all of the outgoing links continuously busy, making the system highly efficient.
Furthermore, the system is able to control packet departure times and hence provides guaranteed qualities-of-service (QoS) [3] [15] [20] [21] .
Switch and router designers are finding that the continued growth in bandwidth is making it increasingly difficult to design a shared bus and centralized memory that run fast enough.
The data rate of a shared bus is limited by electrical considerations, such as the loading on the bus, and reflections from connectors.

N N

Increasingly, a passive shared bus is being replaced by an active non-blocking switch fabric -most often a crossbar switch.
The very fastest switches and routers usually transfer packets across the switching fabric in fixed size units, that the authors shall refer to as "cells.".
Increased overflows occur because a maximum size matching algorithm does not consider queue lengths when deciding which input queues to service.
With LPF their goal is to combine the benefits of a maximum size matching algorithm, with those of a maximum weight algorithm, while lending itself to simple implementation in hardware.
This enables LPF to take advantage of both the high instantaneous throughput of a maximum size matching algorithm, and the ability of a maximum weight matching algorithm to achieve high throughput, and a small number of overflows even when the arriving traffic is non-uniform.

LPF has a running-time complexity of

Furthermore, the comparators that limit the performance of LQF are removed from the critical path of the LPF algorithm.
In fact, the heart of the LPF algorithm uses a slightly modified maximum size matching algorithm, for which there are a variety of existing, heuristic approximations [1][9][10][17].
In Section 3, the authors describe LPF and its properties before presenting their performance analysis.

2 Our Switch Model

Figure 1 shows an input-queued switch consisting of input and output ports, a non-blocking switching fabric and a scheduler.
The scheduler determines which inputs and outputs are connected during each slot.

3 The LPF Algorithm

Together, the sum of the input and output occupancies represents the work load or congestion that a cell faces as it competes for transmission to its output.
The authors call this sum the port occupancy; LPF favors queues with high port occupancy.

Property 1: The total weight of an LPF match is equal to the occupancy sum of all matched inputs and outputs, i.e,

, where and are the set of matched inputs and matched outputs respectively.
LPF finds a match that is both maximum size and maximum weight, also known as Theorem 1.

3.1 Finding an LPF Match Using a Maximum Size Matching Algorithm

Existing maximum size matching algorithms cannot be used to implement LPF because they are unable to select the maximum size match with the largest weight.
Then the authors use a modified Edmonds-Karp maximum size matching algorithm [2] [19] to find the LPF match .
First, LPFS builds a tree with as its root.
Initially every input and output is colored white -undiscovered, then is grayed when it is discovered, and finally is blackened when it is finished.
From the tree, an augmenting path from to which must go through an unmatched input can be found by walking the predecessor list which begins at a selected unmatched input.

3.2 A Practical Approximation to LPF

LPF can be adapted to run at higher speed using simple heuristic approximations.
The second step consists of a double for-loop used to find a maximal size match.
Since the requests have already been ordered in the first step, the maximal size matching in the second step does not need to compare request weights.
The authors exploratory design work suggests that the second step can be implemented using simple hardware; for a switch, their synthesized design can make a scheduling decision in just 10ns using a commercial 0.25 CMOS ASIC technology.
The first step, which requires simple integer arithmetic, can also run in 10ns, allowing the switch to run at a line rate of 20 Gb/s. 1.

Figure 6:

First, the algorithm builds a sorted list of all inputs and outputs based on their occupancies.
Then, starting from the largest output and input, the algorithm finds a maximal size match.

Iterative LPF algorithm

Step 1. 1 Sort inputs&outputs based on their occupancies 2 Reorder requests according to their input and output occupancies Step 2.
The authors define a switch to be stable for a particular arrival process if the expected length of the input queues does not grow without bound, i.e., .
A switch can achieve 100% throughput if it is stable for all independent and admissible arrivals, also known as Definition 5.
The LPF algorithm is stable for all admissible independent arrival processes, also known as Theorem 3.

3.4 Stability With a Finite Pipeline Delay

Because the modified maximum size matching algorithm requires the input and outputs to be pre-ordered, LPF and iLPF need sorting networks to sort all inputs and outputs.
Due to the relatively high complexity of the sorting networks, they could dominate the running time of the algorithm.
This means that the maximum size matching algorithm is operating on weights that are now one slot out of date -it is possible for the algorithm to favor the 6 , inputs and outputs are pre-sorted by the two sorter networks.
Raw requests (requests with weights removed) is given in a matrix form.
The match needs to be permuted back to its natural order.

Raw Requests

Because of the speed benefits of pipelining, the authors consider here its effect on throughput.
A slot pipeline delay is equivalent to non-pipelined LPF but with slot old weights, .
Hence, it finds the match that maximizes .
Perhaps surprisingly, the authors can verify the following: Theorem 4: Using k slot old weights, the LPF algorithm is stable for all admissible independent arrival processes, .

4 Conclusion

Input-queued non-blocking switches offer much higher aggregate bandwidth than systems based on shared buses and centralized shared memory.
While VOQs make it theoretically possible for an input-queued switch to achieve high throughput, most existing scheduling algorithms yield low throughput or are too complex to run at high speed.
The authors new scheduling algorithm, LPF, is both practical, and can achieve 100% throughput for all traffic with independent arrivals.
Because LPF uses a maximum size matching algorithm, it leads to a fast, iterative, heuristic algorithm called iLPF that is simple to implement in hardware.
Initial investigation suggests that iLPF can configure a switch in 10ns using today's ASIC technology.

Did you find this useful? Give us your feedback

Figures (6)

Figure 1: A Simple Model of VOQ Switches.

Figure 3: Transformation of a request graph into a flow network. (a) A weighted request graph. (b) The corresponding flow network, G, whose all edges are of unity capacity. A source and a target are added. The cost of every edge from and to is set to zero. The cost of all other edges are equal to the negated value of the corresponding weight. s

Figure 6: An iterative LPF algorithm. First, the algorithm builds a sorted list of all inputs and outputs based on their occupancies. Then, starting from the largest output and input, the algorithm finds a maximal size match.

Figure 7: A block diagram ofiLPF. Referring to the algorithm in Figure 6, inputs and outputs are pre-sorted by the two sorter networks. Raw requests (requests with weights removed) is given in a matrix form. Request reordering is done by the two crossbars which are configured by the sorting results. The maximal size matching block, which implements the double for-loop, finds a maximal size match that approximates an LPF match. The match needs to be permuted back to its natural order.

Figure 4: Modified Edmonds-Karp algorithm [2]. is a flow network or graph constructed as described in Figure 3. is the set of all edges in ; or is a vertex in representing an input or output; is an edge from to ; is the total flow through the network; denotes a flow from to .

Figure 5: A largest-unmatched-port first search (LPFS). First, LPFS builds a tree with as its root. Initially every input and output is colored white — undiscovered, then is grayed when it is discovered, and finally is blackened when it is finished. is the predecessor of . From the tree, an augmenting path from to which must go through an unmatched input can be found by walking the predecessor list which begins at a selected unmatched input.

Content maybe subject to copyright Report

Abstract

Input queueing is becoming increasingly used for high-

bandwidth switches and routers. In previous work, it was

proved that it is possible to achieve 100% throughput for

input-queued switches using a combination of virtual out-

put queueing and a scheduling algorithm called LQF.

However, this is only a theoretical result: LQF is too com-

plex to implement in hardware. In this paper we introduce

a new algorithm called Longest Port First (LPF), which is

designed to overcome the complexity problems of LQF,

and can be implemented in hardware at high speed. By

giving preferential service based on queue lengths, we

prove that LPF can achieve 100% throughput.

1 Introduction

Traditionally, switches and routers have been most

often designed as a collection of line-cards connected to a

single shared bus. Packets waiting to be transmitted on

outgoing links are stored in a centralized, shared pool of

memory. If the aggregate bandwidths of the bus and mem-

ory are high enough, the system is able to keep all of the

outgoing links continuously busy, making the system

highly efﬁcient. Furthermore, the system is able to control

packet departure times and hence provides guaranteed

qualities-of-service (QoS) [3][15][20][21]. However,

switch and router designers are finding that the continued

growth in bandwidth is making it increasingly difficult to

design a shared bus and centralized memory that run fast

enough. The data rate of a shared bus is limited by electri-

cal considerations, such as the loading on the bus, and

reﬂections from connectors. And the data rate of a central-

ized shared memory is limited because it requires buffer

memories that run times faster than the line rate, where

is the number of switch ports.

Increasingly, a passive shared bus is being replaced by

an active non-blocking switch fabric — most often a

crossbar switch. Each line card is connected by a dedi-

cated point-to-point link to the central switch fabric, and

therefore has fewer electrical limitations due to loading

and reflections. More importantly, each connection to the

switch need run only as fast as the line rate, rather than at

the aggregate bandwidth of the switch. Centralized shared

memory is also being replaced—by separate queues at

each input of the switching fabric. Input queues need only

run at the line rate, and therefore allow a faster overall sys-

tem to be built [6][11].

The very fastest switches and routers usually transfer

packets across the switching fabric in ﬁxed size units, that

we shall refer to as “cells.” Variable length packets are

segmented into cells upon arrival, transferred across the

switch fabric and then reassembled again before they

depart. At the beginning of each cell time, a (usually cen-

tralized) scheduler selects a configuration for the switch-

ing fabric and then transfers cells from inputs to outputs.

Using fixed sized cells simplifies the switch design, and

makes it easier for the scheduler to configure the switch

fabric for high throughput.

But systems that use input queues have two potential

problems: low throughput due to head-of-line (HOL)

blocking and the difﬁculty of controlling cell delay. In this

paper, we focus on the first problem: achieving high

throughput.

It is well known that if an input-queued switch

employs a single FIFO queue at each input, HOL blocking

limits the throughput to just 58.6% of the maximum [7].

But HOL blocking can be eliminated entirely using a

queueing technique known as virtual output queueing

(VOQ) in which each input maintains a separate queue for

each output [1][10][12][13][17]. It has been shown that

with a suitable centralized scheduling algorithm, the

throughput can be increased from 58.6% to 100% [12].

A Practical Scheduling Algorithm to Achieve

100% Throughput in Input-Queued Switches.

Adisak Mekkittikul Nick McKeown

Computer Systems Laboratory

Stanford University, Stanford, CA 94305-9030

{adisak, nickm}@stanford.edu

This work was funded by a fellowship from National Semicon-

ductor and also by Texas Instruments, Cisco Systems, the Alfred

P. Sloan Foundation and a Robert N. Noyce faculty fellowship.

Unfortunately, the algorithms known to-date (LQF [12]

and OCF [13]) are too complex to implement in hardware,

and are therefore unsuitable for switches operating at high

speed. Instead, most switches and routers use a much sim-

pler scheduling algorithm to configure the switch fabric

[1][10][18]. Typically, a configuration is selected in an

attempt to maximize the number of connections made dur-

ing each cell time. Such an algorithm is called a maximum

size bipartite matching algorithm, and is found to perform

well when the arriving trafﬁc is uniformly distributed over

all the switch outputs.

But real traffic is not uniform: traffic tends to be

focused on a relatively small number of active ports. And

unfortunately, a maximum size matching algorithm is

known to perform poorly when trafﬁc is non-uniform [12].

The algorithm performs poorly in two (albeit related)

ways: increased buffer overﬂows, and reduced throughput.

Increased overflows occur because a maximum size

matching algorithm does not consider queue lengths when

deciding which input queues to service. When traffic is

non-uniform, the occupancies of the various input queues

can differ greatly, and queues with heavy traffic can over-

ﬂow while ones with light trafﬁc remain empty most of the

time. The reason for reduced throughput is a little more

complex. For a given number of cells in the system, if the

trafﬁc is non-uniform, the cells are concentrated on a rela-

tively small number of VOQs. This reduces the number of

configurations available to the scheduler, and therefore

reduces the size of the maximum size match. If instead the

traffic was uniform, the cells in the system would be dis-

tributed uniformly over a relatively large number of

VOQs, making available a larger number of conﬁgurations

for the scheduler to choose from.

In earlier work [12][13], it was found that LQF (long-

est queue first) can achieve 100% for both uniform and

non-uniform traffic by considering the occupancies of the

queues. LQF gives preferential service to long queues by

using a maximum weight matching algorithm, where each

weight is set to the corresponding queue length. But LQF

is very difficult to implement in hardware at high speed.

First of all, it takes too long to run—the most efficient

algorithm known to-date has a running-time complexity

. Second, an implementation requires a large

number of multi-bit comparators to perform many weight

comparisons in parallel. Attempts to implement LQF (and

even heuristic approximations [10]) have been limited by

the design of a single-chip scheduler that: (i) has fast

enough comparators, (ii) can support a sufficient number

of comparators, and (iii) can interconnect them in a rich

enough pattern.

Nlog()

Motivated by the desire to overcome the impracticali-

ties of LQF, yet achieve its high performance, we propose

a new algorithm: LPF (longest port first). With LPF our

goal is to combine the benefits of a maximum size match-

ing algorithm, with those of a maximum weight algorithm,

while lending itself to simple implementation in hardware.

LPF effectively finds the set of maximum size matches,

and from among this set chooses the match with the largest

total weight. In LPF each weight is a function of queue

lengths (we shall see later that the weights in LPF are not

exactly equal to the queue lengths, but are similar). This

enables LPF to take advantage of both the high instanta-

neous throughput of a maximum size matching algorithm,

and the ability of a maximum weight matching algorithm

to achieve high throughput, and a small number of over-

flows even when the arriving traffic is non-uniform. We

find that LPF—like LQF—can achieve 100% throughput

for both uniform and non-uniform trafﬁc.

LPF has a running-time complexity of ; lower

than LQF. Furthermore, the comparators that limit the per-

formance of LQF are removed from the critical path of the

LPF algorithm. In fact, the heart of the LPF algorithm uses

a slightly modiﬁed maximum size matching algorithm, for

which there are a variety of existing, heuristic approxima-

tions [1][9][10][17].

The paper is organized as follows. In Section 2, we

provide some definitions. In Section 3, we describe LPF

and its properties before presenting our performance anal-

ysis.

2 Our Switch Model

We follow the general deﬁnitions used in [12]. Figure

1 shows an input-queued switch consisting of

input and output ports, a non-blocking switching fabric

and a scheduler. To eliminate head-of-line (HOL) block-

ing, each input maintains FIFO virtual output queues,

one for each output. denotes the VOQ at input con-

taining cells destined to output . Arrivals are fixed size

packets or cells, allowing us to split time into discrete cell

times, or slots. During any given slot, there is at most one

arrival to and departure from each input, and similarly for

each output. is the arrival process of cells to input

destined to output at rate . Consequently, is

the aggregate process of all arrivals to input at rate

2.5

()

MN× M

ij,

n()

ij,

n()

ij,

j 1=

∑

Definition 1: An arrival process is said to be admissible

when no input or output is oversubscribed, i.e., when

Deﬁnition 2: The trafﬁc is uniform if all arrival processes

have the same arrival rate, and if the destinations of cells

are uniformly distributed over all outputs. Otherwise the

trafﬁc is non-uniform.

The scheduler determines which inputs and outputs

are connected during each slot. The scheduling problem

can be viewed as a bipartite graph matching problem

Figure 1: A Simple Model of VOQ Switches.

Input 1

1,1

1,N

(t)

Input M

M,1

M,N

(t)

Output 1

Output N

Crossbar

1,1

(t)

Scheduler

switch

ij,

i 1=

∑

1 λ

ij,

j 1=

∑

1λ

ij,

0≥,<,<

Inputs, I

Outputs, J

Request graph, G

Matching, M

a) Example of G for

b) Example of matching

Inputs, I

Outputs, J

1,1

2,1

1,3

3,2

3,N

M,3

I = M and J = N.

M on G.

2,1

1,3

3,2

igure 2: A request graph and a matching graph of an

witch. Define G = [V,E] as an undirected graph connecting the

et of vertices V with the set of edges E. The edge connecting ver-

ices i, 1≤i≤M and j, 1≤j≤N has an associated weight denoted w

i,j

raph G is bipartite if the set of inputs I = {i: 1≤i≤M} and outputs

= {i: 1≤j≤N} partition V such that every edge has one end in I

nd one end in J. Matching M on G is any subset of E such that no

wo edges in M have a common vertex.

MN×

[2][19], an example of which is shown in Figure 2. Each

input makes a request to every output for which it has cells

queued. An edge in the graph represents a request from

with weight (denoted in Figure 2 as ).

Let be a service indicator such that

and ; a value of one indicates

that input is matched to output , i.e., is allowed to

forward one cell to its output.

Deﬁnition 3: A maximum size match is one that maximizes

, i.e., the number of connections.

Definition 4: A maximum weight match is one that maxi-

mizes , i.e., the total weight.

Alternatively, a bipartite graph matching problem can

be easily solved and understood by transforming it into a

ﬂow network [2][19], as illustrated in Figure 3.

3 The LPF Algorithm

Although in practice LPF can be thought of as a spe-

cial maximum size matching algorithm, in theory it is eas-

ier to consider LPF as a maximum weight matching

algorithm. Each LPF request weight, , for a request

from input to output is deﬁned as follows:

(1)

ij,

n() w

ij,

n()

ij,

n()

i 1=

∑

1≤ S

ij,

n()

j 1=

∑

1≤

a) Weighted request graph.

b) A corresponding ﬂow network

Figure 3: Transformation of a request graph into a flow network.

(a) A weighted request graph. (b) The corresponding flow net-

work, G, whose all edges are of unity capacity. A source and a

target are added. The cost of every edge from and to is set

to zero. The cost of all other edges are equal to the negated value

of the corresponding weight.

t s t

-w

1,1

-w

2,1

-w

1,3

-w

3,2

-w

M,3

1,1

2,1

1,3

3,2

M,3

3,N

-w

3,N

ij,

n()

ij,

∑

ij,

n()w

ij,

n()

ij,

∑

ij,

n()

ij,

n()

n() C

n()+ L

ij,

n() 0>,

0 otherwise,,







where is the occupancy of at slot ,

and .

, which we call the input occupancy, is the total

number of cells that are currently waiting at input to be

forwarded to their respective outputs. Similarly, , the

output occupancy, is the total number of cells at all inputs

waiting to be forwarded to output . Together, the sum of

the input and output occupancies represents the work load

or congestion that a cell faces as it competes for transmis-

sion to its output. We call this sum the port occupancy;

LPF favors queues with high port occupancy.

Property 1: The total weight of an LPF match is equal to

the occupancy sum of all matched inputs and outputs, i.e,

, where and are the

set of matched inputs and matched outputs respectively.

We now show that LPF is a special case of a maxi-

mum size matching algorithm.

Theorem 1: LPF ﬁnds a match that is both maximum size

and maximum weight.

Proof of Main Theorem: see Appendix A. ❒

Since an LPF match is a maximum size match, we can

use a maximum size matching algorithm to find an LPF

ij,

n() Q

ij,

n() L

ij,

n()

∑

= C

n() L

ij,

n()

∑

n()

Figure 4: Modified Edmonds-Karp algorithm [2]. is a flow

network or graph constructed as described in Figure 3. is

the set of all edges in ; or is a vertex in representing

an input or output; is an edge from to ; is the total

flow through the network; denotes a flow from to .

is a residual network [2] [19], also called a residual graph.

LPFS is a largest unmatched port first search.

EG[]

Gu v G

uv,() uv

uv,[] uv

Modiﬁed Edmonds-Karp algorithm

1 for each edge

2 do

4 while LPFS finds a path from to in the residual

network

5 for each edge in

6 do if then

7 else

uv,()EG[]∈

uv,[]0=

vu,[]0=

p st

uv,()p

uv,[]0=

uv,[]cuv,[]=

vu,[]0←

uv,[] f–vu,[]=

n()

ij,

n()w

ij,

n()

ij,

∑

iI∈

∑

jJ∈

∑

+= I J

match. But we need to make sure that among all possible

maximum size matches we choose one with the largest

total weight.

3.1 Finding an LPF Match Using a Maximum

Size Matching Algorithm

Existing maximum size matching algorithms cannot

be used to implement LPF because they are unable to

select the maximum size match with the largest weight. A

simple modiﬁcation is called for. First, in order to keep the

algorithm free of complex magnitude comparisons, all

inputs and outputs are pre-ordered according to their LPF

weights prior to running the maximum size matching algo-

rithm. Then we use a modified Edmonds-Karp maximum

size matching algorithm [2][19] to ﬁnd the LPF match (see

Figure 4). A breadth-first search (BFS) in the Edmonds-

Karp algorithm is replaced by a largest-unmatched-port

first search (LPFS) described in Figure 5. LPFS enables

the modified algorithm to search for a maximum weight

match while performing path augmentation [19] to find a

maximum size match. As a result, line 2 of the LPFS-Visit

does not involve any magnitude comparison. It is proved

in [14] that the modiﬁed algorithm ﬁnds an LPF match.

Figure 5: A largest-unmatched-port first search (LPFS). First,

LPFS builds a tree with as its root. Initially every input and out-

put is colored white — undiscovered, then is grayed when it is

discovered, and finally is blackened when it is finished. is

the predecessor of . From the tree, an augmenting path from

to which must go through an unmatched input can be found by

walking the predecessor list which begins at a selected un-

matched input.

π v[]

v s

LPFS(G)

1 for each vertex

2 do white

3 nil

4 LPFS-Visit(t)

LPFS-Visit(u)

1 gray

2 for each , starting the largest to

the smallest.

3 do if white

4 then

5 LPFS-Visit(v)

6 black

uVG[]∈

color u[]←

πu[]←

color u[]←

v Adjacent u[]∈

color v[]=

πv[] u←

color u[]=

Theorem 2: The maximum size match found by the modi-

fied Edmonds-Karp algorithm is also a maximum weight

match with weights as deﬁned in Equation 1.

Proof of Main Theorem: see reference [14]. ❒

3.2 A Practical Approximation to LPF

LPF can be adapted to run at higher speed using sim-

ple heuristic approximations. Shown in Figure 6 is an iter-

ative algorithm called iLPF that approximates LPF. All

weight processing is done in step 1 prior to the iterative

steps. The second step consists of a double for-loop used

to find a maximal size match. Since the requests have

already been ordered in the first step, the maximal size

matching in the second step does not need to compare

request weights. Figure 7 shows the schematic of a hard-

ware implementation of iLPF. Our exploratory design

work suggests that the second step can be implemented

using simple hardware; for a switch, our synthe-

sized design can make a scheduling decision in just 10ns

using a commercial 0.25 CMOS ASIC technology.

The first step, which requires simple integer arithmetic,

can also run in 10ns, allowing the switch to run at a line

rate of 20 Gb/s.

3.3 Stability

We now prove that LPF can achieve 100% throughput

for all traffic patterns with independent arrivals, using the

1. Calculated based on the size of an ATM cell.

Figure 6: An iterative LPF algorithm. First, the algorithm builds

a sorted list of all inputs and outputs based on their occupancies.

Then, starting from the largest output and input, the algorithm

finds a maximal size match.

Iterative LPF algorithm

Step 1.

1 Sort inputs&outputs based on their occupancies

2 Reorder requests according to their input and output

occupancies

Step 2. Maximal size matching

1 for each output from largest->smallest

2 for each input from largest->smallest

3 if (there is a request) and (both input and output

unmatched)

4 then match them

32 32×

µm

notion of stability [8]. We deﬁne a switch to be stable for a

particular arrival process if the expected length of the

input queues does not grow without bound, i.e.,

. (2)

Definition 5: A switch can achieve 100% throughput if it

is stable for all independent and admissible arrivals.

Theorem 3: The LPF algorithm is stable for all admissi-

ble independent arrival processes.

Proof of Main Theorem: see Appendix B. ❒

3.4 Stability With a Finite Pipeline Delay

Because the modified maximum size matching algo-

rithm requires the input and outputs to be pre-ordered,

LPF and iLPF need sorting networks to sort all inputs and

outputs. Due to the relatively high complexity of the sort-

ing networks, they could dominate the running time of the

algorithm. Alternatively, we can pipeline the design to

reduce its running time; the sorting networks can operate

in one slot, and the maximum size matching algorithm in

the next. This means that the maximum size matching

algorithm is operating on weights that are now one slot out

of date — it is possible for the algorithm to favor the

Figure 7: A block diagram of iLPF. Referring to the algorithm in

Figure 6, inputs and outputs are pre-sorted by the two sorter net-

works. Raw requests (requests with weights removed) is given in

a matrix form. Request reordering is done by the two crossbars

which are configured by the sorting results. The maximal size

matching block, which implements the double for-loop, finds a

maximal size match that approximates an LPF match. The match

needs to be permuted back to its natural order.

Input Occupancies

Output Occupancies

Maximal size

Raw Requests

Sorter

X Bar

Matching

Permuted Requests

Match

{10, 20, 30}

{20, 25, 15}

{3, 2, 1}

{2, 1, 3}

ij,

n()

ij,

∑

∞< n∀,

HTML Viewer

A practical scheduling algorithm to achieve 100% throughput in input-queued switches

Summary (3 min read)

1 Introduction

N N

LPF has a running-time complexity of

2 Our Switch Model

3 The LPF Algorithm

Property 1: The total weight of an LPF match is equal to the occupancy sum of all matched inputs and outputs, i.e,

3.1 Finding an LPF Match Using a Maximum Size Matching Algorithm

3.2 A Practical Approximation to LPF

Figure 6:

Iterative LPF algorithm

3.4 Stability With a Finite Pipeline Delay

Raw Requests

4 Conclusion

Figures (6)

Citations

Cites methods from "A practical scheduling algorithm to..."

Cites background from "A practical scheduling algorithm to..."

Cites background or methods from "A practical scheduling algorithm to..."

References

"A practical scheduling algorithm to..." refers background in this paper

Related Papers (5)