What are the future works mentioned in the paper "Matching output queueing with a combined input output queued switch1" ?

The authors believe this to be an important area for future research.

How can GBVOQ be used in conjunction with the DTC strategy?

Like CCF, GBVOQ can be used in conjunction with the DTC strategy to reduce the number of iterations needed to compute a stable matching.

What is the reason for the CCF insertion policy?

The intuition behind this insertion policy is that a a cell with a small output cushion needs to leave soon (i.e. it is “more critical”), and therefore needs to be delivered to its output sooner than a cell with a larger output cushion.

What is the insertion policy for a cell?

“Critical Cells First” (CCF) inserts an arriving cell as far from the head of its input queue as possible, such that the input thread of the cell is not larger than its output cushion.

What is the slackness of a cell in a departure phase?

During an arrival phase, the slackness of a cell already in the system can go down by at most F since a new cell with fanout F may get inserted ahead of it.

What is the way to group incoming cells into virtual output queues?

for emulating a FIFO OQ switch, the authors can group incoming cells into Virtual Output Queues and obtain an upper bound of on the number of cells that need to be considered.

What is the VOQ that needs to be marked active?

and unfortunately, to determine which VOQs need to be marked active, the authors again need access to global state, namely the output cushion of each cell at the head of a VOQ.

how can a cell's slackness be determined?

Counting the changes in each of the four phases (arrival, departure, and two scheduling phases), the authors conclude that the slackness of cell c can not decrease from time slot to time slot.

(Open Access) Matching output queueing with a combined input/output-queued switch (1999) | Shang-Tse Chuang

Matching Output Queueing with a Combined Input Output Queued

Switch

Shang-Tse Chuang

Ashish Goel

Nick McKeown

Balaji Prabhakar

Stanford University

Abstract — The Internet is facing two problems simultaneously: there is a need for a

faster switching/routing infrastructure, and a need to introduce guaranteed qualities of

service (QoS). Each problem can be solved independently: switches and routers can be

made faster by using input-queued crossbars, instead of shared memory systems; and QoS

can be provided using WFQ-based packet scheduling. However, until now, the two

solutions have been mutually exclusive — all of the work on WFQ-based scheduling

algorithms has required that switches/routers use output-queueing, or centralized shared

memory. This paper demonstrates that a Combined Input Output Queueing (CIOQ) switch

running twice as fast as an input-queued switch can provide precise emulation of a broad

class of packet scheduling algorithms, including WFQ and strict priorities. More

precisely, we show that for an switch, a “speedup” of is necessary and a

speedup of two is sufficient for this exact emulation. Perhaps most interestingly, this result

holds for all traffic arrival patterns. On its own, the result is primarily a theoretical

observation; it shows that it is possible to emulate purely OQ switches with CIOQ

switches running at approximately twice the line-rate. To make the result more practical,

we introduce several scheduling algorithms that, with a speedup of two, can emulate an

OQ switch. We focus our attention on the simplest of these algorithms, Critical Cells First

(CCF), and consider its running-time and implementation complexity. We conclude that

additional techniques are required to make the scheduling algorithms implementable at

high speed, and propose two specific strategies.

1 Introduction

Many commercial switches and routers today employ output-queueing.

When a packet

arrives at an output-queued (OQ) switch, it is immediately placed in a queue that is dedicated to its

outgoing line, where it waits until departing from the switch. This approach is known to maximize

the throughput of the switch: so long as no input or output is oversubscribed, the switch is able to

support the traffic and the occupancies of queues remain bounded. Furthermore, by carefully

scheduling the time a packet is placed onto the outgoing line, a switch or router can control the

packet’s latency, and hence provide quality-of-service (QoS) guarantees. But output queueing is

1. This paper was presented at Infocom ‘99, New York, USA.

2. When we refer to output-queueing in this paper, we include designs that employ centralized shared memory.

⁄

–

impractical for switches with high line rates and/or with a large number of ports, since the fabric

and memory of an switch must run times as fast as the line rate. Unfortunately, at high

line rates, memories with sufficient bandwidth are simply not available.

On the other hand, the fabric and the memory of an input queued (IQ) switch need only run as

fast as the line rate. This makes input queueing very appealing for switches with fast line rates, or

with a large number of ports. For this reason, the highest performance switches and routers use

input-queued crossbar switches [3][4]. But IQ switches can suffer from head-of-line (HOL) block-

ing, which can have a severe effect on throughput. It is well-known that if each input maintains a

single FIFO, then HOL blocking can limit the throughput to just 58.6% [5].

One method that has been proposed to reduce HOL blocking is to increase the “speedup” of a

switch. A switch with a speedup of can remove up to packets from each input and deliver up

to packets to each output within a time slot, where a time slot is the time between packet arrivals

at input ports. Hence, an OQ switch has a speedup of while an IQ switch has a speedup of one.

For values of between 1 and packets need to be buffered at the inputs before switching as

well as at the outputs after switching. We call this architecture a combined input and output queued

(CIOQ) switch.

Both analytical and simulation studies of a CIOQ switch which maintains a single FIFO at

each input have been conducted for various values of speedup [6][7][8][9]. A common conclusion

of these studies is that with or 5 one can achieve about 99% throughput when arrivals are

independent and identically distributed at each input, and the distribution of packet destinations is

uniform across the outputs. Whereas these studies consider average delay (and simplistic input

traffic patterns), they make no guarantees about the delay of individual packets. This is particularly

important if a switch or router is to offer QoS guarantees.

We believe that a well-designed network switch should perform predictably in the face of all

types of arrival process

and allow the delay of individual packets to be controlled. Hence our

approach is quite different: rather than find values of speedup that work well on average, or with

simplistic and unrealistic traffic models, we find the minimum speedup such that a CIOQ switch

behaves identically to an OQ switch for all types of traffic. (Here, “behave identically” means that

when the same inputs are applied to both the OQ switch and to the CIOQ switch, the correspond-

ing output processes from the two switches are completely indistinguishable.) This approach was

first formulated in the recent work of Prabhakar and McKeown [12]. They show that a CIOQ

switch with a speedup of four can behave identically to a FIFO OQ switch for arbitrary input traf-

fic patterns and switch sizes. In this sense, this paper builds upon and extends the results in [12], as

described in the next paragraph. A number of researchers have recently considered various aspects

of the speedup problem, most notably [18] which obtains packet delay bounds and [19] which

finds sufficient conditions for maximizing throughput through work conservation and mimicking

of output queueing.

In this paper, we show that a CIOQ switch with a speedup of two can behave identically to an

OQ switch. The result holds for switches with an arbitrary number of ports, and for any traffic

1. The need for a switch that can deliver a certain grade of service, irrespective of the applied traffic is par-

ticularly important given the number of recent studies that show how little we understand network traffic

processes [11]. Indeed, a sobering conclusion of these studies is that it is not yet possible to accurately

model or simulate a trace of actual network traffic. Furthermore, new applications, protocols or data-cod-

ing mechanisms may bring new traffic types in future years.

2. [20] aimed to extend the results of [12], but the algorithms and proofs presented there are incorrect and do

not solve the speedup problem. See http://www.cs.cmu.edu/~istoica/IWQoS98-fix.html for a discussion

of the errors.

S 4=

arrival pattern. It is also found to be true for a broad class of widely used output link scheduling

algorithms such as weighted fair queueing, strict priorities, and FIFO. We introduce some specific

scheduling algorithms that achieve this result. We also show more generally that a speedup of

is both necessary and sufficient for a CIOQ switch to behave identically to a FIFO OQ

switch.

It is worth briefly considering the implications of this result. It demonstrates that it is possible

to emulate an OQ switch using buffer memory operating at only twice the speed of the

external line. Previously, an OQ switch could only be implemented with memories operating at N

times the speed of the external line. However, the advantages do not come for free. In essence, the

memory bandwidth is reduced at the expense of a fast cell scheduling algorithm that is required to

configure the crossbar. As we shall see, the scheduling algorithms are complex, the best known-to-

date having a running-time complexity of N. (We discuss the implementation complexity in some

detail in Section 5). This means that it is not yet practicable to emulate fast OQ switches with a

large number of ports. While we propose some strategies in this paper, this is a topic for further

research.

1.1 Background

Consider the single stage, switch shown in Figure 1. Throughout the paper we assume

that packets begin to arrive at the switch from time , the switch having been empty before

that time. Although packets arriving to the switch or router may have variable length, we will

assume that they are treated internally as fixed length “cells”. This is common practice in high per-

formance LAN switches and routers; variable length packets are segmented into cells as they

arrive, carried across the switch as cells, and reassembled back into packets again before they

depart [4][3]. We take the arrival time between cells as the basic time unit and refer to it as a time

slot. The switch is said to have a speedup of , for if it can remove up to

cells from each input and transfer at most cells to each output in a time slot. A speedup of

requires the switch fabric to run times as fast as the input or output line rate. For

buffering is required both at the inputs and at the outputs, and leads to a combined input and output

queued (CIOQ) architecture. The following is the problem we wish to solve.

The speedup problem: Determine the smallest value of and an appropriate cell scheduling

algorithm that

1. allows a CIOQ switch to exactly mimic the performance of an output-queued switch (in a

sense that will be made precise),

2. achieves this for any arbitrary input traffic pattern,

3. is independent of switch size.

In an OQ switch, arriving cells are immediately forwarded to their corresponding outputs.

This (a) ensures that the switch is work-conserving, i.e. an output never idles so long as there is a

cell destined for it in the system, and (b) allows the departure of cells to be scheduled to meet

latency constraints.

We will require that any solution of the speedup problem possess these two

desirable features; that is, a CIOQ switch must behave identically to an OQ switch in the following

sense:

1. For ease of exposition, we will at times assume that the output uses a FIFO queueing discipline, i.e. cells depart from

the output in the same order that they arrived to the inputs of the switch. However, we are interested in a broader class

of queueing disciplines: ones that allow cells to depart in time to meet particular bandwidth and delay guarantees.

----–

t 1=

…

,,,

{}

∈

1 SN

Identical Behavior: A CIOQ switch is said to behave identically to an OQ switch if, under iden-

tical inputs, the departure time of every cell from both switches is identical.

Figure 1: A General Combined Input and Output Queued (CIOQ) switch.

As a benchmark with which to compare our CIOQ switch, we will assume there exists a

shadow OQ switch that is fed the same input traffic pattern as the CIOQ switch. Our goal is

to arrange for each cell to depart from the CIOQ switch at exactly the same time as its counterpart

cell departs from the OQ switch. In the CIOQ switch, the sequence in which cells are transferred

from their input queues to the output queue is determined by a scheduling algorithm. In each time

slot, the scheduling algorithm matches each non-empty input with at most one output and, con-

versely, each output is matched with at most one input. The matching is used to configure the

crossbar fabric before cells are transferred from the input side to the output side. A CIOQ switch

with a speedup of is able to make such transfers during each time slot.

Selecting the appropriate scheduling algorithm is the key to achieving identical behavior

between the CIOQ switch and its shadow OQ switch.

1.2 Push-in Queues

Throughout this paper, we will make repeated use of what we will call a push-in queue. Simi-

lar to a discrete-event queue, a push-in queue is one in which an arriving cell is inserted at an arbi-

trary location in the queue based on some criterion. For example, each cell may carry with it a

departure time, and is placed in the queue ahead of all cells with a later departure time, yet behind

cells with an earlier departure time. The only property that defines a push-in queue is that once

placed in the queue, cells may not switch places with other cells. In other words, their relative

ordering remains unchanged. In general, we distinguish two types of push-in queues: (1) “Push-In

First-Out” (PIFO) queues, in which arriving cells are placed at an arbitrary location, and the cell at

the head of the queue is always the next to depart. PIFO queues are quite general — for example, a

WFQ scheduling discipline operating at an output queued switch is a special case of a PIFO queue.

(2) “Push-In Arbitrary-Out” (PIAO) queues, in which cells are removed from the queue in an arbi-

trary order. i.e. it is not necessarily the case that the next cell to depart is the one currently at the

head of the queue.

It is assumed that each input of the CIOQ switch maintains a queue, which can be thought of

as an ordered set of cells waiting at the input port. In general, the CIOQ switches that we consider,

can all be described using PIAO input queues.

Many orderings of the cells are possible — each

ordering leading to a different switch scheduling algorithm, as we shall see.

Output 1

Input 1

Input N

Output N

Each output maintains a queue for the cells waiting to depart from the switch. In addition, each

output also maintains an output priority list: an ordered list of cells at the inputs waiting to be

transferred to this particular output. The output priority list is drawn in the order in which the cells

would depart from the OQ switch we wish to emulate (i.e. the shadow OQ switch). This priority

list will depend on the queueing policy followed by the OQ switch (FIFO, WFQ, strict priorities

etc.).

1.3 Definitions

The following definitions are crucial to the rest of the paper.

Definition 1: Time to Leave — The “time to leave” for cell c, TL(c), is the time slot at which c

will leave the shadow OQ switch. Note that it is possible for TL(c) to increase. This happens if

new cells arrive to the switch, destined for c’s output, and have a higher priority than c. (Of

course, TL(c) is also the time slot in which c must leave from our CIOQ switch for the identical

behavior to be achieved.)

Definition 2: Output Cushion — At any time, the “output cushion of a cell c”, OC(c), is the

number of cells waiting in the output buffer at cell c’s output port with a smaller time to leave

value than cell c.

Notice that if a cell is still on the input side and has a small (or zero) output cushion, the sched-

uling algorithm must urgently deliver the cell to its output so that it may depart on time. Since the

switch is work-conserving, a cell’s output cushion decreases by one during every time slot, and can

only be increased by newly arriving cells that are destined to the same output and have a more

urgent time to leave.

Definition 3: Input Thread — At any time, the “input thread of cell c”, IT(c), is the number of

cells ahead of cell c in its input priority list.

In other words, IT(c) represents the number of cells currently at the input that need to be trans-

ferred to their outputs more urgently than cell c. A cell’s input thread is decremented only when a

cell ahead of it is transferred from the input, and is possibly incremented by newly arriving cells.

Notice that it would be undesirable for a cell to simultaneously have a large input thread and a

small output cushion — the cells ahead of it at the input may prevent it from reaching its output

before its time to leave. This motivates our definition of slackness.

Definition 4: Slackness — At any time, the “slackness of cell c”, L(c), equals the output cushion

of cell c minus its input thread i.e. .

Slackness is a measure of how large a cell’s output cushion is with respect to its input thread. If

a cell’s slackness is small, then it urgently needs to be transferred to its output. Conversely, if a cell

has a large slackness, then it may languish at the input without fear of missing its time to leave.

Figure 2: A snapshot of a CIOQ switch

1. In practice, we need not necessarily use a PIAO queue to implement these techniques. But we will use the PIAO

queue as a general way of describing the input queueing mechanism.

1. Note that a cell’s input thread and slackness are only defined when the cell is waiting at the input side of

the switch.

()

–

Matching output queueing with a combined input/output-queued switch

Figures

Citations

The iSLIP scheduling algorithm for input-queued switches

Algorithmics of Matching Under Preferences

The throughput of data switches with and without speedup

Analysis of a dynamically wavelength-routed optical burst switched network architecture

Network flow switching and flow data export

References

College Admissions and the Stability of Marriage

On the self-similar nature of Ethernet traffic (extended version)

Analysis and simulation of a fair queueing algorithm

Input Versus Output Queueing on a Space-Division Packet Switch

Analysis and Simulation of Fair Queueing Algorithm

Related Papers (5)

The iSLIP scheduling algorithm for input-queued switches

High-speed switch scheduling for local-area networks

Input Versus Output Queueing on a Space-Division Packet Switch

Achieving 100% throughput in an input-queued switch

The throughput of data switches with and without speedup

Frequently Asked Questions (15)

Q1. What are the contributions mentioned in the paper "Matching output queueing with a combined input output queued switch1" ?

Q2. What are the future works mentioned in the paper "Matching output queueing with a combined input output queued switch1" ?

Q3. How can GBVOQ be used in conjunction with the DTC strategy?

Q4. What is the common conclusion of these studies?

Q5. What is the reason for the CCF insertion policy?

Q6. What is the CIOQ switch's sequence of departure?

Q7. What is the insertion policy for a cell?

Q8. What is the common way to queue a packet?

Q9. What is the slackness of a cell in a departure phase?

Q10. What is the way to group incoming cells into virtual output queues?

Q11. What is the common practice in high performance LAN switches?

Q12. How does Prabhakar and McKeown show that a CIOQ switch can behave?

Q13. What is the VOQ that needs to be marked active?

Q14. how can a cell's slackness be determined?

Q15. What is the main difference between output queueing and IQ switching?

Trending Questions (1)