A three-stage ATM switch with cell-level path allocation

doi:10.1109/26.592612

Journal Article•DOI•

A three-stage ATM switch with cell-level path allocation

Martin Collier¹•Institutions (1)

01 Jun 1997-IEEE Transactions on Communications (Institute of Electrical and Electronics Engineers)-Vol. 45, Iss: 6, pp 701-709

TL;DR: A method is described for performing routing in three-stage asynchronous transfer mode (ATM) switches which feature multiple channels between the switch modules in adjacent stages, which allows cell-level routing to be performed, whereby routes are updated in each time slot.

read less

Abstract: A method is described for performing routing in three-stage asynchronous transfer mode (ATM) switches which feature multiple channels between the switch modules in adjacent stages. The method is suited to hardware implementation using parallelism to achieve a very short execution time. This allows cell-level routing to be performed, whereby routes are updated in each time slot. The algorithm allows a contention-free routing to be performed, so that buffering is not required in the intermediate stage. An algorithm with this property, which preserves the cell sequence, is referred to as a path allocation algorithm. A detailed description of the necessary hardware is presented. This hardware uses a novel circuit to count the number of cells requesting each output module, it allocates a path through the intermediate stage of the switch to each cell, and it generates a routing tag for each cell, indicating the path assigned to it. The method of routing tag assignment described employs a nonblocking copy network. The use of highly parallel hardware reduces the clock rate required of the circuitry, for a given-switch size. The performance of ATM switches using this path allocation algorithm has been evaluated by simulation, and is described.

...read moreread less

Summary (4 min read)

Jump to: [I. INTRODUCTION] – [A. The Objectives of a Path Allocation Algorithm] – [B. Basic Principles of the Path Allocation Algorithm] – [C. Implementation of the Algorithm] – [otherwise.] – [D. Implementation Issues] – [III. A FAST METHOD OF REQUEST COUNTING] – [A. Principles of Operation] – [TABLE I PATTERN OF REQUESTS AND POSSIBLE OUTCOME OF PATH ALLOCATION PROCESS] – [B. An Example of Routing Tag Assignment] – [V. PERFORMANCE OF THE PATH ALLOCATION ALGORITHM] – [VI. A DESIGN EXAMPLE] and [VII. CONCLUSIONS]

I. INTRODUCTION

T HE THROUGHPUT achievable (in bits/second) in an asynchronous transfer mode (ATM) switch depends heavily on the process used to fabricate it.
Some method of routing is then necessary, to select among the available paths from source to destination, through the second stage of the switch.
In one approach (call-level routing), all cells belonging to a virtual connection ("call") are allocated the same route.
The algorithm described here requires fewer iterations than that in [6] , does not require input buffering (which degrades the throughput), unlike [7] , and is fairer than that presented in [5] , in addition to readily supporting intermediate channel grouping.

A. The Objectives of a Path Allocation Algorithm

There are routes from each input module to each intermediate module.
There are routes from each intermediate module to each output module.
The authors must choose, for every input cell (if possible) an intermediate switch module through which to pass on the way to the selected destination, such that no input module attempts to route more than cells via any intermediate module, and no intermediate module attempts to route more than cells to any output module, in any one time slot.
It will be assumed, for simplicity, that all input ports of the switch operate at the same rate, and thus that the duration of the time slot (the interval between successive cell boundaries) is the same for every cell.

B. Basic Principles of the Path Allocation Algorithm

A new and efficient algorithm will now be described.
Note that and need only be local to the input module.
The procedure determines the capacity available from input module to output module via intermediate switch module (i.e., the minimum of and .
The number of requests which can be satisfied is equal to the minimum of the number of requests outstanding and the available capacity.
A parallel implementation requires multiple processors, each executing the procedure for a different set of procedure parameters, subject to the following constraints: no two processors shall simultaneously require access to the same quantity.

C. Implementation of the Algorithm

Suppose that there are modules in each stage of the switch.
The processor in the th row (numbered from the right) and th column (numbered from the bottom) of the array is labeled .
The values stored in the processor array are shown in Fig. 2 (a) for the case where .
The algorithm then requires iterations (iterations zero through .
Specifically, processor is initialized as follows: otherwise.

otherwise.

An examination of the operation of the resulting algorithm reveals that the processors in row or higher and in column or above never modify the and values they receive, and thus may be replaced by simple delays.
If , each column requires additional registers.
Hence a relatively high clock speed will be required in the array, so as to complete iterations of the algorithm in the time available (which is less than the duration of one time slot).
A switch with intermediate channel grouping affords the possibility of reducing cell loss probability by increasing and , rather than by increasing Thus, the proposed algorithm is fairer than that described in [5] .

D. Implementation Issues

The processor must execute the procedure, and thus must perform two types of operation: 1) find the minimum of three numbers; 2) perform three subtractions.
The and values are obtained from (and forwarded to) adjacent processors.
A fast implementation using bit-serial arithmetic, and which does not require the calculation of the minimum of three numbers, was described in [10] .
The input and output port controllers must perform the necessary bit rate adaptation (and multiplexing/demultiplexing) for links operating at other rates, so that cells traverse the switch fabric at a common rate.
This requires the path allocation algorithm to preferentially allocate paths to cells with the CLP bit set to zero.

III. A FAST METHOD OF REQUEST COUNTING

Suitable hardware to simultaneously calculate (the number of requests from input module for output module for all values of will now be described.
The execution time for this hardware is clock cycles.
Under these circumstances, it may readily be shown that where is the number of data cells requesting output module , and is fixed, since the Batcher network processes only requests from input module .
A total of control packets is thus simultaneously launched into the concentrator, and these are routed to the serial adders at outputs zero through without blocking.
The concentrated list of values is then read by these serial adders, the lower input (as shown in Fig. 5 ) being inverted.

A. Principles of Operation

The processor generates a sequence of values, one after every iteration of the path allocation algorithm, commencing with (the initial value of determined by the request counting hardware) and decrementing, after every iteration, in accordance with the procedure, as paths are allocated to cells.
Thus represents the number of outstanding requests from input module for output module .
When the path allocation process is complete, a special null token is broadcast to the cells which have lost contention.
During each iteration of the algorithm, submits a routing packet to the network, to be broadcast to address generators through containing in the data field the token address, i.e., the address of the intermediate switch module through which a route has been allocated.
Two bits (one each from the upper and lower address), in addition to the activity bit, must be processed at each node of the network.

TABLE I PATTERN OF REQUESTS AND POSSIBLE OUTCOME OF PATH ALLOCATION PROCESS

Changes after the first iteration of the algorithm [16] .
Hence, on subsequent iterations of the algorithm, there is no need to distribute the lower address, so that the header on the routing packet may be shortened, reducing the delay through the copy network.

B. An Example of Routing Tag Assignment

Table I indicates the number of cells from input module 0 which have requested each of the four output modules and a possible pattern of path allocations which might be generated by the processors.
The copy network must be initialized before path allocation commences.
After each iteration of the path allocation algorithm (i.e., iterations 0, 1, 2 and 3), the corresponding iteration of the routing tag assignment algorithm is performed (iterations and respectively).
Also shown are the lower address bits processed by each switch element.
The token address is not broadcast, except during the first iteration.

V. PERFORMANCE OF THE PATH ALLOCATION ALGORITHM

The performance of a three-stage switch using the celllevel path allocation algorithm described above will now be evaluated.
The simulation model is based on the following assumptions.
3) The destination of each cell is drawn from a uniform distribution; all output modules receive the same load.
The probability of an individual cell being lost is obviously much less, but cannot be evaluated without knowing how the probability of a given cell losing contention, and the corresponding probabilities for the cells with which it contends, are correlated.).
These graphs can be used to find the maximum number of input ports which a switch with a given capacity in the intermediate stage can support, for a given probability of cell loss during path allocation.

VI. A DESIGN EXAMPLE

The resulting switch has a cell loss probability (due to loss of contention during path allocation) below 10 even in the presence of a nonuniform load [14] .
The input modules must accept data from the address generators in Fig. 4 , and so must have 128 inputs, even though at most 96 data cells will be present.
One execution of the procedure will require nine clock cycles, using the efficient implementation described in [10] .
The number of processors required is 1024 (32 32), but the IC count should be relatively low because of the simplicity of the processor design.
The complexity of the path allocation circuitry is relatively high, but the switch modules in the first and second stages are of simple design, because of the avoidance of output contention.

VII. CONCLUSIONS

A new algorithm for path allocation in three-stage broadband networks has been described.
A complete hardware implementation of this algorithm has been presented, including a method for generating the initial data required by the algorithm, and for forwarding the results to each cell at the input side of the switch, in the form of a routing tag.
The operating speed required of the design appears within the capabilities of VLSI technology in the short term.
The resulting switch offers the delay performance of an output-buffered switch, unlike either three-stage switches featuring call-level routing, which buffer the cells at each stage, or those featuring input buffers.
It avoids the fairness problem intrinsic to the "cell scheduling" algorithm of the Growable Packet Switch [5] .

Did you find this useful? Give us your feedback

Figures (8)

Fig. 6. An example of routing tag assignment. (a) Initialization (i.e., iteration0 ). (b) Iteration0+: (c) Iteration1+. (d) Iteration2+. (e) Iteration3+. The type of token being broadcast is shown on the input and output sides of the copy network. The type of packet receiving the token, and the value of the last token received, are shown at the network outputs (clearly only data cells receive tokens, as required). RPG: Routing packet generator; ISM: Intermediate stage module.

Fig. 1. A three-stage switch with intermediate channel grouping.

TABLE I PATTERN OF REQUESTS AND POSSIBLE OUTCOME OF THE PATH ALLOCATION PROCESS

Fig. 7. Performance of the switch. (a) Performance with uniform traffic(L1 = m = L2; S1 = S2 = 4): (b) Performance with uniform traffic (L1 = m = L2; S1 = 8; S2 = 4): (c) Performance with uniform traffic(L1 = m = L2; S1 = 4; S2 = 8).

Fig. 3. Implementation of theatomic( ) processor. Min: Calculator of minimum;Dx: Delay (needed to synchronise arrival times—may be zero).

TABLE II VALUES OFK REGISTERS, ANDADDRESSES TOWHICH TOKENS ARE BROADCAST, IN EACH ITERATION, FOR THE EXAMPLE OF TABLE I

Fig. 4. The circuitry for request counting and routing tag assignment. CG: Count generator; RPG: Routing packet generator; AG: Address generator.

Fig. 5. An example of request counting. CG: Count generator.

Content maybe subject to copyright Report

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 45, NO. 6, JUNE 1997 701

A Three-Stage ATM Switch

with Cell-Level Path Allocation

Martin Collier, Member, IEEE

Abstract— A method is described for performing routing in

three-stage asynchronous transfer mode (ATM) switches which

feature multiple channels between the switch modules in adjacent

stages. The method is suited to hardware implementation using

parallelism to achieve a very short execution time. This allows

cell-level routing to be performed, whereby routes are updated in

each time slot. The algorithm allows a contention-free routing to

be performed, so that buffering is not required in the intermediate

stage. An algorithm with this property, which preserves the cell

sequence, is referred to here as a path allocation algorithm.

A detailed description of the necessary hardware is presented.

This hardware uses a novel circuit to count the number of cells

requesting each output module, it allocates a path through the

intermediate stage of the switch to each cell, and it generates

a routing tag for each cell, indicating the path assigned to

it. The method of routing tag assignment described employs a

nonblocking copy network. The use of highly parallel hardware

reduces the clock rate required of the circuitry, for a given switch

size. The performance of ATM switches using this path allocation

algorithm has been evaluated by simulation, and is described

here.

Index Terms— Asynchronous transfer mode, communication

switching, communication system routing.

I. INTRODUCTION

HE THROUGHPUT achievable (in bits/second) in an

asynchronous transfer mode (ATM) switch depends heav-

ily on the process used to fabricate it. For example, Bianchini

and Kim [1] have described a single-board switch prototype

with 155-Mb/s link rate and a throughput of 2.48 Gb/s, con-

structed using “off-the-shelf” integrated circuits and PLD’s.

Collivignarelli et al. [2] have described a 16

16 switch chip

with a 311-Mb/s link rate (and hence, with a throughput close

to 5 Gb/s) fabricated using a 0.8

m BiCMOS process, which

dissipates 7 W. Merayo et al. [3] have reported a switch with

a 10-Gb/s throughput and a 2.5-Gb/s link rate, using a 0.7-

BiCMOS process and requiring approximately twenty chips.

Hino et al. [4] have developed a 4

4 switching element (for

a rerouting banyan network) with link rates of 10 Gb/s using

a 0.2-

m GaAs MESFET technology. The power dissipated

by this switch (some 30W) necessitates its implementation on

three integrated circuits.

It may be concluded, from the results reported above,

which are typical of the current state of the art, that the

tradeoffs to be performed between circuit complexity, power

Paper approved by G. P. O’Reilly, the Editor for Communications Switch-

ing of the IEEE Communications Society. Manuscript received July 3, 1995;

revised December 1, 1995.

The author is with the School of Electronic Engineering, Dublin City

University, Glasnevin, Dublin 9, Ireland.

Publisher Item Identiﬁer S 0090-6778(97)04172-X.

dissipation and process cost in designing ATM switches are

such as to restrict single-chip and single-board switch fabrics

to throughputs below perhaps 40 Gb/s for the foreseeable

future, even when using leading-edge (and thus expensive) IC

technologies. Hence, a large switch fabric (i.e., a switch with

a throughput exceeding, say, 200 Gb/s) will require a modular

architecture, allowing the switch fabric to be distributed across

multiple boards or cabinets.

An obvious method of implementing a large switch, given

these constraints, is to design the switch with three stages,

where each stage consists of smaller switch modules. Many

authors have proposed such switches [5]–[9]. This approach

typically introduces a new problem (not present in a single-

stage switch) whereby multiple paths from source to desti-

nation become available. Thus even if the individual switch

modules possess the self-routing feature, this feature is not

retained by the overall switch. Some method of routing is then

necessary, to select among the available paths from source to

destination, through the second stage of the switch.

Routing may be performed over a number of time scales.

In one approach (call-level routing), all cells belonging to a

virtual connection (“call”) are allocated the same route. Thus

the routing decision is made at connection setup time, and

this route is ﬁxed for the duration of the connection. Cell-

level routing is performed if the routing decision is made

independently in each time slot. The process of determining

a routing pattern such that no blocking can occur in the

second stage of the switch is referred to here as cell-level

path allocation.

This paper considers cell-level path allocation, and, specif-

ically, the problem of implementing a cell-level algorithm for

path allocation in the channel-grouped three stage network of

Fig. 1. This is an

switch, with , and

modules in the input, intermediate and output stages, respec-

tively. There are

links in the channel group connecting input

and intermediate stage modules, and

links in the channel

group connecting intermediate and output stage modules. The

use of channel grouping allows additional ﬂexibility when

dimensioning the three-stage switch. Cell-level path allocation

has been proposed by a number of authors [5]–[7]. The

algorithm described here requires fewer iterations than that

in [6], does not require input buffering (which degrades the

throughput), unlike [7], and is fairer than that presented in

[5], in addition to readily supporting intermediate channel

grouping.

The path allocation algorithm and the hardware necessary

to implement it are described in Section II of this paper.

0090–6778/97$10.00  1997 IEEE

Authorized licensed use limited to: DUBLIN CITY UNIVERSITY. Downloaded on July 19,2010 at 08:54:07 UTC from IEEE Xplore. Restrictions apply.

702 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 45, NO. 6, JUNE 1997

Fig. 1. A three-stage switch with intermediate channel grouping.

The algorithm requires ancillary hardware to count incoming

cells and to deliver routing tags to them. Suitable hardware

is described in Sections III and IV of this paper. The switch

performance is discussed in Section V.

II. A

N ALGORITHM FOR PAT H ALLOCATION AT CELL LEVEL

A. The Objectives of a Path Allocation Algorithm

There are

routes from each input module to each inter-

mediate module. There are

routes from each intermediate

module to each output module. We must choose, for every

input cell (if possible) an intermediate switch module through

which to pass on the way to the selected destination, such that

no input module attempts to route more than

cells via any

intermediate module, and no intermediate module attempts to

route more than

cells to any output module, in any one

time slot. This strategy ensures that:

1) the intermediate stage can never be congested;

2) no queueing occurs in the intermediate stage; thus the

delay through the intermediate stage is uniform, regard-

less of the path taken; this makes it possible to preserve

cell sequence on a virtual connection;

3) contention can never occur in the intermediate stage,

simplifying its design.

An algorithm to implement this strategy will now be de-

scribed. It will be assumed, for simplicity, that all input ports of

the switch operate at the same rate, and thus that the duration of

the time slot (the interval between successive cell boundaries)

is the same for every cell.

B. Basic Principles of the Path Allocation Algorithm

A new and efﬁcient algorithm will now be described. It is

suitable for use in a channel-grouped three-stage switch and

requires only knowledge obtainable at the input side of the

switch. It operates on the following quantities:

number of channels available from input module to

intermediate switch module

number of channels available from intermediate switch

module

to output module

number of requests from input module for output

module

(a)

(b)

Fig. 2. Examples of the processor array (a) showing contents of processors

during Iteration Zero

(

=4)

and (b) showing initial conditions

for

, and

Note that and need only be local to the input

module. The

’s must be forwarded to each input module

in turn. Let

be the number of cells to be routed from

input module

to output module via intermediate switch

module

The values of and are updated using

the procedure

described below:

This procedure is “atomic” in the sense that it is the basic

building block from which the path allocation algorithm is

constructed. The procedure determines the capacity available

from input module

to output module via intermediate

switch module

(i.e., the minimum of and . The

number of requests which can be satisﬁed is equal to the

minimum of the number of requests outstanding

and

the available capacity.

A parallel implementation requires multiple processors, each

executing the

procedure for a different set of

procedure parameters, subject to the following constraints:

• no two processors shall simultaneously require

access to the same quantity. For example,

uses and so that neither

nor

can be executed concurrently with for

any

;

Authorized licensed use limited to: DUBLIN CITY UNIVERSITY. Downloaded on July 19,2010 at 08:54:07 UTC from IEEE Xplore. Restrictions apply.

COLLIER: A THREE-STAGE ATM SWITCH 703

Fig. 3. Implementation of the

atomic

()

processor. Min: Calculator of minimum;

: Delay (needed to synchronise arrival times—may be zero).

• the data required by a processor for the next iteration of

the algorithm should be available locally, or from adjacent

processors.

An implementation satisfying these two constraints will now

be presented.

C. Implementation of the Algorithm

Suppose that there are

modules in each stage of the

switch. An array of

processors is used. The processor

in the

th row (numbered from the right) and th column

(numbered from the bottom) of the array is labeled

Processor

is initialized by loading the following three

values:

1) initial value of

;

2) initial value of

(i.e., ;

3) initial value of

(i.e., .

The values stored in the processor array are shown in

Fig. 2(a) for the case where

The algorithm then requires

iterations (iterations zero

through

. Processor executes

during iteration ; after each iteration

forwards the updated value of to and of

to , and retains .

If we choose

the same algorithm

may be used for a switch with an arbitrary number of modules

in each stage. Suppose that a square array of

processors is used. Some of the processor registers must be

initialized to zero if their contents pertain to a nonexistent

switch module. Speciﬁcally, processor

is initialized as

follows:

otherwise.

where

An examination of the operation of the resulting algorithm

reveals that the processors in row

or higher and in column

or above never modify the and values they receive,

and thus may be replaced by simple delays.

In general, a switch with

input modules and output

modules requires a processor array with

rows and

columns. If , each column requires additional

registers. If , each row requires additional

registers. The initial conditions in the array for the case where

and are shown in Fig. 2(b).

An unichannel architecture may require a large value for

to obtain low cell loss probabilities. Hence a relatively

high clock speed will be required in the array, so as to

complete

iterations of the algorithm in the time available

(which is less than the duration of one time slot). A switch

with intermediate channel grouping affords the possibility

of reducing cell loss probability by increasing

and ,

rather than by increasing

This can reduce the clock speed

requirements. Note that, unlike the cell scheduling algorithm

in [5], this algorithm attempts to allocate a path to each cell

at the switch inputs during every iteration of the algorithm.

Thus, the proposed algorithm is fairer than that described

in [5].

D. Implementation Issues

The processor must execute the

procedure, and

thus must perform two types of operation:

1) ﬁnd the minimum of three numbers;

2) perform three subtractions.

Hence, in principle, the processor may be implemented as

shown in Fig. 3. The value of

is stored locally. The

and values are obtained from (and forwarded to) adjacent

processors. The simple structure of the

processor

ensures that many copies of it may be constructed on a single

integrated circuit (IC), and also ensures that it can operate at

high speed. A fast implementation using bit-serial arithmetic,

and which does not require the calculation of the minimum of

three numbers, was described in [10].

Authorized licensed use limited to: DUBLIN CITY UNIVERSITY. Downloaded on July 19,2010 at 08:54:07 UTC from IEEE Xplore. Restrictions apply.

704 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 45, NO. 6, JUNE 1997

Fig. 4. The circuitry for request counting and routing tag assignment. CG: Count generator; RPG: Routing packet generator; AG: Address generator.

Hardware is also needed in each input module to perform the

following tasks before and during the path allocation process:

• to count the number of requests for each output module

so as to obtain the initial values of the

’s;

• to forward a routing tag based on the results of path

allocation to each input cell.

The circuitry to implement these functions is shown in

Fig. 4. Its operation will be described in Sections III and IV.

It is assumed that cells losing contention are discarded. If

this is not the case, additional hardware will be required to

forward acknowledgments to the input port controllers, and

this circuitry will introduce an additional delay.

The switch fabric, as described above, operates at a

single rate (which will typically be the OC-3/STM-1 rate

of 155 Mb/s). The input and output port controllers must

perform the necessary bit rate adaptation (and multiplex-

ing/demultiplexing) for links operating at other rates, so

that cells traverse the switch fabric at a common rate. The

demultiplexing of incoming cell streams of high bit rate to a

number of switch fabric inputs has implications for the switch

performance (since correlations are then possible between

the arrival processes on adjacent input ports), and for cell

sequence preservation, which will be addressed in a future

paper.

The switch will be required to support multiple loss prior-

ities in practice. This requires the path allocation algorithm

to preferentially allocate paths to cells with the CLP bit

set to zero. The simplest way of modifying the described

algorithm to achieve this is to perform path allocation twice,

once for cells with CLP

, and a second time for the

cells tolerating higher loss rates, with the initialization of the

processor array being appropriately modiﬁed. However, this

approach doubles the required operating speed of the array,

which may be impractical in many cases. A less expensive

method for introducing differentials in loss probabilities is

described in [11].

III. A F

AST METHOD OF REQUEST COUNTING

Suitable hardware to simultaneously calculate (the

number of requests from input module

for output module

for all values of will now be described.

The execution time for this hardware is

clock cycles. A slower solution, requiring less hardware, was

described in [12].

The hardware required is shown in Fig. 5. Data cells from

the

input ports associated with input module are merged

with

control packets (one per output module) by a Batcher

sorting network. The merge operation is performed in such a

way that idle cells (i.e., empty cells from inactive input ports)

are sorted to the highest output ports of the Batcher network.

If the control packet for output module

appears at output

of the Batcher network, then the data cells (if any) requesting

that output module appear at lower output ports of the sorter

(ports

etc.), as shown in Fig. 5.

Under these circumstances, it may readily be shown that

where is the number of data cells requesting output

module

, and is ﬁxed, since the Batcher network processes

only requests from input module

The key to this method of request counting is the obser-

vation that

The necessary subtraction can be performed very efﬁciently,

since

where is the 1’s complement of obtained by bitwise

inversion of

It follows that the value of can be

generated using a serial adder, and can then be stored in the

It is necessary to generate a concentrated list of the values

as input data for the serial adders.

These values are obviously available at the sorter outputs

which have received control packets (since, for example, con-

trol packet 4 appears at output

, but are not concentrated

onto contiguous outputs. Hence a concentrator is required. This

is the purpose of the binary self-routing network shown in

Fig. 5, which is often called the “reverse banyan” [13]. A

well-known property of this network is that it is nonblocking

Authorized licensed use limited to: DUBLIN CITY UNIVERSITY. Downloaded on July 19,2010 at 08:54:07 UTC from IEEE Xplore. Restrictions apply.

COLLIER: A THREE-STAGE ATM SWITCH 705

Fig. 5. An example of request counting. CG: Count generator.

when acting as a concentrator. A formal proof that blocking

cannot occur in Fig. 5 was given in [14].

The count generators forward only control packets to this

network. Count generators which have received a data cell

or an idle cell through the Batcher network submit an inactive

packet to the concentrator. The count generator which receives

control packet

from output of the Batcher network

appends a data ﬁeld to the packet containing the value of

This packet is then routed to output of the concentrator.

A total of

control packets is thus simultaneously launched

into the concentrator, and these are routed to the serial adders

at outputs zero through

without blocking.

The concentrated list of

values is then read by these

serial adders, the lower input (as shown in Fig. 5) being

inverted. Hence the

values are generated, and passed to the

processors. The example considered in Fig. 5 shows

three requests for output module zero, two for output module

one, and none for output module two. It can be seen that

the correct values (i.e., 3, 2 and 0) are returned to processors

and , respectively.

The submitted packets take two cycles to propagate through

each stage of the concentrator (one cycle to identify if the

packet is active, and another to determine where to route it)

and an additional clock cycle is required before the serial adder

generates the least signiﬁcant bit of the appropriate

value.

Thus the number of clock cycles required by the request count

hardware before path allocation can commence is

Hence, for a switch with and , the number

of clock cycles required is just 15.

IV. R

OUTING TAG ASSIGNMENT

A. Principles of Operation

The

processor generates a sequence of

values, one after every iteration of the path allocation

algorithm, commencing with

(the initial value of

determined by the request counting hardware) and decrement-

ing, after every iteration, in accordance with the

procedure, as paths are allocated to cells. Thus represents

the number of outstanding requests from input module

for

output module

. The relevant cells must be informed of the

path through the intermediate stage which they have been

assigned. The relevant information is obtained from the

output of the processor shown in Fig. 3. After each iteration

of the

algorithm, tokens are broadcast to cells

by the circuitry for routing tag assignment. A cell may receive

multiple tokens, but only the last token it receives contains

valid routing information. When the path allocation process is

complete, a special null token is broadcast to the cells which

have lost contention. The address generator then preﬁxes a

routing tag to each data cell whose value equals the token

value. Cells losing contention are marked as inactive.

The broadcasting is done by the copy network shown in

Fig. 4. This must copy tokens and perform routing in such a

way that the token required by the data cell at a given Batcher

network output in Fig. 4 appears at the corresponding copy

network output, and is thus received by the correct address

generator.

The copy network has

inputs and outputs. The

routing packet generators are connected to

of the copy

network inputs, and the remaining inputs are idle. Routing

packet generator

receives the value of from the

appropriate

processor.

The cells requesting output module

appear at outputs

through of the Batcher network, where

(as before)

The routing packet generator for output module

must forward the relevant routing tokens to the data cells at

outputs

through of the Batcher network.

The value of

is readily obtainable from the request

counting hardware.

Authorized licensed use limited to: DUBLIN CITY UNIVERSITY. Downloaded on July 19,2010 at 08:54:07 UTC from IEEE Xplore. Restrictions apply.

HTML Viewer

A three-stage ATM switch with cell-level path allocation

Summary (4 min read)

I. INTRODUCTION

A. The Objectives of a Path Allocation Algorithm

B. Basic Principles of the Path Allocation Algorithm

C. Implementation of the Algorithm

otherwise.

D. Implementation Issues

III. A FAST METHOD OF REQUEST COUNTING

A. Principles of Operation

TABLE I PATTERN OF REQUESTS AND POSSIBLE OUTCOME OF PATH ALLOCATION PROCESS

B. An Example of Routing Tag Assignment

V. PERFORMANCE OF THE PATH ALLOCATION ALGORITHM

VI. A DESIGN EXAMPLE

VII. CONCLUSIONS

Figures (8)

Citations

Additional excerpts

References

"A three-stage ATM switch with cell-..." refers background or methods in this paper

"A three-stage ATM switch with cell-..." refers background or methods in this paper

"A three-stage ATM switch with cell-..." refers background in this paper

Related Papers (5)