A three-stage ATM switch with cell-level path allocation
Summary (4 min read)
I. INTRODUCTION
- T HE THROUGHPUT achievable (in bits/second) in an asynchronous transfer mode (ATM) switch depends heavily on the process used to fabricate it.
- Some method of routing is then necessary, to select among the available paths from source to destination, through the second stage of the switch.
- In one approach (call-level routing), all cells belonging to a virtual connection ("call") are allocated the same route.
- The algorithm described here requires fewer iterations than that in [6] , does not require input buffering (which degrades the throughput), unlike [7] , and is fairer than that presented in [5] , in addition to readily supporting intermediate channel grouping.
A. The Objectives of a Path Allocation Algorithm
- There are routes from each input module to each intermediate module.
- There are routes from each intermediate module to each output module.
- The authors must choose, for every input cell (if possible) an intermediate switch module through which to pass on the way to the selected destination, such that no input module attempts to route more than cells via any intermediate module, and no intermediate module attempts to route more than cells to any output module, in any one time slot.
- It will be assumed, for simplicity, that all input ports of the switch operate at the same rate, and thus that the duration of the time slot (the interval between successive cell boundaries) is the same for every cell.
B. Basic Principles of the Path Allocation Algorithm
- A new and efficient algorithm will now be described.
- Note that and need only be local to the input module.
- The procedure determines the capacity available from input module to output module via intermediate switch module (i.e., the minimum of and .
- The number of requests which can be satisfied is equal to the minimum of the number of requests outstanding and the available capacity.
- A parallel implementation requires multiple processors, each executing the procedure for a different set of procedure parameters, subject to the following constraints: no two processors shall simultaneously require access to the same quantity.
C. Implementation of the Algorithm
- Suppose that there are modules in each stage of the switch.
- The processor in the th row (numbered from the right) and th column (numbered from the bottom) of the array is labeled .
- The values stored in the processor array are shown in Fig. 2 (a) for the case where .
- The algorithm then requires iterations (iterations zero through .
- Specifically, processor is initialized as follows: otherwise.
otherwise.
- An examination of the operation of the resulting algorithm reveals that the processors in row or higher and in column or above never modify the and values they receive, and thus may be replaced by simple delays.
- If , each column requires additional registers.
- Hence a relatively high clock speed will be required in the array, so as to complete iterations of the algorithm in the time available (which is less than the duration of one time slot).
- A switch with intermediate channel grouping affords the possibility of reducing cell loss probability by increasing and , rather than by increasing Thus, the proposed algorithm is fairer than that described in [5] .
D. Implementation Issues
- The processor must execute the procedure, and thus must perform two types of operation: 1) find the minimum of three numbers; 2) perform three subtractions.
- The and values are obtained from (and forwarded to) adjacent processors.
- A fast implementation using bit-serial arithmetic, and which does not require the calculation of the minimum of three numbers, was described in [10] .
- The input and output port controllers must perform the necessary bit rate adaptation (and multiplexing/demultiplexing) for links operating at other rates, so that cells traverse the switch fabric at a common rate.
- This requires the path allocation algorithm to preferentially allocate paths to cells with the CLP bit set to zero.
III. A FAST METHOD OF REQUEST COUNTING
- Suitable hardware to simultaneously calculate (the number of requests from input module for output module for all values of will now be described.
- The execution time for this hardware is clock cycles.
- Under these circumstances, it may readily be shown that where is the number of data cells requesting output module , and is fixed, since the Batcher network processes only requests from input module .
- A total of control packets is thus simultaneously launched into the concentrator, and these are routed to the serial adders at outputs zero through without blocking.
- The concentrated list of values is then read by these serial adders, the lower input (as shown in Fig. 5 ) being inverted.
A. Principles of Operation
- The processor generates a sequence of values, one after every iteration of the path allocation algorithm, commencing with (the initial value of determined by the request counting hardware) and decrementing, after every iteration, in accordance with the procedure, as paths are allocated to cells.
- Thus represents the number of outstanding requests from input module for output module .
- When the path allocation process is complete, a special null token is broadcast to the cells which have lost contention.
- During each iteration of the algorithm, submits a routing packet to the network, to be broadcast to address generators through containing in the data field the token address, i.e., the address of the intermediate switch module through which a route has been allocated.
- Two bits (one each from the upper and lower address), in addition to the activity bit, must be processed at each node of the network.
TABLE I PATTERN OF REQUESTS AND POSSIBLE OUTCOME OF PATH ALLOCATION PROCESS
- Changes after the first iteration of the algorithm [16] .
- Hence, on subsequent iterations of the algorithm, there is no need to distribute the lower address, so that the header on the routing packet may be shortened, reducing the delay through the copy network.
B. An Example of Routing Tag Assignment
- Table I indicates the number of cells from input module 0 which have requested each of the four output modules and a possible pattern of path allocations which might be generated by the processors.
- The copy network must be initialized before path allocation commences.
- After each iteration of the path allocation algorithm (i.e., iterations 0, 1, 2 and 3), the corresponding iteration of the routing tag assignment algorithm is performed (iterations and respectively).
- Also shown are the lower address bits processed by each switch element.
- The token address is not broadcast, except during the first iteration.
V. PERFORMANCE OF THE PATH ALLOCATION ALGORITHM
- The performance of a three-stage switch using the celllevel path allocation algorithm described above will now be evaluated.
- The simulation model is based on the following assumptions.
- 3) The destination of each cell is drawn from a uniform distribution; all output modules receive the same load.
- The probability of an individual cell being lost is obviously much less, but cannot be evaluated without knowing how the probability of a given cell losing contention, and the corresponding probabilities for the cells with which it contends, are correlated.).
- These graphs can be used to find the maximum number of input ports which a switch with a given capacity in the intermediate stage can support, for a given probability of cell loss during path allocation.
VI. A DESIGN EXAMPLE
- The resulting switch has a cell loss probability (due to loss of contention during path allocation) below 10 even in the presence of a nonuniform load [14] .
- The input modules must accept data from the address generators in Fig. 4 , and so must have 128 inputs, even though at most 96 data cells will be present.
- One execution of the procedure will require nine clock cycles, using the efficient implementation described in [10] .
- The number of processors required is 1024 (32 32), but the IC count should be relatively low because of the simplicity of the processor design.
- The complexity of the path allocation circuitry is relatively high, but the switch modules in the first and second stages are of simple design, because of the avoidance of output contention.
VII. CONCLUSIONS
- A new algorithm for path allocation in three-stage broadband networks has been described.
- A complete hardware implementation of this algorithm has been presented, including a method for generating the initial data required by the algorithm, and for forwarding the results to each cell at the input side of the switch, in the form of a routing tag.
- The operating speed required of the design appears within the capabilities of VLSI technology in the short term.
- The resulting switch offers the delay performance of an output-buffered switch, unlike either three-stage switches featuring call-level routing, which buffer the cells at each stage, or those featuring input buffers.
- It avoids the fairness problem intrinsic to the "cell scheduling" algorithm of the Growable Packet Switch [5] .
Did you find this useful? Give us your feedback
Citations
16 citations
Additional excerpts
...This is our core expertise (cf [3])....
[...]
2 citations
References
387 citations
155 citations
145 citations
"A three-stage ATM switch with cell-..." refers background or methods in this paper
...The author is currently investigating the practical implementation of the path allocation circuitry, with a view to confirming that the overall complexity of the switch is no greater than that of competing architectures, such as those in [5-9]....
[...]
...Note that, unlike the cell scheduling algorithm in [5], this algorithm attempts to allocate a path to each cell at the switch inputs during every iteration of the algorithm....
[...]
...Many authors have proposed such switches [5-9]....
[...]
...[5] K....
[...]
...Cell-level path allocation has been proposed by a number of authors [5-7]....
[...]
132 citations
"A three-stage ATM switch with cell-..." refers background or methods in this paper
...Note that, unlike the cell scheduling algorithm in [5], this algorithm attempts to allocate a path to eachcell at the switch inputs duringevery iteration of the algorithm....
[...]
...The author is currently investigating the practical implementation of the path allocation circuitry, with a view to confirming that the overall complexity of the switch is no greater than that of competing architectures, such as those in [5]–[9]....
[...]
...It avoids the fairness problem intrinsic to the “cell scheduling” algorithm of the Growable Packet Switch [5]....
[...]
...[5], in addition to readily supporting intermediate channel grouping....
[...]
...has been proposed by a number of authors [5]–[7]....
[...]
74 citations
"A three-stage ATM switch with cell-..." refers background in this paper
...The author is currently investigating the practical implementation of the path allocation circuitry, with a view to confirming that the overall complexity of the switch is no greater than that of competing architectures, such as those in [5]–[9]....
[...]
...Many authors have proposed such switches [5]–[9]....
[...]