scispace - formally typeset
Search or ask a question

Showing papers on "Multistage interconnection networks published in 1997"


Journal ArticleDOI
TL;DR: The ATLANTA/sup TM/ switching architecture as discussed by the authors uses an innovative structure with ingress and egress buffers, where selective backpressure is applied from the fabric to the ingress cards, achieving "sharing" of the distributed buffers and buffer utilization comparable with a centralized shared memory switch.
Abstract: The ATLANTA/sup TM/ switching architecture has the following distinguishing characteristics: (1) is nonblocking, (2) scales modularly over a wide range of switching and buffering capacities using commonly available implementation technology, (3) achieves high buffer utilization while using distributed buffers, (4) has low complexity, and (5) provides a clear path for future growth in features. The ATLANTA architecture uses an innovative structure with ingress and egress buffers, where selective backpressure is applied from the fabric to the ingress cards. Selective backpressure makes the buffers in the ingress cards act as an extension of the output buffers in the fabric, achieving "sharing" of the distributed buffers and buffer utilization comparable with a centralized shared-memory switch. The advantage is that the majority of the buffers are in the ingress and egress port cards, and are implemented using low-cost off-the-shelf memories regardless of the total switching capacity. Different arrangements are possible for the switch fabric. In the smallest configuration, the fabric consists of a single standalone switching module; for larger switching capacities, the fabric is a modular three-stage memory/space/memory (MSM) arrangement. The ATLANTA architecture provides optimal support of multicast traffic. The ATLANTA chipset provides the complete set of building blocks for implementing ATM switches ranging in capacity from 622 Mb/s to 25 Gb/s. The chipset consists of four chips, two devices to be used in the fabric and two in the port cards. The port devices provide full-duplex ingress and egress functionality at 622 Mb/s port rate (plus the overhead due to the local header used internally to the switch). The physical interface to the incoming/outgoing lines supports the UTOPIA II multiplexing standard, and the port devices manage multiplexing/demultiplexing from/to a maximum of 30 subports per port. Although our current implementation of the architecture is targeted primarily to ATM, the principles behind the architecture are more general, and apply to IP switching and routing technologies.

109 citations


Journal ArticleDOI
TL;DR: It is demonstrated that middle buffering with virtual channels provides better performance than input bufferingwith virtual channels in multistage interconnection networks, two-dimensional meshes, and hypercubes.
Abstract: Wormhole switched input-buffered and middle-buffered routers with virtual channels are analyzed in this paper. Middle buffering refers to the placement of virtual channels between the demultiplexers and multiplexers of a crossbar switch. An analytical model for multistage interconnection networks using middle-buffered switches is developed. In addition, extensive simulation is conducted to assess the performance of the two buffering techniques in different network topologies. The study demonstrates that middle buffering with virtual channels provides better performance than input buffering with virtual channels in multistage interconnection networks, two-dimensional meshes, and hypercubes.

40 citations


Journal ArticleDOI
TL;DR: In this article, four wormhole multistage interconnection networks (MINs) are considered: traditional MINs, dilated MINs (DMINs), MINs with virtual channels (VMINs) and bidirectional MINs.
Abstract: Multistage interconnection networks (MINs) are a popular class of switch-based network architectures for constructing scalable parallel computers. Four wormhole MINs built from k/spl times/k switches, where k=2/sup i/ for some j, are considered in this paper: traditional MINs (TMINs), dilated MINs (DMINs), MINs with virtual channels (VMINs), and bidirectional MINs (BMINs). The first three MINs are unidirectional networks, and we show that the cube interconnection pattern can provide contention-free and channel-balanced partitioning of binary cube clusters. BMINs based on butterfly interconnection are essentially a fat tree, and their routing properties are described. Performance comparison among these four networks using simulation experiments is presented with respect to different network traffic patterns. Both DMINs (dilation two) and BMINs have a similar hardware complexity. We conclude that a two-dilated MIN outperforms the corresponding BMIN (or fat tree) for most of the traffic conditions and is a better choice for the design of scalable parallel computers.

35 citations


Journal ArticleDOI
TL;DR: An ATM switch architecture which uses only a single shift-register-type buffering element to store and queue cells, and within the same (physical) queue, switches the cells by organizing them in logical queues destined for different output lines is proposed.
Abstract: We introduce a new approach to ATM switching. We propose an ATM switch architecture which uses only a single shift-register-type buffering element to store and queue cells, and within the same (physical) queue, switches the cells by organizing them in logical queues destined for different output lines. The buffer is also a sequencer which allows flexible ordering of the cells in each logical queue to achieve any appropriate scheduling algorithm. This switch is proposed for use as the building block of large-stale multistage ATM switches because of low hardware complexity and flexibility in providing (per-VC) scheduling among the cells. The switch can also be used as scheduler/controller for RAM-based switches. The single-queue switch implements output queueing and performs full buffer sharing. The hardware complexity is low. The number of input and output lines can vary independently without affecting the switch core. The size of the buffering space can be increased simply by cascading the buffering elements.

32 citations


Proceedings ArticleDOI
25 May 1997
TL;DR: The proposed OPTIMA architecture and 640 Gb/s system can be applied to realize future broadband ATM networks and an 8/spl times/8 interconnection is realized.
Abstract: A Tb/s throughput ATM switching architecture, OPTIMA, is proposed for a quasi-non-blocking large switch. The switch uses hardware self-rearrangement with a three stage network, that is traffic control is automatically performed by hardware. The switch thus acts as a non-blocking switch. In addition, optical wavelength routing is used to avoid interconnection limitations. An 8/spl times/8 interconnection is realized that uses 8 wavelengths to transfer 10 Gb/s signals. A 640 Gb/s OPTIMA prototype is described. The proposed OPTIMA architecture and 640 Gb/s system can be applied to realize future broadband ATM networks.

28 citations


Journal ArticleDOI
TL;DR: The results of implementations on a 64-node SP-1 show that the proposed algorithm significantly outperforms the application-level broadcast primitives provided by currently existing collective communication libraries including the public domain MPI.
Abstract: Multistage interconnection networks are a popular class of interconnection architecture for constructing scalable parallel computers (SPCs). The focus of this paper is on the multistage network system which supports wormhole routed turnaround routing. Existing machines characterized by such a system model include the IBM SP-1 and SP-2, TMC CM-5, and Meiko CS-2. Efficient collective communication among processor nodes is critical to the performance of SPCs. A system-level multicast service, in which the same message is delivered from a source node to an arbitrary number of destination nodes, is fundamental in supporting collective communication primitives including the application-level broadcast, reduction, and barrier synchronization. This paper addresses how to efficiently implement multicast services in wormhole-routed multistage networks, in the absence of hardware multicast support, by exploiting the properties of the turnaround switching technology. An optimal multicast algorithm is proposed. The results of implementations on a 64-node SP-1 show that the proposed algorithm significantly outperforms the application-level broadcast primitives provided by currently existing collective communication libraries including the public domain MPI.

26 citations


Journal ArticleDOI
Laxmi N. Bhuyan1, Ravi Iyer, T. Askar, Ashwini K. Nanda, M. Kumar 
TL;DR: The authors develop self routing techniques for the various paths, present an algorithm to route a request along the path with minimum distance, and analyze the probabilities of a packet taking different routes to show that the MBN provides similar performance to a BMIN while offering simplicity in hardware and more fault-tolerance than a conventional MIN.
Abstract: A multistage bus network (MEN) is proposed to overcome some of the shortcomings of the conventional multistage interconnection networks (MINs), single bus, and hierarchical bus interconnection networks. The MBN consists of multiple stages of buses connected in a manner similar to the MINs and has the same bandwidth at each stage. A switch in an MBN is similar to that in a MIN switch except that there is a single bus connection instead of a crossbar. MBNs support bidirectional routing and there exists a number of paths between any source and destination pair. The authors develop self routing techniques for the various paths, present an algorithm to route a request along the path with minimum distance, and analyze the probabilities of a packet taking different routes. Further, they derive a performance analysis of a synchronous packet-switched MBN in a distributed shared memory environment and compare the results with those of an equivalent bidirectional MIN (BMIN). Finally, they present the execution time of various applications on the MBN and the BMIN through an execution-driven simulation. They show that the MBN provides similar performance to a BMIN while offering simplicity in hardware and more fault-tolerance than a conventional MIN.

22 citations


Journal ArticleDOI
TL;DR: It is shown that, for the proposed design, a higher degree of heterogeneity results in better performance than the baseline network and another banyan network based parallel interconnection network.
Abstract: This paper presents a new self-routing packet network called the plane interconnected parallel network (PIPN). In the proposed design, the traffic arriving at the network is shaped and routed through two banyan network based interconnected planes. The interconnections between the planes distribute the incoming load more homogeneously over the network. The throughput of the network under uniform and heterogeneous traffic requirements is studied analytically and by simulation. The results are compared with the results of the baseline network and another banyan network based parallel interconnection network. It is shown that, for the proposed design, a higher degree of heterogeneity results in better performance.

22 citations


Journal ArticleDOI
TL;DR: It is argued that intuitive optimizations for multistage switching networks may not be cost-effective, and changes to increase the network bandwidth at the root of the traffic convergence tree and to delay traffic convergence up until the final stages of the network are suggested.
Abstract: While multistage switching networks for vector multiprocessors have been studied extensively, detailed evaluations of their performance are rare. Indeed, analytical models, simulations with pseudosynthetic loads, studies focused on average-value parameters, and measurements of networks disconnected from the machine, all provide limited information. In this paper, instead, we present an in-depth empirical analysis of a multistage switching network in a realistic setting: We use hardware probes to examine the performance of the omega network of the Cedar shared-memory machine executing real applications. The machine is configured with 16 vector processors. The analysis suggests that the performance of multistage switching networks is limited by traffic nonuniformities. We identify two major nonuniformities that degrade Cedar's performance and are likely to slow down other networks too. The first one is the contention caused by the return messages in a vector access as they converge from the memories to one processor port. This traffic convergence penalizes vector reads and, more importantly, causes tree saturation. The second nonuniformity is the uneven contention delays induced by a relatively fair scheme to resolve message collisions. Based on our observations, we argue that intuitive optimizations for multistage switching networks may not be the most cost-effective ones. Instead, we suggest changes to increase the network bandwidth at the root of the traffic convergence tree and to delay traffic convergence up until the final stages of the network.

21 citations


Journal ArticleDOI
TL;DR: The designs of electrical and optical switch cores with Terabits of bisection bandwidth for Networks-of-Workstations (NOWs) are described and meet Shannon's lower bound on memory requirements.
Abstract: Principles for designing practical self-routing nonblocking N/spl times/N circuit-switched connection networks with optimal /spl theta/(N/spl middot/log N) hardware at the bit-level of complexity are described. The overall principles behind the architecture can be described as "Expand-Route-Contract". A self-routing nonblocking network with w-bit wide datapaths can be achieved by expanding the datapaths to w+z independent bit-serial connections, routing these connections through self-routing networks with blocking, and by contracting the data at the output and recovering the w-bit wide datapaths. For an appropriate redundancy z, the blocking probability can be made arbitrarily small and the fault tolerance arbitrarily high. By using efficient space domain concentrators, the architecture yields self-routing nonblocking switching networks with an optimal O(N/spl middot/log N) bits of memory or O(N/spl middot/log N/spl middot/log log log N) logic gates. By using a linear-cost time domain concentrator, the architecture yields self-routing nonblocking switching networks with an optimal /spl theta/(N/spl middot/log N) bits of memory or logic gates. These designs meet Shannon's lower bound on memory requirements, established in the 1950s. The number of stages of crossbars can match the theoretical minimum, which has not been achieved by previous self-routing networks. The architecture is feasible with existing electrical or optical technologies. The designs of electrical and optical switch cores with Terabits of bisection bandwidth for Networks-of-Workstations (NOWs) are described.

17 citations


Journal ArticleDOI
TL;DR: A new control design for single queue MINs is proposed that reduces the duration of the clock period by making use of output buffers and acknowledgments and develops an analytical model to compare its performance with the existing designs reported in the literature.
Abstract: Small switching elements are the key components of multistage interconnection networks (MINs) used in multiprocessors and in high speed switching fabrics. Clock design for synchronous MINs is an important issue. The existing models assume that the clock period consists of two parts. The control messages are transferred between switching stages during the first part, and the actual data transfer takes place during the second part. We propose a new control design for single queue MINs that reduces the duration of the clock period by making use of output buffers and acknowledgments. The reduction in the clock period comes from the addition of two-unit output buffers, introducing a sophisticated hardware control mechanism, and sacrificing the FIFO feature. We develop an analytical model to compare its performance with the existing designs reported in the literature. We validate our model with extensive simulation studies.

Book ChapterDOI
26 Aug 1997
TL;DR: This paper analyzes the general case of Multistage Interconnection Networks, made of k × k switches with finite, infinite or zero length buffers (unbuffered) and derives an approximation for the steady state distributions in the second stage and beyond.
Abstract: In this paper we analyze the general case of Multistage Interconnection Networks (MINs), made of k × k switches with finite, infinite or zero length buffers (unbuffered). The exact solution of the steady state distribution of the first stage is derived for all cases. We use this to get an approximation for the steady state distributions in the second stage and beyond. In the case of unbuffered switches we reach the known exact solution for all the stages of the MIN. Our results are validated by extensive simulations.

Journal ArticleDOI
TL;DR: Experimental results on the beam combination of signal- and power-beam arrays at a node stage for three-dimensional multistage interconnection networks show the feasibility of cascading operations in the planar integrated optics.
Abstract: We propose a configuration of planar integrated optics for three-dimensional multistage interconnection networks. To show the feasibility of cascading operations in the planar integrated optics, we present experimental results on the beam combination of signal- and power-beam arrays at a node stage. The beam-combination efficiency measured in the experiment is ∼42% of the theoretical limit.

Journal ArticleDOI
TL;DR: A buffer management technique called delayed pushout is applied to a multistage ATM switch in which shared-memory switching elements are arranged in a banyan topology, and a synergy emerges when pushout, backpressure, and this threshold are all employed together.
Abstract: We study a multistage ATM switch in which shared-memory switching elements are arranged in a banyan topology. By "shared-memory," we mean that each switching element uses output queueing and shares its local cell buffer memory among all its output ports. We apply a buffer management technique called delayed pushout that was originally designed for multistage ATM switches with hierarchical topologies. Delayed pushout combines a pushout mechanism, for sharing memory efficiently among queues within the same switching element, and a backpressure mechanism, for sharing memory across switch stages. The backpressure component has a threshold to restrict the amount of sharing between stages. A synergy emerges when pushout, backpressure, and this threshold are all employed together. Using a computer simulation of the switch under bursty traffic, we study delayed pushout as well as several simpler pushout and backpressure schemes under a variety of traffic conditions. Of the five schemes we simulate, delayed pushout is the only one that performs well under all load conditions.

Proceedings ArticleDOI
11 Aug 1997
TL;DR: An asynchronous tree-based multicasting algorithm is developed in which deadlocks are prevented by serializing the initiations of branching operations that have potential for creating deadlocks.
Abstract: In this peeper, we propose a tree-based multicasting algorithm for Multistage Interconnection Networks. We first analyze the necessary conditions for deadlocks in MINs. Based on these observations, an asynchronous tree-based multicasting algorithm is developed in which deadlocks are prevented by serializing the initiations of branching operations that have potential for creating deadlocks. The serialization is done using a technique based on grouping of the switching elements. The preliminary simulation results are encouraging as it lowers the latency by almost a factor of 4 when compared with the software multicasting approach proposed earlier.

Proceedings ArticleDOI
08 Jun 1997
TL;DR: A new architectural design of a very large next generation gigabit switch, called BATMAN (Banyan ATM Architectural Network), is introduced, which allows for the modular growth of its size from small to very large dimensions without sacrificing its overall delay/throughput performance.
Abstract: In spite of the recent advances of technology, the limitation on the switching size is the primary implementation constraint. Practical dimensions are limited to the small size of a module. To build a larger dimension, more than one module is interconnected in a multistage configuration. Moreover, internal switching fabrics of these interconnected modules are usually speed-up to a higher data rate in order to reduce excessive queuing delay. In this paper, a new architectural design of a very large next generation gigabit switch, called BATMAN (Banyan ATM Architectural Network), is introduced. The proposed switch has the structure of an N/spl times/N Banyan network, and recursively followed by 2/sup k/ groups of shared buffers, and N/2k 2/sup k//spl times/2/sub k/ Banyan networks, where k is incremented from 1 to [log/sub 2/N/log/sub 2//sup /spl rho//], and /spl rho/ is the speed-up factor. In its simplest form, it has the structure of an N/spl times/N Banyan network, N/4 groups of shared buffers, and N/4 4/spl times/4 Banyan routing networks. In each Banyan network module, universal packet timeslot (UPTS) is adopted. Because the hardware complexity of the proposed switch architecture is low, the architecture allows for the modular growth of its size from small to very large dimensions without sacrificing its overall delay/throughput performance.

Proceedings ArticleDOI
19 Mar 1997
TL;DR: A timed Petri net model is used to derive the performance of buffered Banyan networks, in which messages may also be multicasted, and the automatic generation of timedPetri net models is possible for arbitrary destination patterns of the packets.
Abstract: Multistage Banyan networks are frequently proposed as connections in multiprocessor systems. There exist several studies to determine the performance of networks in which messages are unicasted. (One processor sends a message to one and only one other processor.) In this paper, a timed Petri net model is used to derive the performance of buffered Banyan networks, in which messages may also be multicasted (One processor can send a message to more than one other processor). We consider a Banyan network with 2/spl times/2-switches and the two cases of complete and partial broadcasting within the switching elements, An algorithm is presented to calculate the destination distribution in all network stages for arbitrary destination patterns of incoming uniform packet traffic. Thus, the automatic generation of timed Petri net models is possible for arbitrary destination patterns of the packets. The dependency upon the network size is also considered.

Journal ArticleDOI
TL;DR: A unified model for analysing multistage interconnection networks with multi-queue buffered strategies shows that the DAFC scheme has the best performance over all the four buffer allocation schemes under both uniform and non-uniform load.
Abstract: This paper presents a unified model for analysing multistage interconnection networks with multi-queue buffered strategies. Buffering strategies include SAFC (Statically Allocated Fully Connected), SAMQ (Statically Allocated Multi-Queue), DAMQ (Dynamically Allocated Multi-Queue), and DAFC (Dynamically Allocated Fully Connected) schemes. We develop a unified model to evaluate the performance of all these buffer allocation schemes under the uniform and non-uniform traffic patterns. The analytical model is validated through extensive simulations. Using the unified model, we conducted performance comparisons for the four buffer allocation schemes under both uniform and non-uniform traffic load. It is shown that the DAFC scheme has the best performance over all the four buffer allocation schemes under both uniform and non-uniform load.

Proceedings ArticleDOI
12 Jan 1997
TL;DR: The scalable coherent interface (SCI) defines a high-speed interconnect system that provides a coherent memory system that specifies a topology-independent communication protocol with the possibility of connecting up to 64 K nodes.
Abstract: The scalable coherent interface (SCI) defines a high-speed interconnect system that provides a coherent memory system. It specifies a topology-independent communication protocol with the possibility of connecting up to 64 K nodes. SCI switches are the key components in building large SCI systems effectively. An SCI switch which uses several internal buses is studied as well as more complex systems composed of several switches. Computer simulations are used to compare the different models and to determine system parameters.

01 Jan 1997
TL;DR: This paper studied a more general class of networks, which is called (m / 1)-stage d-nary bit permutation networks, and characterized the equivalence of such networks by sequence of positive integers.
Abstract: In recent years, many multistage interconnection networks using 2 x 2 switching elements have been proposed for parallel architectures. Typical examples are baseline networks, banyan networks, shuffle-exchange networks, and their inverses. As these networks are blocking, such networks with extra stages have also been studied extensively. These include Benes networks and Δ ○+ Δ' networks. Recently, Hwang et al. studied k-extra-stage networks, which are a generalization of the above networks. They also investigated the equivalence issue among some of these networks. In this paper, we studied a more general class of networks, which we call (m + 1)-stage d-nary bit permutation networks. We characterize the equivalence of such networks by sequence of positive integers.

Proceedings ArticleDOI
22 Jun 1997
TL;DR: In this article, the two bounce free-space arbitrary interconnection architecture is introduced, which combines the global optical interconnection with the minimum nonblocking multistage interconnection network, the Benes network, to achieve arbitrary interconnections across a multichip backplane.
Abstract: The two bounce free-space arbitrary interconnection architecture is introduced. It is requires 3 stages of local electronic routing and 2 passes, or bounces, through a common retro-reflective optical system. The concept combines the global optical interconnection with the minimum nonblocking multistage interconnection network, the Benes network, to achieve arbitrary interconnections across a multichip backplane. The arbitrary interconnection requires only one additional pass through the optical system. The architecture is experimentally validated with a optical module and a fiber coupled LED and detector array to simulate the smart pixel I/O placement in the backplane of the module. The architecture is further evaluated using VCSEL arrays and a CCD camera for resolution and registration measurements.

Proceedings ArticleDOI
08 Jun 1997
TL;DR: It is shown that if the set of input connection requests is ordered, the broadcast Clos network is non-blocking and route assignment can be done by using the rank of each connection request, and the proposed copy network is the generalization of Lee's architecture (1988).
Abstract: A generalized non-blocking copy network based on a broadcast Clos (1953) network is proposed. We show that if the set of input connection requests is ordered, the broadcast Clos network is non-blocking and route assignment can be done by using the rank of each connection request. Packet replications and routing are achieved by the generalized interval splitting algorithm. We show that the broadcast Clos network can be considered as the cascade combination of a reverse omega network and a broadcast omega network. The construction of copy network is therefore no longer limited to 2/spl times/2 switching elements. By recursively constructing the reverse omega and the omega networks using 2/spl times/2 switching elements, we show that the proposed copy network is the generalization of Lee's architecture (1988).

Proceedings ArticleDOI
09 Apr 1997
TL;DR: A newly proposed large-scale ATM switch called the cross-path switch has been shown to be capable of handling multirate traffic efficiently and it is observed that, to achieve the same throughput and loss requirement, the second architecture may require fewer switching elements than the first one.
Abstract: A newly proposed large-scale ATM switch called the cross-path switch has been shown to be capable of handling multirate traffic efficiently. We study two replication approaches to enhance the switch to support multicasting. The first approach replicates multicast cells at both the input and output stages, while the second one replicates cells at the input stage only. A feasible configuration for each scheme is considered and the effect of multicast traffic on the switch performance in terms of the throughput and cell loss probability is studied. We observed that, to achieve the same throughput and loss requirement, the second architecture may require fewer switching elements than the first one.

Journal ArticleDOI
TL;DR: A new analytical model on the blocking probability of the three-stage Clos (1953) network is presented that can more accurately describe the blocking behavior of the network and is consistent with the deterministic nonblocking condition.
Abstract: We present a new analytical model on the blocking probability of the three-stage Clos (1953) network. Due to the effect of approximations, a common problem with previously proposed analytical models is that they may not be very accurate in some cases. In particular, the blocking probability in these models contradicts the well-known deterministic nonblocking condition for the Clos network. The most notable feature of the newly proposed model is that it can more accurately describe the blocking behavior of the network and is consistent with the deterministic nonblocking condition.

Proceedings ArticleDOI
08 Jun 1997
TL;DR: A high-performance buffered-Banyan switch which encompasses multiple input-queueing as its buffering strategy is presented and described, and simulation results are given to demonstrate its throughput, mean waiting time and cell-loss performance considering different switch and buffer sizes.
Abstract: Multistage interconnection networks (MINs) are very popular in ATM switching since they can achieve high-performance switching and are easy to implement and expand due to their modular design. In this paper we present and describe in detail a high-performance buffered-Banyan switch which encompasses multiple input-queueing as its buffering strategy. We call this switching architecture Dual-Banyan switch. Simulation results are given to demonstrate its throughput, mean waiting time and cell-loss performance considering different switch and buffer sizes. We further compare it to the simple, single-queue buffered Banyan network, assuming, for reasons of fairness, the same total buffer capacity with respect to uniform and non-uniform traffic patterns.

Journal ArticleDOI
TL;DR: Two multichannel time slot sorters which sort N/Sup 2/ time-division multiplexed (TDM) optical inputs, arranged as N frames with N time slots per frame using O(Nlog/sup 2/N) optical switch elements are proposed.
Abstract: The general time-space-time switching problem in telecommunications requires the use of multichannel time slot interchangers. We propose two multichannel time slot sorters which sort N/sup 2/ time-division multiplexed (TDM) optical inputs, arranged as N frames with N time slots per frame using O(Nlog/sup 2/N) optical switch elements. The TDM optical inputs are sorted in place without expanding the space-time fabric into a space-division switch. The hardware components used are 2/spl times/2 optical switches (LiNbO/sub 3/ directional couplers) and optical delay lines connected in a feedforward fashion. Two space-time variants of the spatial odd-even merge algorithm are used to design the sorters. By maintaining the number of shift-exchange operations invariant at each stage, the proposed sorters use fewer switches than previously proposed sorters using switches with feedback line delays. The use of local control at each 2/spl times/2 switch makes the proposed sorters more practical for high-speed optical inputs than Benes-based time slot permuters with global control and high latency, which affects interframe distance. Both time slot sorters support pipelining of input frames and sorted outputs are available at each time slot after an initial frame delay. The proposed sorters find practical application in the time-domain equivalents of space-division, nonblocking, self-routing packet switches using the sort-banyan architecture, such as the Starlite switch, Sunshine switch, etc.

Proceedings ArticleDOI
M. Jurczyk1
11 Aug 1997
TL;DR: An analytical upper bound of the achievable network bandwidth under nonuniform traffic patterns is derived and compared to simulation results and it is discussed how central memory buffered switch boxes can be efficiently changed into higher order HOL-blocking switch boxes through only minor changes in the switch box control path.
Abstract: Nonuniform traffic can degrade the overall performance of multistage interconnection networks substantially. This performance degradation was traced back to higher order head-of-line blocking (higher order HOL-blocking) effects within the network in the literature. This paper further elaborates on higher order HOL-blocking networks, on their performance under nonuniform traffic patterns, and on methods on how to efficiently implement switch boxes to construct higher order HOL-blocking networks. An analytical upper bound of the achievable network bandwidth under nonuniform traffic patterns is derived and compared to simulation results. Furthermore, it is discussed how central memory buffered switch boxes can be efficiently changed into higher order HOL-blocking switch boxes through only minor changes in the switch box control path. With those switch boxes, high network performance under nonuniform traffic patterns can be achieved with regular hardware effort.

Journal ArticleDOI
TL;DR: This paper proposes a new request combining based architecture to reduce the hot spot performance degradation in multistage interconnection networks, referred to as interconnection network front-end controller combining (IN-FEC).

Proceedings ArticleDOI
03 Nov 1997
TL;DR: This work proposes a high capacity switch network called the Multi-channel ATM Switch with Crossbar Oriented Network (MASCON), implementing this switch network as a custom ASIC, and construct multi-stage interconnected networks (MINs) using the multi-channel concept to give greater throughput than that possible from MINs constructed from single-channel modules.
Abstract: We propose a high capacity switch network called the Multi-channel ATM Switch with Crossbar Oriented Network (MASCON). Implementing this switch network as a custom ASIC, we construct multi-stage interconnected networks (MINs) using the multi-channel concept to give greater throughput than that possible from MINs constructed from single-channel modules. Flexible multi-channel switching is supported in which any k ports may be grouped logically to form a higher bandwidth pipe. Multi-channel switching results in improved cell loss and delay performance due to the economy of scale in aggregating shared resources. MASCON is an internally non-blocking switch. Sixteen inputs/outputs are implemented with a 622 Mbps port speed at each port for a total module capacity of 10 Gbps. A fully shared buffer is incorporated into the design. MASCON's shared buffering can interact well with MIN input and output buffering through the use of our backpressure flow-control scheme.

Proceedings ArticleDOI
28 Jan 1997
TL;DR: The SSS-PBSF chip uses the PBSF connection structure which can obtain a higher bandwidth than that of crossbar with connecting banyan networks in a 3D direction and solve the pin-limitation problem.
Abstract: A high speed switch is a critical component of multiprocessors. Multistage interconnection network (MIN) has been utilized as a switch for connection processors and memory modules in multiprocessors. Unlike the crossbar, it consists of small switching elements, and provides a high bandwidth with relatively small hardware. Most of traditional MINs are blocking networks and packets are transferred in the store-and-forward manner between switching elements with bit-parallel (8-64bits) lines. Since the width of communication paths and transferred manner cause pin-limitation problems and complicated structure, the high density implementation and high speed clock is not utilized. In order to solve these problems, we implemented the SSS-PBSF chip. This switch uses the PBSF connection structure which can obtain a higher bandwidth than that of crossbar with connecting banyan networks in a 3D direction. A simple serial synchronized (SSS) style control mechanism is adopted both for high speed operation and solving the pin-limitation problem.