scispace - formally typeset
Search or ask a question

Showing papers on "Multistage interconnection networks published in 1996"


Proceedings ArticleDOI
Robert W. Horst1
15 Apr 1996
TL;DR: A new class of scalable topologies for constructing large networks without introducing loops that could cause deadlocks are proposed, called "fractahedrons", which are deadlock-free and reduce the maximum link contention compared to other networks.
Abstract: This paper examines the problems of deadlock avoidance in multistage networks, and proposes a new class of scalable topologies for constructing large networks without introducing loops that could cause deadlocks. The new topologies, called "fractahedrons," are deadlock-free and reduce the maximum link contention compared to other networks. The use of fractahedral topologies is illustrated by various configurations of 6-port ServerNet routers. The properties of fractahedral networks are compared with networks configured as a mesh, hypercube or fat tree.

83 citations


Journal ArticleDOI
TL;DR: A novel architecture that offers the flexibility of implementing widely varying motion-estimation algorithms by employing multiple processing elements which communicate with multiple memory banks via a multistage interconnection network is described.
Abstract: This paper describes a novel architecture that offers the flexibility of implementing widely varying motion-estimation algorithms. To achieve real-time performance, we employ multiple processing elements (PE's) which communicate with multiple memory banks via a multistage interconnection network. Three different block-matching algorithms-full search, three-step search, and conjugate-direction search-have been mapped onto this architecture to illustrate its programmability. We schedule the desired operations and design the required data-flow in such a way that processor utilization is high and memory bandwidth is at a feasible level. The details regarding the flow of the pixel data and the scheduling and allocation of the desired ALU operations (which pixels are processed on which processors in which clock cycles) are described in the paper. We analyze the performance of the proposed architecture for several different interconnection networks and data-memory organizations.

59 citations


Proceedings ArticleDOI
24 Mar 1996
TL;DR: A parallel routing algorithm is developed by solving a set of Boolean equations which are derived from the connection requests and the symmetric structure of the Benes-Clos network and can be applied to the Clos network if the number of central modules is a power of two.
Abstract: A new parallel algorithm for route assignment in Benes-Clos network is studied. In packet switching systems, switch fabrics must be able to provide internally conflict-free paths simultaneously and to accommodate packets requesting for connections in real-time as they arrive at the inputs. Most known sequential route assignment algorithms, such as the looping algorithm for Benes (1962) networks or Clos (1953) networks, are designed for circuit switching systems where the switching configuration can be rearranged at a relatively low speed. Most existing parallel routing algorithms are not practical for packet switching because they either assume the set of connection requests is a full permutation or fail to deal with output contentions among the set of input packets. We develop a parallel routing algorithm by solving a set of Boolean equations which are derived from the connection requests and the symmetric structure of the Benes network. Our approach can handle both the partial permutations and the output contention problem easily. The time complexity of our algorithm is O(log/sup 2/N), where N is the network size. Furthermore, we extend the algorithm and show that it can be applied to the Clos network if the number of central modules is a power of two.

49 citations


Proceedings ArticleDOI
15 Apr 1996
TL;DR: The adaptive source routing (ASR) method is described which is a first attempt to combine adaptive routing and source routing methods and a route generation algorithm that determines maximally adaptive routes in multistage networks is described.
Abstract: We describe the adaptive source routing (ASR) method which is a first attempt to combine adaptive routing and source routing methods. In ASR, the adaptivity of each packet is determined at the source processor. Every packet can be routed in a fully adaptive or partially adaptive or non-adaptive manner, all within the same network at the same time. We evaluate and compare performance of the proposed adaptive source routing networks and oblivious routing networks by simulations. We also describe a route generation algorithm that determines maximally adaptive routes in multistage networks.

42 citations


Proceedings ArticleDOI
23 Oct 1996
TL;DR: Simulation studies indicate that improvement in broadcast/multicast latency up to a factor of 4 is feasible using the new approach and this approach is able to implement multicast with reduced latency as the number of destinations increases beyond a certain number.
Abstract: This paper proposes a new approach for implementing fast multicast and broadcast in multistage interconnection networks (MINs) with multiport encoded multidestination worms. For a MIN with k/spl times/k switches and n stages such worms use n header flits each. One flit is used for each stage of the network and it indicates the output ports to which a multicast message must be replicated. A single multiport encoded worm has the capability to cover a large number of destinations with a single communication startup. A switch architecture is proposed for implementing multidestination worms without deadlock. Grouping algorithms of varying complexity are presented to derive the associated multiport encoded worms for a multicast to an arbitrary set of destinations. Using these worms a multinomial tree-based scheme is proposed to implement the multicast. This approach significantly reduces broadcast/multicast latency compared to schemes using unicast messages. Simulation studies indicate that improvement in broadcast/multicast latency up to a factor of 4 is feasible using the new approach. Interestingly, this approach is able to implement multicast with reduced latency as the number of destinations increases beyond a certain number.

40 citations



Proceedings ArticleDOI
24 Mar 1996
TL;DR: An approach for the analysis of multistage switching networks with a variety of buffer sharing strategies that allows non-uniform bursty traffic, and it features a computational complexity which is independent of the buffer size.
Abstract: This paper describes an approach for the analysis of multistage switching networks with a variety of buffer sharing strategies. The approach allows non-uniform bursty traffic, and it features a computational complexity which is independent of the buffer size. We decompose the complex shared buffer analysis problem into an equivalent dedicated buffer problem through an iterative series of buffer size approximations. Results are compared with simulations and are used to quantify the performance differences of several buffer sharing policies.

21 citations


Proceedings ArticleDOI
24 Jun 1996
TL;DR: This paper investigates the steady state throughput of single buffered multistage interconnection networks using the so called relaxed Uocktng model, where a message is deleted, if the receiving buffer is occupied, and gives tight upper and lower bounds on the steadyState dist,nbution of I/O-sequences.
Abstract: Multistage networks (MIN) are used as interconnection structure in a large number of applications. Their performance is mainly determined by their communication throughput which, in most cases, has to be investigated by time-consuming simulations or approximated by simple models. In this paper, we investigate the steady state throughput of single buffered multistage interconnection networks using the so called relaxed Uocktng model, where a message is deleted, if the receiving buffer is occupied. We clerive upper and lower bounds on the throughput of MINs of arbitrary height and show that the throughput of singlebuffered networks is an order of magnitude higher than the throughput of non-buffered MINs. In detail we show, that the throughput is @(n/@) if n is the size of the network. Because the time-dynamic of finite buffered MINs defies each marcovor semi-marcov approach, we analyze the the equilibrium-situation of the network and give tight upper and lower bounds on the steady state dist,nbution of I/O-sequences.

14 citations


Journal ArticleDOI
TL;DR: A queueing model for performance analysis of finite-buffered multistage interconnection networks and various design decisions using this model are drawn with respect to delay, throughput, and system power.
Abstract: We present a queueing model for performance analysis of finite-buffered multistage interconnection networks. The proposed model captures network behaviour in an asynchronous communication mode and is based on realistic assumptions. A uniform traffic model is developed first and then extended to capture nonuniform traffic in the presence of a hot-spot. Throughput and delay are computed using the proposed model and the results are validated via simulation. The analysis is extended to predict performance of MIN-based multiprocessors. The effects of buffer length, switch size, and the maximum allowable outstanding requests on the system performance are discussed. Various design decisions using this model are drawn with respect to delay, throughput, and system power.

12 citations


Journal ArticleDOI
TL;DR: A general design technique for high-performance fault-tolerant networks in multiprocessor systems is applied to some specific networks, i.e., the CIN (cube interconnection network) and the d-dilated CIN, to show how to maximize the number of redundant paths.
Abstract: We propose a general design technique for high-performance fault-tolerant networks in multiprocessor systems. The proposed technique called extra link multistage interconnection network (ELMIN) can distribute the load evenly and tolerate faults by providing maximal independent paths at the expense of some additional hardware (extra links), which is much smaller than most of the networks proposed earlier. In this paper, the technique is applied to some specific networks, i.e., the CIN (cube interconnection network) and the d-dilated CIN, to show how to maximize the number of redundant paths. The routing algorithms for the ELMIN have the same simplicity as that of the original MIN. We analyze the performance of the proposed networks and also simulate them along with several others under the buffered and unbuffered packet switching environment. Both analysis and simulation show the high performance of the proposed networks without regard to the presence of faults.

12 citations


Journal ArticleDOI
TL;DR: A new characterization of the baseline network is presented and a heuristic is proposed for finding XOR-matrices by determining the constraints of each template-matrix and solving a set of simultaneous equations for each row.
Abstract: Finding general XOR-schemes to minimize memory and network contention for accessing arrays with arbitrary sets of data templates is presented. A combined XOR-matrix is proposed together with a necessary and sufficient condition for conflict-free access. We present a new characterization of the baseline network. Finding an XOR-matrix for combined templates is shown to be an NP-complete problem. A heuristic is proposed for finding XOR-matrices by determining the constraints of each template-matrix and solving a set of simultaneous equations for each row. Evaluation shows significant reduction of memory and network contention compared to interleaving and to static row-column-diagonals storage.

Journal ArticleDOI
TL;DR: The objective of this paper is to develop an accurate model for MINs using finite output buffered SEs and operating in the presence of nonuniform traffic patterns and it is shown that the proposed analytical model is much more accurate than existing models.
Abstract: The performance of Multistage Interconnection Networks (MINs) constructed from output buffered switching elements (SE) is higher than those having input buffered SEs. Many of the existing analytical models for output buffered MINs assume uniform traffic and infinite buffers at each output port of an SE. The models are not realistic because, in practice buffers are finite and the traffic may not be uniform. Moreover, because of simplifying assumptions, the models do not produce accurate results. For the purpose of network design and proper buffer dimensioning, it is important to develop an accurate analytical model under realistic traffic patterns and finite buffered SEs. The objective of this paper is to develop an accurate model for MINs using finite output buffered SEs and operating in the presence of nonuniform traffic patterns. It is shown that the proposed analytical model is much more accurate than existing models.

Journal ArticleDOI
TL;DR: The throughput of this approach in designing nonblocking networks is computed and it is shown that a fixed-path-routing buffered network will have a throughput even lower than that of an unbuffered network.
Abstract: Two implementation styles (buffered and unbuffered) have been used for constructing multistage interconnection networks for ATM switching. Conventional studies have shown that an unbuffered network, while having a simpler design, produces a lower throughput than a buffered network. But most of these studies, based on the assumption that each cell is routed independently (i.e. per-cell routing), ignored the out-of-sequence transmission problem of a buffered network in a virtual-channel environment. One way to keep the packet sequence for a buffered network without adding additional hardware is to fix the path for each virtual channel. We compute the throughput of this approach in designing nonblocking networks and compare it with that of the unbuffered approach. The base of our comparison is log/sub d/(N,e,p) networks. The results show that a fixed-path-routing buffered network will have a throughput even lower than that of an unbuffered network.

Journal ArticleDOI
TL;DR: The results show that recycling is a practicable option for multicasting and that the delay does not increase drastically when cells are recycled.

Journal ArticleDOI
TL;DR: This work presents what it believes to be the first analytical model that allows calculation of the bandwidth of the general class of unbuffered, packet-switched, multipath, multistage networks.
Abstract: Because of their ability to tolerate faults, multipath, multistage networks provide useful interconnection schemes for large-scale parallel computers. However, the analytical models that have been used to analyze the performance of Banyan networks cannot be used to evaluate the performance of multipath networks. We present here what we believe to be the first analytical model that allows calculation of the bandwidth of the general class of unbuffered, packet-switched, multipath, multistage networks. The equations yielded by the model can be solved either exactly or by Monte Carlo approximation. The model agrees well with the results of a more complex simulation and provides a first step towards solution of the open problem of modeling of buffered, packet-switched, multipath, multistage networks.

Journal ArticleDOI
TL;DR: A basic algorithm is developed that balances the established connections among middle-stage switches by performing a small number of rearrangements per disconnection in the semi-rearrangeably nonblocking SRN operation of asymmetrical three-stage Clos (1953) switching networks in the multirate environment.
Abstract: We study the semi-rearrangeably nonblocking (SRN) operation of asymmetrical three-stage Clos (1953) switching networks in the multirate environment. We develop a basic algorithm that balances the established connections among middle-stage switches by performing a small number of rearrangements per disconnection. For this algorithm, we first derive general conditions under which rearranging from a single middle-stage switch is sufficient to achieve SRN operation. In the most general case, however, a sequence of rearrangements from several middle-stage switches may be required for SRN operation. An algorithm to achieve this sequence of rearrangements is presented and its correctness is proved. The minimum resource requirements to achieve SRN operation, in terms of the number of middle-stage switches, are derived for various cases.

Journal ArticleDOI
TL;DR: The multistage off-line method is presented, a new and rather natural way to model off-lines packet routing problems, which reduces the problem of off- line packet routing to that of finding edge disjoint paths on a multistages graph.
Abstract: In this paper we present the multistage off-line method, a new and rather natural way to model off-line packet routing problems, which reduces the problem of off-line packet routing to that of finding edge disjoint paths on a multistage graph. The multistage off-line method can model any kind of routing pattern on any graph and can incorporate the size of the maximum queue allowed in any processor. The paths for the packets are computed by a greedy heuristic method. Based on the multistage off-line method, we study the permutation packet routing problem on two-dimensional meshes. We ran millions of experiments based on random generated data and, for all of our experiments, we were able to compute a solution of length equal to the maximum distance a packet had to travel, and thus, match the actual lower bound for each routing pattern.

Proceedings ArticleDOI
12 Aug 1996
TL;DR: This paper presents the construction of a new recirculating bitonic sorting network which reduces the O(Nlog/sup 2/N) cost complexity of the original bitonic sorted network to O( NlogN) while preserving the well known time complexity of O(log/Sup 2/ N).
Abstract: This paper presents the construction of a new recirculating bitonic sorting network which reduces the O(Nlog/sup 2/N) cost complexity of the original bitonic sorting network to O(NlogN) while preserving the well known time complexity of O(log/sup 2/N). Network communication is reduced by one half by leaving the N/2 even-parity keys in the local memory of each comparator.

Proceedings ArticleDOI
24 Mar 1996
TL;DR: The results imply that space-time tradeoffs are improved by using Banyans instead of dilated Banyan networks under either switch or stage control.
Abstract: A photonic switching network may be dilated in either space or time to establish crosstalk-free connections. Space-time tradeoffs are evaluated using an analytical model based on Markov process. The probability that a new connection can be established without crosstalk is calculated by taking into consideration the traffic correlations between stages. The model is applicable to both Banyan and dilated Banyan networks under either switch or stage control. Our results imply that space-time tradeoffs are improved by using Banyans instead of dilated Banyans. If hardware cost is not a concern, a multi-plane Banyan network, which is more effective than a dilated Banyan, may be used.

Proceedings ArticleDOI
18 Nov 1996
TL;DR: An analytical model for the switch under uniform input traffic is studied and numerical examples demonstrate that the cell loss probability is very small when the expansion ratio is properly chosen.
Abstract: A large-scale modular multicast ATM switch based three-stage Clos network architecture is proposed and its performance is studied. Although it is a multipath network, the cell sequence is preserved because only output buffers are used in this architecture. The proposed multicast switch has the following advantages: (1) it is modular and suitable for large scale deployment; (2) no dedicated copy network is required since copying and switching are performed simultaneously; (3) two-stage packet replication is used which gives a maximum fan-out of n/sup 2/; (4) translation tables are distributive which gives manageable table sizes; (5) high throughput performance for both uniform and nonuniform input traffic; and (6) a self-routing scheme is used. An analytical model for the switch under uniform input traffic is studied and numerical examples demonstrate that the cell loss probability is very small when the expansion ratio is properly chosen.

Proceedings ArticleDOI
23 Jun 1996
TL;DR: An improved analysis of ATM switching architectures adopting a replicated banyan interconnection network provided with dedicated input and output queues, one per switch inlet and outlet is described.
Abstract: This paper describes an improved analysis of ATM switching architectures adopting a replicated banyan interconnection network provided with dedicated input and output queues, one per switch inlet and outlet. Two different plane selection policies are studied, random selection and alternate sharing. The analysis, which assumes that the network is loaded by uniform traffic, always provides conservative results whereas known models are less accurate and give optimistic traffic results.

Journal ArticleDOI
TL;DR: An optimal algorithm is presented which solves the problem of partitioning an arbitrary permutation into a minimum number of groups such that conflict-free paths for all source-destination pairs in each group can be established on an omega network.
Abstract: It is difficult to partition an arbitrary permutation into a minimum number of groups such that conflict-free paths for all source-destination pairs in each group can be established on an omega network. Based on linear algebra theory, this paper presents an optimal algorithm which solves this problem for the LC class of permutations on a large class of multi-stage networks. This algorithm extends the previous result which deals with the BPC class of permutations on the omega network.

Journal ArticleDOI
TL;DR: It is shown that multipass routing may degrade the system performance if the communication loads are not well balanced among processors; congestion may appear in some processors and the useful communication bandwidth is badly affected.

Journal ArticleDOI
TL;DR: This paper considers the problem of performability of a class of multistage interconnection networks, namely, the Clos network, and examines the performance and availability of gracefully degradable fault-tolerant multiprocessor systems.

Journal ArticleDOI
TL;DR: The proposed graph-theoretic method is found to be simple and computationally efficient compared to the existing techniques, and therefore can be applied for reliability evaluation of other large interconnection networks used in parallel computing systems.

Journal ArticleDOI
TL;DR: This paper studies the class of non-blocking multistage interconnection networks for ATM switching architectures in which the packet storage capability is obtained through a shared queueing technique and proposes three classes of selection algorithms for this network.
Abstract: The paper studies the class of non-blocking multistage interconnection networks for ATM switching architectures in which the packet storage capability is obtained through a shared queueing technique. The basic components of this architecture are a sorting network and a routing network; a recirculation network is also provided that accomplishes the packet shared queueing. The issues of congestion control and fairness in bandwidth allocation by this network are here investigated. We point out that the key component to be designed in order to fulfill the congestion control and fairness requirements is the selection algorithm of the packets that cannot be stored in the shared queue. Three classes of selection algorithms are proposed and compared in terms of possible hardware implementation and traffic performance.

Proceedings ArticleDOI
11 Jun 1996
TL;DR: It is first shown that the structures of the incomplete WK-recursive networks are conveniently represented with multistage graphs and shown that they are Hamiltonian if their connectivities are greater than one.
Abstract: WK-recursive networks, which were originally proposed by Vecchia and Sanges (1988), have suffered from the rigorous restriction of the number of nodes. Like other incomplete networks, incomplete WK-recursive networks are proposed to relieve this restriction. It is first shown that the structures of the incomplete WK-recursive networks are conveniently represented with multistage graphs. This representation can provide a uniform look at the incomplete WK-recursive networks. Using this they: (1) compute the connectivities of the incomplete WK-recursive networks; (2) show that they are Hamiltonian if their connectivities are greater than one; and (3) propose a sufficient and necessary condition for a Hamiltonian path in an incomplete WK-recursive network with connectivity 1.

Proceedings ArticleDOI
18 Nov 1996
TL;DR: A new switch architecture culled extended baseline networks (EBN) is proposed for nonblocking photonic switching and most of the characteristics are shown to be better than those of other well-known networks-fabricated in single Ti:LiNbO/sub 3/ substrate.
Abstract: A new switch architecture culled extended baseline networks (EBN) is proposed for nonblocking photonic switching. This switch is a space-division multistage network using 2/spl times/2 optical switch elements which may be directional couplers fabricated on titanium diffused lithium niobate (Ti:LiNbO/sub 3/) substrates. A recursive definition for the proposed architecture is presented. Some properties including the number of switch elements required, blocking characteristics, number of crossovers, system attenuation, and signal-to-noise ratio (SNR) are derived and analyzed. Most of the characteristics are shown to be better than those of other well-known networks-fabricated in single Ti:LiNbO/sub 3/ substrate.

Journal Article
TL;DR: It is shown that network bandwidth and packet delay improve under nonuniform traffics with increasing HOL-blocking order of a network, which is termed higher order Head-of-Line-blocking (HOLk-blocking) in this paper.
Abstract: Nonuniform traff ic can degrade the overall performance of multistage interconnection networks substantially. In this paper, this performance degradation is traced back to blocking effects that are not present under uniform traff ic patterns within a network. This blocking phenomenon is not mentioned in the literature and is termed higher order Head-of-Line-blocking (HOLk-blocking) in this paper. Methods to determine the HOL-blocking order of multistage networks in order to classify the networks are presented. The performance of networks under hot-spot traff ic as a function of their HOL-blocking characteristics is studied by simulation. It is shown that network bandwidth and packet delay improve under nonuniform traffics with increasing HOL-blocking order of a network.

Proceedings ArticleDOI
TL;DR: In this article, the authors presented holographic polarization-selective elements with electro-optic halfwave plates, which can be combined to implement star couplers to distribute equal optical power from each input channel to all output channels.
Abstract: Highly polarization-selective holographic elements can be achieved with suitable designs. The presented holographic polarization-selective elements are compact and light- weight, and the feature of normally incident and output coupling provide better flexibility and easier alignment for system applications. With suitable designs and arrangements, these elements can be combined to implement star couplers to distribute equal optical power from each input channel to all output channels. In addition, based on our holographic polarization-selective elements with electro-optic halfwave plates, holographic polarization-dependent and polarization- independent optical switches are introduced. The structures to use these switches in various compact 3D multistage interconnection networks for reconfigurable interconnections and in self-healing rings for network service restoration are presented.© (1996) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.