scispace - formally typeset
Search or ask a question

Showing papers on "Multistage interconnection networks published in 1999"


Journal ArticleDOI
TL;DR: Although optical MINs hold great promise and have demonstrated advantages over their electronic counterparts, they also introduce new challenges such as how to deal with the unique problem of avoiding crosstalk in the SEs.
Abstract: Optical interconnections for communication networks and multiprocessor systems have been studied extensively. A basic element of optical switching networks is a directional coupler with two inputs and two outputs or switching elements (SEs). Depending on the control voltage applied to it, an input optical signal is coupled to either of the two outputs, setting the SE to either the straight or cross state. A class of topologies that can be used to construct optical networks is multistage interconnection networks, which interconnect their inputs and outputs via several stages of SEs. Although optical MINs hold great promise and have demonstrated advantages over their electronic counterparts, they also introduce new challenges such as how to deal with the unique problem of avoiding crosstalk in the SEs. In this article we survey the research carried out, including major challenges encountered and approaches taken, on optical MINs.

88 citations


Journal ArticleDOI
TL;DR: A modified dilated Benes (1965) network composed of directional couplers is proposed to improve the signal-to-noise ratio (SNR) characteristics of dilatedBenes networks, and it is shown that this SNR is much worse than that previously known.
Abstract: A modified dilated Benes (1965) network composed of directional couplers is proposed. This structure is introduced to improve the signal-to-noise ratio (SNR) characteristics of dilated Benes networks. A new estimation of the SNR for dilated Benes networks is derived, and it is shown that this SNR is much worse than that previously known. The SNR for modified dilated Benes networks is estimated and compared to dilated Benes and other network architectures. Some other properties including the number of switching elements required, number of crossovers, and system attenuation are also derived and analyzed. Most of the characteristics are shown to be similar to dilated Benes networks and better than those of other well-known networks fabricated in Ti:LiNbO/sub 3/.

35 citations


Journal ArticleDOI
TL;DR: A new self-routing multicast network which can realize arbitrary multicast assignments between its inputs and outputs without any blocking is proposed.
Abstract: In this paper, we propose a design for a new self-routing multicast network which can realize arbitrary multicast assignments between its inputs and outputs without any blocking. The network design uses a recursive decomposition approach and is based on the binary radix sorting concept. All functional components of the network are reverse banyan networks. Specifically, the new multicast network is recursively constructed by cascading a binary splitting network and two half-size multicast networks. The binary splitting network, in turn, consists of two recursively constructed reverse banyan networks. The first reverse banyan network serves as a scatter network and the second reverse banyan network serves as a quasisorting network. The advantage of this approach is to provide a way to self-route multicast assignments through the network and a possibility to reuse part of network to reduce the network cost. The new multicast network we design is compared favorably with the previously proposed multicast networks. It uses O(n log/sup 2/ n) logic gates, and has O(log/sup 2/ n) depth and O(log/sup 2/ n) routing time where the unit of time is a gate delay. By reusing part of the network, the feedback implementation of the network can further reduce the network cost to O(n log n).

33 citations


Journal ArticleDOI
TL;DR: This work proposes a high-performance large-scale ATM switch dealing with link contention problem of bursty Internet traffic, and designs and proves the topological properties that all the SEs of the Banyan network are arranged in a regular pattern topologically, and formulate and prove these properties through an algebraic formalism.
Abstract: Because the Internet traffic, that will be the major traffic of broadband integrated services digital networks, is bursty when cells are being switched within the multistage switching network, it has a higher possibility that multiple cells arriving simultaneously at a switching element through different incoming links may have to be forwarded along the same outgoing link. We propose a high-performance large-scale ATM switch dealing with such link contention problem. It is a new unbuffered augmented Banyan network using fully adaptive self-routing control: the deflection self-routing Banyan network. To utilize all the links of the network as alternate paths, we employ the deflection-routing algorithm in each switching element, such that cells failing to get selected for the intended link are sent along different links, in the hope that they later return, or detour the contended link and continue their journey to the destination. Cells are never dropped within the switching network, whereas the switch has no multiple cell buffers. The proposed routing is as simple as that of the generic Banyan network, and all the switch elements (SEs) have a uniform structure. To design the proposed network and its self-routing, we use the topological properties that all the SEs of the Banyan network are arranged in a regular pattern topologically. We formulate and prove these properties through an algebraic formalism. We also ran a performance analysis to provide quantitative comparison against the Banyan network and the replicated Banyan networks. As a result, we show that the new network has a far better performance and scalability than the other networks.

24 citations


Journal ArticleDOI
TL;DR: This article categorizes, reviews, and compares existing strategies for reducing "tree saturation" effects of memory contention in shared-memory multiprocessor systems that use multistage interconnection networks.
Abstract: In shared-memory multiprocessor systems that use multistage interconnection networks, memory contention can produce "tree saturation", which ultimately degrades system performance. This article categorizes, reviews, and compares existing strategies for reducing these effects.

23 citations


Patent
17 Feb 1999
TL;DR: In this paper, a method for all-to-all personalized exchange for a class of multistage interconnecting networks (MINs) is presented, which is based on a Latin square matrix corresponding to a set of admissible permutations of a multi-age interconnection network.
Abstract: Disclosed is a method for all-to-all personalized exchange for a class of multistage interconnecting networks (MINs). The method is based on a Latin square matrix corresponding to a set of admissible permutations of a multistage interconnecting network. Disclosed are first and second methods for constructing a Latin square matrix used in the personalized exchange technique. Also disclosed is a generic method for decomposing all-to-all personalized exchange patterns into admissible permutations to form the Latin square matrix for self-routing networks which are a subclass of the MINs.

20 citations


Proceedings ArticleDOI
12 Apr 1999
TL;DR: The worst case performance of Earliest Due Date algorithm when applied to packet scheduling in distributed systems is investigated, establishing that EDD is always able to produce a schedule meeting this objective, whenever the so called link utilization is no more than 1/2.
Abstract: In this paper we investigate the worst case performance of Earliest Due Date algorithm when applied to packet scheduling in distributed systems. We assume that the processing elements communicate via a multistage interconnection network, and that the system is synchronous. When two or more packers are simultaneously sent over the same input port, or received through the same output port, the packets undergo a collision and are damaged, needing to be retransmitted later. This causes a performance degradation in terms of both throughput and delay. So, collisions must be avoided. The special type of traffic to be scheduled by Earliest Due Date is a periodic hard-real time one, and the objective is to schedule all the packers within their individual due dares. We establish that EDD is always able to produce a schedule meeting this objective, whenever the so called link utilization is no more than 1/2, showing that this worst case performance bound is tight. Such a bound can be effectively used as a feasibility test before actually running the algorithm.

18 citations


Journal ArticleDOI
01 Jul 1999-Networks
TL;DR: A more general class of networks, which is called (m + 1)-stage d-nary bit permutation networks, is studied and the equivalence of such networks is characterized by sequence of positive integers.
Abstract: In recent years, many multistage interconnection networks using 2 x 2 switching elements have been proposed for parallel architectures. Typical examples are baseline networks, banyan networks, shuffle-exchange networks, and their inverses. As these networks are blocking, such networks with extra stages have also been studied extensively. These include Benes networks and Δ ○+ Δ' networks. Recently, Hwang et al. studied k-extra-stage networks, which are a generalization of the above networks. They also investigated the equivalence issue among some of these networks. In this paper, we studied a more general class of networks, which we call (m + 1)-stage d-nary bit permutation networks. We characterize the equivalence of such networks by sequence of positive integers.

16 citations


Proceedings ArticleDOI
21 Mar 1999
TL;DR: The design rules to configure a minimum-cost rearrangeable photonic switching network where the most serious cause of crosstalk (interference inside the switching elements) is eliminated are found.
Abstract: The substantial growth expected in the near future in the demand of transmission bandwidth to be used in a dynamic and flexible transport network makes the development of all-optical digital crossconnect architectures very important. We consider here in particular the class of the rearrangeable non-blocking space-division switching fabrics configured as multistage structures built with very simple optical switching elements. Rearrangeable networks look today more attractive than strict-sense non-blocking networks since the former have a lower complexity and are compatible with the data loss rate required in a circuit-switched network even with current technology of optical switching devices. We face one the most important problems which arises in a multiple space-channel optical system such as a photonic switching fabric, that is the build-up of crosstalk noise on a certain channel due to the interference with other signals inside the system. We find here the design rules to configure a minimum-cost rearrangeable photonic switching network where the most serious cause of crosstalk (interference inside the switching elements) is eliminated.

13 citations


Journal ArticleDOI
TL;DR: Two analytical decomposition techniques for computing the transient state space solution of large stochastic PN (SPN) models of MINs and HINs are proposed and shown that the suggested techniques give results quite close to those obtained by the exact method with an enormous saving in computation time and memory usage.

12 citations


Journal ArticleDOI
TL;DR: The results indicate that the proposed hardware-based ATBM scheme reduces the communication latency when compared to the software multicasting approach proposed earlier.
Abstract: Multicast operation is an important operation in multicomputer communication systems and can be used to support several collective communication operations. A significant performance improvement can be achieved by supporting multicast operations at the hardware level. We propose an asynchronous tree-based multicasting (ATBM) technique for multistage interconnection networks (MINs). The deadlock issues in tree-based multicasting in MINs are analyzed first to examine the main causes of deadlocks. An ATBM framework is developed in which deadlocks are prevented by serializing the initiations of tree operations that have a potential to create deadlocks. These tree operations are identified through a grouping algorithm. The ATBM approach is not only simple to implement but also provides good communication performance using minimal overheads in terms of additional hardware requirements and synchronization delay. Using the ATBM framework, algorithms are developed for both unidirectional and bidirectional multistage interconnection networks. The performances of the proposed algorithms are evaluated through simulation experiments. The results indicate that the proposed hardware-based ATBM scheme reduces the communication latency when compared to the software multicasting approach proposed earlier.

Proceedings ArticleDOI
06 Jun 1999
TL;DR: A fast cell selection method is proposed for cell selection in input buffered Banyan network with an internal speed twice that of the external links to avoid slow cell selection and costly network setup forMultistage network based input-buffered ATM switches.
Abstract: Multistage network based input-buffered ATM switches, which have been studied extensively, are cheaper compared to crossbar designs but suffer from elaborate cell selection methods or expensive network setup. In this paper, a fast cell selection method is proposed to avoid slow cell selection and costly network setup for these designs. In particular, we propose network hardware specific selection techniques for cell selection in input buffered Banyan network with an internal speed twice that of the external links. Our simulation results show that cell selection by looking at up to 10 cells in each input queue for switch sizes up to N=64 yields 95% or higher switch utilization.

Proceedings ArticleDOI
01 Jul 1999
TL;DR: The performance is shown to increase significantly when the replicated PIPN is used which supports the idea of using this switch as a new high-performance ATM switch.
Abstract: Banyan networks are commonly used as interconnection structures in ATM switches. This paper is concerned with the replication technique which was applied to the standard banyan networks. We apply this technique to the plane interconnected parallel network (PIPN) which is a switch introduced previously as a better banyan-based interconnection structure. The normalized throughput of unbuffered and buffered replicated PIPN is analyzed analytically under uniform traffic model. We apply the simulation technique to verify the analytical results under the uniform traffic model and to study the performance of different heterogeneous traffic models. The performance is shown to increase significantly when the replicated PIPN is used which supports the idea of using this switch as a new high-performance ATM switch.

Journal ArticleDOI
TL;DR: This article presents an analytical model for the performance of buffered banyan networks which support multicast communication, and indicates that the throughput of a multicast banyans network is generally higher than that of a unicastbanyan network.

Journal ArticleDOI
TL;DR: An analytical model for the routing blocking probability of the Clos network is presented which incorporates the probability of interstage link failure to allow for a more realistic and useful determination of the approximation of blocking probability.
Abstract: The well-known Clos network has been extensively used for telephone switching, multiprocessor interconnection and data communications. Much work has been done to develop analytical models for understanding the routing blocking probability of the Clos network. However, none of the analytical models for estimating the blocking probability of this type of network have taken into account the very real possibility of the interstage links in the network failing. In this paper, we consider the routing between arbitrary network inputs and outputs in the Clos network in the presence of interstage link faults. In particular, we present an analytical model for the routing blocking probability of the Clos network which incorporates the probability of interstage link failure to allow for a more realistic and useful determination of the approximation of blocking probability. We also conduct extensive simulations to validate the model. Our analytical and simulation results demonstrate that for a relatively small interstage link failure probability, the blocking behavior of the Clos network is similar to that of a fault-free network, and indicate that the Clos network has a good fault-tolerant capability. The new integrated analytical model can guide network designers in the determination of the effects of network failure on the overall connecting capability of the network and allows for the examination of the relationship between network utilization and network failure.

Journal ArticleDOI
TL;DR: A cut-based technique to compute bounds on the full access probability of an extra stage shuffle exchange network (ESEN) and a wrap-around inverse banyan network (WIBN) and their results obtain tighter bounds as compared to those using existing techniques.
Abstract: This paper proposes a cut-based technique to compute bounds on the full access probability of an extra stage shuffle exchange network (ESEN) and a wrap-around inverse banyan network (WIBN). Note that the problem of finding an exact full access probability is known to be NP-hard. Our results obtain tighter bounds as compared to those using existing techniques. For a small size multistage interconnection network, it deviates less from the exact value. We also notice that our proposed lower bound is conservative. Further, the lower bound is important as it suggests that a network is at least this much reliable.

Journal ArticleDOI
TL;DR: A preliminary low-level design and partial experimental implementation of a multi-credit RSFQ network switching node with the estimated throughput of 7/spl middot/10/sup 10/ 85-bit-parallel packets per second, service latency of 109 ps, and dissipated power of 4.6 mW is presented.
Abstract: This work is part of a project to design a petaflops-scale computer using a hybrid technology multi-threaded architecture (HTMT). A high-bandwidth low-latency switching network (CNET) based on RSFQ logic/memory family comprises the core of the superconductor part of the HTMT system, interconnecting 4,096 processors. We present a preliminary low-level design and partial experimental implementation of a multi-credit RSFQ network switching node with the estimated throughput of 7/spl middot/10/sup 10/ 85-bit-parallel packets per second, service latency of 109 ps, and dissipated power of 4.6 mW.

Journal ArticleDOI
TL;DR: A new switching mechanism for MINs called unit step buffering (USB) is proposed which significantly improves the network performance and does not require any additional hardware or operational overhead.
Abstract: Multistage interconnection networks (MINs) have been widely used for parallel computer systems, and also recognized as an efficient switching fabric for digital communication. In this paper, we propose a new switching mechanism for MINs called unit step buffering (USB) which significantly improves the network performance. Here each cell is allowed to move only one buffer entry position using short network cycle. The proposed USB scheme is compared to the traditional scheme by analytical modeling and computer simulation. They reveal that throughput and delay are improved about 60%-80% for practical size MINs with reasonable traffic in the asynchronous transfer mode (ATM) switching environment. Improvement on parallel computer systems with larger size packets is more significant at about 100%. More importantly, the scheme does not require any additional hardware or operational overhead.

Journal ArticleDOI
TL;DR: A broadcast 2 x 2 switch is presented, an extension of the standard bypass-exchange switch that allows for the broadcasting of the inputs in addition to the conventional modes.
Abstract: Conventional switching systems connect each input channel to one output channel Broadcasting systems permit the connection of each input channel to more than a single output A broadcast 2 x 2 switch is presented This switch is an extension of the standard bypass-exchange switch It allows for the broadcasting of the inputs in addition to the conventional modes Multistage interconnection networks can be constructed with this switch as the basic building block Such networks will extend their capabilities, allowing for broadcasting features Three implementations of this type are described, and experimental results for the 2 x 2 switch are also presented

Journal ArticleDOI
TL;DR: A scalable pipelined asynchronous transfer mode (ATM) switch architecture employing a family of dilated banyan (DB) networks together with their complexity analysis and performance, and shows that performance is not degradable under ATM traffic with temporal and spatial burstiness generated using the on-off model.
Abstract: In the pipeline banyan (PB), the reservation cycle in the control plane is made several times faster than payload transmission in data plane. This enables pipelining multiple banyans. It is observed that the ratio of throughput to switching delay (service rate) is relatively low in the PB due to the banyan. For this, we present a scalable pipelined asynchronous transfer mode (ATM) switch architecture employing a family of dilated banyan (DB) networks together with their complexity analysis and performance. A DB can be engineered between two extremes: (1) a low-cost banyan with internal and external conflicts, or (2) a high-cost conflict-free fully connected network with multiple outlets. Between the two extremes lies a family of DBs having different switching delays and throughputs. Increasing the dilation degree reduces path conflicts, which produces noticeable increase in service rate due to increase in throughput and decrease in path delay. Compared to PB, the pipelined dilated banyan (PDB) requires smaller number of data planes for the same throughput, or provides higher throughput for a given number of data planes. Simulation of PDB is carded out under uniform traffic and simulated ATM traffic. We study the switch performance while varying the load, buffer size, and number of data planes. To analyze the robustness of the switch, we show that performance is not degradable under ATM traffic with temporal and spatial burstiness generated using the on-off model. The PDB is scalable with respect to service rate and can be engineered with respect to: (1) cell loss rate; (2) hardware resources; (3) size of buffers; (4) switching delays; and (5) delay incurred to higher priority traffic. The PDB can deliver up to 3.5 times the service rate of the PB with only linear increase in hardware cost.

Proceedings ArticleDOI
05 Dec 1999
TL;DR: The necessary and sufficient condition for a strictly non-blocking three-stage network is found and demonstrated and it is demonstrated that, in the worst case of unrestricted fan-out and with a simple assumption on the path selection algorithm, the complexity of a N/spl times/N multicast network can be limited to O(N/sup 53/).
Abstract: This paper deals with non-blocking properties of multicast three-stage interconnection networks. The necessary and sufficient condition for a strictly non-blocking three-stage network is found and demonstrated. This condition represents a real innovative result with respect to those already available in the literature that only refer to sufficient bounds. Moreover, it is also demonstrated that, in the worst case of unrestricted fan-out and with a simple assumption on the path selection algorithm, the complexity of a N/spl times/N multicast network can be limited to O(N/sup 53/). Such complexity is lower than that relevant to a crossbar network of the same size.

Journal ArticleDOI
01 Sep 1999
TL;DR: From empirical evaluation with some application programs, it appears that the latency and synchronization overhead of the SSS-MIN are tolerable, and the bandwidth of the sss-MIN is sufficient.
Abstract: Simple Serial Synchronized (SSS)-Multistage Interconnection Network (MIN) is a novel MIN architecture for connecting processors and memory modules in multiprocessors. Synchronized bit-serial communication simplifies the structure/control, and also solves the pin-limitation problem. Here, design, implementation, and evaluation of a multiprocessor prototype called the SNAIL with the SSS-MIN are presented.The heart of SNAIL is a prototype 1 μm CMOS SSS-MIN gate array chip which exchanges packets from 16 inputs at a 50 MHz clock speed. The message combining is implemented with only a 20% increase in hardware. From empirical evaluation with some application programs, it appears that the latency and synchronization overhead of the SSS-MIN are tolerable, and the bandwidth of the SSS-MIN is sufficient.

Proceedings ArticleDOI
24 May 1999
TL;DR: A model to study the performance guarantees in a cross-path switch is developed and it is shown that due to the quasi-static nature of the routing scheme implemented at the central stage, data traffic always becomes more bursty after passing through the cross- path switch.
Abstract: Recently, a novel quasi-static routing scheme called path switching has been proposed for a large-scale ATM packet switch, and a Clos-like switching network called the cross-path switch was designed for its implementation. In principle, the cross-path switch can support both multicast and multirate traffic and now we add the capacity of supporting multimedia traffic in it. In this paper, we develop a model to study the performance guarantees in a cross-path switch. Based on this model, we derive a set of deterministic bounds on propagation delay as well as backlogs at the switch on a per-connection basis. The results show that due to the quasi-static nature of the routing scheme implemented at the central stage, data traffic always becomes more bursty after passing through the cross-path switch. This leads to the criteria for the design of the route assignment algorithm.

Proceedings ArticleDOI
06 Jun 1999
TL;DR: This I-Cubeout makes use of far simpler SEs and requires fewer stages to achieve a given cell drop rate than the earlier design known as the shuffleout, making it suitable for the ATM switch residing in wireless base stations.
Abstract: In this paper, we present a cost-effective design for ATM switching fabrics based on multistage structures, which involve no internal buffers at the constituent switching elements (SEs). The design consists of repeated copies of multiple stages of SEs, that are interconnected according to the indirect n-cube connection style between stages and that provide outlets for cells to terminate at their respective output queues when they reach their destined SEs, referred to as the I-Cubeout. This I-Cubeout makes use of far simpler SEs and requires fewer stages to achieve a given cell drop rate than the earlier design known as the shuffleout. It routes cells in a distributed manner and exhibits low hardware complexity, making it suitable for the ATM switch residing in wireless base stations.

Proceedings ArticleDOI
20 Dec 1999
TL;DR: The labeling scheme used in Benes-equivalent networks is extended to a class of concatenated omega networks with modified central stage connection and the class is proved to be nonblocking rearrangeable.
Abstract: Benes networks are known to be nonblocking rearrangeable networks which can realize arbitrary permutations. Topological equivalence extends the nonblocking rearrangeability to a class of multistage interconnection networks (MIN), which has the same topology as Benes networks. There is another class of well-known multistage interconnection networks, which is not yet known as either nonblocking rearrangeable networks or blocking networks, such as omega+omega networks. In this paper we extend the labeling scheme used in Benes-equivalent networks to a class of concatenated omega networks with modified central stage connection. The class of concatenated omega networks are proved to be nonblocking rearrangeable. A looping algorithm is proposed to routing through the networks to realize arbitrary permutation for the whole class of 2log/sub 2/N stage networks.

Proceedings ArticleDOI
17 Oct 1999
TL;DR: This paper utilizes a labeling scheme to define a class of 2log/sub 2/N-stage omega-based networks and an algorithm is proposed to solve the rearrangeability of this class of networks.
Abstract: The rearrangeability of most omega-based 2log/sub 2/N-stage networks remains an open question, such as omega+omega networks. This paper utilizes a labeling scheme to define a class of 2log/sub 2/N-stage omega-based networks. An algorithm is proposed to solve the rearrangeability of this class of networks. This algorithm focuses on the the central stage connection labeling patterns instead of specific network topologies. Compared with the original looping algorithm, this algorithm can route some 2log/sub 2/N-stage omega-based networks in O(Nlog/sub 2/N) time.

Journal ArticleDOI
TL;DR: A heuristic based on the Simulated Annealing method for efficient routing in the Indirect Star based Multistage Interconnection Networks of O [n] is proposed.
Abstract: We propose a heuristic based on the Simulated Annealing method for efficient routing in the Indirect Star based Multistage Interconnection Networks of O [n]. We assert and demonstrate through simulation results, that this heuristic is more efficient than those previously proposed for this class of MINs since it offers a higher probability of acceptance.

Journal ArticleDOI
TL;DR: The unbuffered Beta topology of SSIN is evaluated using stochastic Petri nets and an approximate analytical model is presented, showing that the bandwidth increases when the data transfer increases and the average tranafer time increases slowly compared to the increase of processors.
Abstract: Many multistage interconnection networks (MINs) and single stage interconnection networks (SSINs) have been proposed for parallel computer systems and for fast packet switching in high speed networks. The cost, performance, and fault-tolerance capability of the interconnection networks (INs) becomes very important in the do sign considerations of a multiprocessor systems. Several types of INs have been proposed, notably multistage and single-stage interconnection networks. There have been extensive studies on MIN (e.g., performance analysis, methods to improve the throughput, priority, etc.), but relatively little work on SSINs has appeared in the literature. In this paper we evaluate the unbuffered Beta topology of SSIN using stochastic Petri nets. We present an approximate analytical model. We analyze the random delay experienced by a message traversing the network for uniform traffic. Messages can have different sizes. Each sender can accept one packet per cycle and route it to the appropriate receiver. It is shown that the bandwidth increases when the data transfer increases. In addition, it is shown that the average tranafer time increases slowly compared to the increase of processors. The power of this model is that, firstly, it presents an acceptable number of states, and secondly, the model can be easily generalized.

Journal ArticleDOI
TL;DR: Simulation results show that the blocking performance of EGS networks under multicast traffic is much better than that of three-stage Clos networks of equal complexity.
Abstract: Extended generalized shuffle (EGS) networks are a wide class of interconnection networks introduced by Richards (1993). In this work, we study the blocking performance of EGS networks under point-to-multipoint traffic. Two new routing algorithms for multicast connections in EGS networks are defined, and a theorem proving that these algorithms construct minimum-cost connection trees is enclosed. Simulation results show that the blocking performance of EGS networks under multicast traffic is much better than that of three-stage Clos networks of equal complexity.

Proceedings ArticleDOI
17 Oct 1999
TL;DR: A study of the CNET for two alternative architectures: banyan and pruned high-dimensional meshes is presented, and a simple 2/spl times/2 internal switching node is designed which can be used to construct more complex networks using either of the architectures.
Abstract: This work is a part of a project to design a petaflops-scale computer using a hybrid technology multi-threaded architecture (HTMT). In the core of the superconductor part of the HTMT system there should be a high-bandwidth low-latency superconductor RSFQ switching network (CNET) connecting 4,096 computing modules with each other and with room-temperature semiconductor components. We present a study of the CNET for two alternative architectures: banyan and pruned high-dimensional meshes. The results indicate that with the speed and space limitations accepted in the HTMT concept, CNET will be able to provide a cross-sectional bandwidth of about 3/5 packet per processor per network clock cycle (in the HTMT concept, 32 ps). We have designed a simple 2/spl times/2 internal switching node which can be used to construct more complex networks using either of the architectures, and experimentally demonstrated successful operation of a 2-bit-wide data path.