scispace - formally typeset
Search or ask a question

Showing papers on "Multistage interconnection networks published in 2008"


Proceedings ArticleDOI
31 Oct 2008
TL;DR: Simulations with application-based patterns showed that the difference between effective and rated bisection bandwidth could impact overall application performance by up to 12%, so a new metric is introduced: effective bisected bandwidth.
Abstract: Multistage interconnection networks based on central switches are ubiquitous in high-performance computing. Applications and communication libraries typically make use of such networks without consideration of the actual internal characteristics of the switch. However, application performance of these networks, particularly with respect to bisection bandwidth, does depend on communication paths through the switch. In this paper we discuss the limitations of the hardware definition of bisection bandwidth (capacity-based) and introduce a new metric: effective bisection bandwidth. We assess the effective bisection bandwidth of several large-scale production clusters by simulating artificial communication patterns on them. Networks with full bisection bandwidth typically provided effective bisection bandwidth in the range of 55-60%. Simulations with application-based patterns showed that the difference between effective and rated bisection bandwidth could impact overall application performance by up to 12%.

109 citations


Journal ArticleDOI
TL;DR: A common network topology with a 2×2 basic building block in a SEN and its variants in terms of extra-stages is investigated and three measures of reliability: terminal, broadcast, and network reliability for the three SEN systems are analyzed.

72 citations


Journal ArticleDOI
TL;DR: Reliability of an MIN is used as a measure of system’s ability to transform information from input to output devices, and reliability bounds to estimate the exact reliability of a gamma network are proposed.

45 citations


Journal ArticleDOI
TL;DR: It is shown that all kinds of multicast traffic particularly benefit from the new topology, and performance and costs of the new architecture are determined and compared to other network topologies.

37 citations


Proceedings ArticleDOI
08 Dec 2008
TL;DR: This paper proposes a reduced unidirectional fat-tree (RUFT) that uses a a simplified version of the aforementioned deterministic routing algorithm, and shows that RUFT obtains lower latency thanFat-tree for low and medium traffic loads and in large networks, it obtains almost the same throughput than the classicalfat-tree.
Abstract: The fat-tree is one of the most widely-used topologies by interconnection network manufacturers. Recently, a deterministic routing algorithm that optimally balances the network traffic in fat--trees was proposed. It can not only achieve almost the same performance than adaptive routing, but also outperforms it for some traffic patterns. Nevertheless, fat-trees require a high number of switches with a non-negligible wiring complexity. In this paper, we propose replacing the fat-tree by an unidirectional multistage interconnection network referred to as reduced unidirectional fat-tree (RUFT) that uses a a simplified version of the aforementioned deterministic routing algorithm. As a consequence, switch hardware is almost reduced to the half, decreasing, in this way, power consumption, arbitration complexity, switch size, and network cost. Evaluation results show that RUFT obtains lower latency than fat-tree for low and medium traffic loads. Furthermore, in large networks, it obtains almost the same throughput than the classical fat-tree.

37 citations


Proceedings ArticleDOI
05 May 2008
TL;DR: An approximate performance model for self routing multistage interconnection networks, applied for 2 x 2 switches which are subject to blocking situations when the packets compete for a full output port of a next stage switch is presented.
Abstract: Multistage Interconnection Networks (MINs) allow efficient communication between network components and also among the components of parallel systems. This paper presents an approximate performance model for self routing multistage interconnection networks, applied for 2 x 2 switches which are subject to blocking situations when the packets compete for a full output port of a next stage switch. We apply our model to variable network size MINs and we study the performance under different traffic conditions. In our approximation the bulk of packets that arrive in each cycle to the MIN inputs, follow a Bernoulli distribution. We derive an approximate formula for the utilization of each queue and based on this, we approximate the blocking behavior (probabilities) and the steady-state distributions of populations for each queue of the MIN. This novel analytical model is validated by extensive simulations. Our analytical method gives more accurate results than previous existing analytical models and converges very fast.

25 citations


Journal ArticleDOI
TL;DR: It is shown, that the number of planes required is less than those derived earlier in other papers for WSNB multiplane banyan- type switching fabrics under crosstalk constraint.
Abstract: A new control algorithm for log2(N, 0, p) switching networks composed of 2 x 2 switching elements has been proposed recently. Under this algorithm, log2(N,0,p) switching networks with even number of stages are wide-sense nonblocking (WSNB) if p is the same as for the rearrangeable nonblocking (RNB) one. The considerred algorithm and WSNB conditions did not take into account crosstalk constraint, which is an important factor in photonic switching. This paper enhanced this algorithm to the case when crosstalk in the switching fabric is not allowed. WSNB conditions for this enhanced algorithm are also derived. It is shown, that the number of planes required is less than those derived earlier in other papers for WSNB multiplane banyan- type switching fabrics under crosstalk constraint. Under this algorithm, log2( N,0,p) switching networks with odd number of stages and with zero crosstalk are WSNB if p is the same as for RNB one.

20 citations


Journal ArticleDOI
TL;DR: This paper gives a local memoryless switch policy that uses back pressure and achieves a competitive ratio of (4h + 1), where h is the number of stages of the MIN and k is the size of an SE.
Abstract: Combined input and output queued (CIOQ) architectures with a moderate fabric speedupS > 1 have come to play a major role in the design of high performance switches. In this paper we study CIOQ switches in two settings. The first is a setting of a single CIOQ switch with Priority Queuing (PQ) buffers, which provide better Quality of Service (QoS) guarantees by decreasing the delay experienced by mission-critical and real-time traffic. The second is a setting of a Multistage Interconnection Network (MIN), where each Switching Element (SE) is a CIOQ switch. In the first setting, we consider the case of traffic with packets having variable values. The goal of the switch policy is to maximize the total value of packets sent out of the switch. We present a switch policy that is 6-competitive for any speedup. In the second setting, we study a MIN architecture in which each internal buffer is further divided into virtual buffers, one per each MIN output port reachable from that buffer. We consider the case of traffic with unit value packets, and the goal of the policy managing the MIN is to maximize the total number of packets sent out of the MIN. We give a local memoryless switch policy that uses back pressure and achieves a competitive ratio of (4h + 1), where h is the number of stages of the MIN. The proposed policy is simple and can be efficiently implemented at high speeds. We also demonstrate a lower bound of h/2 on the competitive ratio of any local work-conserving deterministic memoryless switch policy. We further show that without back pressure, no online local work-conserving deterministic switch policy can break the competitive ratio of , where N is the size of the MIN and k is the size of an SE.

20 citations


Journal Article
TL;DR: This paper proposes a new algorithm called the ZeroY algorithm (ZeroY) to avoid crosstalk and route the traffic in an OM IN more efficiently and outperforms all the other algorithms in terms of the running time that is required for one permutation.
Abstract: Multistage interconnection networks (MIN) are popular in switching and communication applications. However, OMINs introduce crosstalk which results from coupling two signals within one Switching Element (SE). Under the constraint of avoiding crosstalk, what we will discuss in is how to realize a permutation that requires the minimum number of passes. In this paper, we are interested in a network called Omega Network, which has shuffle-exchange connection pattern. We propose a new algorithm called the ZeroY algorithm (ZeroY) to avoid crosstalk and route the traffic in an OM IN more efficiently. The results of the ZeroY algorithm are analyzed and compared with those of other algorithms (except the GA) in an Omega network. The ZeroY algorithm outperforms all the other algorithms in terms of the running time that are required for one permutation.

20 citations


Journal ArticleDOI
TL;DR: This paper proposes replacing thefat-tree by a unidirectional multistage interconnection network (UMIN) that uses a traffic balancing deterministic routing algorithm, and preliminary evaluation results show that the UMIN with the load balancing scheme obtains lower latency than fat-tree for low and medium traffic loads.
Abstract: The fat-tree is one of the most widely-used topologies by interconnection network manufacturers. Recently, it has been demonstrated that a deterministic routing algorithm that optimally balances the network traffic can not only achieve almost the same performance than an adaptive routing algorithm but also outperforms it. On the other hand, fat-trees require a high number of switches with a non-negligible wiring complexity. In this paper, we propose replacing the fat-tree by a unidirectional multistage interconnection network (UMIN) that uses a traffic balancing deterministic routing algorithm. As a consequence, switch hardware is almost reduced to the half, decreasing, in this way, the power consumption, the arbitration complexity, the switch size itself, and the network cost. Preliminary evaluation results show that the UMIN with the load balancing scheme obtains lower latency than fat-tree for low and medium traffic loads. Furthermore, in networks with a high number of stages or with high radix switches, it obtains the same, or even higher, throughput than fat-tree.

20 citations


Journal Article
TL;DR: Fast window method based on bitwise operations (BWM) is represented and this algorithm applies Omega network and reduces the execution time approximately more than ten times compared with previous algorithms.
Abstract: One undesirable problem introduced by the Optical Multistage Interconnection network is a crosstalk that is caused by coupling two signals within a switching element. To avoid a crosstalk, many approaches have been proposed such as time domain and space domain approaches. Because the messages should be partitioned into several groups to send to the network, some methods are used to find conflicts between the messages. Window Method is used to find out which messages have conflict and should not be in the same group. In this paper, fast window method based on bitwise operations (BWM) is represented. This algorithm applies Omega network. The comparison result shows the good performance of this algorithm. This algorithm reduces the execution time approximately more than ten times compared with previous algorithms.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: A model of multistage interconnection network and a design of prototyping on FPGA are presented and this enabled the comparison of the proposed model with the full crossbar network, and the estimation of performance in terms of area, latency and energy consumption.
Abstract: Multiprocessor system on chip is a concept that aims to integrate multiple hardware and software in a chip multistage interconnection network is considered as a promising solution for applications which use parallel architectures integrating a large number of processors and memories in this paper, we present a model of multistage interconnection network and a design of prototyping on FPGA This enabled the comparison of the proposed model with the full crossbar network, and the estimation of performance in terms of area, latency and energy consumption The Multistage Interconnection Networks are well adapted to MPSoC architecture They meet the needs of intensive signal processing and they are scalable to connect a large number of modules

Journal ArticleDOI
TL;DR: This paper studies the rearrangeable f-cast multilog2 N networks under both node- blocking scenario (relevant to photonic switches) and link-blocking scenario ( relevant to electronic switches).
Abstract: Multi-log2 N networks (or vertically stacked banyan networks) have been an attractive class of switching networks due to their small depth O(log N), absolute signal loss uniformity and good fault tolerance property. Recently, F.K.Hwang extended the study of multi-log2 N networks to the general f-cast case, which covers the unicast case (f = 1) and multicast case (f = N) as special cases, and determined the conditions for these networks to be f-cast strictly nonblocking when the fan-out capability is available at both the input stage and middle banyan stage. In this paper, we study the rearrangeable f-cast multilog2 N networks under both node-blocking scenario (relevant to photonic switches) and link-blocking scenario (relevant to electronic switches). In particular, we consider the following three fan-out cases in our study: (1) no restriction on fan-out capability; (2) input stage has no fan-out capability; (3) middle banyan stage has no fan-out capability. We determine the necessary conditions for the first two cases while obtaining the necessary and also sufficient condition for the third one.

Journal ArticleDOI
TL;DR: This letter proposes a scalable packet switch architecture that is called the central-stage buffered Clos-network (CBC), and analyzes the memory requirements to be strictly non-blocking, especially for emulating an output-queuing packet switch.
Abstract: We consider using the Clos-network to scale high performance routers, especially the space-memory-space (SMS) packet switches. In circuit switching, the Clos-network is responsible for pure connections and the internal links are the only blocking sources. In packet switching, however, the buffers cause additional blockings. In this letter, we first propose a scalable packet switch architecture that we call the central-stage buffered Clos-network (CBC). Then, we analyze the memory requirements for the CBC to be strictly non-blocking, especially for emulating an output-queuing packet switch. Results show that even with the additional memory blockings the CBC still inherits advantages from the Clos-network, e.g., modular design and cost efficiency.

Journal ArticleDOI
TL;DR: It is shown that Monte Carlo method is capable of providing reliability evaluation for SEN+ system, confined to multiprocessor environment based on identical switching elements used in interconnecting multiple processors.
Abstract: Multistage interconnection networks (MINs) have been widely adopted in communication networks especially in telecommunication and multiprocessor environments. This paper aims to evaluate the reliability performance of shuffle exchange network with an additional stage (SEN+), based on Monte Carlo method using computerized simulation. The evaluation is further improvised by deploying stratified sampling into the Monte Carlo method. SEN+ described in this paper is confined to multiprocessor environment based on identical switching elements used in interconnecting multiple processors. It is shown that Monte Carlo method is capable of providing reliability evaluation for SEN+ system.

Proceedings ArticleDOI
26 Oct 2008
TL;DR: This paper compares the performance of multi-class priority mechanism against the single priority one, by gathering metrics for the two most important network performance factors, namely packet throughput and delay under uniform traffic conditions and various offered loads, using simulations.
Abstract: In this paper the modeling of Omega Networks supporting multi-class routing traffic is presented and their performance is analyzed. We compare the performance of multi-class priority mechanism against the single priority one, by gathering metrics for the two most important network performance factors, namely packet throughput and delay under uniform traffic conditions and various offered loads, using simulations. Moreover, two different test-bed setups were used in order to investigate and analyze the performance of all priority-class traffic, under different quality of service (QoS) configurations. In the considered environment, switching elements (SEs) that natively support multi-class priority routing traffic are used for constructing the MIN, while we also consider double-buffered SEs, two configuration parameters that have not been addressed insofar. The rationale behind introducing a multiple-priority scheme is to provide different QoS guarantees to traffic from different applications, which is a highly desired feature for many IP network operators, and particularly for enterprise networks.

Proceedings ArticleDOI
07 May 2008
TL;DR: This paper presents a modeling methodology based on that notation to model the delta network family of interconnection networks for NoC construction, and proposes a notion of multidimensional multiplicity to model repetitive structures and topologies.
Abstract: As system-on-chip (SoCs) become more complex, high performance interconnection mediums are required to handle their complexity. Network-on-chips (NoCs) enable integration of more intellectual properties (IPs) into the SoC with increased performance. In the recent MARTE (modeling and analysis of real-time and embedded systems) profile, a notion of multidimensional multiplicity has been proposed to model repetitive structures and topologies. This paper presents a modeling methodology based on that notation to model the delta network family of interconnection networks for NoC construction.

Proceedings ArticleDOI
08 Jun 2008
TL;DR: It is found that the use of asymmetric-sized buffered systems leads to better exploitation of network capacity, while the increments in delays can be tolerated.
Abstract: In this paper the performance of asymmetric-sized finite-buffered Delta Networks with 2-class routing traffic is presented and analyzed in the uniform traffic conditions under various loads using simulations. We compare the performance of 2-class priority mechanism against the single priority one, by gathering metrics for the two most important network performance factors, namely packet throughput and delay. We also introduce and calculate a universal performance factor, which includes the importance aspect of each of the above main performance factors. We found that the use of asymmetric-sized buffered systems leads to better exploitation of network capacity, while the increments in delays can be tolerated. The goal of this paper is to help network designers in performance prediction before actual network implementation and in understanding the impact of each parameter factor.

Proceedings ArticleDOI
26 Sep 2008
TL;DR: A fast and efficient crosstalk-free algorithm for message routing in optical Omega multistage networks is proposed based on the Zero algorithms and the inverse Conflict Matrix is used to map identified conflicts between messages in the network.
Abstract: Limited by the properties of optical signals, it is not possible to route more than one message simultaneously, without optical crosstalk, over a switching element in an Optical Multistage Interconnection Networks (OMINs). One solution, called the time domain approach, avoids optical crosstalk by arranging the permutation in such a way that a set of crosstalk-free connections can be established and each connection set be made active in different time slots. Based on the Zero algorithms, we proposed a fast and efficient crosstalk-free algorithm for message routing in optical Omega multistage networks. The Bitwise Window Method (BWM) is used to identify potential message conflicts that may further lead to optical crosstalk. In addition, the inverse Conflict Matrix (iCM) is used to map identified conflicts between messages in the network. It is shown that the new algorithm successfully improved the execution time in comparison to the original Zero algorithm.

Patent
07 Oct 2008
TL;DR: In this paper, an overlaid switching network is derived by placing a switching element corresponding to the position of switching elements in either multistage interconnection network, and each switching element in the overlaid network has the ports defined by the two multistages interconnection networks as does its interconnection.
Abstract: An overlaid switching network is derived by overlaying perpendicularly one multistage interconnection network with a second multistage interconnection network. The new network is formed by placing a switching element corresponding to the position of switching elements in either multistage interconnection network. Each switching element in the overlaid network has the ports defined by the two multistage interconnection networks as does its interconnection networks. A special case occurs when the number of rows and columns of the first multistage interconnection network is the number of columns and rows of the second multistage interconnection network, respectively. The overlaid switching networks also inherit their upgradeability from the multistage interconnection networks from which they are derived, such as in the case of a redundant blocking compensated cyclic group multistage network.

Journal ArticleDOI
TL;DR: A new irregular interconnection network IABN (Irregular Augmented Baseline) has been proposed, which provides much better fault-tolerance and almost double bandwidth at the expanse of little more cost than ABN.
Abstract: The design of a suitable interconnection network for inter-processor communication is one of the key issues of the system performance. In this study a new irregular interconnection network IABN (Irregular Augmented Baseline) has been proposed. IABN is designed by modifying existing ABN (Augmented Baseline Network). ABN is a regular multi-path network with limited fault tolerance. IABN provides three times more paths between any pair of source-destination in comparison to ABN. The ABN and IABN MINs are analyzed and compared in terms of performance parameters namely Bandwidth, Cost and Bandwidth per unit Cost. The proposed network IABN provides much better fault-tolerance and almost double bandwidth at the expanse of little more cost than ABN.

Proceedings ArticleDOI
19 May 2008
TL;DR: This work proposes a configuration scheme for IQC switches that hierarchizes the matching process, and shows that the switching performance of the proposed approach using weight- based and weightless selection schemes is high under uniform and nonuniform traffic.
Abstract: Clos-network switches were proposed as a scalable architecture for the implementation of large-capacity circuit switches. In packet switching, the three-stage Clos-network architecture uses small switches as modules to assemble a switch with large number of ports or aggregated ports with high data rates. Current schemes for configuration of input-queued three- stage Clos-network (IQC) switches involve port matching and path routing assignment, in that order. The implementation of a scheduler capable of matching thousands of ports in large-size switches is complex because of the large port count. To decrease the scheduler complexity for such switches (e.g., 1024 ports or more), we propose a configuration scheme for IQC switches that hierarchizes the matching process. In a practical scenario our scheme performs routing first and port matching thereafter. This approach applies the reduction concept of Clos networks to the matching process. The application of this approach results in a feasible size of schedulers for up to Exabit-capacity switches, an independent configuration of the middle stage modules from port matches, a reduction of the matching communication overhead between different stages, and a release of the switching function to the last-stage modules in a 3-stage switch. We show that the switching performance of the proposed approach using weight- based and weightless selection schemes is high under uniform and nonuniform traffic.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: It is found how crosstalk adds a new dimension to the performance analysis of practical VSOB networks where link failures present, and can guide network designer in finding a tradeoff among the blocking probability, the degree of crosStalk and link failures ofVSOB networks.
Abstract: Vertical stacking of multiple copies of an optical banyan network is a novel scheme for building nonblocking optical switching networks. The resulting network, namely vertically stacked optical banyan (VSOB) network, preserves all the properties of the banyan network, but increases the hardware cost significantly under first order crosstalk-free constraint. Therefore, blocking behavior analysis could be an effective approach to studying network performance, and finding a graceful compromise between hardware costs and blocking probability with different degree of crosstalk constraint and link failure probability. However, upper bound on blocking probability for such networks with link failures only has been presented in the literature. In this paper, we present the simulation results for upper bound on blocking probability considering both link-failures and given degree of crosstalk constraint. We find how crosstalk adds a new dimension to the performance analysis of practical VSOB networks where link failures present. The simulation results presented in this paper can guide network designer in finding a tradeoff among the blocking probability, the degree of crosstalk and link failures of VSOB networks.

Journal ArticleDOI
TL;DR: The new multiplane rearrangeable reduced baseline switching network requires fewer switching elements and crosspoints than the multiplane switching network which is based on the plain baseline network.
Abstract: The new concept of the multiplane rearrangeable switching network is presented. The new switching network's architecture is based on the well-known baseline network (the log2(N, 0, 1) switching network). This new architecture can easily be obtained from the baseline network by the removal of some switching elements. It is therefore called the reduced baseline switching network and is denoted by logr 2(N, 0, 1). The new multiplane rearrangeable reduced baseline switching network requires fewer switching elements and crosspoints than the multiplane switching network which is based on the plain baseline network.

Proceedings ArticleDOI
01 Nov 2008
TL;DR: Simulation results have shown that integrating RLP to FastZ algorithm successfully improved routing performance and the new fast zero with RLP (FastRLP) algorithm is developed based on the time domain approach for solving optical crosstalk in the optical omega network.
Abstract: In this paper, we explore the idea of integrating the remove last pass (RLP) algorithm to the fast zero (FastZ) algorithm as the prior initial solution to improve routing performance in optical multistage interconnection networks (OMINs). OMINs are popular for its cost-effectiveness and self-routable characteristics to meet the demand for high speed switching capability. A great challenge in dealing with OMINs is the optical crosstalk caused by optical signal coupling when propagating through the switching elements comprising the architecture. Many algorithms have been developed to solve optical crosstalk using different approaches. The new fast zero with RLP (FastRLP) algorithm is developed based on the time domain approach for solving optical crosstalk in the optical omega network. Simulation results have shown that integrating RLP to FastZ algorithm successfully improved routing performance.

Proceedings ArticleDOI
13 May 2008
TL;DR: Using the concept of inverse conflict matrix (iCM), another representation of a conflict matrix in which it summarizes all possible conflicts between each node in the network, the ZeroX algorithm is simplified, thus improved by reducing the time needed for routing process.
Abstract: Based on the ZeroX algorithm, we proposed a fast and efficient crosstalk-free algorithm called the Fast ZeroX algorithm for solving optical crosstalk problem in omega networks. In our approach, we introduced the concept of inverse conflict matrix (iCM), another representation of a conflict matrix in which it summarizes all possible conflicts between each node in the network. Using iCM, the ZeroX algorithm is simplified, thus improved by reducing the time needed for routing process. From our simulation results, it is shown that our approach yields better performance in terms of minimal routing time in comparison to the original ZeroX algorithm.

Proceedings ArticleDOI
10 Oct 2008
TL;DR: This paper presents a formal specification of the Delta multistage interconnection networks for MPSoCs in the ACL2 logic, based on a generic model for networks on chip (GeNoC).
Abstract: The design of modern multiprocessor systems-on-chip has performance constraints which must be satisfied by the interconnection architecture. multistage interconnection networks, also denoted MINs, seem to be a promising alternative for solving the problems of on-chip communications. This paper presents a formal specification of the Delta multistage interconnection networks for MPSoCs in the ACL2 logic. This work is based on a generic model for networks on chip (GeNoC).


Proceedings ArticleDOI
08 Dec 2008
TL;DR: This paper proposes an Ethernet for which for both time and space are carrier sensed, and calls the space-time transmission medium Terabit Ethernet (TbE), which allows routing in ether to be done by the network interface card (NIC).
Abstract: To achieve Terabit and Petabit switching, both time (high transmission speed) and space (multi-stage interconnection network) technologies are required. We propose an Ethernet for which for both time and space are carrier sensed. We extend CSMA/CD to a time- space protocol called CSMA/TS. We call the space-time transmission medium Terabit Ethernet (TbE). This space sensing CSMA/TS protocol allows routing in ether to be done by the network interface card (NIC). The advantages are scalability, lower cost, and most importantly, reduced delay end-to-end. Simple analysis is given for evaluating throughput for 2-stage and 3-stage TbE networks.

Proceedings ArticleDOI
29 Jun 2008
TL;DR: The proposed architecture of Delta Networks which have been thoroughly analyzed by the analytical model can be useful in the study and development of communication links that support both voice and data traffic co instantaneously, with good quality of service.
Abstract: In this research, the performance evaluation of two class priority Delta Multistage Interconnection Networks (MINs) is analyzed, using a discrete priority queuing model All Delta Networks are constructed by special switching elements (SEs) based on an architecture which natively supports two classes of routing traffic The proposed analytical method which was developed for performance metrics investigation can be applied on any class of Banyan Switches, because the model is independent from their internal link permutations, providing results for all intermediate stages The analytical method was validated by simulation experiments and the obtained results in the marginal cases of single priority MINs were found to be more accurate as compared with those of three older classic modelsThe proposed architecture of Delta Networks which have been thoroughly analyzed by our analytical model can be useful in the study and development of communication links that support both voice and data traffic co instantaneously, with good quality of service