scispace - formally typeset
Search or ask a question
Topic

Myrinet

About: Myrinet is a research topic. Over the lifetime, 545 publications have been published within this topic receiving 15611 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The Myrinet local area network employs the same technology used for packet communication and switching within massively parallel processors, but with the highest performance per unit cost of any current LAN.
Abstract: The Myrinet local area network employs the same technology used for packet communication and switching within massively parallel processors. In realizing this distributed MPP network, we developed specialized communication channels, cut-through switches, host interfaces, and software. To our knowledge, Myrinet demonstrates the highest performance per unit cost of any current LAN. >

1,857 citations

Proceedings ArticleDOI
01 Apr 1992
TL;DR: It is shown that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed and, with this mechanism, latency tolerance becomes a programming/compiling concern.
Abstract: The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor cost/performance. We show that existing message passing multiprocessors have unnecessarily high communication costs. Research prototypes of message driven machines demonstrate low communication overhead, but poor processor cost/performance. We introduce a simple communication mechanism, Active Messages, show that it is intrinsic to both architectures, allows cost effective use of the hardware, and offers tremendous flexibility. Implementations on nCUBE/2 and CM-5 are described and evaluated using a split-phase shared-memory extension to C, Split-C. We further show that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed. With this mechanism, latency tolerance becomes a programming/compiling concern. Hardware support for active messages is desirable and we outline a range of enhancements to mainstream processors.

1,402 citations

10 Oct 2000
TL;DR: The design and implementation of PVFS are described and performance results on the Chiba City cluster at Argonne are presented, both for a concurrent read/write workload and for the BTIO benchmark.
Abstract: As Linux clusters have matured as platforms for low-cost, high-performance parallel computing, software packages to provide many key services have emerged, especially in areas such as message passing and networking. One area devoid of support, however, has been parallel file systems, which are critical for high-performance I/O on such clusters. We have developed a parallel file system for Linux clusters, called the Parallel Virtual File System (PVFS). PVFS is intended both as a high-performance parallel file system that anyone can download and use and as a tool for pursuing further research in parallel I/O and parallel file systems for Linux clusters. In this paper, we describe the design and implementation of PVFS and present performance results on the Chiba City cluster at Argonne. We provide performance results for a workload of concurrent reads and writes for various numbers of compute nodes, I/O nodes, and I/O request sizes. We also present performance results for MPI-IO on PVFS, both for a concurrent read/write workload and for the BTIO benchmark. We compare the I/O performance when using a Myrinet network versus a fast-ethernet network for I/O-related communication in PVFS. We obtained read and write bandwidths as high as 700 Mbytes/sec with Myrinet and 225 Mbytes/sec with fast ethernet.

1,029 citations

Journal ArticleDOI
01 Feb 2005
TL;DR: The work on improving the performance of collective communication operations in MPICH is described, with results indicating that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.
Abstract: We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI Message Passing Interface collective operations, because of limited space we describe only the algorithms for allgather, broadcast, all-to-all, reduce-scatter, reduce, and allreduce. Performance results on a Myrinet-connected Linux cluster and an IBM SP indicate that, in all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM's MPI on the SP. We also explore in further detail the optimization of two of the most commonly used collective operations, allreduce and reduce, particularly for long messages and nonpower-of-two numbers of processes. The optimized algorithms for these operations perform several times better than the native algorithms on a Myrinet cluster, IBM SP, and Cray T3E. Our results indicate that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.

838 citations

Proceedings ArticleDOI
03 Dec 1995
TL;DR: U-Net as mentioned in this paper provides processes with a virtual view of a network interface to enable user-level access to high-speed communication devices using off-the-shelf ATM communication hardware.
Abstract: The U-Net communication architecture provides processes with a virtual view of a network interface to enable userlevel access to high-speed communication devices. The architecture, implemented on standard workstations using offthe-shelf ATM communication hardware, removes the kernel from the communication path, while still providing full protection. The model presented by U-Net allows for the construction of protocols at user level whose performance is only limited by the capabilities of network. The architecture is extremely flexible in the sense that traditional protocols like TCP and UDP, as well as novel abstractions like Active Messages can be implemented efficiently. A U-Net prototype on an 8node ATM cluster of standard workstations offers 65 microseconds round-trip latency and 15 Mbytes/sec bandwidth. It achieves TCP performance at maximum network bandwidth and demonstrates performance equivalent to Meiko CS-2 and TMC CM-5 supercomputers on a set of Split-C benchmarks.

809 citations


Network Information
Related Topics (5)
Scalability
50.9K papers, 931.6K citations
81% related
Cache
59.1K papers, 976.6K citations
81% related
Server
79.5K papers, 1.4M citations
80% related
Compiler
26.3K papers, 578.5K citations
78% related
Dynamic priority scheduling
28.2K papers, 585.1K citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
20211
20203
20183
20161
20151