Showing papers by "Paul W. Coteus published in 2005"

PDF

Open Access

Journal Article•DOI•

Overview of the Blue Gene/L system architecture

[...]

Alan Gara¹, Matthias A. Blumrich¹, Dong Chen¹, G. L.-T. Chiu¹, Paul W. Coteus¹, Mark E. Giampapa¹, R. A. Haring¹, Philip Heidelberger¹, Dirk Hoenicke¹, G.V. Kopcsay¹, T. A. Liebsch², Martin Ohmacht¹, Burkhard Steinmacher-Burow¹, Todd E. Takken¹, Pavlos M. Vranas¹ - Show less +11 more•Institutions (2)

IBM¹, University of Rochester²

01 Mar 2005-Ibm Journal of Research and Development

TL;DR: The key architectural features of BlueGene/L are introduced: the link chip component and five Blue Gene/L networks, the PowerPC® 440 core and floating-point enhancements, the on-chip and off-chip distributed memory system, the node- and system-level design for high reliability, and the comprehensive approach to fault isolation.

...read moreread less

Abstract: The Blue Gene®/L computer is a massively parallel supercomputer based on IBM system-on-a-chip technology. It is designed to scale to 65,536 dual-processor nodes, with a peak performance of 360 teraflops. This paper describes the project objectives and provides an overview of the system architecture that resulted. We discuss our application-based approach and rationale for a low-power, highly integrated design. The key architectural features of Blue Gene/L are introduced in this paper: the link chip component and five Blue Gene/L networks, the PowerPC® 440 core and floating-point enhancements, the on-chip and off-chip distributed memory system, the node- and system-level design for high reliability, and the comprehensive approach to fault isolation.

...read moreread less

422 citations

Journal Article•DOI•

Blue Gene/L torus interconnection network

[...]

N. R. Adiga¹, Matthias A. Blumrich¹, Dong Chen¹, Paul W. Coteus¹, Alan Gara¹, Mark E. Giampapa¹, Philip Heidelberger¹, Suryabhan Singh¹, Burkhard Steinmacher-Burow¹, Todd E. Takken¹, M. Tsao¹, Pavlos M. Vranas¹ - Show less +8 more•Institutions (1)

IBM¹

01 Mar 2005-Ibm Journal of Research and Development

TL;DR: Both the architecture and the microarchitecture of the torus and a network performance simulator are described and simulation results and hardware measurements are presented.

...read moreread less

Abstract: The main interconnect of the massively parallel Blue Gene®/L is a three-dimensional torus network with dynamic virtual cut-through routing. This paper describes both the architecture and the microarchitecture of the torus and a network performance simulator. Both simulation results and hardware measurements are presented.

...read moreread less

361 citations

Patent•

Collective Network For Computer Structures

[...]

Matthias A. Blumrich¹, Paul W. Coteus¹, Dong Chen¹, Alan Gara¹, Mark E. Giampapa¹, Philip Heidelberger¹, Dirk Hoenicke¹, Todd E. Takken¹, Burkhard Steinmacher-Burow¹, Pavlos M. Vranas¹ - Show less +6 more•Institutions (1)

IBM¹

18 Jul 2005

TL;DR: The Global Collective Network (GCN) as discussed by the authors is a system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes, which optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected nodes.

...read moreread less

Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.

...read moreread less

49 citations

Journal Article•DOI•

Packaging the Blue Gene/L supercomputer

[...]

Paul W. Coteus¹, Harry R. Bickford¹, Thomas Mario Cipolla¹, Paul G. Crumley¹, Alan Gara¹, Shawn A. Hall¹, G.V. Kopcsay¹, Alphonso P. Lanzetta¹, Lawrence Shungwei Mok¹, Rick A. Rand¹, Richard A. Swetz¹, Todd E. Takken¹, P. La Rocca², C. M. Marroquin², P. R. Germann², M. J. Jeanson² - Show less +12 more•Institutions (2)

IBM¹, University of Rochester²

01 Mar 2005-Ibm Journal of Research and Development

TL;DR: How 1,024 dual-processor compute application-specific integrated circuits (ASICs) are packaged in a scalable rack, and how racks are combined and augmented with host computers and remote storage are described.

...read moreread less

Abstract: As 1999 ended, IBM announced its intention to construct a one-petaflop supercomputer. The construction of this system was based on a cellular architecture--the use of relatively small but powerful building blocks used together in sufficient quantities to construct large systems. The first step on the road to a petaflop machine (one quadrillion floating-point operations in a second) is the Blue Gene®/L supercomputer. Blue Gene/L combines a low-power processor with a highly parallel architecture to achieve unparalleled computing performance per unit volume. Implementing the Blue Gene/L packaging involved trading off considerations of cost, power, cooling, signaling, electromagnetic radiation, mechanics, component selection, cabling, reliability, service strategy, risk, and schedule. This paper describes how 1,024 dual-processor compute application-specific integrated circuits (ASICs) are packaged in a scalable rack, and how racks are combined and augmented with host computers and remote storage. The Blue Gene/L interconnect, power, cooling, and control systems are described individually and as part of the synergistic whole.

...read moreread less

47 citations

Proceedings Article•DOI•

Power and performance optimization at the system level

[...]

Valentina Salapura¹, R. Bickford¹, Matthias A. Blumrich¹, A. A. Bright¹, Dong Chen¹, Paul W. Coteus¹, Alan Gara¹, Mark E. Giampapa¹, Michael K. Gschwind¹, Manish Gupta¹, Shawn A. Hall¹, R. A. Haring¹, Philip Heidelberger¹, Dirk Hoenicke¹, Gerard V. Kopcsay¹, Martin Ohmacht¹, Rick A. Rand¹, Todd E. Takken¹, Pavlos M. Vranas¹ - Show less +15 more•Institutions (1)

IBM¹

04 May 2005

TL;DR: The BlueGene/L supercomputer as mentioned in this paper has been designed with a focus on power/performance efficiency to achieve high application performance under the thermal constraints of common data centers, and emphasis was put on system solutions to engineer a power-efficient system.

...read moreread less

Abstract: The BlueGene/L supercomputer has been designed with a focus on power/performance efficiency to achieve high application performance under the thermal constraints of common data centers. To achieve this goal, emphasis was put on system solutions to engineer a power-efficient system. To exploit thread level parallelism, the BlueGene/L system can scale to 64 racks with a total of 65536 computer nodes consisting of a single compute ASIC integrating all system functions with two industry-standard PowerPC microprocessor cores in a chip multiprocessor configuration. Each PowerPC processor exploits data-level parallelism with a high-performance SIMD oating point unitTo support good application scaling on such a massive system, special emphasis was put on efficient communication primitives by including five highly optimized communification networks. After an initial introduction of the Blue-Gene/L system architecture, we analyze power/performance efficiency for the BlueGene system using performance and power characteristics for the overall system performance (as exemplified by peak performance numbers.To understand application scaling behavior, and its impact on performance and power/performance efficiency, we analyze the NAMD molecular dynamics package using the ApoA1 benchmark. We find that even for strong scaling problems, BlueGene/L systems can deliver superior performance scaling and deliver significant power/performance efficiency. Application benchmark power/performance scaling for the voltage-invariant energy delay 2 power/performance metric demonstrates that choosing a power-efficient 700MHz embedded PowerPC processor core and relying on application parallelism was the right decision to build a powerful, and power/performance efficient system

...read moreread less

30 citations

Patent•

Wafer level I/O test and repair enabled by I/O layer

[...]

Kerry Bernstein¹, Paul W. Coteus², Ibrahim M. Elfadel², Philip G. Emma¹, Daniel Friedman¹, Ruchir Puri¹, Mark B. Ritter¹, Jeannine M. Trewhella¹, Albert M. Young¹ - Show less +5 more•Institutions (2)

IBM¹, GlobalFoundries²

07 Oct 2005

TL;DR: A 3D chip having at least one I/O layer connected to other 3D-chip layers by a vertical bus can accommodate protection and off-chip device drive circuits, customization circuits, translation circuits, conversions circuits and/or built-in self-test circuits as discussed by the authors.

...read moreread less

Abstract: A 3D chip having at least one I/O layer connected to other 3D chip layers by a vertical bus such that the I/O layer(s) may accommodate protection and off-chip device drive circuits, customization circuits, translation circuits, conversions circuits and/or built-in self-test circuits capable of comprehensive chip or wafer level testing wherein the I/O layers function as a testhead. Substitution of I/O circuits or structures may be performed using E-fuses or the like responsive to such testing.

...read moreread less

16 citations

Patent•

Metalized elastomeric electrical contacts

[...]

Gareth G. Hougham¹, Ali Afzali¹, Steven Allen Cordes¹, Paul W. Coteus¹, Matthew J. Farinelli¹, Sherif A. Goma¹, Alphonso P. Lanzetta¹, Daniel Peter Morris¹, Joanna Rosner¹, Nisha Yohannan¹ - Show less +6 more•Institutions (1)

IBM¹

30 Sep 2005

TL;DR: In this article, the authors proposed techniques or forming enhanced electrical connections, where an electrical connecting device comprises an electrically insulating carrier having one or more contact structures traversing a plane thereof.

...read moreread less

Abstract: Techniques or forming enhanced electrical connections are provided. In one aspect, and electrical connecting device comprises an electrically insulating carrier having one or more contact structures traversing a plane thereof. Each contact structure comprises an elastomeric material having an electrically conductive layer running along at least one surface thereof continuously through the plane of the carrier.

...read moreread less

15 citations

Patent•

Methods and apparatus using commutative error detection values for fault isolation in multiple node computers

[...]

Gheorghe Almasi¹, Matthias A. Blumrich¹, Dong Chen¹, Paul W. Coteus¹, Alan Gara¹, Mark E. Giampapa¹, Philip Heidelberger¹, Dirk Hoenicke¹, Sarabjeet Singh¹, Burkhard Steinmacher-Burow¹, Todd E. Takken¹, Pavlos M. Vranas¹ - Show less +8 more•Institutions (1)

IBM¹

14 Apr 2005

TL;DR: In this article, a node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the program.

...read moreread less

Abstract: Methods and apparatus perform fault isolation in multiple node computing systems using commutative error detection values for—example, checksums—to identify and to isolate faulty nodes. When information associated with a reproducible portion of a computer program is injected into a network by a node, a commutative error detection value is calculated. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created and stored in memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in values indicate a possible faulty node.

...read moreread less

13 citations

Patent•

Metalized elastomeric probe structure

[...]

Gareth G. Hougham¹, Ali Afzali², Steven Allen Cordes¹, Paul W. Coteus¹, Matthew J. Farinelli¹, Sherif A. Goma¹, Alphonso P. Lanzetta¹, Daniel Peter Morris¹, Joanna Rosner¹, Nisha Yohannan¹ - Show less +6 more•Institutions (2)

IBM¹, GlobalFoundries²

30 Sep 2005

TL;DR: In this paper, a probe structure for an electronic device is described, which includes an elastomeric material having an electrically conductive layer running along at least one surface thereof continuously through the plane of the carrier.

...read moreread less

Abstract: A probe structure for an electronic device is provided. In one aspect, the probe structure includes an electrically insulating carrier having one or more contact structures traversing a plane thereof. Each contact structure includes an elastomeric material having an electrically conductive layer running along at least one surface thereof continuously through the plane of the carrier. The probe structure includes one or more other contact structures adapted for connection to a test apparatus.

...read moreread less

6 citations

Patent•

Providing indeterminate read data latency in a memory system

[...]

Paul W. Coteus¹, Kevin C. Gower¹, Warren E. Maule¹, Robert B. Tremaine¹•Institutions (1)

IBM¹

28 Nov 2005

TL;DR: In this paper, the authors propose a method for providing indeterminate read data latency in a memory system, which includes determining if a local data packet has been received and storing it into a buffer device, and if the buffer device contains a data packet and determining if an upstream driver for transmitting data packets to a memory controller via an upstream channel is idle.

...read moreread less

Abstract: A method for providing indeterminate read data latency in a memory system. The method includes determining if a local data packet has been received and storing it into a buffer device. The method also includes determining if the buffer device contains a data packet and determining if an upstream driver for transmitting data packets to a memory controller via an upstream channel is idle, and in response thereto the data packet is transmitted to the upstream driver. The method further includes determining if an upstream data packet has been received and the upstream driver is not idle, then the upstream data packet is stored into the buffer device. The upstream data packet is selectively transmitted to the upstream driver. If the upstream driver is not idle, then any data packets in progress are continued being transmitted to the upstream driver.

...read moreread less

1 citations

Overview of the Blue Gene/L system

[...]

Alan Gara, Matthias A. Blumrich, Dong Chen, G. L.-T. Chiu, Paul W. Coteus, Mark E. Giampapa, R. A. Haring, Philip Heidelberger, Dirk Hoenicke, G.V. Kopcsay, T. Liebsch, Martin Ohmacht, Burkhard Steinmacher-Burow, Todd E. Takken, Pavlos Vranas - Show less +11 more

01 Jan 2005

TL;DR: The key architectural features of Blue Genet/L are introduced: the link chip component and five Blue Gene/L networks, the PowerPCt 440 core and floatingpoint enhancements, the on-chip and off-chip distributed memory system, the node- and system-level design for high reliability, and the comprehensive approach to fault isolation.

...read moreread less

Abstract: The Blue Genet/L computer is a massively parallel supercomputer based on IBM system-on-a-chip technology. It is designed to scale to 65,536 dual-processor nodes, with a peak performance of 360 teraflops. This paper describes the project objectives and provides an overview of the system architecture that resulted. We discuss our application-based approach and rationale for a low-power, highly integrated design. The key architectural features of Blue Gene/L are introduced in this paper: the link chip component and five Blue Gene/L networks, the PowerPCt 440 core and floatingpoint enhancements, the on-chip and off-chip distributed memory system, the node- and system-level design for high reliability, and the comprehensive approach to fault isolation.

...read moreread less