Maxwell - a 64 FPGA Supercomputer

doi:10.1109/AHS.2007.71

Home
/
Papers
/
Maxwell - a 64 FPGA Supercomputer

Proceedings Article•DOI•

Maxwell - a 64 FPGA Supercomputer

Robert Baxter¹, Stephen Booth¹, Mark Bull¹, Geoff Cawood¹, James Perry¹, Mark Parsons¹, Alan D. Simpson¹, Arthur Trew¹, A. McCormick, G. Smart, R. Smart, A. Cantle, R. Chamberlain, G. Genest - Show less +10 more•Institutions (1)

El Paso Community College¹

05 Aug 2007-pp 287-294

TL;DR: The machine itself - Maxwell - its hardware and software environment is described and very early benchmark results from runs of the demonstrators are presented.

read less

Abstract: We present the initial results from the FHPCA Supercomputer project at the University of Edinburgh. The project has successfully built a general-purpose 64 FPGA computer and ported to it three demonstration applications from the oil, medical and finance sectors. This paper describes in brief the machine itself - Maxwell - its hardware and software environment and presents very early benchmark results from runs of the demonstrators.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A reconfigurable fabric for accelerating large-scale datacenter services

[...]

Andrew Putnam¹, Adrian M. Caulfield¹, Eric S. Chung¹, Derek Chiou², Kypros Constantinides³, John Demme⁴, Hadi Esmaeilzadeh⁵, Jeremy Fowers¹, Gopi Prashanth Gopal¹, Jan Gray¹, Michael Haselman¹, Scott Hauck⁶, Stephen F. Heil¹, Amir Hormati⁷, Joo-Young Kim¹, Sitaram Lanka¹, James R. Larus⁸, Eric C. Peterson¹, Simon Pope¹, Aaron L. Smith¹, Jason Thong¹, Phillip Yi Xiao¹, Doug Burger¹ - Show less +19 more•Institutions (8)

Microsoft¹, University of Texas at Austin², Amazon.com³, Columbia University⁴, Georgia Institute of Technology⁵, University of Washington⁶, Google⁷, École Polytechnique Fédérale de Lausanne⁸

28 Oct 2016-Communications of The ACM

TL;DR: The authors deployed the reconfigurable fabric in a bed of 1,632 servers and FPGAs in a production datacenter and successfully used it to accelerate the ranking portion of the Bing Web search engine by nearly a factor of two.

...read moreread less

Abstract: Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost It is challenging to improve all of these factors simultaneously To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA) Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth networkWe describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput In other words, the reconfigurable fabric enables the same throughput using only half the number of servers

...read moreread less

835 citations

Cites methods from "Maxwell - a 64 FPGA Supercomputer"

...Maxwell [4] is the most similar to our design, as it directly connects FPGAs in a 2-D torus using InfiniBand cables, although the FPGAs do not implement routing logic....
[...]

Journal Article•DOI•

A reconfigurable fabric for accelerating large-scale datacenter services

[...]

Andrew Putnam¹, Adrian M. Caulfield¹, Eric S. Chung¹, Derek Chiou², Kypros Constantinides¹, John Demme³, Hadi Esmaeilzadeh⁴, Jeremy Fowers¹, Gopi Prashanth Gopal¹, Jan Gray¹, Michael Haselman¹, Scott Hauck⁵, Stephen F. Heil¹, Amir Hormati⁶, Joo-Young Kim¹, Sitaram Lanka¹, James R. Larus⁷, Eric C. Peterson¹, Simon Pope¹, Aaron L. Smith¹, Jason Thong¹, Phillip Yi Xiao¹, Doug Burger¹ - Show less +19 more•Institutions (7)

Microsoft¹, University of Texas at Austin², Columbia University³, Georgia Institute of Technology⁴, University of Washington⁵, Google⁶, École Polytechnique Fédérale de Lausanne⁷

14 Jun 2014

TL;DR: The requirements and architecture of the fabric are described, the critical engineering challenges and solutions needed to make the system robust in the presence of failures are detailed, and the performance, power, and resilience of the system when ranking candidate documents are measured.

...read moreread less

Abstract: Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurablefabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6x8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cablesIn this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distribution--- or, while maintaining equivalent throughput, reduces the tail latency by 29%

...read moreread less

712 citations

Cites methods from "Maxwell - a 64 FPGA Supercomputer"

...Maxwell [4] is the most similar to our design, as it directly connects FPGAs in a 2-D torus using InfiniBand cables, although the FPGAs do not implement routing logic....
[...]

Proceedings Article•DOI•

A cloud-scale acceleration architecture

[...]

Adrian M. Caulfield¹, Eric S. Chung¹, Andrew Putnam¹, Hari Angepat¹, Jeremy Fowers¹, Michael Haselman¹, Stephen F. Heil¹, Matt Humphrey¹, Puneet Kaur¹, Joo-Young Kim¹, Lo Daniel¹, Todd Massengill¹, Kalin Ovtcharov¹, Michael K. Papamichael¹, Lisa Woods¹, Sitaram Lanka¹, Derek Chiou¹, Doug Burger¹ - Show less +14 more•Institutions (1)

Microsoft¹

15 Oct 2016

TL;DR: A new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications, and is much more scalable than prior work which used secondary rack-scale networks for inter-FPGA communication.

...read moreread less

Abstract: Hyperscale datacenter providers have struggled to balance the growing need for specialized hardware (efficiency) with the economic benefits of homogeneity (manageability) In this paper we propose a new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications This Configurable Cloud architecture places a layer of reconfigurable logic (FPGAs) between the network switches and the servers, enabling network flows to be programmably transformed at line rate, enabling acceleration of local applications running on the server, and enabling the FPGAs to communicate directly, at datacenter scale, to harvest remote FPGAs unused by their local servers We deployed this design over a production server bed, and show how it can be used for both service acceleration (Web search ranking) and network acceleration (encryption of data in transit at high-speeds) This architecture is much more scalable than prior work which used secondary rack-scale networks for inter-FPGA communication By coupling to the network plane, direct FPGA-to-FPGA messages can be achieved at comparable latency to previous work, without the secondary network Additionally, the scale of direct inter-FPGA messaging is much larger The average round-trip latencies observed in our measurements among 24, 1000, and 250,000 machines are under 3, 9, and 20 microseconds, respectively The Configurable Cloud architecture has been deployed at hyperscale in Microsoft's production datacenters worldwide

...read moreread less

512 citations

Cites background from "Maxwell - a 64 FPGA Supercomputer"

...Another example is Maxwell [18], which also provides rack-scale direct FPGA-to-FPGA communication....
[...]

Proceedings Article•DOI•

Axel: a heterogeneous cluster with FPGAs and GPUs

[...]

Kuen Hung Tsoi¹, Wayne Luk¹•Institutions (1)

Imperial College London¹

21 Feb 2010

TL;DR: A Map-Reduce framework for the Axel cluster is presented which exploits spatial and temporal locality through different types of processing elements and communication channels, and enables the first demonstration of FPGAs, GPUs and CPUs running collaboratively for N-body simulation.

...read moreread less

Abstract: This paper describes a heterogeneous computer cluster called Axel. Axel contains a collection of nodes; each node can include multiple types of accelerators such as FPGAs (Field Programmable Gate Arrays) and GPUs (Graphics Processing Units). A Map-Reduce framework for the Axel cluster is presented which exploits spatial and temporal locality through different types of processing elements and communication channels. The Axel system enables the first demonstration of FPGAs, GPUs and CPUs running collaboratively for N-body simulation. Performance improvement from 4.4 times to 22.7 times has been achieved using our approach, which shows that the Axel system can combine the benefits of the specialization of FPGA, the parallelism of GPU, and the scalability of computer clusters.

...read moreread less

176 citations

Cites methods from "Maxwell - a 64 FPGA Supercomputer"

...Building a cluster of computers is a common technique to realize the parallel computing model....
[...]

Patent•

Reconfigurable cloud computing

[...]

Stephen M. Hebert, Robert L. Sherrard

17 Apr 2012

TL;DR: In this article, a method, system, and computer-readable storage medium for using a distributed computing system are disclosed, and a workflow is generated to indicate that the application is to be executed using the computing resource(s).

...read moreread less

Abstract: A method, system, and computer-readable storage medium for using a distributed computing system are disclosed. For example, one method involves receiving one or more parameters. The one or more parameters indicate one or more operations. The method also involves selecting one or more computing resources from computing resources. This selecting is based on the parameter(s). An application is configured to be executed using the computing resource(s). The method also involves generating a workflow. The workflow indicates that the application is to be executed using the computing resource(s). The workflow indicates that the application performs the operation(s). The method also involves communicating at least a part of the workflow to one or more nodes, where the node(s) include the computing resource(s).

...read moreread less

98 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Pricing of Options and Corporate Liabilities

[...]

Fischer Black, Myron S. Scholes

01 May 1973-Journal of Political Economy

TL;DR: In this paper, a theoretical valuation formula for options is derived, based on the assumption that options are correctly priced in the market and it should not be possible to make sure profits by creating portfolios of long and short positions in options and their underlying stocks.

...read moreread less

Abstract: If options are correctly priced in the market, it should not be possible to make sure profits by creating portfolios of long and short positions in options and their underlying stocks. Using this principle, a theoretical valuation formula for options is derived. Since almost all corporate liabilities can be viewed as combinations of options, the formula and the analysis that led to it are also applicable to corporate liabilities such as common stock, corporate bonds, and warrants. In particular, the formula can be used to derive the discount that should be applied to a corporate bond because of the possibility of default.

...read moreread less

28,434 citations

Journal Article•DOI•

Reconfigurable computing: a survey of systems and software

[...]

Katherine Compton¹, Scott Hauck²•Institutions (2)

Northwestern University¹, University of Washington²

01 Jun 2002-ACM Computing Surveys

TL;DR: The hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling are explored, and the software that targets these machines is focused on.

...read moreread less

Abstract: Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this survey, we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling. We also focus on the software that targets these machines, such as compilation tools that map high-level algorithms directly to the reconfigurable substrate. Finally, we consider the issues involved in run-time reconfigurable systems, which reuse the configurable hardware during program execution.

...read moreread less

1,666 citations

Proceedings Article•DOI•

Reconfigurable Computing Cluster (RCC) Project: Investigating the Feasibility of FPGA-Based Petascale Computing

[...]

Ron Sass¹, William V. Kritikos¹, Andrew G. Schmidt¹, S. Beeravolu¹, P. Beeraka¹ - Show less +1 more•Institutions (1)

University of North Carolina at Charlotte¹

23 Apr 2007

TL;DR: The reconfigurable computing cluster (RCC) project is a multi-institution, multi-disciplinary project investigating the use of Platform FPGAs to build cost-effective petascale computers and a 64-node prototype cluster is described.

...read moreread less

Abstract: While medium- and large-sized computing centers have increasingly relied on clusters of commodity PC hardware to provide cost-effective capacity and capability, it is not clear that this technology will scale to the PetaFLOP range. It is expected that semiconductor technology will continue its exponential advancements over next fifteen years; however, new issues are rapidly emerging and the relative importance of current performance metrics are shifting. Future PetaFLOP architectures will require system designers to solve computer architecture problems ranging from how to house, power, and cool the machine, all the while remaining sensitive to cost. The reconfigurable computing cluster (RCC) project is a multi-institution, multi-disciplinary project investigating the use of Platform FPGAs to build cost-effective petascale computers. This paper describes the nascent project's objectives and a 64-node prototype cluster. Specifically, the aim is to provide an detailed motivation for the project, describe the design principles guiding development, and present a preliminary performance assessment. Microbenchmark results are reported to answer several pragmatic questions about key subsystems, including the system software, network performance, memory bandwidth, power consumption of nodes in the cluster. Results suggest that the approach is sound.

...read moreread less

67 citations

Journal Article•DOI•

Examining the viability of FPGA supercomputing

[...]

Stephen Craven¹, Peter Athanas¹•Institutions (1)

Virginia Tech¹

01 Jan 2007-Eurasip Journal on Embedded Systems

TL;DR: A comparative analysis of FPGAs and traditional processors is presented, focusing on floating-point performance and procurement costs, revealing economic hurdles in the adoption of FFPAs for general high-performance computing (HPC).

...read moreread less

Abstract: For certain applications, custom computational hardware created using field programmable gate arrays (FPGAs) can produce significant performance improvements over processors, leading some in academia and industry to call for the inclusion of FPGAs in supercomputing clusters This paper presents a comparative analysis of FPGAs and traditional processors, focusing on floating-point performance and procurement costs, revealing economic hurdles in the adoption of FPGAs for general high-performance computing (HPC)

...read moreread less

63 citations

Journal Article•DOI•

Controlled-source electromagnetic imaging on the Nuggets-1 reservoir

[...]

Lucy MacGregor, David Andreis, James Tomlinson, Neville Barker

01 Aug 2006-Geophysics

TL;DR: The marine controlled-source electromagnetic (CSEM) sounding method is rapidly gaining acceptance as an exploration tool for detecting and delineating hydrocarbon reservoirs as discussed by the authors, however, distinguishing hydrocarbon fluids from water within these structures is more problematic.

...read moreread less

Abstract: The marine controlled-source electromagnetic (CSEM) sounding method is rapidly gaining acceptance as an exploration tool for detecting and delineating hydrocarbon reservoirs. Whereas seismic surveys can detect the structures that may contain hydrocarbons with great accuracy, distinguishing hydrocarbon fluids from water within these structures is more problematic. As a result, less than a third of exploration wells result in a commercial discovery.

...read moreread less

52 citations